Sep 112013
 

In this previous post I talked about Bokeh depth of field, where it comes from and why it is different to the type of fake depth of field effects you get in some (usually older) games. In this slightly more technical post I’ll be outlining a nice technique for rendering efficient depth of field, which I use in my demo code, taken from this EA talk about the depth of field in Need For Speed: The Run.

The main difference is the shape of the blur – traditionally, a Gaussian blur is performed (a Gaussian blur is a bell-shaped blur curve), whereas real Bokeh requires a blur into the shape of the camera aperture:

Bokeh blur on the left, Gaussian on the right

The first question you might be asking is why are Gaussian blurs used instead of more realistic shapes? It comes down to rendering efficiency, and things called separable filters. But first you need to know what a normal filter is.

Filters

You’re probably familiar with image filters from Photoshop and similar – when you perform a blur, sharpen, edge detect or any of a number of others, you’re running a filter on the image. A filter consists of a grid of numbers. Here is a simple blur filter:

\left(\begin{array}{ccc}\frac{1}{16}&\frac{2}{16}&\frac{1}{16}\\\frac{2}{16}&\frac{4}{16}&\frac{2}{16}\\\frac{1}{16}&\frac{2}{16}&\frac{1}{16}\end{array}\right)

For every pixel in the image, this grid is overlaid so that the centre number is over the current pixel and the other numbers are over the neighbouring pixels. To get the filtered result for the current pixel, the colour under each of the grid element is multiplied by the number over it and then they’re all added up. So for this particular filter you can see that the result for each pixel will be 4 times the original colour, plus twice each neighbouring pixel, plus one of each diagonally neighbouring pixel, and divide by 16 so it all adds up to one again. Or more simply, blend some of the surrounding eight pixels into the centre one.

As another example, here is a very basic edge detection filter:

\left(\begin{array}{ccc}-1&-1&-1\\-1&8&-1\\-1&-1&-1\end{array}\right)

On flat areas of the image the +8 of the centre pixel will cancel with the eight surrounding -1 values and give a black pixel. However, along the brighter side of an edge, the values won’t cancel and you’ll get bright output pixels in your filtered image.

You can find a bunch more examples, and pictures of what they do, over here.

Separable filters

These example filters are only 3×3 pixels in size, but they need to sample from the original image nine times for each pixel. A 3×3 filter can only be affected by the eight neighbouring pixels, so will only give a very small blur radius. To get a nice big blur you need a much larger filter, maybe 15×15 for a nice Gaussian. This would require 225 texture fetches for each pixel in the image, which is very slow!

Luckily some filters have the property that they are separable. That means that you can get the same end result by applying a one-dimensional filter twice, first horizontally and then vertically. So first a 15×1 filter is used to blur horizontally, and then the filter is rotated 90 degrees and the result is blurred vertically as well. This only requires 15 texture lookups per pass (as the filter only has 15 elements), giving a total of 30 texture lookups. This will give exactly the same result as performing the full 15×15 filter in one pass, except that one required 225 texture lookups.

Original image / horizontal pass / both passes

Unfortunately only a few special filters are separable – there is no way to produce the hard-edged circular filter at the top of the page with a separable filter, for example. A size n blur would require the full n-squared texture lookups, which is far too slow for large n (and you need a large blur to create a noticeable effect).

Bokeh filters

So what we need to do is find a way to use separable filters to create a plausible Bokeh shape (e.g. circle, pentagon, hexagon etc). Another type of separable filter is the box filter. Here is a 5×1 box filter:

\left(\begin{array}{ccccc}\frac{1}{5}&\frac{1}{5}&\frac{1}{5}&\frac{1}{5}&\frac{1}{5}\end{array}\right)

Apply this in both directions and you’ll see that this just turns a pixel into a 5×5 square (and we’ll actually use a lot bigger than 5×5 in the real thing). Unfortunately you don’t get square Bokeh (well you might, but it doesn’t look nice), so we’ll have to go further.

One thing to note is that you can skew your square filter and keep it separable:

Then you could perhaps do this three times in different directions and add the results together:

And here we have a hexagonal blur, which is a much nicer Bokeh shape! Unfortunately doing all these individual blurs and adding them up is still pretty slow, but we can do some tricks to combine them together. Here is how it works.

First pass

Start with the unblurred image.

Original image

Perform a blur directly upwards, and another down and left (at 120°). You use two output textures – into one write just the upwards blur:

Output 1 – blurred upwards

Into the other write both blurs added together:

Output 2 – blurred upwards plus blurred down and left

Second pass

The second pass uses the two output images from above and combines them into the final hexagonal blur. Blur the first texture (the vertical blur) down and left at 120° to make a rhombus. This is the upper left third of the hexagon:

Intermediate 1 – first texture blurred down and left

At the same time, blur the second texture (vertical plus diagonal blur) down and right at 120° to make the other two thirds of the hexagon:

Intermediate 2 – second texture blurred down and right

Finally, add both of these blurs together and divide by three (each individual blur preserves the total brightness of the image, but the final stage adds together three lots of these – one in the first input texture and two in the second  input texture). This gives you your final hexagonal blur:

Final combined output

Controlling the blur size

So far in this example, every pixel has been blurred into the same sized large hexagon. However, depth of field effects require different sized blurs for each pixel. Ideally, each pixel would scatter colour onto surrounding pixels depending on how blurred it is (and this is how the draw-a-sprite-for-each-pixel techniques work). Unfortunately we can’t do that in this case – the shader is applied by drawing one large polygon over the whole screen so each pixel is only written to once, and can therefore only gather colour data from surrounding pixels in the input textures. Thus for each pixel the shader outputs, it has to know which surrounding pixels are going to blur into it. This requires a bit of extra work.

The alpha channel of the original image is unused so far. In a previous pass we can use the depth of that pixel to calculate the blur size, and write it into the alpha channel. The size of the blur (i.e. the size of the circle of confusion) for each pixel is determined by the physical properties of the camera: the focal distance, the aperture size and the distance from the camera to the object. You can work out the CoC size by using a bit of geometry which I won’t go into. The calculation looks like this if you’re interested (taken from the talk again):

CoCSize = z * CoCScale + CoCBias
CoCScale = (A * focalLength * focalPlane * (zFar - zNear)) / ((focalPlane - focalLength) * zNear * zFar)
CoCBias = (A * focalLength * (zNear - focalPlane)) / (focalPlane - focalLength) * zNear)

[A is aperture size, focal length is a property of the lens, focal plane is the distance from the camera that is in focus. zFar and zNear are from the projection matrix, and all that stuff is required to convert post-projection Z values back into real-world units. CoCScale and CoCBias are constant across the whole frame, so the only calculation done per-pixel is a multiply and add, which is quick. Edit – thanks to Vincent for pointing out the previous error in CoCBias!]

In the images above, every pixel is blurred by the largest amount. Now we can have different blur sizes per-pixel. Because for any pixel there could be another pixel blurring over it, a full sized blur must always be performed. When sampling each pixel from the input texture, the CoCSize of that pixel is compared with how far it is from the pixel being shaded, and if it’s bigger then it’s added in. This means that in scenes with little blurring there are a lot of wasted texture lookups, but this is the only way to simulate pixel ‘scatter’ in a ‘gather’ shader.

Per-pixel blur size – near blur, in focus and far blur

Another little issue is that blur sizes can only grow by a whole pixel at a time, which introduces some ugly popping at the CoCSize changes (e.g. when the camera moves). To reduce this you can soften the edge – for example if sampling a pixel 5 pixels away, blend in the contribution as the CoCSize goes from 5 to 4 pixels.

Near and far depth of field

There are a couple of subtleties with near and far depth of field. Objects behind the focal plane don’t blur over things that are in focus, but objects in front do (do an image search for “depth of field” to see examples of this). Therefore when sampling to see if other pixels are going to blur over the one you’re currently shading, make sure it’s either in front of the focal plane (CoCSize is negative) or the currently shaded pixel and the sampled pixel are both behind the focal plane and the sampled pixel isn’t too far behind (in my implementation ‘too far’ is more than twice the CoCSize).

[Edit: tweaked the implementation of when to use the sampled pixel]

This isn’t perfect because objects at different depths don’t properly occlude each others’ blurs, but it still looks pretty good and catches the main cases.

And finally, here’s some shader code.

Jul 012013
 

So far I’ve covered the basics of getting objects on the screen with textures, lighting, reflections and shadows. The results look reasonable but to increase the realism you need to more accurately simulate the behaviour of light when a real image is seen. An important part of this is the characteristics of the camera itself.

One important property of cameras is depth of field. Depth of field effects have been in games for a good few years, but it’s only recently that we’re starting to do it ‘properly’ in real-time graphics.

What is depth of field?

Cameras can’t keep everything in focus at once – they have a focal depth which is the distance from the camera where objects are perfectly in focus. The further away from this distance you get, the more out of focus an object becomes. Depth of field is the size of the region in which the image looks sharp. Because every real image we see is seen through a lens (whether in a camera or in the eye), to make a believable image we need to simulate this effect.

The quick method

Until the last couple of years, most depth of field effects were done using a ‘hack it and hope’ approach – do something that looks vaguely right and is quick to render. In this case, we just need to make objects outside of a certain depth range look blurry.

So first you need a blurry version of the screen. To do this you draw everything in the scene as normal, and then create a blurred version as a separate texture. There are a few methods of blurring the screen, depending on how much processing time you want to spend. The quickest and simplest is to scale down the texture four times and then scale it back up again, where the texture filtering will fill in the extra pixels. Or, if you’re really posh, you can use a 5×5 Gaussian blur (or something similar) which gives a smoother blur (especially noticeable when the camera moves). You should be able to see that the upscaled version looks more pixelated:

Blurring using reduce and upscale, and a Gaussian blur

Then you make up four distance values: near blur minimum and maximum distances, and far blur minimum and maximum distances. The original image and the blurry version are then blended together to give the final image – further away than the ‘far minimum’ distance you blend in more and more of the blurry image, up until you’re showing the fully blurred image at the ‘far maximum’ distance (and the same for the near blur).

In the end you get something that looks a bit like this (view the full size image to see more pronounced blurring in the distance):

Depth of field in Viva Piñata

This looks fairly OK, but it’s nothing like how a real camera works. To get better results we need to go back to the theory and understand the reasons you get depth of field in real cameras.

Understanding it properly

Light bounces off objects in the world in all directions. Cameras and eyes have fairly large openings to let in lots of light, which means that they will capture a cone of light bounced off from the object (light can bounce off the object anywhere within a cone of angles and still enter the camera). Therefore, cameras need lenses to focus all this light back onto a single point.

Light cones from objects at different distances to a lens

A lens bends all incoming light the same. This means that light bouncing off objects at different distances from the lens will converge at different points on the other side of it. In the diagram the central object is in focus, because the red lines converge on the projection plane. The green light converges too early because the object is too far away. The blue light converges too late because the object is too close.

What the focus control on your camera does is move the lens backwards and forwards. You can see that moving the lens away from the projection plane would mean that the blue lines converge on the plane, so closer objects would be in focus.

There is a technical term, circle of confusion (CoC), which is the circular area over which the light from an object is focussed over on the projection plane. The red lines show a very tiny CoC, while the blue lines show a larger one. The green lines show the largest CoC of the three objects, as the light is spread out over a large area. This is what causes the blur on out of focus objects, as their light is spread over the image. This picture is a great example of this effect, where the light from each individual bulb on the Christmas tree is spread into a perfect circle:

Bokeh

The circle of confusion doesn’t always appear circular. It is circular in some cases because the aperture of the camera is circular, letting in light from a full cone. When the aperture is partly closed it becomes more pentagonal/hexagonal/octagonal, depending on how many blades make up the aperture. Light is blocked by the blades, so the CoC will actually take the shape of the aperture.

This lens has an aperture with six blades, so will give a hexagonal circle of confusion:

So why is simulating Bokeh important? It can be used for artistic effect because it gives a nice quality to the blur, and also it will give you a more believable image because it will simulate how a camera actually works. Applying a Gaussian blur to the Christmas tree picture would give an indistinct blurry mess, but the Bokeh makes the individual bright lights stand out even though they are out of focus.

Here is the difference between applying a Bokeh blur to a bright pixel, compared to a Gaussian blur. As you can see, the Gaussian smears out a pixel without giving those distinct edges:

Bokeh blur on the left, Gaussian on the right

Using Bokeh in real-time graphics

In principle, Bokeh depth of field isn’t complicated to implement in a game engine. For any pixel you can work out the size of the CoC from the depth, focal length and aperture size. If the CoC is smaller than one pixel then it’s completely in focus, otherwise the light from that pixel will be spread over a number of pixels, depending on the CoC size. The use of real camera controls such as aperture size and focal length means that your game camera now functions much more like a real camera with the same settings, and setting up cameras will be easier for anyone who is familiar with real cameras.

In practice, Bokeh depth of field isn’t trivial to implement in real-time. Gaussian blurs are relatively fast (and downsize/upscaling is even faster) which is why these types of blurs were used for years. There aren’t any similarly quick methods of blurring an image with an arbitrary shaped blur (i.e. to get a blur like the left image above, rather than the right).

However, GPUs are getting powerful enough to use a brute force approach, which is the approach that was introduced in Unreal Engine 3. You draw a texture of your Bohek shape (anything you like), and then for each pixel in your image you work out the CoC size (from the depth and camera settings). Then to make your final image, instead of drawing a single pixel for each pixel in the original image, you draw a sprite using the Bokeh texture. Draw the sprite the same colour as the original pixel, and the same size as the CoC. This will accurately simulate the light from a pixel being spread over a wide area. Here it is in action, courtesy of Unreal Engine:

Depth of field with different Bokeh shapes

The downside of this technique is that it’s very slow. If your maximum Bokeh sprite size is, say, 8 pixels wide, then in the worst case each pixel in the final image will be made up of 64 composited textures. Doubling the width of the blur increases the fill cost by four times. This approach looks really nice, but you need to use some tricks to get it performing well on anything but the most powerful hardware (for example, draw one sprite for every 2×2 block of pixels to reduce the fill cost).

An alternative method

There is a good alternative method that I like which is much quicker to draw, and I shall go through that soon in a more technical post. It was presented in this talk from EA at Siggraph 2011, but it takes a bit of thought to decipher the slides into a full implementation so I’ll try to make it clearer. This is actually the technique I use in my Purple Space demo.

Cheaper depth of field effect