In this previous post I talked about Bokeh depth of field, where it comes from and why it is different to the type of fake depth of field effects you get in some (usually older) games. In this slightly more technical post I’ll be outlining a nice technique for rendering efficient depth of field, which I use in my demo code, taken from this EA talk about the depth of field in Need For Speed: The Run.
The main difference is the shape of the blur – traditionally, a Gaussian blur is performed (a Gaussian blur is a bell-shaped blur curve), whereas real Bokeh requires a blur into the shape of the camera aperture:
The first question you might be asking is why are Gaussian blurs used instead of more realistic shapes? It comes down to rendering efficiency, and things called separable filters. But first you need to know what a normal filter is.
Filters
You’re probably familiar with image filters from Photoshop and similar – when you perform a blur, sharpen, edge detect or any of a number of others, you’re running a filter on the image. A filter consists of a grid of numbers. Here is a simple blur filter:
For every pixel in the image, this grid is overlaid so that the centre number is over the current pixel and the other numbers are over the neighbouring pixels. To get the filtered result for the current pixel, the colour under each of the grid element is multiplied by the number over it and then they’re all added up. So for this particular filter you can see that the result for each pixel will be 4 times the original colour, plus twice each neighbouring pixel, plus one of each diagonally neighbouring pixel, and divide by 16 so it all adds up to one again. Or more simply, blend some of the surrounding eight pixels into the centre one.
As another example, here is a very basic edge detection filter:
On flat areas of the image the +8 of the centre pixel will cancel with the eight surrounding -1 values and give a black pixel. However, along the brighter side of an edge, the values won’t cancel and you’ll get bright output pixels in your filtered image.
You can find a bunch more examples, and pictures of what they do, over here.
Separable filters
These example filters are only 3×3 pixels in size, but they need to sample from the original image nine times for each pixel. A 3×3 filter can only be affected by the eight neighbouring pixels, so will only give a very small blur radius. To get a nice big blur you need a much larger filter, maybe 15×15 for a nice Gaussian. This would require 225 texture fetches for each pixel in the image, which is very slow!
Luckily some filters have the property that they are separable. That means that you can get the same end result by applying a one-dimensional filter twice, first horizontally and then vertically. So first a 15×1 filter is used to blur horizontally, and then the filter is rotated 90 degrees and the result is blurred vertically as well. This only requires 15 texture lookups per pass (as the filter only has 15 elements), giving a total of 30 texture lookups. This will give exactly the same result as performing the full 15×15 filter in one pass, except that one required 225 texture lookups.
Unfortunately only a few special filters are separable – there is no way to produce the hard-edged circular filter at the top of the page with a separable filter, for example. A size n blur would require the full n-squared texture lookups, which is far too slow for large n (and you need a large blur to create a noticeable effect).
Bokeh filters
So what we need to do is find a way to use separable filters to create a plausible Bokeh shape (e.g. circle, pentagon, hexagon etc). Another type of separable filter is the box filter. Here is a 5×1 box filter:
Apply this in both directions and you’ll see that this just turns a pixel into a 5×5 square (and we’ll actually use a lot bigger than 5×5 in the real thing). Unfortunately you don’t get square Bokeh (well you might, but it doesn’t look nice), so we’ll have to go further.
One thing to note is that you can skew your square filter and keep it separable:
Then you could perhaps do this three times in different directions and add the results together:
And here we have a hexagonal blur, which is a much nicer Bokeh shape! Unfortunately doing all these individual blurs and adding them up is still pretty slow, but we can do some tricks to combine them together. Here is how it works.
First pass
Start with the unblurred image.
Perform a blur directly upwards, and another down and left (at 120°). You use two output textures – into one write just the upwards blur:
Into the other write both blurs added together:
Second pass
The second pass uses the two output images from above and combines them into the final hexagonal blur. Blur the first texture (the vertical blur) down and left at 120° to make a rhombus. This is the upper left third of the hexagon:
At the same time, blur the second texture (vertical plus diagonal blur) down and right at 120° to make the other two thirds of the hexagon:
Finally, add both of these blurs together and divide by three (each individual blur preserves the total brightness of the image, but the final stage adds together three lots of these – one in the first input texture and two in the second input texture). This gives you your final hexagonal blur:
Controlling the blur size
So far in this example, every pixel has been blurred into the same sized large hexagon. However, depth of field effects require different sized blurs for each pixel. Ideally, each pixel would scatter colour onto surrounding pixels depending on how blurred it is (and this is how the draw-a-sprite-for-each-pixel techniques work). Unfortunately we can’t do that in this case – the shader is applied by drawing one large polygon over the whole screen so each pixel is only written to once, and can therefore only gather colour data from surrounding pixels in the input textures. Thus for each pixel the shader outputs, it has to know which surrounding pixels are going to blur into it. This requires a bit of extra work.
The alpha channel of the original image is unused so far. In a previous pass we can use the depth of that pixel to calculate the blur size, and write it into the alpha channel. The size of the blur (i.e. the size of the circle of confusion) for each pixel is determined by the physical properties of the camera: the focal distance, the aperture size and the distance from the camera to the object. You can work out the CoC size by using a bit of geometry which I won’t go into. The calculation looks like this if you’re interested (taken from the talk again):
CoCSize = z * CoCScale + CoCBias CoCScale = (A * focalLength * focalPlane * (zFar - zNear)) / ((focalPlane - focalLength) * zNear * zFar) CoCBias = (A * focalLength * (zNear - focalPlane)) / (focalPlane - focalLength) * zNear)
[A is aperture size, focal length is a property of the lens, focal plane is the distance from the camera that is in focus. zFar and zNear are from the projection matrix, and all that stuff is required to convert post-projection Z values back into real-world units. CoCScale and CoCBias are constant across the whole frame, so the only calculation done per-pixel is a multiply and add, which is quick. Edit – thanks to Vincent for pointing out the previous error in CoCBias!]
In the images above, every pixel is blurred by the largest amount. Now we can have different blur sizes per-pixel. Because for any pixel there could be another pixel blurring over it, a full sized blur must always be performed. When sampling each pixel from the input texture, the CoCSize of that pixel is compared with how far it is from the pixel being shaded, and if it’s bigger then it’s added in. This means that in scenes with little blurring there are a lot of wasted texture lookups, but this is the only way to simulate pixel ‘scatter’ in a ‘gather’ shader.
Another little issue is that blur sizes can only grow by a whole pixel at a time, which introduces some ugly popping at the CoCSize changes (e.g. when the camera moves). To reduce this you can soften the edge – for example if sampling a pixel 5 pixels away, blend in the contribution as the CoCSize goes from 5 to 4 pixels.
Near and far depth of field
There are a couple of subtleties with near and far depth of field. Objects behind the focal plane don’t blur over things that are in focus, but objects in front do (do an image search for “depth of field” to see examples of this). Therefore when sampling to see if other pixels are going to blur over the one you’re currently shading, make sure it’s either in front of the focal plane (CoCSize is negative) or the currently shaded pixel and the sampled pixel are both behind the focal plane and the sampled pixel isn’t too far behind (in my implementation ‘too far’ is more than twice the CoCSize).
[Edit: tweaked the implementation of when to use the sampled pixel]
This isn’t perfect because objects at different depths don’t properly occlude each others’ blurs, but it still looks pretty good and catches the main cases.
And finally, here’s some shader code.