So far I’ve covered the basics of getting objects on the screen with textures, lighting, reflections and shadows. The results look reasonable but to increase the realism you need to more accurately simulate the behaviour of light when a real image is seen. An important part of this is the characteristics of the camera itself.
One important property of cameras is depth of field. Depth of field effects have been in games for a good few years, but it’s only recently that we’re starting to do it ‘properly’ in real-time graphics.
What is depth of field?
Cameras can’t keep everything in focus at once – they have a focal depth which is the distance from the camera where objects are perfectly in focus. The further away from this distance you get, the more out of focus an object becomes. Depth of field is the size of the region in which the image looks sharp. Because every real image we see is seen through a lens (whether in a camera or in the eye), to make a believable image we need to simulate this effect.
The quick method
Until the last couple of years, most depth of field effects were done using a ‘hack it and hope’ approach – do something that looks vaguely right and is quick to render. In this case, we just need to make objects outside of a certain depth range look blurry.
So first you need a blurry version of the screen. To do this you draw everything in the scene as normal, and then create a blurred version as a separate texture. There are a few methods of blurring the screen, depending on how much processing time you want to spend. The quickest and simplest is to scale down the texture four times and then scale it back up again, where the texture filtering will fill in the extra pixels. Or, if you’re really posh, you can use a 5×5 Gaussian blur (or something similar) which gives a smoother blur (especially noticeable when the camera moves). You should be able to see that the upscaled version looks more pixelated:
Then you make up four distance values: near blur minimum and maximum distances, and far blur minimum and maximum distances. The original image and the blurry version are then blended together to give the final image – further away than the ‘far minimum’ distance you blend in more and more of the blurry image, up until you’re showing the fully blurred image at the ‘far maximum’ distance (and the same for the near blur).
In the end you get something that looks a bit like this (view the full size image to see more pronounced blurring in the distance):
This looks fairly OK, but it’s nothing like how a real camera works. To get better results we need to go back to the theory and understand the reasons you get depth of field in real cameras.
Understanding it properly
Light bounces off objects in the world in all directions. Cameras and eyes have fairly large openings to let in lots of light, which means that they will capture a cone of light bounced off from the object (light can bounce off the object anywhere within a cone of angles and still enter the camera). Therefore, cameras need lenses to focus all this light back onto a single point.
A lens bends all incoming light the same. This means that light bouncing off objects at different distances from the lens will converge at different points on the other side of it. In the diagram the central object is in focus, because the red lines converge on the projection plane. The green light converges too early because the object is too far away. The blue light converges too late because the object is too close.
What the focus control on your camera does is move the lens backwards and forwards. You can see that moving the lens away from the projection plane would mean that the blue lines converge on the plane, so closer objects would be in focus.
There is a technical term, circle of confusion (CoC), which is the circular area over which the light from an object is focussed over on the projection plane. The red lines show a very tiny CoC, while the blue lines show a larger one. The green lines show the largest CoC of the three objects, as the light is spread out over a large area. This is what causes the blur on out of focus objects, as their light is spread over the image. This picture is a great example of this effect, where the light from each individual bulb on the Christmas tree is spread into a perfect circle:
The circle of confusion doesn’t always appear circular. It is circular in some cases because the aperture of the camera is circular, letting in light from a full cone. When the aperture is partly closed it becomes more pentagonal/hexagonal/octagonal, depending on how many blades make up the aperture. Light is blocked by the blades, so the CoC will actually take the shape of the aperture.
This lens has an aperture with six blades, so will give a hexagonal circle of confusion:
So why is simulating Bokeh important? It can be used for artistic effect because it gives a nice quality to the blur, and also it will give you a more believable image because it will simulate how a camera actually works. Applying a Gaussian blur to the Christmas tree picture would give an indistinct blurry mess, but the Bokeh makes the individual bright lights stand out even though they are out of focus.
Here is the difference between applying a Bokeh blur to a bright pixel, compared to a Gaussian blur. As you can see, the Gaussian smears out a pixel without giving those distinct edges:
Using Bokeh in real-time graphics
In principle, Bokeh depth of field isn’t complicated to implement in a game engine. For any pixel you can work out the size of the CoC from the depth, focal length and aperture size. If the CoC is smaller than one pixel then it’s completely in focus, otherwise the light from that pixel will be spread over a number of pixels, depending on the CoC size. The use of real camera controls such as aperture size and focal length means that your game camera now functions much more like a real camera with the same settings, and setting up cameras will be easier for anyone who is familiar with real cameras.
In practice, Bokeh depth of field isn’t trivial to implement in real-time. Gaussian blurs are relatively fast (and downsize/upscaling is even faster) which is why these types of blurs were used for years. There aren’t any similarly quick methods of blurring an image with an arbitrary shaped blur (i.e. to get a blur like the left image above, rather than the right).
However, GPUs are getting powerful enough to use a brute force approach, which is the approach that was introduced in Unreal Engine 3. You draw a texture of your Bohek shape (anything you like), and then for each pixel in your image you work out the CoC size (from the depth and camera settings). Then to make your final image, instead of drawing a single pixel for each pixel in the original image, you draw a sprite using the Bokeh texture. Draw the sprite the same colour as the original pixel, and the same size as the CoC. This will accurately simulate the light from a pixel being spread over a wide area. Here it is in action, courtesy of Unreal Engine:
The downside of this technique is that it’s very slow. If your maximum Bokeh sprite size is, say, 8 pixels wide, then in the worst case each pixel in the final image will be made up of 64 composited textures. Doubling the width of the blur increases the fill cost by four times. This approach looks really nice, but you need to use some tricks to get it performing well on anything but the most powerful hardware (for example, draw one sprite for every 2×2 block of pixels to reduce the fill cost).
An alternative method
There is a good alternative method that I like which is much quicker to draw, and I shall go through that soon in a more technical post. It was presented in this talk from EA at Siggraph 2011, but it takes a bit of thought to decipher the slides into a full implementation so I’ll try to make it clearer. This is actually the technique I use in my Purple Space demo.