When we want to draw something in 3D on a screen, what we’re really doing is trying to draw a flat picture of it as it would look on a film, or projected onto the retina in your eye. So we have an “eye” position, which is the point from which the scene is viewed, and we have the projection plane, which is the “film” of the camera.

When you take an image with a camera the image is projected reversed onto the film, because the projection plane is behind the lens (you can see this by looking at the path the light takes through the lens, in red). When rendering, it’s conceptually simpler to think of the projection plane as being in front of the eye. It’s also easier to see what we mean by a projection – we can think of the projection plane as being a window in front of the eye, through which we see the world. Each pixel wants to be drawn the colour of the light that passes through the corresponding part of the “window”, to reach the eye.
A basic property of a camera is the Field Of View (FOV). Cameras don’t capture the scene in a full 360 degrees, and the FOV is the angle which it can see, which is the angle between the red lines in the diagram. Continuing the window analogy, there are two ways to change the field of view: you can make the window bigger or smaller, or you can stand closer or further away from the window. Both of these will alter how much of the world you can see on the other side.
The most basic concept in 3D is perspective. It’s so simple that it’s been explained by Father Ted. Perspective just means that the further away things are, the smaller they look. Further to that, the size reduction is proportional to the distance. What this means is that something twice as far away will look half as big (specifically, half as big when you measure a length, the area will be a quarter of the size). So if you want to work out how big something will be on the screen, you divide the size by the distance from the eye position.
To start rendering in 3D we just need to know a few numbers that define the “camera view” that will be used to draw the scene. These are the size of the projection plane, and the distance it is from the eye (the projection plane is always some small distance in front of the eye to avoid nastiness later on with divide-by-zero and things).

In the diagram, take one grid square to be 1 unit in size. It makes no difference what the units are, as long as you’re consistent. For simplicity let’s work in metres. So in this diagram we can see the two pieces of information we need. The distance from the camera to the projection plane (called the camera near distance) is 1 metre, and the size of the projection plane is around 1.5 metres (specific numbers don’t matter at this point). You can see the field of view that this arrangement gives in red. In this diagram we want to draw the blue triangle, so we need to know where the three corner vertices will projected to on the projection plane.
Positions in 3D space are given using three coordinates, x, y, and z. These specify the distance along the x axis, y axis and z axis respectively, where the three axes are perpendicular to each other. There are various different coordinate spaces used in rendering, where coordinate space means the orientation of these three axes. For example, world space is where things are in you ‘world’, i.e. the scene that you are rendering, so there is the origin (0, 0, 0) at some fixed point in the world and everything is positioned relative to that. In this case x and z specify the horizontal position and y specifies the height.
The coordinate space we’re interested in at the moment though is camera space. In camera space, X is the distance left or right in your window, y is the distance up or down, and z is the distance forwards and backwards, i.e. into or out of the window. The origin is at the eye position and the camera traditionally looks along the negative z axis, so in the diagram the z axis will point to the right. The diagram is 2D so only shows one of the other axes, so we’ll ignore the third one for now.
We can now do a bit of simple maths to work out where to draw one of the vertices, the one marked with a dot. The approximate position of the vertex is (1.0, -5.2), by counting the squares in each axis (yes, this is the other way around from your traditional axes on a graph, but that just reinforces the point about different coordinate spaces). So to project this on the screen we simply divide by Z to find the point that the green line intersects the line where Z=-1. This give X = 1.0/-5.2 = -0.192.
Now we need to convert this to screen space, which is as shown is this diagram:

This is where we use the size of the projection plane, and the distance it is from the eye, to find a scaling factor. We said that the projection plane was 1.5m is total, so is 0.75 metres from the centre to each side, and is at -1.0 metres from the eye along the z axis. So the scaling factor is -1.0/0.75 = -1.333.
Now we can combine these to find where on the screen the vertex should be drawn:
X = -0.192*-1.333 = 0.256
There is one final transform that needs to be done, to work out the actual pixel coordinates on the screen. To do this we simply map the -1.0 to 1.0 range of the screen space into the viewport, which is defined in pixels. So if you’re drawing to a widescreen TV the viewport would be 1280×768 pixels in size, so the actual x pixel coordinate of the example would be:
((1.0 + 0.256) * 0.5) * 1280 = 804
Then simply do the same again with the Y axis and you drawn a 3D point! Do this with the other two points as well, and then draw straight lines between them all, and you’ve got a 3D triangle!
This works as long as the camera is at the origin, looking straight down the z axis. Next time I’ll talk a bit about transforms and how this is generalised to any view.