In this section we will try to formulate an answer to the following questions. What do images tell us about a 3D scene? How can we get 3D information from these images? What do we need to know beforehand? A few problems and difficulties will also be presented.
An image like in Figure 1.1 tells us a lot about the observed scene. There is however not enough information to reconstruct the 3D scene (at least not without doing an important number of assumptions on the structure of the scene). This is due to the nature of the image formation process which consists of a projection from a three-dimensional scene onto a two-dimensional image. During this process the depth is lost.
Figure 1.2 illustrates this. The three-dimensional point corresponding to a specific image point is constraint to be on the associated line of sight. From a single image it is not possible to determine which point of this line corresponds to the image point.
An image of a scene
If two (or more) images are available, then -as can be seen from Figure 1.3- the three-dimensional point can be obtained as the intersection of the two line of sights. This process is called triangulation. Note, however, that a number of things are needed for this:
Back-projection of a point along the line of sight.
The relation between an image point and its line of sight is given by the camera model (e.g. pinhole camera) and the calibration parameters. These parameters are often called the intrinsic camera parameters while the position and orientation of the camera are in general called extrinsic parameters.
- Corresponding image points
- Relative pose of the camera for the different views
- Relation between the image points and the corresponding line of sight
In the following chapters we will learn how all these elements can be retrieved from the images. The key for this are the relations between multiple views which tell us that corresponding sets of points must contain some structure and that this structure is related to the poses and the calibration of the camera.
Reconstruction of three-dimensional point through triangulation.
Note that different viewpoints are not the only depth cues that are available in images. In Figure 1.4 some other depth cues are illustrated. Although approaches have been presented that can exploit most of these, in this text we will concentrate on the use of multiple views.
Shading (top-left), shadows/symmetry/silhouette (top-right), texture (bottom-left) and focus (bottom-right) also give some hints about depth or local geometry.
In Figure 1.5 a few problems for 3D modeling from images are illustrated. Most of these problems will limit the application of the presented method. However, some of the problems can be tackled by the presented approach.
Another type of problems is caused when the imaging process does not satisfy the camera model that is used. In Figure 1.6 two examples are given. In the left image quite some radial distortion is present. This means that the assumption of a pinhole camera is not satisfied. It is however possible to extend the model to take the distortion into account. The right image however is much harder to use since an important part of the scene is not in focus. There is also some blooming in that image (i.e. overflow of CCD-pixel to the whole column). Most of these problems can however be avoided under normal imaging circumstance.
Some difficult scenes: moving objects (top-left), complex scene with many discontinuities (top-right), reflections (bottom-left) and another hard scene (bottom-right).
Some problems with image acquisition: radial distortion (left), un-focussed and blooming (right).