In recent years computer graphics has made tremendous progress in visualizing 3D models. Many techniques have reached maturity and are being ported to hardware. This explains that in the area of 3D visualization performance is increasing even faster than Moore's lawA1. What required a million dollar computer a few years ago can now be achieved by a game computer costing a few hundred dollars. It is now possible to visualize complex 3D scenes in real time.
This evolution causes an important demand for more complex and realistic models. The problem is that even though the tools that are available for three-dimensional modeling are getting more and more powerful, synthesizing realistic models is difficult and time-consuming, and thus very expensive. Many virtual objects are inspired by real objects and it would therefore be interesting to be able to acquire the models directly from the real object.
Researchers have been investigating methods to acquire 3D information from objects and scenes for many years. In the past the main applications were visual inspection and robot guidance. Nowadays however the emphasis is shifting. There is more and more demand for 3D content for computer graphics, virtual reality and communication. This results in a change in emphasis for the requirements. The visual quality becomes one of the main points of attention. Therefore not only the position of a small number of points have to be measured with high accuracy, but the geometry and appearance of all points of the surface have to be measured.
The acquisition conditions and the technical expertise of the users in these new application domains can often not be matched with the requirements of existing systems. These require intricate calibration procedures every time the system is used. There is an important demand for flexibility in acquisition. Calibration procedures should be absent or restricted to a minimum.
Additionally, the existing systems are often built around specialized hardware (e.g. laser range finders or stereo rigs) resulting in a high cost for these systems. Many new applications however require robust low cost acquisition systems. This stimulates the use of consumer photo- or video cameras. The recent progress in consumer digital imaging facilitates this. Moore's law also tells us that more and more can be done in software.
Due to the convergence of these different factors, many techniques have been developed over the last few years. Many of them do not require more than a camera and a computer to acquire three-dimensional models of real objects.
There are active and passive techniques. The former ones control the lighting of the scene (e.g. projection of structured light) which on the one hand simplifies the problem, but on the other hand restricts the applicability. The latter ones are often more flexible, but computationally more expensive and dependent on the structure of the scene itself.
Some examples of state-of-the-art active techniques are the simple shadow-based approach proposed by Bouguet and Perona  or the grid projection approach proposed by Proesmans et al. [126,133] which is able to extract dynamic textured 3D shapes (this technique is commercially available, see ). For the passive techniques many approaches exist. The main differences between the approaches consist of the required level of calibration and the amount of interaction that is required.
For many years photogrammetry  has been dealing with the extraction of high accuracy measurements from images. These techniques mostly require very precise calibration and there is almost no automation. The detailed acquisition of models is therefore very time consuming. Besides the tools available for professionals, some simpler tools are commercially available (e.g. PhotoModeler ).
Since a few years researchers in computer vision have tried to both reduce the requirements for calibration and augment the automation of the acquisition. The goal is to automatically extract a realistic 3D model by freely moving a camera around an object.
An early approach was proposed by Tomasi and Kanade . They used an affine factorization method to extract 3D from image sequences. An important restriction of this system is the assumption of orthographic projection.
Another type of system starts from an approximate 3D model and camera poses and refines the model based on images (e.g. Facade proposed by Debevec et al. [22,148]). The advantage is that less images are required. On the other hand a preliminary model must be available and the geometry should not be too complex.
In this text it is explained how a 3D surface model can be obtained from a sequence of images taken with off-the-shelf consumer cameras. The user acquires the images by freely moving the camera around the object. Neither the camera motion nor the camera settings have to be known. The obtained 3D model is a scaled version of the original object (i.e. a metric reconstruction), and the surface albedo is obtained from the image sequence as well. This approach has been developed over the last few years [99,102,103,105,107,111,109,71,112,113,66,101]. The presented system uses full perspective cameras and does not require prior models. It combines state-of-the-art algorithms to solve the different subproblems.