Skip Navigation
Text:
Increase font size
Decrease font size

    VACE: 3D Content Extraction from Video Streams

    Principal Investigator: Marc Pollefeys 
    Funding Agency: IARPA
    Agency Number: NBCHC060154

    Abstract
    We address Content Extraction from Ground Reconnaissance Videos and in particular 3D scene and 3D object modeling and mensuration in 3D space. Our proposed work augments current VACE capabilities with rapid and accurate 3D reconstructions of static and dynamic scenes captured by calibrated or uncalibrated cameras. Camera parameter and motion estimation is a key enabling technology that makes accurate modeling possible. We are currently equipped with a real-time robust pose estimation system for calibrated camera system mounted on a vehicle, which will be extended to handle uncalibrated cameras, as well as other challenges specific to VACE such as scenes captured with handheld cameras. Specifically, we intend to address camera motion estimation in dynamic scenes and auto-calibration and structure from motion of wide-angle lenses for indoors scenes. We propose to leverage 3D scene and object modeling with our ongoing work on the rapid 3D reconstruction of urban scenes from a moving vehicle under the DARPA UrbanScape program. We will investigate different stereo reconstruction algorithms which will provide the user with the ability to process a wider range of Ground Reconnaissance Videos and to adjust the speed and quality of the 3D reconstruction. Dynamic scenes, which are out of the scope of UrbanScape, will be an objective in the proposed system. We will develop an efficient representation for the potentially massive 3D models in the form of a streaming polygonal mesh. This enables standard VRML or X3D viewers to display the models on consumer computers. To automatically process large quantities of video that was not necessarily recorded specifically for the purpose of 3D modeling and mensuration in mind, we will work on two key components. The first one will employ robust model selection algorithms to segment video streams in segments that are appropriate for 3D modeling, segments that only allow for the reconstruction of a 2D panorama (specifically when camera is only rotating or zooming) and segments where the camera is static. Each of these segments can then further be processed using the appropriate algorithms to achieve the richest representation that can be deduced. The second component is aimed at performing recognition of previously observed scenes or archived locations. This is important to allow for localization and classification of video segments as well as to obtain more complete 3D models by fusing information from multiple segments or multiple videos. Once 3D models of the scene and potentially the objects have been computed, tasks such as mensuration and context inference will be performed. Accurate 3D geometry can provide powerful cues for further analysis of the video. These may include whether the video is indoors or outdoors, whether it contains natural or man-made structures, the level of activity in terms of independently moving objects, etc. Finally, we propose a comprehensive evaluation approach for the individual components as well as for the entire system using ground truth. Accurate 3D models will be generated using active sensors and surveying techniques, while GPS and INS sensors will provide ground truth camera motion data.

    Document Actions