Research Topics

My current research interests cover several aspects of 3D modeling from images. For more information, see also the publications page.

Structure from Motion

What Can Missing Correspondences Tell Us About 3D Structure and Motion?

Practically all existing approaches to structure and motion computation use only positive image correspondences to verify the camera pose hypotheses. Incorrect epipolar geometries are solely detected by identifying outliers among the found correspondences. Ambigous patterns in the images are often incorrectly handled by these standard methods. In this work we propose two approaches to overcome such problems. First, we apply non-monotone reasoning on view triplets using a Bayesian formulation. In contrast to two-view epipolar geometry, image triplets allow the prediction of features in the third image. Absence of these features (i.e. missing correspondences) enables additional inference about the view triplet. Furthermore, we integrate these view triplet handling into an incremental procedure for structure and motion computation. Thus, our approach is able to refine the maintained 3D structure when additional image data is provided.

Incorrectly merged reconstruction Correctly separated model for scene 1 Correctly separated model for scene 2

C. Zach, A. Irschara, H. Bischof. CVPR 2008.

Towards Wiki-based Dense City Modeling

This work reports on the advances and on the current status of a terrestrial city modeling approach, which uses images contributed by end-users as input. Hence, the Wiki principle well known from textual knowledge databases is transferred to the goal of incrementally building a virtual representation of the occupied habitat. In order to achieve this objective, many state-of-the-art computer vision methods must be applied and modified according to this task. We describe the utilized 3D vision methods and show initial results obtained from the current image database acquired by in-house participants.

Sparse urban reconstruction Dense textured model

A. Irschara, C. Zach, H. Bischof. VRML Workshop in conjunction with ICCV 2007.

Multi-view stereo and dense 3D modeling

A Globally Optimal Algorithm for Robust TV-L^1 Range Image Integration

Robust integration of range images is an important task for building high-quality 3D models. Since range images, and in particular range maps from stereo vision, may have a substantial amount of outliers, any integration approach aiming at high-quality models needs an increased level of robustness. Additionally, a certain level of regularization is required to obtain smooth surfaces. Computational efficiency and global convergence are further preferable properties. The contribution of this paper is a unified framework to solve all these issues. Our method is based on minimizing an energy functional consisting of a total variation (TV) regularization force and an L^1 data fidelity term. We present a novel and efficient numerical scheme, which combines the duality principle for the TV term with a point-wise optimization step. We demonstrate the superior performance of our algorithm on the well-known Middlebury multi-view database and additionally on real-world multi-view images.

Temple ring model - view 1 Temple ring model - view 2 Dino ring model - view 1 Dino ring model - view 2

C. Zach, T. Pock, H. Bischof. ICCV 2007.

Mumford-Shah Meets Stereo: Integration of Weak Depth Hypotheses

Recent results on stereo indicate that an accurate segmentation is crucial for obtaining faithful depth maps. Variational methods have successfully been applied to both image segmentation and computational stereo. In this paper we propose a combination in a unified framework. In particular, we use a Mumford-Shah-like functional to compute a piecewise smooth depth map of a stereo pair. Our approach has two novel features: First, the regularization term of the functional combines edge information obtained from the color segmentation with flow-driven depth discontinuities emerging during the optimization procedure. Second, we propose a robust data term which adaptively selects the best matches obtained from different weak stereo algorithms. We integrate these features in a theoretically consistent framework. The final depth map is the minimizer of the energy functional, which can be solved by the associated functional derivatives. The underlying numerical scheme allows an efficient implementation on modern graphics hardware. We illustrate the performance of our algorithm using the Middlebury database as well as on real imagery.

T. Pock, C. Zach, H. Bischof. CVPR 2007.

Efficient computer vision methods on the GPU

A Duality Based Approach for Realtime TV-L^1 Optical Flow

Variational methods are among the most successful approaches to calculate the optical flow between two image frames. A particularly appealing formulation is based on total variation (TV) regularization and the robust L^1 norm in the data fidelity term. This formulation can preserve discontinuities in the flow field and offers an increased robustness against illumination changes, occlusions and noise. In this work we present a novel approach to solve the TV-L^1 formulation. Our method results in a very efficient numerical scheme, which is based on a dual formulation of the TV energy and employs an efficient point-wise thresholding step. Additionally, our approach can be accelerated by modern graphics processing units. We demonstrate the real-time performance (30 fps) of our approach for video inputs at a resolution of 320x240 pixels.

Rheinhafen sequence frames Rheinhafen sequence flow field Live video frames Live video flow field
(Move the mouse over the source images to see the other frame.)

C. Zach, T. Pock, H. Bischof. DAGM 2007.