next up previous contents
Next: Partial projective structure and Up: Dealing with dominant planes Previous: Dealing with dominant planes   Contents

Detecting dominant planes

The first part of the solution consists of detecting the cases where only planar features are being matched. The Geometric Robust Information Criterion (GRIC) model selection approach proposed in [150] is briefly reviewed. The GRIC selects the model with the lowest score. The score of a model is obtained by summing two contributions. The first one is related to the goodness of the fit and the second one is related to the parsimony of the model. It is important that a robust Maximum Likelihood Estimator (MLE) be used for estimating the different structure and motion models being compared through GRIC. GRIC takes into account the number $n$ of inliers plus outliers, the residuals $e_i$, the standard deviation of the measurement error $\sigma$, the dimension of the data $r$, the number $k$ of motion model parameters and the dimension $d$ of the structure:

\begin{displaymath}
\mbox{GRIC} = \sum \rho(e^2_i)+(nd \ln (r) + k \ln (rn)) \enspace .
\end{displaymath} (E9)

where $\rho(e^2)$
\begin{displaymath}
\rho(e^2)=\min \left(\frac{e^2}{\sigma^2},2(r-d)\right) \enspace .
\end{displaymath} (E10)

In the above equation $nd \ln (r)$ represents the penalty term for the structure having $n$ times $d$ parameters each estimated from $r$ observations and $k \ln (rn)$ represents the penalty term for the motion model having $k$ parameters estimated from $rn$ observations.

For each image pair GRIC$({\bf F})$ and GRIC$({\bf H})$ can be compared. If GRIC$({\bf H})$ yields the lowest value it is assumed that most matched features are located on a dominant plane and that a homography model is therefore appropriate. On the contrary, when GRIC$({\bf F})$ yields the lowest value one could assume, as did Torr [155], that standard projective structure and motion recovery could be continued. In most cases this is correct, however, in some cases this might still fail. An illustration of the problem is given on the left side of Figure 5.6 where both ${\bf F}_{12}$ and ${\bf F}_{23}$ could be successfully computed, but where structure and motion recovery would fail because all features common to the three views are located on a plane.

Figure 5.6: Left: Although each pair contains non-coplanar features, the three views only have coplanar points in common. Right: Illustration of the remaining ambiguity if the position of the center of projection for view 2 corresponds for structure 1-2 and 2-3.
\begin{figure}\centerline{
\epsfig{figure=sam/planecommon.eps, width=6cm}\epsfig{figure=sam/planecommon2.eps, width=6cm}}\end{figure}
Estimating the pose of camera 3 from features reconstructed from views 1 and 2 or alternatively estimating the trifocal tensor from the triplets would yield a three-parameter family of solutions. However, imposing reconstruction 1-2 and reconstruction 2-3 to be aligned (including the center of projection for view 2) would reduce the ambiguity to a one-parameter family of solutions. This ambiguity is illustrated on the right side of Figure 5.6. Compared to the reference frame of cameras 1 and 2 the position of camera 3 can change arbitrarily as long as the epipole in image 2 is not modified (i.e. motion along a line connecting the center of projections of image 2 and 3). Since intersection has to be preserved and the image of the common plane also has to be invariant, the transformation of the rest of space is completely determined. Note -as seen in Figure 5.6- that this remaining ambiguity could still cause an important distortion.

For the reason described above we propose to use the GRIC criterion on triplets of views ($r=6$). On the one hand we have GRIC(PPP) based on a model containing 3 projection matrices (up to a projective ambiguity) with $k=3 \times 11-15=18$ and $d=3$ (note that using a model based on the trifocal tensor would be equivalent), on the other hand we have GRIC(HH) based on a model containing 2 homographies with $k=2 \times 8= 16$ and $d=2$. To efficiently compute the MLE of both PPP and HH the sparse structure of the problem is exploited (similar to bundle adjustment). We can now differentiate between two different cases: Case A: GRIC(PPP)$<$GRIC(HH): three views observe general 3D structure. Case B: GRIC(PPP)$>$GRIC(HH): common structure between three views is planar. Note that it does not make sense to consider mixed cases such as HF or FH since for structure and motion recovery triplets are needed which in these cases would all be located on a plane anyway.

Note that in addition, one should verify that a sufficient number of triplets remain (say more than $50$) to allow a reliable estimation. When too few points are seen in common over three views, the sequence is also split up. In a later stage it can be reassembled (using the procedure laid out in Section 5.4.3). This avoids the risk of a (slight) change of projective basis due to an unreliable estimation based on too few points. Note that it is important to avoid this, since this would mean that different transformations would be required to bring the different parts of the recovered structure and motion back to a metric reference frame. In practice this causes self-calibration to fail and should therefore be avoided.


next up previous contents
Next: Partial projective structure and Up: Dealing with dominant planes Previous: Dealing with dominant planes   Contents
Marc Pollefeys 2002-11-22