The first part of the solution consists of detecting the cases where only planar features are being matched. The Geometric Robust Information Criterion (GRIC) model selection approach proposed in  is briefly reviewed. The GRIC selects the model with the lowest score. The score of a model is obtained by summing two contributions. The first one is related to the goodness of the fit and the second one is related to the parsimony of the model.
It is important that a robust Maximum Likelihood Estimator (MLE) be used for estimating the different structure and motion models being compared through GRIC.
GRIC takes into account the number of inliers plus outliers, the residuals , the standard deviation of the measurement error , the dimension of the data , the number of motion model parameters and the dimension of the structure:
For each image pair GRIC and GRIC can be compared. If GRIC yields the lowest value it is assumed that most matched features are located on a dominant plane and that a homography model is therefore appropriate. On the contrary, when GRIC yields the lowest value one could assume, as did Torr , that standard projective structure and motion recovery could be continued. In most cases this is correct, however, in some cases this might still fail. An illustration of the problem is given on the left side of Figure 5.6 where both and could be successfully computed, but where structure and motion recovery would fail because all features common to the three views are located on a plane.
For the reason described above we propose to use the GRIC criterion on triplets of views (). On the one hand we have GRIC(PPP) based on a model containing 3 projection matrices (up to a projective ambiguity) with and (note that using a model based on the trifocal tensor would be equivalent), on the other hand we have GRIC(HH) based on a model containing 2 homographies with and . To efficiently compute the MLE of both PPP and HH the sparse structure of the problem is exploited (similar to bundle adjustment). We can now differentiate between two different cases: Case A: GRIC(PPP)GRIC(HH): three views observe general 3D structure. Case B: GRIC(PPP)GRIC(HH): common structure between three views is planar. Note that it does not make sense to consider mixed cases such as HF or FH since for structure and motion recovery triplets are needed which in these cases would all be located on a plane anyway.
Note that in addition, one should verify that a sufficient number of triplets remain (say more than ) to allow a reliable estimation. When too few points are seen in common over three views, the sequence is also split up. In a later stage it can be reassembled (using the procedure laid out in Section 5.4.3). This avoids the risk of a (slight) change of projective basis due to an unreliable estimation based on too few points. Note that it is important to avoid this, since this would mean that different transformations would be required to bring the different parts of the recovered structure and motion back to a metric reference frame. In practice this causes self-calibration to fail and should therefore be avoided.