One of the most important requirements for a feature point is that it can be differentiated from its neighboring image points. If this were not the case, it wouldn't be possible to match it uniquely with a corresponding point in another image. Therefore, the neighborhood of a feature should be sufficiently different from the neighborhoods obtained after a small displacement.
A second order approximation of the dissimilarity, as defined in Eq. (4.1), between a image window
and a slightly translated image window is given by
| (D3) |
In the case of separate frames as obtained with a still camera, there is the additional requirement that as much image points originating from the same 3D points as possible should be extracted. Therefore, only local maxima of the corner response function are considered as features. Sub-pixel precision can be achieved through quadratic approximation of the neighborhood of the local maxima. A typical choice for
in this case is a Gaussian with
. Matching is typically done by comparing small, e.g.
, windows centered around the feature through SSD or NCC. This measure is only invariant to image translations and can therefore not cope with too large variations in camera pose.
To match images that are more widely separated, it is required to cope with a larger set of image variations. Exhaustive search over all possible variations is computationally untractable. A more interesting approach consists of extracting a more complex feature that not only determines the position, but also the other unknowns of a local affine transformation [164] (see Section 4.1.3).
In practice often far too much corners are extracted. In this case it is often interesting to first restrict the numbers of corners before trying to match them. One possibility consists of only selecting the corners with a value
above a certain threshold. This threshold can be tuned to yield the desired number of features. Since for some scenes most of the strongest corners are located in the same area, it can be interesting to refine this scheme further to ensure that in every part of the image a sufficient number of corners are found.
In figure 4.1 two images are shown with the extracted corners. Note that it is not possible to find the corresponding corner for each corner, but that for many of them it is.
In figure 4.2 corresponding parts of two images are shown. In each the position of 5 corners is indicated. In figure 4.3 the neighborhood of each of these corners is shown. The intensity cross-correlation was computed for every possible combination. This is shown in Table 4.1. It can be seen that in this case the correct pair matches all yield the highest cross-correlation values (i.e. highest values on diagonal). However, the combination 2-5, for example, comes very close to 2-2. In practice, one can certainly not rely on the fact that all matches will be correct and automatic matching procedures should therefore be able to deal with important fraction of outliers. Therefore, further on robust matching procedures will be introduced.
If one can assume that the motion between two images is small (which is needed anyway for the intensity cross-correlation measure to yield good results), the location of the feature can not change widely between two consecutive views. This can therefore be used to reduce the combinatorial complexity of the matching. Only features with similar coordinates in both images will be compared. For a corner located at
, only the corners of the other image with coordinates located in the interval
.
and
are typically
or
of the image.