Omnidirectional Camera Image Roaming



Li Guan and Changchang Wu

{lguan, ccwu}@cs.unc.edu

University of North Carolina at Chapel Hill

Department of Computer Science




 

Figure 1. Left: 360 One VR Omnidirectional Camera [1] mounted on Canon S1 IS together with a tripod. Right: a typical image taken with this setup.

 

Abstract

We present a general framework for roaming through omnidirectional images. After camera calibration, raw images are unwarped to projective images. We acclaim that a topological graph of the camera shooting path can be build based on image feature point. Image morphing is then applied to achieve roaming though the image sequence. By globally minimizing the distance between neighboring nodes in the graph, we are able to display the map decently. Finally, we provide a general GUI for exploring the environment via the image sequence smoothly.

1.     Introduction

The task that we want to achieve in this project is an automatic topological map construction, given a sequence of omnidirectional camera images, as well as smoothly roaming through these images. Originally, we proposed to mount the omnidirectional camera system, (Fig. 1), on an ER1 Personal Robot System [2], extending our ‘Sensor-based Motion Planning’, 2005 [3]. Unfortunately due to the robot main board damage, we can only finish the vision system with simulation so far. Basically we assume four kinds of robot motion: (1) forward, (2) backward, (3) turn left 90 degrees, and (4) turn right 90 degrees. The robot motion simulation is designed such that merging the robot platform and vision system is expectedly trivial.

We talk about omnidirectional camera calibration and image unwarping in Section 2, image feature point extraction and matching in Section 3,  image linking and loop detection in Section 4, topological map construction in Section 5, image morphing in Section 6, and introduction to a roaming GUI in Section 7. Finally, in Section 8, we summarize what have been achieved within the final project period and what is required to be done in future. The main contributions of Changchang Wu are Section 3, 4 and 6. The main contributions of Li Guan are Section 2, 5 and 7, as well as data acquisition.

For the experiment, we have acquired a sequence of 28 2048x1536 omnidirectional images around a reading room, which forms a closed loop as well as some overlapping.

2.     Camera calibration & image unwarping

Calibrating an omnidirectional camera implies finding the relation between a given 2D pixel point  and the 3D vector P emanating from the mirror effective viewpoint. Only when the mapping is determined, can we unwarp the distorted images back to projective images, the ones that are taken from ordinary cameras.

2.1 Camera calibration

In order to find the mapping function, we adopt the mirror model from [4], as shown in Fig. 2, which maps an image pointinto its corresponding 3D vector. The coefficientcan be taken as the focal length of the camera. Assume that the mirror is rotationally symmetric, depends only on the distance of a point from the image center. And the model describes by means of a polynomial, whose coefficients are the calibration parameters to be estimated. That is:

      

Figure 2. Mirror model used for calibration and unwarping.

Then we are able to solve the linear system of the coefficients by providing 3D points on a checkerboard plane, as Fig.3 shows. We use 11x8 checkerboard grid at 8 different positions, i.e. 11x8x8x2=1408 equations for five unknown parameters.

Figure 3. Mirror surface calibration.

After linear and nonlinear optimization we get:

       

Therefore the omnidirectional mirror of our system is a parabolic surface. Fig. 4 shows the reconstructed mirror surface with the coefficients.

Figure 4. Reconstructed mirror surface with the calibrated coefficients.

2.2 Image unwarping

There are two reasons we need to have the unwarping process. Firstly, we can have more natural images for the roaming performance. Secondly, and which is more important, images will only have projective distortions, so that we can apply scale and rotation invariant feature, i.e. SIFT features [5] to perform matching and image correspondence. Otherwise, the high order distortions would severely weaken the power of SIFT matching.

The mapping function is as follows.

Where, x, y, z are 3D coordinates, alpha is the focal length factor, u and v are omnidirectional image coordinates, i and j are unwarped image coordinates,   is the vertical field of view and  is the horizontal filed of view, as shown in Fig. 5. In our unwarping process,

Figure 5. An illustration of the unwarping process.

Since in simulation, the ER1 robot has four directions to travel. We unwarp the original image into four images, i.e. ‘North’, ‘East’, ‘South’, and ‘West’, as shown in Fig. 6.

Figure 6. Unwarp the original image into N, E, S, and W four images in the local camera shooing coordinates. The horizontal filed of view is 120 degrees, and the vertical field of view is from +45 degrees (upward) to -150 degrees (downward).

3.     Feature extraction & matching

After image unwrapping, vision techniques are then applied to match the unwrapped images to find the camera movement between frames.

SIFT (Scale Invariant Feature Transform) [5] is used in our project to extract features from the unwrapped images.  SIFT features are located at the local extrema of DoG (Difference of Gaussian) Scale space. By computing weight edge histogram of normalized image patches, which is scaled and rotated to cancel out similarity transform, SIFT feature is invariant to Similarity transform, and it is also proved to be robust to perspective change, illumination change. In our experiments, the unwrapped images normally still have some distortions at the bottom, but the SIFT works pretty will for most of the image areas. Figure 7 demonstrates a typical image, in which features scales and feature orientations are displayed.

Figure 7. an example of SIFT features in image.

SIFT features can be easily matched by finding the nearest neighbors in the descriptor space. The features are normally represented as 128D normalized vectors, and the distance of two SIFT features then can be simply evaluated by the arccos of their dot products. Figure 8 demonstrates a typical feature matching, in which the blue lines are the tracks of features.

Figure 8. an example of feature matching.

A model selection step is then needed to filter out the outliers in the putative matches. The relationship of points in two projective images can be captured by the Fundamental Matrix model.  RANSAC technique can be applied to fit this model [6]. Figure 9 is the resulted inliers of the previous image pair. Actually, the images in our experiments sometimes belong to a degenerate case of Fundamental Matrix estimation, because most of the correct feature matches are on a dominant plane. In this case, a technique called QDEGSAC [7] will work better by automatically selecting the rank of models.

Given two unwrapped images, the numbers of inlier feature matches from Fundamental Matrix estimation is then used as the matching score of them.

Figure 9. the inliers  computed by RANSAC

4.     Image linking & loop detection

Based on the matching scores of projective image pairs, we can then compute the matching of two frames. Each frame is initially an omnidirectional image, and later unwrapped to 4 projective images.  Given that we only have 4 different orientations in this particular experiment, there are only 4 possibilities of rotation between two frames: no rotation, left, turn right, turn and 180 degree turn. For a certain rotation, we match the front and back images of one frame to the corresponding images of the other, and keep the largest matching score of them. Then we can try the four different possibilities, and the largest matching score will be kept as the matching score of two frames, and the corresponding rotation is the recovered.

If the recovered rotation is zero, we will assume that the camera is moving forward. This is true in our experiment because we never move backward, but the two movements should be able to be distinguished by checking whether the image points are moving away from the epiploe point. The points are actually moving away from the epiploe when camera is moving forward, and toward the epiploe when backward.

With the relative movements between every adjacent frame, it is very easy to compute the absolute orientation of a camera by accumulating all the rotations. The rest problem will be to detect a look when the camera comes back to a same place after several frames.

Loop can be detected by checking the score of matching current frame with existing frames. First a global threshold is used for ignoring all the pairs whose inlier number is small. Then by checking along the timeline, we say a loop is detected when there is a local maximum of matching score. This can be made more robust after applying a simple de-noise filter on the matching score along the timeline.

5.     Topological map construction

Given the linking relationship of the images, we are now able to construct a topological graph for further use. First of all, we store the image linking relations into a table. To be more specific, given the shooting path as in Fig. 10, the table should look like Tab. 1

Figure 10. One corner of the image shooting path of 25 images. Notice the orientation of the numbers denotes the orientation in the global coordinate when shot the image.

Node ID

North (1)

East (2)

South (3)

West (4)

#1

2.1

0

25.2

0

#2

3.1

0

1.3

0

#3

4.1

0

2.3

0

#4

5.1

0

3.3

0

#5

6.1

0

4.3

0

#6

7.1

0

5.3

0

#7

8.1

0

6.3

0

#8

0

0

7.3

9.1

#9

10.1

0

8.2

0

#10

11.1

0

9.3

0

#11

12.1

0

10.3

0

#12

0

0

11.3

13.1

Table 1

This table stores the information guiding the morphing process in Section 6. There are four directions for every node, i.e. North (1), East (2), South (3), and West (4). Each table element consists of the integer part I, and the decimal part D. I indicates the node ID that the current node is linked to. D indicates which orientation image should the current image morph to when moving from the current node to node I. If for a certain node, there is no neighboring node in a certain direction, the respective element in the table is assigned 0. As an example, for node #8 in Tab. 1, it has two neighboring nodes, #7 on the south, and #9 on the west. When the robot would move from #8 node position to #9, it has to first turn to west, and move forward to #9’s local north direction, so the fourth element of node #8 should be 9.1, where “1” denotes the north image of #9 in its local coordinates.

Notice that we only store the local orientations in the table, because only local orientations and turnings are detected in the image linking and looping stage. Although, this is feasible for morphing, it is very confusing for the topological map construction. However, since the morphing procedure is repeated in the roaming process, while the topological map is built once for all at the beginning, we still store the local information as in Tab. 1 in the program, while compute a linking table in global coordinates once before the topological map construction. If we assume the north direction of node #1 is the global orientation, we can have the global table as below.

Node ID

North (1)

East (2)

South (3)

West (4)

#1

2

0

25

0

#2

3

0

1

0

#3

4

0

2

0

#4

5

0

3

0

#5

6

0

4

0

#6

7

0

5

0

#7

8

0

6

0

#8

0

0

7

9

#9

0

8

0

10

#10

0

9

0

11

#11

0

10

0

12

#12

0

11

13

0

Table 2

But constructing a decent looking map from this global relationship table is still nontrivial, because in general there will be different number of nodes on every straight path, therefore the topological edge between two nodes should not be always the same length otherwise a loop will not be closed, when we travel incrementally from a starting node and back to it again as map construction strategy. When we have more than one loop in the map, thing becomes more messy because, even when you are lucky to close one loop be local adjustment, it is difficult to close all the loops by that means. As a matter of fact, the adjustment is a global problem, and we thus come up with an idea that model this as a global minimization problem. And here are the formulae for the setup for Fig. X

Where (xi,yi) are the global coordinates for node i. And we constrain the edges between neighboring nodes are no less than 1 unit. And Fig. X is the output of the MATLAB minimization and plotting.

Node ID

North (1)

East (2)

South (3)

West (4)

#1

2.1

0

9.2

0

#2

3.1

12.3

1.3

0

#3

0

0

2.3

4.1

#4

5.1

16.2

3.2

0

#5

0

0

4.3

6.1

#6

0

0

5.2

7.1

#7

8.1

0

6.2

0

#8

9.1

0

7.3

0

#9

10.1

0

8.3

1.1

#10

0

0

9.3

11.1

#11

0

0

10.2

12.1

#12

2.4

13.1

11.2

0

#13

0

0

12.4

14.1

#14

15.1

0

13.2

0

#15

16.1

0

14.3

0

#16

0

0

15.3

4.4

Table 3

To evaluate the efficiency of this method, we randomly construct a table similar to Tab. 1 as in Tab. 3, and run the minimization process. And the output is in Fig. 11, from which we can see that this is a difficult problem since there are three loops and most of the straight paths have different number of nodes from the path on the opposite side of the same loop.

Figure 11. A challenging map construction test from Tab. 3. The oriented number beside each red node is the node number. The numbers on the axes indicate the positions of the map in the global coordinate.

6.     Image morphing

With the topological map and the entire set of pre-captured frames, an image morphing based virtual exploration is then implemented.  A person will be allowed to move along the existing path along the original or reverse directions, and also be allowed to turn and look around.

When the person wants to turn left or right, a rotational morphing is used to generate smooth video. Since we’ve already implemented the unwrapping function, this part will be very easy, and we only need to generate the unwrapped image from a different view range. 

When the person is moving forward or backward along the path, we use a morphing between the current images to the next images to visualize a smooth transition. If the matching of those two images does not have a large number of feature matching, a simple linear morphing is used. This happens at the places where there are large non-textured regions like the wall.

If the current image and the next image have enough number of feature matching, a triangulation-based morphing will be used. Due to perspective change, the relative positions of points will change from one image to another, and the triangulation of feature points will be contaminated. A fix step is used to detect the bad point in the triangulation to make sure all the triangles in the first image correspond to triangles with the same orientation in the other. When some triangle changes orientation from one image to the other, the point will the largest displacement will be removed from the point set. By doing this fix step several times, we will have a mapping from triangulation in one image to the triangulation in the other. Then a linear morphing between triangles will be used.

Figure 12. a mapping of triangulations between two images

This will give very good transition in almost all the triangles except the triangles connecting the four image corners. This is because the image corner in one image dose not math the image corner of the other. We then use the epipole to give the image corner some approximate displacements to make it a little better.

The epiploe is the image in on view of the camera center of the other, and particularly in forward motion, there will be an effect of all the pixels moving away from the epiploe, for example in Figure 13. 

    

Figure 13.  an example of epiploe lines in forward motion

Given the polygon from all the inter feature points, we approximate the displacement of corners by making the image corners moving at the same speed as the intersection point of its epiploe line with the polygon. For example in Figure 14, we assume corner c has the same speed as the its intersection point p, then we move the image corner c1 in the first image to make (c1e1)/(p1e1) =(ce)/(pe). There will be a similar in the second image.  Figure 15 give the result of morphing between the two images in Figure 13, and there is obvious an image corner movement.

(a) Morph image

(b) first image

(c) Second image

Figure 14.  Image corner displacement

Figure 15.  Morphing of the two images in Figure 8.

7.     Roaming GUI

We have built a GUI using MATLAB to facilitate the roaming through images, as shown in Fig. 16. The roaming window is on the left. The map as well as the current location and orientation of the robot are on the right. The pull-down dialog box specifies which image sequence is about to be browsed. The “Show triangulation” radio box is checked, if the user would like to see the tessellation during the forward and backward roaming period. If at the current location, the robot can move in a certain direction, the respective direction button on the bottom-right will be enabled. The user can rotate the view 360 degrees at anytime he/she wishes to.

Figure 16. The GUI for user’s easy roaming though images.

8.     Conclusion & future work

In this project, we are able to automatically determine neighboring information purely from image feature matching, and detect the occurrence of a loop in the shooting path. We then based on this local information, build decent topological map by deriving the global adjustment problem into a minimization problem. Finally, we create a GUI which displays morphing through these images as a visitor roaming system.

In the future, the whole computation process can be integrated as a complete product. We can try more complicated datasets and mounting the system onto a real robot platform. Also challenges remain in more than 4 directions of freedom. And finally, the calibration and the unwarping algorithm can be improved, by searching for the omnidirectional center first.

References

 

[1]     http://www.kaidan.com/

[2]     http://www.evolution.com/er1/

[3]     http://www.cs.unc.edu/~lguan/COMP290-58.files/COMP290-58.htm/

[4]     Davide Scaramuzza, Omnidirectional Camera Calibration Toolbox, http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=12384&objectType=file

[5]     D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, November 2004. http://www.cs.ubc.ca/ lowe/keypoints/.

[6]     R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521623049, 2000.

[7]     Frahm, J. and Pollefeys, M. 2006. RANSAC for (Quasi-) Degenerate data (QDEGSAC). In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1. CVPR. IEEE Computer Society, Washington, DC