Omnidirectional Camera Image
Roaming
Li Guan and Changchang Wu
{lguan, ccwu}@cs.unc.edu
Department of Computer Science

Figure 1. Left: 360 One VR Omnidirectional Camera [1] mounted on Canon S1 IS together with a tripod. Right: a typical image taken with this setup.
Abstract
We present a general framework for roaming through
omnidirectional images. After camera calibration, raw images are unwarped to
projective images. We acclaim that a topological graph of the camera shooting
path can be build based on image feature point. Image morphing is then applied
to achieve roaming though the image sequence. By globally minimizing the
distance between neighboring nodes in the graph, we are able to display the map
decently. Finally, we provide a general GUI for exploring the environment via
the image sequence smoothly.
The task that we want to achieve in this
project is an automatic topological map construction, given a sequence of
omnidirectional camera images, as well as smoothly roaming through these
images. Originally, we proposed to mount the omnidirectional camera system,
(Fig. 1), on an ER1 Personal Robot System [2], extending our ‘Sensor-based
Motion Planning’, 2005 [3]. Unfortunately due to the robot main board damage,
we can only finish the vision system with simulation so far. Basically we
assume four kinds of robot motion: (1) forward, (2) backward, (3) turn left 90
degrees, and (4) turn right 90 degrees. The robot motion simulation is designed
such that merging the robot platform and vision system is expectedly trivial.
We talk about omnidirectional camera
calibration and image unwarping in Section 2, image feature point extraction and
matching in Section 3, image linking and
loop detection in Section 4, topological map construction in Section 5, image
morphing in Section 6, and introduction to a roaming GUI in Section 7. Finally,
in Section 8, we summarize what have been achieved within the final project
period and what is required to be done in future. The main contributions of
Changchang Wu are Section 3, 4 and 6. The main contributions of Li Guan are
Section 2, 5 and 7, as well as data acquisition.
For the experiment, we have
acquired a sequence of 28 2048x1536 omnidirectional images around a reading
room, which forms a closed loop as well as some overlapping.
Calibrating an
omnidirectional camera implies finding the relation between a given 2D pixel
point
and the 3D vector P emanating
from the mirror effective viewpoint. Only when the mapping is determined, can
we unwarp the distorted images back to projective images, the ones that are
taken from ordinary cameras.
In order to find the
mapping function
, we adopt the mirror model from [4], as shown in Fig. 2,
which maps an image point
into its corresponding 3D vector
. The coefficient
can be taken as the focal length of the camera. Assume that the
mirror is rotationally symmetric,
depends only on the distance of a point from the image center
. And the model describes
by means of a polynomial, whose coefficients are the
calibration parameters to be estimated. That is:
![]()

Figure 2. Mirror model used for
calibration and unwarping.
Then
we are able to solve the linear system of the coefficients by providing 3D
points on a checkerboard plane, as Fig.3 shows. We use 11x8 checkerboard grid
at 8 different positions, i.e. 11x8x8x2=1408 equations for five unknown
parameters.

Figure 3. Mirror surface calibration.
After
linear and nonlinear optimization we get:
![]()
Therefore
the omnidirectional mirror of our system is a parabolic surface. Fig. 4 shows
the reconstructed mirror surface with the coefficients.

Figure 4. Reconstructed mirror surface
with the calibrated coefficients.
There are two reasons we need to have the
unwarping process. Firstly, we can have more natural images for the roaming
performance. Secondly, and which is more important, images will only have
projective distortions, so that we can apply scale and rotation invariant
feature, i.e. SIFT features [5] to perform matching and image correspondence.
Otherwise, the high order distortions would severely weaken the power of SIFT
matching.
The mapping function is as follows.

Where,
x, y, z are 3D coordinates,
alpha is the focal length factor, u and v are
omnidirectional image coordinates, i
and j are unwarped image
coordinates,
is the vertical field
of view and
is the horizontal
filed of view, as shown in Fig. 5. In our unwarping process, ![]()

Figure 5. An illustration of the
unwarping process.
Since in simulation, the ER1 robot has four
directions to travel. We unwarp the original image into four images, i.e. ‘North’,
‘East’, ‘South’, and ‘West’, as shown in Fig. 6.

Figure 6. Unwarp the original image into N, E, S, and W four
images in the local camera shooing coordinates. The horizontal filed of view is
120 degrees, and the vertical field of view is from +45 degrees (upward) to
-150 degrees (downward).
After image unwrapping, vision techniques are
then applied to match the unwrapped images to find the camera movement between
frames.
SIFT (Scale Invariant Feature Transform) [5]
is used in our project to extract features from the unwrapped images. SIFT features are located at the local
extrema of DoG (Difference of Gaussian) Scale space. By computing weight edge
histogram of normalized image patches, which is scaled and rotated to cancel
out similarity transform, SIFT feature is invariant to Similarity transform,
and it is also proved to be robust to perspective change, illumination change.
In our experiments, the unwrapped images normally still have some distortions
at the bottom, but the SIFT works pretty will for most of the image areas.
Figure 7 demonstrates a typical image, in which features scales and feature
orientations are displayed.

Figure 7. an example of SIFT features
in image.
SIFT features can be easily matched by finding
the nearest neighbors in the descriptor space. The features are normally
represented as 128D normalized vectors, and the distance of two SIFT features then
can be simply evaluated by the arccos of their dot products. Figure 8
demonstrates a typical feature matching, in which the blue lines are the tracks
of features.

Figure 8. an example of feature
matching.
A model selection step is then needed to filter out the outliers in the putative matches. The relationship of points in two projective images can be captured by the Fundamental Matrix model. RANSAC technique can be applied to fit this model [6]. Figure 9 is the resulted inliers of the previous image pair. Actually, the images in our experiments sometimes belong to a degenerate case of Fundamental Matrix estimation, because most of the correct feature matches are on a dominant plane. In this case, a technique called QDEGSAC [7] will work better by automatically selecting the rank of models.
Given two unwrapped images, the numbers of inlier feature matches from Fundamental Matrix estimation is then used as the matching score of them.

Figure 9. the inliers computed by RANSAC
Based on the matching scores of projective
image pairs, we can then compute the matching of two frames. Each frame is
initially an omnidirectional image, and later unwrapped to 4 projective
images. Given that we only have 4
different orientations in this particular experiment, there are only 4
possibilities of rotation between two frames: no rotation, left, turn right, turn
and 180 degree turn. For a certain rotation, we match the front and back images
of one frame to the corresponding images of the other, and keep the largest
matching score of them. Then we can try the four different possibilities, and
the largest matching score will be kept as the matching score of two frames,
and the corresponding rotation is the recovered.
If the recovered rotation is zero, we will
assume that the camera is moving forward. This is true in our experiment
because we never move backward, but the two movements should be able to be
distinguished by checking whether the image points are moving away from the epiploe
point. The points are actually moving away from the epiploe when camera is
moving forward, and toward the epiploe when backward.
With the relative movements between every
adjacent frame, it is very easy to compute the absolute orientation of a camera
by accumulating all the rotations. The rest problem will be to detect a look
when the camera comes back to a same place after several frames.
Given the linking relationship of the images,
we are now able to construct a topological graph for further use. First of all,
we store the image linking relations into a table. To be more specific, given
the shooting path as in Fig. 10, the table should look like Tab. 1

Figure 10. One corner of the image shooting path of 25 images.
Notice the orientation of the numbers denotes the orientation in the global
coordinate when shot the image.
|
Node ID |
North (1) |
East (2) |
South (3) |
West (4) |
|
#1 |
2.1 |
0 |
25.2 |
0 |
|
#2 |
3.1 |
0 |
1.3 |
0 |
|
#3 |
4.1 |
0 |
2.3 |
0 |
|
#4 |
5.1 |
0 |
3.3 |
0 |
|
#5 |
6.1 |
0 |
4.3 |
0 |
|
#6 |
7.1 |
0 |
5.3 |
0 |
|
#7 |
8.1 |
0 |
6.3 |
0 |
|
#8 |
0 |
0 |
7.3 |
9.1 |
|
#9 |
10.1 |
0 |
8.2 |
0 |
|
#10 |
11.1 |
0 |
9.3 |
0 |
|
#11 |
12.1 |
0 |
10.3 |
0 |
|
#12 |
0 |
0 |
11.3 |
13.1 |
|
… |
… |
… |
… |
… |
Table 1
This table stores the information guiding the
morphing process in Section 6. There are four directions for every node, i.e.
North (1), East (2), South (3), and West (4). Each table element consists of
the integer part I, and the decimal part D. I indicates
the node ID that the current node is linked to. D indicates which
orientation image should the current image morph to when moving from the
current node to node I. If for a certain node, there is no neighboring
node in a certain direction, the respective element in the table is assigned 0.
As an example, for node #
Notice that we only store the local
orientations in the table, because only local orientations and turnings are
detected in the image linking and looping stage. Although, this is feasible for
morphing, it is very confusing for the topological map construction. However,
since the morphing procedure is repeated in the roaming process, while the
topological map is built once for all at the beginning, we still store the
local information as in Tab.
|
Node ID |
North (1) |
East (2) |
South (3) |
West (4) |
|
#1 |
2 |
0 |
25 |
0 |
|
#2 |
3 |
0 |
1 |
0 |
|
#3 |
4 |
0 |
2 |
0 |
|
#4 |
5 |
0 |
3 |
0 |
|
#5 |
6 |
0 |
4 |
0 |
|
#6 |
7 |
0 |
5 |
0 |
|
#7 |
8 |
0 |
6 |
0 |
|
#8 |
0 |
0 |
7 |
9 |
|
#9 |
0 |
8 |
0 |
10 |
|
#10 |
0 |
9 |
0 |
11 |
|
#11 |
0 |
10 |
0 |
12 |
|
#12 |
0 |
11 |
13 |
0 |
|
… |
… |
… |
… |
… |
Table 2
But constructing a decent looking map from
this global relationship table is still nontrivial, because in general there
will be different number of nodes on every straight path, therefore the
topological edge between two nodes should not be always the same length
otherwise a loop will not be closed, when we travel incrementally from a
starting node and back to it again as map construction strategy. When we have
more than one loop in the map, thing becomes more messy because, even when you
are lucky to close one loop be local adjustment, it is difficult to close all
the loops by that means. As a matter of fact, the adjustment is a global
problem, and we thus come up with an idea that model this as a global
minimization problem. And here are the formulae for the setup for Fig. X

Where (xi,yi)
are the global coordinates for node i.
And we constrain the edges between neighboring nodes are no less than 1 unit.
And Fig. X is the output of the MATLAB minimization and plotting.
|
Node ID |
North (1) |
East (2) |
South (3) |
West (4) |
|
#1 |
2.1 |
0 |
9.2 |
0 |
|
#2 |
3.1 |
12.3 |
1.3 |
0 |
|
#3 |
0 |
0 |
2.3 |
4.1 |
|
#4 |
5.1 |
16.2 |
3.2 |
0 |
|
#5 |
0 |
0 |
4.3 |
6.1 |
|
#6 |
0 |
0 |
5.2 |
7.1 |
|
#7 |
8.1 |
0 |
6.2 |
0 |
|
#8 |
9.1 |
0 |
7.3 |
0 |
|
#9 |
10.1 |
0 |
8.3 |
1.1 |
|
#10 |
0 |
0 |
9.3 |
11.1 |
|
#11 |
0 |
0 |
10.2 |
12.1 |
|
#12 |
2.4 |
13.1 |
11.2 |
0 |
|
#13 |
0 |
0 |
12.4 |
14.1 |
|
#14 |
15.1 |
0 |
13.2 |
0 |
|
#15 |
16.1 |
0 |
14.3 |
0 |
|
#16 |
0 |
0 |
15.3 |
4.4 |
Table 3
To
evaluate the efficiency of this method, we randomly construct a table similar
to Tab. 1 as in Tab. 3, and run the minimization process. And the output is in
Fig. 11, from which we can see that this is a difficult problem since there are
three loops and most of the straight paths have different number of nodes from
the path on the opposite side of the same loop.

Figure 11. A challenging map construction test from Tab. 3. The
oriented number beside each red node is the node number. The numbers on the
axes indicate the positions of the map in the global coordinate.
With the topological map and the entire set
of pre-captured frames, an image morphing based virtual exploration is then
implemented. A person will be allowed to move along the existing path along the
original or reverse directions, and also be allowed to turn and look around.
When the person wants to turn left or right,
a rotational morphing is used to generate smooth video. Since we’ve already
implemented the unwrapping function, this part will be very easy, and we only
need to generate the unwrapped image from a different view range.
When the person is moving forward or backward
along the path, we use a morphing between the current images to the next images
to visualize a smooth transition. If the matching of those two images does not
have a large number of feature matching, a simple linear morphing is used. This
happens at the places where there are large non-textured regions like the wall.
If the current image and the next image have
enough number of feature matching, a triangulation-based morphing will be used.
Due to perspective change, the relative positions of points will change from
one image to another, and the triangulation of feature points will be
contaminated. A fix step is used to detect the bad point in the triangulation
to make sure all the triangles in the first image correspond to triangles with
the same orientation in the other. When some triangle changes orientation from
one image to the other, the point will the largest displacement will be removed
from the point set. By doing this fix step several times, we will have a
mapping from triangulation in one image to the triangulation in the other. Then
a linear morphing between triangles will be used.


Figure 12. a mapping of triangulations
between two images
This will give very good transition in almost
all the triangles except the triangles connecting the four image corners. This
is because the image corner in one image dose not math the image corner of the
other. We then use the epipole to give the image corner some approximate
displacements to make it a little better.
The epiploe is the image in on view of the camera
center of the other, and particularly
in forward motion, there will be an effect of all the pixels moving away from
the epiploe, for example in Figure 13.

Figure 13. an example of epiploe lines in forward motion
Given the polygon from all the inter feature
points, we approximate the displacement of corners by making the image corners
moving at the same speed as the intersection point of its epiploe line with the
polygon. For example in Figure 14, we assume corner c has the same speed as the its intersection point p, then we move the image corner c

(a) Morph image

(b) first image

(c) Second image
Figure 14. Image corner displacement

Figure 15. Morphing of the two images in Figure 8.
We have built a GUI using MATLAB to
facilitate the roaming through images, as shown in Fig. 16. The roaming window
is on the left. The map as well as the current location and orientation of the
robot are on the right. The pull-down dialog box specifies which image sequence
is about to be browsed. The “Show triangulation” radio box is checked, if the
user would like to see the tessellation during the forward and backward roaming
period. If at the current location, the robot can move in a certain direction,
the respective direction button on the bottom-right will be enabled. The user
can rotate the view 360 degrees at anytime he/she wishes to.

Figure 16. The GUI for user’s easy roaming though images.
In this project, we are able to automatically
determine neighboring information purely from image feature matching, and
detect the occurrence of a loop in the shooting path. We then based on this
local information, build decent topological map by deriving the global
adjustment problem into a minimization problem. Finally, we create a GUI which
displays morphing through these images as a visitor roaming system.
In the future, the whole computation process
can be integrated as a complete product. We can try more complicated datasets
and mounting the system onto a real robot platform. Also challenges remain in
more than 4 directions of freedom. And finally, the calibration and the
unwarping algorithm can be improved, by searching for the omnidirectional
center first.
[2]
http://www.evolution.com/er1/
[3]
http://www.cs.unc.edu/~lguan/COMP290-58.files/COMP290-58.htm/
[4]
Davide Scaramuzza,
Omnidirectional Camera Calibration Toolbox, http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=12384&objectType=file
[5]
D. G. Lowe.
Distinctive image features from scale-invariant keypoints. International
Journal of Computer Vision, 60, November 2004. http://www.cs.ubc.ca/
lowe/keypoints/.
[6]
R. I. Hartley and
A. Zisserman. Multiple View Geometry in Computer Vision.
[7]
Frahm, J. and
Pollefeys, M. 2006. RANSAC for (Quasi-) Degenerate data (QDEGSAC). In
Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition - Volume 1. CVPR. IEEE Computer Society,