Sensor-based
Robot Navigation and Map Construction
Li Guan
lguan@cs.unc.edu
Department of Computer Science
Figure 1. ER1 Robot basically consists
of motors, drivers, a laptop and cameras.
Abstract
We present a
general program framework for ER1 robot, which consists of sensing, control and
motion planning module. Based on the platform, we propose a specific
application—topological map building of the third floor of Sitterson Hall based
on one laptop, one webcam and one ER1 robot. Different algorithms are
introduced to achieve robust navigation. Although real topological map is not
completed with the time provided, because of the instability of landmark
recognition, but experiments on simulated data shows the feasibility of map
construction. And we expect that, given the robustness of the navigation
algorithm, the final goal will be achieved shortly.
The tasks that we want to achieve are
collision-free navigation through an indoor environment, corridors, to be
specific, where static as well as dynamic obstacle would exist. The goal also
includes topological map construction of the landmarks in the environment,
e.g., doors, corners, dead ends and sideways.
In 2003, Evolution RoboticsTM introduces the
ER1 Personal Robot System [1], which provides the full programming and
processing power of the personal computer, as well as wireless capabilities, as
shown in Figure 1. Also thanks to the OpenCV library [2], we can intercept
camera output to do more advanced vision algorithms for our special purpose.
And that is why we propose for the final project as single-webcam-based mobile
robot navigation and map reconstruction.
Due to the
complication of the project combining theoretical results as well as practical
expectations, this is never a final report of the project. But the good side is
that given what we have achieved, the final goal is expandable and foreseeable.
In Section 2, we introduce the complete system architecture and talks about
controller, vision and motion planning modules respectively in section 3, 4 and
5. Section 6 discusses map construction. Experimental results and further
discussions are given in Section 7. Finally, in Section 8, we summarize what
have been achieved within the final project period and what is required to be
done in future.
The ER1 robot platform is really a
general-purpose robotic platform. In order to make the full use of this
advantage, in the framework design stage, our main concern is the expandability.
In this section, we have generalized three main tasks related to sensor-based
mobile robots. They are controlling interface (robot driver interface), sensing
interface, and task management. The relationship between the three is depicted
in Figure 2. Therefore, later on, we can concentrate on each sub-problem one at
a time, without interfering with other problems. Another advantage of this
design is that the whole system is driven by input sensor data. With
well-defined application strategies, this closed-loop feedback system is bound
to be robust. This process is also similar to the way humans perform tasks.
For our single-webcam-based mobile robot navigation and map
reconstruction, the three sub-problems are explained as follows.
Figure 2. Three main system modules,
controller, sensor and task . It is a closed-loop data-driven system.
Controller
module: ER1 provides API for
users to directly control the robot. The API is accessed by opening a TCP/IP
socket to RCC and sending text commands along the socket.
Sensor
Module: Since the only
sensor we have in this application is a webcam, the sensor module consists of
mainly vision-based algorithms, information extraction.
To be more specific, there are mainly three
required tasks for the vision module. The first is to infer directions from the
single image as guidance of the direction of the robot. The second one is
obstacle detection, including detections of dynamic/static obstacles as well as
a dead end. The final one is to recognize landmarks of the environment, for
further use like map construction in our application. One thing we should
emphasize is that more tasks can be added to this module too.
Task
management module:
This module takes the data output of the sensor module and do some further
computations. Robot motion planning strategies are actually embedded in this
module. So are some other application-defined tasks such as the map
construction in our case.
The integration happens on hardware level as
well as software level, as shown in Figure 3.
Hardware
Part: It consists of one
webcam, ER1 robot, and laptop PC. The data flow starts from the webcam, and is
grabbed via OpenCV library in real-time to the laptop. After calculation in the
software part in the laptop, new commands are generated and being sent to
through TCP socket to RRC Telnet server then to low-level motor driver.
Figure 3. Architecture of the system in
this paper. Hardware and software parts are separated by the dashed line. All
arrows indicate dataflow.
Software Part: After a new image has been sent to the laptop, the
system dispatches it to front-end vision module to detect obstacles and doors
of the corridor, to direct navigation path and so on. After this high-level
information has been extracted, motion planning algorithms can generate new
commands accordingly, and if a landmark has been detected, the topological map
is updated.
A nice thing about
this framework as mentioned before is its divide-and-conquer strategy—one can
debug the performance of one sub-problem freely without worrying about
interference with other sub-problems. Also this framework is highly expandable.
For example, for a more advanced self-localization problem, we can train the
images we have already acquired as a database, and for new images getting from
the webcam, we might give the posterior probability that we have been to this
shooting spot before. The only thing to add into the framework is then an image
matching and query procedure.
The controller in our application is a socket-based
legal command generator. It is responsible for command assembly and command
parsing as well. Legal ER1 robot commands used in our project are as in Table
1. There is one important command—“events” that listens to the completion of a
specific command. I have not been successful to get it work in the “blocking”
mode of TCP/IP socket, but I have to time to try other sockets right now. I
hope I can finally get it in use, because I need to know the feedback of a
command in some situations. Right now, without this mechanism, I just wait
sufficiently long to generate the next command.
Table 1. ER1 robot command list (partial)
Command |
Sample |
Unit |
Description |
move |
-5 |
cm |
Move forward/backward. |
move |
90 |
degree |
Turn left/right. |
set v |
50 |
cm/s |
Set linear velocity. |
set w |
45 |
degree/s |
Set angular velocity. |
position |
- |
- |
Get current position w.r.t. origin. |
stop |
- |
- |
Halt for any current motion. |
events |
- |
- |
Feedback from the robot about completion of a
command. |
clear |
- |
- |
Throws away all events which have not yet been
sent to the user. |
exit |
- |
- |
Exit the RCC, close all windows. |
We resort to single webcam for sensing,
because this device is highly available and provides relatively good quality
images, comparing with other robots like Sony AIBOTM. The major
disadvantage is that we cannot directly acquire depth information for the robot,
which is a very important for collision-free autonomous navigation and
environment (topological map in this case) construction; therefore we try to extract
depth cues by making full use of the single camera view spatially as well as
temporally. The reason that I don’t use stereo-pair for depth information is
because of the quality and the computation time of those algorithms are not
satisfying. Although BumblebeeTM stereo camera and Canesta TM
3D camera are available for me to use, and they both provide good quality depth
results, they need to carry a power line behind, while USB2.0 webcam does not
have this mobility constraint. But one thing should be pointed out is that,
with the framework provided in Section 3, to substitute the webcam with a
stereo-pair is just straightforward. Also some papers [3]already exist about
inspiring car-shaped robot roaming freely in the bushes with only one color
camera.
Since our main task is performed in an indoor
environment. We can refer to a lot of right angle and parallel line information
to infer and align robot orientation. One of the typical settings is shown in
Figure 4. For direction guidance, we mainly resort to Hough transform.
It is usually used as a global method for
finding straight lines hidden in larger amounts of other data. In fact there
are many advance applications for this method. For line detection in images,
the image is first binarised using some form of thresholding and then the
positive instances catalogued in an examples dataset. Since the implementation
is very standardized, we do not plan to go into detail, but you can refer to J.
Illingworth et al. [4].
Figure
4. Direction
guidance with Hough transform on edge. Left: original image captured by the
webcam. Right: Hough transform result. Only lines with certain slopes are
selected and depicted as blue lines. Wall edges are then averaged and drawn as
green lines. The point at infinity is the intersection of the two green lines.
And we draw a red vertical line through that point for direction guidance.
The original image taken from the camera is
first thresholded for very dark edge pixels between the wall and the floor.
Then Hough transform is performed to get lines having slope between . A mean line segment is calculated as a wall edge for each
of the two sloping lines. They are drawn as green lines in the right image in
Figure 4. The intersection of the two edges is the point at infinity, in which
direction robot should move towards.
We have tried other approaches to find a
proper path to moving on, such as segmentation of the floor out of the image. Once
we encoded all intensity variations of the floor color into one Gaussian; it
turned out almost every pixel on the wall is miss-classified as a floor pixel,
which is not very promising and satisfying.
In fact, in order to improve calculation
speed, we only implement Hough transform to the bottom half of the image. The
top half contains little information regarding wall edge finding, but some
un-expected noise of textures of posters on the wall, as you can observe from
Figure 3.
This is a subtle problem due to non-zero
robot size. Even though the robot’s direction is perfectly aligned with the
point at infinity, the robot might be too close to one side of the wall, and
thus get scratches while it moves forward with unavoidable perturbation. It is
clearly illustrated in Figure 5.
Figure 5. Although the point at
infinity (indicated by the intersection of the green lines) lies perfectly in t
he center of the image, we still have “too close to wall” problem.
But as you may observer, this problem is
well-indicated by the intersection of the edge (the green line) with the bottom
edge of the image—the intersection is too close to the center.
This detection also takes advantage of Hough
transform. This time we just detect horizontal line segments with in an angle
of with respect to the horizon. Sample lines are also shown in
Figure 4.
Figure 6. Door detection with edge
information.
This part is quite challenging, because,
first of all, doors are usually not in the front view of the camera motion. As
shown in Figure 6, if we cast a local window at the edge ending to the nearer
end of the corridor in the image, we would observe a lack of edge pixels when
the door is moving out of the thresholded image on the right.
Although this indication is not very robust,
we have good performance using just this indication over recorded sample video
sequence. And due to time limit, we take it as a high-priority future
improvement direction.
Since the robot is often in motion itself. It
makes no big difference between absolute static obstacles (static with respect
to the ground floor) and dynamic obstacles. Another thought is that, if an
obstacle is relatively static to the robot, it will make no harm to the robot
at all, thus makes no sense to actually detect it. Therefore, we introduce a
general obstacle detection approach exactly based on the relative motion of the
obstacle and the robot.
This approach is
based on the Motion History Image
(MHI) [5] from the camera view. The MHI is a static image template where pixel
intensity is a function of the recency of motion in a sequence. Figure 7 shows
two MHIs of the corridor when the robot is moving forward.
Figure
7. MHIs
of the robot, the darker the intensity is, the older the pixel has been
occupied, which is similar to the trace of fireworks. Left: the motion is
towards right, indicating the robot is turning a little bit towards left while
moving forward. Right: the robot is moving forward while turning a little bit
towards right. In the right image you can clearly observe the shape of an
obstacle on the path.
We then
concentrate within the robot path for obstacle detection. The algorithm is as
follows:
(1)
Get the Motion History Image;
(2)
Mask pixels outside the path as zero;
(3)
Starting a horizontal line sweeping from the bottom
row of the MHI image to the top;
a)
If we found a non-zero pixel, we assign a positive
value of depth to the same column of
pixels in a new depth map (actually,
it is a 1D map);
b)
We decrease depth,
as we move upwards to the top of the image.
(4)
After the horizontal line sweep, we get a rough
depth map marking the smallest depth to the robot as white and further away as
dark. Figure 8 shows a typical depth map.
Figure
8. The
depth map on the bottom is calculated from the MHI on the top. As you can see,
the depth map is a 1D map. The reason that the image has mainly two vertical
bands is because there are mainly two depth steps. The nearer one with lighter
color on the left is due to the trash bin on the path; the farther one with
darker color on the right is due to the end of wall.
The depth map can
be taken as a kind of “potential field”, and used for obstacle avoidance
strategies later in motion planning section.
As you might have
observed, the MHI not only provides good boundary information on obstacles, but
also on wall edges and end of the path too. Therefore, as a complement, we use
MHI to calculate the Hough transform again and take the average edges of the
two Hough transforms as the final edges of the wall.
By now, we have
introduced all vision techniques being used in our system. As we said before,
new techniques can be easily plugged in the framework, thanks to the
expandability.
Given the
detection of front-end vision, we can now design motion strategies for the
robot. A good way of representing the motion planning strategies is the state
machine chart, as shown in Figure 9.
Now, let us
discuss about each node in detail.
Normal Detection: This state is when robot is moving smoothly forward,
doing all kinds of routines, such as wall edge extraction, door detection,
obstacle detection and direction calculation.
Direction Adjustment: This is when direction guidance (Section 4.1) found
that the moving direction is too far (i.e., distance to the image center larger
than a threshold in horizontal direction) away from the point at infinity. So
the robot performs a turning in the opposite direction of departure. When this
is done, it is automatically returned to Normal Detection.
Back to Track: This is when “Close to wall detection” (Section 4.2)
is triggered. The robot is first turning to the center of the path, then move
forward a little bit to the center, and finally turning back to the original
direction.
Figure
9. The
state transition map of the system. The gray nodes indicate the states that
have not been finished by the time of writing this paper. For completeness,
there should also be a starting state and an ending state, which does some
initialization and recognize the starting position respectively.
New
Path Searching: This is because we reach a dead end. The robot first turns 90 degrees
to the left to detect if any path exists in terms of point at infinity, and
then turn 180 degrees to check again. If there exists a path the robot moves to
that direction, favor left direction than right. If there is no path in either
direction, the robot turns backwards, and triggers a “tracing back” state.
Obstacle
avoidance: We only concern about the nearest obstacle. If it is within a certain
depth, the robot enters this state. If the obstacle is too big, the robot just
halts there. Otherwise, the robot chooses to bypass it from left or right,
based on which direction has larger space for the robot to move. The basic
movements are similar to “Back
to Track”.
Tracing back: Similar to
The other two states, namely “draw map” and “trace
location in map” will be discussed in the next section.
There are mainly three kinds of nodes, i.e., door, end of path, and bypass. Every node has position and orientation information with respect to the starting point, its absolute orientation marked as East, North, West, or South, pointers to its neighboring nodes, and the physical length calculated from their position information.
We also have a record of the current robot absolute direction marked as East, North, West, or South for tracing the robot locations. So for example, when a new door is detected on the left side of the walking direction towards South, then a node is created in the map to indicate that on a path spanning North and South, there is a door being detected on the East.
Trace Location does not create new nodes in the graph, but update the node that the robot is currently moved to. This state is important for navigation to paths having been detected but not yet traveled. A map snapshot is given in Figure 10.
Right now, the robot does not recognize bypasses. So it will never complete a full pass around the Sitterson Hall. Figure 10, on the left is the constructed map of part of Sitterson Hall with pre-captured video. This video is captured while the robot is manually driven around, for the purpose of testing most of vision algorithms. Notice that all the landmarks are correctly extracted out. Figure 10 on the right is an example of real-time map construction on the same path.
.
Figure 10. Landmarks are denoted with circles. Red indicates doors, green the corner and magenta the one that the robot currently possesses. Red T-junctions in the circles indicates on which side of the path the door is located. The light gray one indicates a path but not yet having been traveled. Left image: the map constructed with captured video sequence. All landmarks as well as un-traveled path are detected. Right image: the real-time updated map from the robot. Doors are falsely detected. But the robot successfully traveled from the starting position to where it is supposed to be automatically.
In this final project, we propose and implemented a single-camera-based autonomous robot navigation and indoor topological map construction system. With the real robot test, the robot navigation process without obstacle is very robust right now. Although this is just an initial step for the goal we proposed, with the expandability of architecture design, improvements of landmark detection and obstacle avoidance strategies, can be easily integrated for the final goal.
[1]
http://www.evolution.com/er1/
[2]
http://www.intel.com/research/mrl/research/opencv/
[3]
Je
Michels, Ashutosh Saxena, Andrew Y. Ng, High Speed Obstacle Avoidance using
Monocular Vision and Reinforcement LearningComputer Graphics, Computer
Science Department, Stanford University, Stanford, CA 94305 USA.
[4]
J.
Illingworth, J. Kittler, A survey of the
Hough transform, Computer Vision, Graphics, and Image Processing archive,
Volume 44 , October 1988
[5]
Gary R.
Bradski , James W. Davis, Motion
segmentation and pose recognition with motion history gradients, Machine
Vision and Applications, v.13 n.3, p.174-184, July 2002