Sensor-based Robot Navigation and Map Construction


Package (password required, and sample videos are distributed upon request)


Li Guan

University of North Carolina at Chapel Hill

Department of Computer Science


Figure 1. ER1 Robot basically consists of motors, drivers, a laptop and cameras.


We present a general program framework for ER1 robot, which consists of sensing, control and motion planning module. Based on the platform, we propose a specific application—topological map building of the third floor of Sitterson Hall based on one laptop, one webcam and one ER1 robot. Different algorithms are introduced to achieve robust navigation. Although real topological map is not completed with the time provided, because of the instability of landmark recognition, but experiments on simulated data shows the feasibility of map construction. And we expect that, given the robustness of the navigation algorithm, the final goal will be achieved shortly.

1.     Introduction

The tasks that we want to achieve are collision-free navigation through an indoor environment, corridors, to be specific, where static as well as dynamic obstacle would exist. The goal also includes topological map construction of the landmarks in the environment, e.g., doors, corners, dead ends and sideways.

In 2003, Evolution RoboticsTM introduces the ER1 Personal Robot System [1], which provides the full programming and processing power of the personal computer, as well as wireless capabilities, as shown in Figure 1. Also thanks to the OpenCV library [2], we can intercept camera output to do more advanced vision algorithms for our special purpose. And that is why we propose for the final project as single-webcam-based mobile robot navigation and map reconstruction.

Due to the complication of the project combining theoretical results as well as practical expectations, this is never a final report of the project. But the good side is that given what we have achieved, the final goal is expandable and foreseeable. In Section 2, we introduce the complete system architecture and talks about controller, vision and motion planning modules respectively in section 3, 4 and 5. Section 6 discusses map construction. Experimental results and further discussions are given in Section 7. Finally, in Section 8, we summarize what have been achieved within the final project period and what is required to be done in future.

2.     System Architecture

The ER1 robot platform is really a general-purpose robotic platform. In order to make the full use of this advantage, in the framework design stage, our main concern is the expandability. In this section, we have generalized three main tasks related to sensor-based mobile robots. They are controlling interface (robot driver interface), sensing interface, and task management. The relationship between the three is depicted in Figure 2. Therefore, later on, we can concentrate on each sub-problem one at a time, without interfering with other problems. Another advantage of this design is that the whole system is driven by input sensor data. With well-defined application strategies, this closed-loop feedback system is bound to be robust. This process is also similar to the way humans perform tasks.

For our single-webcam-based mobile robot navigation and map reconstruction, the three sub-problems are explained as follows.

Figure 2. Three main system modules, controller, sensor and task . It is a closed-loop data-driven system.

Controller module: ER1 provides API for users to directly control the robot. The API is accessed by opening a TCP/IP socket to RCC and sending text commands along the socket.

Sensor Module: Since the only sensor we have in this application is a webcam, the sensor module consists of mainly vision-based algorithms, information extraction.

To be more specific, there are mainly three required tasks for the vision module. The first is to infer directions from the single image as guidance of the direction of the robot. The second one is obstacle detection, including detections of dynamic/static obstacles as well as a dead end. The final one is to recognize landmarks of the environment, for further use like map construction in our application. One thing we should emphasize is that more tasks can be added to this module too.

Task management module: This module takes the data output of the sensor module and do some further computations. Robot motion planning strategies are actually embedded in this module. So are some other application-defined tasks such as the map construction in our case.

2.1 Integration

The integration happens on hardware level as well as software level, as shown in Figure 3.

Hardware Part: It consists of one webcam, ER1 robot, and laptop PC. The data flow starts from the webcam, and is grabbed via OpenCV library in real-time to the laptop. After calculation in the software part in the laptop, new commands are generated and being sent to through TCP socket to RRC Telnet server then to low-level motor driver.

Figure 3. Architecture of the system in this paper. Hardware and software parts are separated by the dashed line. All arrows indicate dataflow.

Software Part: After a new image has been sent to the laptop, the system dispatches it to front-end vision module to detect obstacles and doors of the corridor, to direct navigation path and so on. After this high-level information has been extracted, motion planning algorithms can generate new commands accordingly, and if a landmark has been detected, the topological map is updated.

A nice thing about this framework as mentioned before is its divide-and-conquer strategy—one can debug the performance of one sub-problem freely without worrying about interference with other sub-problems. Also this framework is highly expandable. For example, for a more advanced self-localization problem, we can train the images we have already acquired as a database, and for new images getting from the webcam, we might give the posterior probability that we have been to this shooting spot before. The only thing to add into the framework is then an image matching and query procedure.

3.     Controller

The controller in our application is a socket-based legal command generator. It is responsible for command assembly and command parsing as well. Legal ER1 robot commands used in our project are as in Table 1. There is one important command“events” that listens to the completion of a specific command. I have not been successful to get it work in the “blocking” mode of TCP/IP socket, but I have to time to try other sockets right now. I hope I can finally get it in use, because I need to know the feedback of a command in some situations. Right now, without this mechanism, I just wait sufficiently long to generate the next command.

Table 1. ER1 robot command list (partial)








Move forward/backward.




Turn left/right.

set v



Set linear velocity.

set w



Set angular velocity.




Get current position w.r.t. origin.




Halt for any current motion.




Feedback from the robot about completion of a command.




Throws away all events which have not yet been sent to the user.




Exit the RCC, close all windows.

4.     Single camera vision

We resort to single webcam for sensing, because this device is highly available and provides relatively good quality images, comparing with other robots like Sony AIBOTM. The major disadvantage is that we cannot directly acquire depth information for the robot, which is a very important for collision-free autonomous navigation and environment (topological map in this case) construction; therefore we try to extract depth cues by making full use of the single camera view spatially as well as temporally. The reason that I don’t use stereo-pair for depth information is because of the quality and the computation time of those algorithms are not satisfying. Although BumblebeeTM stereo camera and Canesta TM 3D camera are available for me to use, and they both provide good quality depth results, they need to carry a power line behind, while USB2.0 webcam does not have this mobility constraint. But one thing should be pointed out is that, with the framework provided in Section 3, to substitute the webcam with a stereo-pair is just straightforward. Also some papers [3]already exist about inspiring car-shaped robot roaming freely in the bushes with only one color camera.

4.1 Direction guidance

Since our main task is performed in an indoor environment. We can refer to a lot of right angle and parallel line information to infer and align robot orientation. One of the typical settings is shown in Figure 4. For direction guidance, we mainly resort to Hough transform.

It is usually used as a global method for finding straight lines hidden in larger amounts of other data. In fact there are many advance applications for this method. For line detection in images, the image is first binarised using some form of thresholding and then the positive instances catalogued in an examples dataset. Since the implementation is very standardized, we do not plan to go into detail, but you can refer to J. Illingworth et al. [4].

Figure 4. Direction guidance with Hough transform on edge. Left: original image captured by the webcam. Right: Hough transform result. Only lines with certain slopes are selected and depicted as blue lines. Wall edges are then averaged and drawn as green lines. The point at infinity is the intersection of the two green lines. And we draw a red vertical line through that point for direction guidance.

The original image taken from the camera is first thresholded for very dark edge pixels between the wall and the floor. Then Hough transform is performed to get lines having slope between . A mean line segment is calculated as a wall edge for each of the two sloping lines. They are drawn as green lines in the right image in Figure 4. The intersection of the two edges is the point at infinity, in which direction robot should move towards.  

We have tried other approaches to find a proper path to moving on, such as segmentation of the floor out of the image. Once we encoded all intensity variations of the floor color into one Gaussian; it turned out almost every pixel on the wall is miss-classified as a floor pixel, which is not very promising and satisfying.

In fact, in order to improve calculation speed, we only implement Hough transform to the bottom half of the image. The top half contains little information regarding wall edge finding, but some un-expected noise of textures of posters on the wall, as you can observe from Figure 3.

4.2 Close to wall detection

This is a subtle problem due to non-zero robot size. Even though the robot’s direction is perfectly aligned with the point at infinity, the robot might be too close to one side of the wall, and thus get scratches while it moves forward with unavoidable perturbation. It is clearly illustrated in Figure 5.

Figure 5. Although the point at infinity (indicated by the intersection of the green lines) lies perfectly in t he center of the image, we still have “too close to wall” problem.

But as you may observer, this problem is well-indicated by the intersection of the edge (the green line) with the bottom edge of the image—the intersection is too close to the center.

4.3 End of wall detection

This detection also takes advantage of Hough transform. This time we just detect horizontal line segments with in an angle of with respect to the horizon. Sample lines are also shown in Figure 4.

4.4 Door (landmark) detection

Figure 6. Door detection with edge information.

This part is quite challenging, because, first of all, doors are usually not in the front view of the camera motion. As shown in Figure 6, if we cast a local window at the edge ending to the nearer end of the corridor in the image, we would observe a lack of edge pixels when the door is moving out of the thresholded image on the right.

Although this indication is not very robust, we have good performance using just this indication over recorded sample video sequence. And due to time limit, we take it as a high-priority future improvement direction.

4.5 Obstacle detection

Since the robot is often in motion itself. It makes no big difference between absolute static obstacles (static with respect to the ground floor) and dynamic obstacles. Another thought is that, if an obstacle is relatively static to the robot, it will make no harm to the robot at all, thus makes no sense to actually detect it. Therefore, we introduce a general obstacle detection approach exactly based on the relative motion of the obstacle and the robot.

This approach is based on the Motion History Image (MHI) [5] from the camera view. The MHI is a static image template where pixel intensity is a function of the recency of motion in a sequence. Figure 7 shows two MHIs of the corridor when the robot is moving forward.

Figure 7. MHIs of the robot, the darker the intensity is, the older the pixel has been occupied, which is similar to the trace of fireworks. Left: the motion is towards right, indicating the robot is turning a little bit towards left while moving forward. Right: the robot is moving forward while turning a little bit towards right. In the right image you can clearly observe the shape of an obstacle on the path.

We then concentrate within the robot path for obstacle detection. The algorithm is as follows:

(1)     Get the Motion History Image;

(2)     Mask pixels outside the path as zero;

(3)     Starting a horizontal line sweeping from the bottom row of the MHI image to the top;

a)         If we found a non-zero pixel, we assign a positive value of depth to the same column of pixels in a new depth map (actually, it is a 1D map);

b)         We decrease depth, as we move upwards to the top of the image.

(4)     After the horizontal line sweep, we get a rough depth map marking the smallest depth to the robot as white and further away as dark. Figure 8 shows a typical depth map.

Figure 8. The depth map on the bottom is calculated from the MHI on the top. As you can see, the depth map is a 1D map. The reason that the image has mainly two vertical bands is because there are mainly two depth steps. The nearer one with lighter color on the left is due to the trash bin on the path; the farther one with darker color on the right is due to the end of wall.

The depth map can be taken as a kind of “potential field”, and used for obstacle avoidance strategies later in motion planning section.

As you might have observed, the MHI not only provides good boundary information on obstacles, but also on wall edges and end of the path too. Therefore, as a complement, we use MHI to calculate the Hough transform again and take the average edges of the two Hough transforms as the final edges of the wall.

By now, we have introduced all vision techniques being used in our system. As we said before, new techniques can be easily plugged in the framework, thanks to the expandability.

5.     Sensor-based motion planning

Given the detection of front-end vision, we can now design motion strategies for the robot. A good way of representing the motion planning strategies is the state machine chart, as shown in Figure 9.

Now, let us discuss about each node in detail.

Normal Detection: This state is when robot is moving smoothly forward, doing all kinds of routines, such as wall edge extraction, door detection, obstacle detection and direction calculation.

Direction Adjustment: This is when direction guidance (Section 4.1) found that the moving direction is too far (i.e., distance to the image center larger than a threshold in horizontal direction) away from the point at infinity. So the robot performs a turning in the opposite direction of departure. When this is done, it is automatically returned to Normal Detection.

Back to Track: This is when “Close to wall detection” (Section 4.2) is triggered. The robot is first turning to the center of the path, then move forward a little bit to the center, and finally turning back to the original direction.

Figure 9. The state transition map of the system. The gray nodes indicate the states that have not been finished by the time of writing this paper. For completeness, there should also be a starting state and an ending state, which does some initialization and recognize the starting position respectively.

New Path Searching: This is because we reach a dead end. The robot first turns 90 degrees to the left to detect if any path exists in terms of point at infinity, and then turn 180 degrees to check again. If there exists a path the robot moves to that direction, favor left direction than right. If there is no path in either direction, the robot turns backwards, and triggers a “tracing back” state.

Obstacle avoidance: We only concern about the nearest obstacle. If it is within a certain depth, the robot enters this state. If the obstacle is too big, the robot just halts there. Otherwise, the robot chooses to bypass it from left or right, based on which direction has larger space for the robot to move. The basic movements are similar to “Back to Track”.

Tracing back: Similar to Normal detection, but not to draw the map.

The other two states, namely “draw map” and “trace location in map” will be discussed in the next section.

6. Map construction

There are mainly three kinds of nodes, i.e., door, end of path, and bypass. Every node has position and orientation information with respect to the starting point, its absolute orientation marked as East, North, West, or South, pointers to its neighboring nodes, and the physical length calculated from their position information.

We also have a record of the current robot absolute direction marked as East, North, West, or South for tracing the robot locations. So for example, when a new door is detected on the left side of the walking direction towards South, then a node is created in the map to indicate that on a path spanning North and South, there is a door being detected on the East.

Trace Location does not create new nodes in the graph, but update the node that the robot is currently moved to. This state is important for navigation to paths having been detected but not yet traveled. A map snapshot is given in Figure 10.

7. Experiment results

Right now, the robot does not recognize bypasses. So it will never complete a full pass around the Sitterson Hall. Figure 10, on the left is the constructed map of part of Sitterson Hall with pre-captured video. This video is captured while the robot is manually driven around, for the purpose of testing most of vision algorithms. Notice that all the landmarks are correctly extracted out. Figure 10 on the right is an example of real-time map construction on the same path.


Figure 10. Landmarks are denoted with circles. Red indicates doors, green the corner and magenta the one that the robot currently possesses. Red T-junctions in the circles indicates on which side of the path the door is located. The light gray one indicates a path but not yet having been traveled. Left image: the map constructed with captured video sequence. All landmarks as well as un-traveled path are detected. Right image: the real-time updated map from the robot. Doors are falsely detected. But the robot successfully traveled from the starting position to where it is supposed to be automatically.

8. Conclusion

In this final project, we propose and implemented a single-camera-based autonomous robot navigation and indoor topological map construction system. With the real robot test, the robot navigation process without obstacle is very robust right now. Although this is just an initial step for the goal we proposed, with the expandability of architecture design, improvements of landmark detection and obstacle avoidance strategies, can be easily integrated for the final goal.




[3]       Je Michels, Ashutosh Saxena, Andrew Y. Ng, High Speed Obstacle Avoidance using Monocular Vision and Reinforcement LearningComputer Graphics, Computer Science Department, Stanford University, Stanford, CA 94305 USA.

[4]       J. Illingworth, J. Kittler, A survey of the Hough transform, Computer Vision, Graphics, and Image Processing archive, Volume 44 ,  October 1988

[5]       Gary R. Bradski , James W. Davis, Motion segmentation and pose recognition with motion history gradients, Machine Vision and Applications, v.13 n.3, p.174-184, July 2002