Research Work

This page is best viewed under 1280*1024 or higher resolution.
Others can click here to hide the left banner.

Heterogeneous Sensor Network

Multiview Shape Computation

Time-of-Flight Camera & Camcorder Network
Calibration & Reconstruction

Calibration with a Sphere (Swiss Ranger 3100). The specular highlight is the sphere center position.

We propose a unified calibration technique for a heterogeneous sensor network of video camcorders and Time-of-Flight (ToF) cameras. By moving a spherical calibration target around the commonly observed scene, we can robustly and conveniently extract the sphere centers in the observed images and recover the geometric extrinsics for both types of sensors.

We then combine the camcorder silhouette cues and RIM camera depth information, for the reconstruction. Our main contribution is the proposal of a sensor fusion framework so that the computation is general, simple and scalable. The reconstruction is formulated as a Bayesian inference problem, and can be solved robustly.

Although we only discuss the fusion of conventional cameras and RIM cameras in this paper, the proposed framework can be applied to any vision sensors.This framework uses a space occupancy grid as a probabilistic
3D representation of scene contents.

The result has been accepted by 3DPVT 2008 and M2SFA2 2008 (in conjunction with ECCV 2008) both as oral presentations. Please check out the publication page for details.

The MATLAB reconstruction code is available now.


Cam1 & 4 are ToF cameras (Swiss Ranger 3100)
Cam 2 & 3 are video camcorders (Canon HG10)

Shape Estimation Result. Left: visual hull; Right: our accurate estimation.


back to the top

Multi-Object Simultaneous Shape Estimation & Tracking

We propose a new algorithm to automatically detect and reconstruct scenes with a variable number of dynamic objects. Our formulation distinguishes between m different silhouettes in the scene by using automatically learnt view-specific object appearance models, eliminating the color calibration requirement. Bayesian reasoning is then applied to solve the m-shape occupancy problem, with m updated as objects enter or leave the scene.

Several outdoor natural environment datasets as well as indoor datasets collected both by us and from other research institutes show that this method yields multiple silhouette-based estimates that drastically improve scene reconstructions over traditional two-label silhouette scene analysis. This enables the method to also efficiently deal with multi-person tracking problems.

The result has been accepted by CVPR 2008. And an extended version of this paper is submitted to IJCV for review. Please check out the publication page for details.

The cluster dataset will be online shortly.


back to the top

3D Occlusion Inference with Bayesian framework

This paper shows that occluders in the interaction space of dynamic objects can be detected and their 3D shape fully recovered as a byproduct of shape-from-silhouette (SfS) analysis from calibrated inward-looking multi-camera setup in natural uncontrolled environments where occlusions are common and inevitable to SfS techniques. We provide a Bayesian sensor fusion formulation to process all occlusion cues occurring in a multi-view sequence. Several outdoor natural environment datasets as well as an indoor dataset show that the shape of static occluders can be robustly recovered from pure dynamic object motion, and that this information can be used for online self-correction and consolidation of dynamic object shape reconstruction.

The result has been accepted by CVPR 2007 as oral presentation. Please check out the publication page for details.

The dataset sculpture2ppl used in the paper is available online now.

back to the top


Occluder Segmentation With Motion

We present a system for segmenting occluders in a scene from a single fixed viewpoint, given video stream involving random active motions from that view. We first detect moving object silhouettes using a pre-learned background and shadow model, so that the system is robust against lighting changes, which is frequent in such a setting. Then we analyze motions of the objects by looking at spatial-temporal Motion History Images (MHI) of the silhouettes. Based on the motion direction, we propose a concept of Effective Edge of the moving silhouette, which guarantees to enclose the occluder boundary. We show its power distinguishing between the interface where moving objects have never reached and the real occluder boundary. After a final refinement, the actual occluder is segmented as a binary mask image. A real-time system has been implemented to validate the theory.

Full-resolution real-time example. (a) camera view; (b) MHI; (c) CSI; (d) silhouette image; (e) silhouette boundary; (f) EE; (g) COBI; (h) final COBI; (i) CSI boundary; (j) after thresholding with R; (k) after flood filling.


Please refer to the project report for details.

back to the top

Graphcut Background Subtraction ( with C Source code)

This is a C implementation of background subtraction given a set of background frames as a training set. Download zipfile.

The above result is tested using the same set of parameters, the first row is from Old Well dataset, the second & third rows are from N. Martel-Brisson & A. Zaccarin's paper "Moving Cast Shadow Detection from a Gaussian Mixture Shadow Model".

The code does the following:
(1) background RGB Gaussian model training. There is no maximum number limit for the training images.
(2) shadow modelling (soft shadow & hard shadow). Please refer to N. Martel-Brisson & A. Zaccarin's "Moving Cast Shadow Detection from a Gaussian Mixture Shadow Model" for details about shadow removal. And section 3.1 of my technical report for soft shadow.
(3) graphcut cleaning. Please refer to Boykov's graphcut papers for more detail.
(4) non-recursive largest binary blob finding.

The code has been tested with sample dataset, under WindowsXP environment. OpenCV 1.0 is required for I/O purpose. For detailed description of the parameters, please check the readme file in the zip package. For more information or questions, please feel free to contact

back to the top

SensorTalk Protocal

The patented product of SensorTalk was developed by me under the supervision of Dr. Hector Gonzales-Banos during the 12-week internship at Honda Research Institute (HRI) at Mountain View, CA in the summer of 2005.

The objective of the product is to provide a unified framework of communication and manipulation of heterogeneous sensors. It also sets up standards for writing sensor drivers in order to combine new sensors into the network. Theoretically, SensorTalk can be taken as a set of protocols independent of operating system. After sensors are subscribed as services to the network, SensorTalk give rights to clients of tuning sensor parameters as well as setting Quality of Service (QoS) due to various applications that might be built on top of it. Combined with RoboTalk, a product also developed in HRI, Mountain View, humanoids and other kinds of robots can be guided by peripheral sensors as well as onboard sensors to perform sophisticated tasks that could never be done before.

SensorTalk was developed in C++ under Visual Studio 6.0 Windows XP. It involves multithreaded programming, STL programming, network programming and object-oriented programming. It consists of 15 files including 10 header files and 5 .cpp files, and totally 3573 lines of code. Real-time demos of applications built on top of SensorTalk were shown in the final presentation of the internship to all staff of HRI Mountain View and HRI Boston.

The product is with complete self-generated Doxygen documentations, and the patent has been filed by HRI Mountain View.

For more information, please contact or

ER1 Robot Exploration & Map Construction

ER1 Robot basically consists of motors, drivers, a laptop and cameras.

During COMP290-58 Robot Motion Planning, I present a general program framework for ER1 robot, which consists of sensing, control and motion planning module. Based on the platform, we propose a specific application—topological map building of the third floor of Sitterson Hall based on one laptop, one webcam and one ER1 robot. Different algorithms are introduced to achieve robust navigation. Although real topological map is not completed with the time provided, because of the instability of landmark recognition, but experiments on simulated data shows the feasibility of map construction. And we expect that, given the robustness of the navigation algorithm, the final goal will be achieved shortly.

The result is that my ER1 robot can automatically roaming in the corridor without colliding with the wall or low-velocity dynamic objects (human for example), while constructing topological map of walls, doors and corners. The following is an example of the map constructed.

Together with Changchang Wu in COMP790 Robotics, we extend the map construction alrogithm to close the loop using vision techniques. For this specific step, we resort to an omnidirectional camera for sample images. But the motion planning strategies still apply in this case. We also re-order the topological map using global minimization.

For detailed information, please refer to my final report of COMP290-58 and COMP790.

back to the top

Visual Hull in the Presence of Occlusion

Visual hull is the intersection of back-projections of object silhouettes seen from each camera views. Due to the simplicity of the construction and effectiveness, visual hull is often used to analyze the 3D objects in the first place or as bootstrap of more sophisticated algorithms acquiring more delicate shape of the 3D objects.

(figure from

The feasibility of doing all these is based on the constraint of conservativeness - a visual hull is the largest volume in which objects can reside that is consistent with all the silhouette information. In other words, the actual 3D object is guaranteed to be within the visual hull.

There are occasions however, when this constraint is not satisfied. For example, if the object being observed is occluded by other objects witnessed in some camera view from where we would never be able to obtain complete silhouettes, yet still we try to construct the visual hull with these incomplete ones, the final visual hull will be torn apart as shown in the figure below. And obviously we can never take the visual hull obtained this way to do any further shape analysis.

Former papers seem to stress more on the efficiency and effectiveness of visual hull construction, but have talked little about the situation of occlusion. Starting from the assumption that there is not occlusion on the scene, or whenever occlusions do happen to some of the camera views, they simply do not use the occluded views. In fact, however, occlusions are very common especially in a complicated scene. And simply dump the occluding views may waste substantial information helpful to construct a better visual hull, in terms of smaller and more accurate volume representation of the 3D subject.

This work has been accepted as oral presentation in 3rd International Symposium on 3D Data Processing, Visualization & Transmission (3DPVT), Chapel Hill, Jun. 2006. Please check out the publication page for details.

back to the top

Image Registration in Robocup Field Calibration

Image registration is the process of establishing point-by-point correspondence between two images of a scene. This process is needed in various computer vision applications, such as stereo depth perception, motion analysis, change detection, object localization, object recognition, and image fusion. During my stay in Germany, I was working with Prof. Dr. Bernd Fischer at University of Luebeck for variational registration approaches. And then after that I modified the curvature smoother to satisfy Robocup Field distortion problem and have very good result for the Zhejiang University Robocup F180 Team, 2004.

This work has been accepted in International Symposium on Intelligent Multimedia, Video & Speech Processing (ISIMP), Hong Kong, 2004. Please check out the publication page for details.

back to the top

Image Matting based on Alpha Values

The purpose of Digital Image Matting is to extract objects of interest from background in a digital image. It can provide film-making, Image Based Modeling and Rendering with natural resources. And some of its techniques can be applied to Object-Oriented Media Compression. Therefore, it is popular at present looking for ways to extract foreground accurately and conveniently, which, however, is not an easy task at all, since boundaries of a natural object in an image is often blurred. Alpha based image matting approaches are aiming at exactly these cases.

From left to right are natural images, alpha images and extracted objects embedded in the blue sky respectively.

Some special effects that you can achieve with these image matting approaches.

In this paper we introduce and implement in MATLAB environment three unique methods based on Alpha value. They are Hillman, Ruzon & Tomasi and Poisson algorithms, which was just introduced in SigGraph 2004. After testing, we give them systematic comparisons and analyses. Then, we improve the “Distance Measure” definition of Ruzon & Tomasi’s approach and obtain optimized result. Finally, all the three methods and the improved version are built up into a uniform GUI, as a simple “tool” of matting.

We notice that when foreground and background colors have sharp contrasts and the colors are changing smoothly along the boundary, the matting result is satisfying enough, in terms of quality and time cost.

This work has been summarized in my bachelor thesis. (Excellent Thesis Award, Zhejiang University, 2004) Please check out the publication page for details.

The GUI demo is available now.

back to the top

The Myth of Fingerprints - Mathematical Modeling

Problem A, Mathematical Contest in Modeling, U.S.A 2004

It is a commonplace belief that the thumbprint of every human who has ever lived is different. Develop and analyze a model that will allow you to assess the probability that this is true. Compare the odds (that you found in this problem) of misidentification by fingerprint evidence against the odds of misidentification by DNA evidence.

Terminology of human fingerprints and its discretized feature presentation.

This work has been awarded Meritorious Winner of the Contest. Please check out the publication page for details.

back to the top

This page is best viewed under 1024*768 or higher resolution.
Others can click here to hide the left banner.