Research Work


[Multi-Object Simultaneous Shape Estimation and Tracking]

We propose a new algorithm to automatically detect and reconstruct scenes with a variable number of dynamic objects. Our formulation distinguishes between m different silhouettes in the scene by using automatically learnt view-specific object appearance models, eliminating the color calibration requirement. Bayesian reasoning is then applied to solve the m-shape occupancy problem, with m updated as objects enter or leave the scene.

Results show that this method yields multiple silhouette-based estimates that drastically improve scene reconstructions over traditional two-label silhouette scene analysis. This enables the method to also efficiently deal with multi-person tracking problems.

The result has been accepted by CVPR 2008. Please checkout the publication page for details.

The cluster dataset will be online shortly.


back to the top

[3D Occlusion Inference with Bayesian framework]

This paper shows that occluders in the interaction space of dynamic objects can be detected and their 3D shape fully recovered as a byproduct of shape-from-silhouette (SfS) analysis from calibrated inward-looking multi-camera setup in natural uncontrolled environments where occlusions are common and inevitable to SfS techniques. We provide a Bayesian sensor fusion formulation to process all occlusion cues occurring in a multi-view sequence. Results show that the shape of static occluders can be robustly recovered from pure dynamic object motion, and that this information can be used for online self-correction and consolidation of dynamic object shape reconstruction.

The result has been accepted by CVPR 2007 as oral presentation. Please checkout the publication page for details.

The dataset sculpture2ppl used in the paper is available online now.

back to the top


[Occluder Segmentation With Motion]

We present a system for segmenting occluders in a scene from a single fixed viewpoint, given video stream involving random active motions from that view. We first detect moving object silhouettes using a pre-learned background and shadow model, so that the system is robust against lighting changes, which is frequent in such a setting. Then we analyze motions of the objects by looking at spatial-temporal Motion History Images (MHI) of the silhouettes. Based on the motion direction, we propose a concept of Effective Edge of the moving silhouette, which guarantees to enclose the occluder boundary. We show its power distinguishing between the interface where moving objects have never reached and the real occluder boundary. After a final refinement, the actual occluder is segmented as a binary mask image. A real-time system has been implemented to validate the theory.

Full-resolution real-time example. (a) camera view; (b) MHI; (c) CSI; (d) silhouette image; (e) silhouette boundary; (f) EE; (g) COBI; (h) final COBI; (i) CSI boundary; (j) after thresholding with R; (k) after flood filling.


Please refer to the project report for details.

back to the top


The patented product of SensorTalk was developed by me under the supervision of Dr. Hector Gonzales-Banos during the 12-week internship at Honda Research Institute (HRI) at Mountain View, CA in the summer of 2005.

The objective of the product is to provide a unified framework of communication and manipulation of heterogeneous sensors. It also sets up standards for writing sensor drivers in order to combine new sensors into the network. Theoretically, SensorTalk can be taken as a set of protocols independent of operating system. After sensors are subscribed as services to the network, SensorTalk give rights to clients of tuning sensor parameters as well as setting Quality of Service (QoS) due to various applications that might be built on top of it. Combined with RoboTalk, a product also developed in HRI, Mountain View, humanoids and other kinds of robots can be guided by peripheral sensors as well as onboard sensors to perform sophisticated tasks that could never be done before.

SensorTalk was developed in C++ under Visual Studio 6.0 Windows XP. It involves multithreaded programming, STL programming, network programming and object-oriented programming. It consists of 15 files including 10 header files and 5 .cpp files, and totally 3573 lines of code. Real-time demos of applications built on top of SensorTalk were shown in the final presentation of the internship to all staff of HRI Mountain View and HRI Boston.

The product is with complete self-generated Doxygen documentations, and the patent has been filed by HRI Mountain View.

The following video shows two image sequence servers (You can take them as 'virtual camera's. They are playing Kung Fu Girl dataset.) Then a client is subscribing to both servers, and after computation, building a volumetric visual hull out of it. The rotation of the volume shows the 3D shape, and the changing of the pose of the girl demonstrate that the client is actually getting new frames from servers. The frame rate is around 1Hz. Then another client is connected to one of the servers, and its function is to display the images that it has acquired. The frame rate of the client is around 20Hz. Note that the framerate can be set by client in advance, and the SensorTalk architecture will automatically adjust the framerate according to the resources among the whole system. The drivers of the virtual cameras are written by me according to the SensorTalk protocol, which is also a set of rules designed by our team. Finally, a colleague of mine appears to say hello to everybody. Thanks to him, so that we get this demo video. :-)

Here is the demo video. (The file is about 120MB and you need MPEG-2 decode to play it.)

For more information, please contact or

[ER1 Robot Exploration & Map Construction]

ER1 Robot basically consists of motors, drivers, a laptop and cameras.

During COMP290-58 Robot Motion Planning, I present a general program framework for ER1 robot, which consists of sensing, control and motion planning module. Based on the platform, we propose a specific application—topological map building of the third floor of Sitterson Hall based on one laptop, one webcam and one ER1 robot. Different algorithms are introduced to achieve robust navigation. Although real topological map is not completed with the time provided, because of the instability of landmark recognition, but experiments on simulated data shows the feasibility of map construction. And we expect that, given the robustness of the navigation algorithm, the final goal will be achieved shortly.

The result is that my ER1 robot can automatically roaming in the corridor without colliding with the wall or low-velocity dynamic objects (human for example), while constructing topological map of walls, doors and corners. The following is an example of the map constructed.

Together with Changchang Wu in COMP790 Robotics, we extend the map construction alrogithm to close the loop using vision techniques. For this specific step, we resort to an omnidirectional camera for sample images. But the motion planning strategies still apply in this case. We also re-order the topological map using global minimization.

For detailed information, please refer to my final report of COMP290-58 and COMP790.

back to the top

[Visual Hull in the Presence of Occlusion]

Visual hull is the intersection of back-projections of object silhouettes seen from each camera views. Due to the simplicity of the construction and effectiveness, visual hull is often used to analyze the 3D objects in the first place or as bootstrap of more sophisticated algorithms acquiring more delicate shape of the 3D objects.

(figure from

The feasibility of doing all these is based on the constraint of conservativeness - a visual hull is the largest volume in which objects can reside that is consistent with all the silhouette information. In other words, the actual 3D object is guaranteed to be within the visual hull.

There are occasions however, when this constraint is not satisfied. For example, if the object being observed is occluded by other objects witnessed in some camera view from where we would never be able to obtain complete silhouettes, yet still we try to construct the visual hull with these incomplete ones, the final visual hull will be torn apart as shown in the figure below. And obviously we can never take the visual hull obtained this way to do any further shape analysis.

Former papers seem to stress more on the efficiency and effectiveness of visual hull construction, but have talked little about the situation of occlusion. Starting from the assumption that there is not occlusion on the scene, or whenever occlusions do happen to some of the camera views, they simply do not use the occluded views. In fact, however, occlusions are very common especially in a complicated scene. And simply dump the occluding views may waste substantial information helpful to construct a better visual hull, in terms of smaller and more accurate volume representation of the 3D subject.

This work has been accepted as oral presentation in 3rd International Symposium on 3D Data Processing, Visualization & Transmission (3DPVT), Chapel Hill, Jun. 2006. Please checkout the publication page for details.

back to the top

[Image Registration in Robocup Field Calibration]

Image registration is the process of establishing point-by-point correspondence between two images of a scene. This process is needed in various computer vision applications, such as stereo depth perception, motion analysis, change detection, object localization, object recognition, and image fusion. During my stay in Germany, I was working with Prof. Dr. Bernd Fischer at University of Luebeck for variational registration approaches. And then after that I modified the curvature smoother to satisfy Robocup Field distortion problem and have very good result for the Zhejiang University Robocup F180 Team, 2004.

This work has been accepted in International Symposium on Intelligent Multimedia, Video & Speech Processing (ISIMP), Hong Kong, 2004. Please checkout the publication page for details.

back to the top

[Image Matting based on Alpha Values]

The purpose of Digital Image Matting is to extract objects of interest from background in a digital image. It can provide film-making, Image Based Modeling and Rendering with natural resources. And some of its techniques can be applied to Object-Oriented Media Compression. Therefore, it is popular at present looking for ways to extract foreground accurately and conveniently, which, however, is not an easy task at all, since boundaries of a natural object in an image is often blurred. Alpha based image matting approaches are aiming at exactly these cases.

From left to right are natural images, alpha images and extracted objects embedded in the blue sky respectively.

Some special effects that you can achieve with these image matting approaches.

In this paper we introduce and implement in MATLAB environment three unique methods based on Alpha value. They are Hillman, Ruzon & Tomasi and Poisson algorithms, which was just introduced in SigGraph 2004. After testing, we give them systematic comparisons and analyses. Then, we improve the “Distance Measure” definition of Ruzon & Tomasi’s approach and obtain optimized result. Finally, all the three methods and the improved version are built up into a uniform GUI, as a simple “tool” of matting.

We notice that when foreground and background colors have sharp contrasts and the colors are changing smoothly along the boundary, the matting result is satisfying enough, in terms of quality and time cost.

This work has been summarized in my bachelor thesis. (Excellent Thesis Award, Zhejiang University, 2004) Please checkout the publication page for details.

The GUI demo is available now.

back to the top

[The Myth of Fingerprints - Mathematical Modeling]

Problem A, Mathematical Contest in Modeling, U.S.A 2004

It is a commonplace belief that the thumbprint of every human who has ever lived is different. Develop and analyze a model that will allow you to assess the probability that this is true. Compare the odds (that you found in this problem) of misidentification by fingerprint evidence against the odds of misidentification by DNA evidence.

Terminology of human fingerprints and its discretized feature presentation.

This work has been awarded Meritorious Winner of the Contest. Please checkout the publication page for details.

back to the top