Robust Estimation




RECON: Scale Adaptive Robust Estimation via Residual Consensus
Rahul Raguram and Jan-Michael Frahm
International Conference on Computer Vision (ICCV), 2011 (Oral Presentation).
[paper (pdf)]   [code]   [presentation video]

In this work, we present a threshold-free robust estimation framework capable of efficiently fitting models to contaminated data. While RANSAC and its many variants have emerged as popular tools for robust estimation, their performance is largely dependent on the availability of a reasonable prior estimate of the inlier threshold. This work builds on the simple observation that models generated from uncontaminated minimal subsets are somehow "consistent" in terms of the behavior of their residuals, while contaminated models exhibit uncorrelated behavior. By leveraging this observation, we then develop a very simple, yet effective algorithm that does not require apriori knowledge of either the scale of the noise, or the fraction of uncontaminated points.








covRANSAC: Exploiting Uncertainty in Random Sample Consensus
Rahul Raguram, Jan-Michael Frahm and Marc Pollefeys
International Conference on Computer Vision (ICCV), 2009.
[paper (pdf)]

One of the implicit assumptions in RANSAC is that a model produced from an all-inlier minimal subset will be consistent with all other inliers in the data. However, in practice, it has been observed that this is rarely the case; since model hypotheses are generated from noisy data points, the model parameters are also affected by noise. In this work, we show how the various uncertainties of the estimation process can be explicitly modeled within a robust estimation framework. Coupled with tests to characterize the "non-randomness" of a solution, this strategy can result in up to an order of magnitude improvement in efficiency over RANSAC.







ARRSAC: Adaptive Real-Time Random Sample Consensus
Rahul Raguram, Jan-Michael Frahm and Marc Pollefeys
European Conference on Computer Vision (ECCV), 2008.
[paper (pdf)]

There are a number of scenarios (e.g., real-time 3D reconstruction) where time-constrained robust estimation is of interest. While there have been a number of recent efforts aimed at increasing the efficiency of the basic RANSAC algorithm, few of them are directly applicable in situations where real-time performance is required. To this end, we have proposed a real-time robust estimation framework, which builds upon the strengths of previous approaches, bringing together various ideas in order to achieve state of the art performance. In particular, ARRSAC is suitable for use in real-time applications with a limited time budget, and is capable of providing accurate results over a wide range of inlier ratios. At UNC, we have used ARRSAC in various large-scale 3D reconstruction systems, operating on both video sequences, as well as internet photo collections.








Universal/Über RANSAC
Rahul Raguram, Ondrej Chum, Marc Pollefeys, Jiri Matas, and Jan-Michael Frahm.
IEEE Transactions on Pattern Analysis and Machine Intelligence, Nov 2012.
[webpage]

We have recently developed a generalization of the standard hypothesize-and-verify structure of RANSAC, building upon and extending many of the ideas incorporated in ARRSAC. This work will include a freely available software library for robust estimation, comprising of source code and test datasets. The library can be used as a stand-alone tool for use in specific applications, a benchmark for comparing new algorithms, and as an easily extendible base for developing new algorithms.








Computer vision meets Computer Security








iSpy: Automatic Reconstruction of Typed Input from Compromising Reflections
Rahul Raguram, Andrew White, Dibyendusekhar Goswami, Fabian Monrose and Jan-Michael Frahm
ACM Conference on Computer and Communications Security (CCS), Oct 2011 (Oral Presentation).
[paper (pdf)]   [webpage]   [slides (ppt)]   [video]

Today, personal mobile devices, such as smartphones, are all around us. While the ubiquity of these powerful personal computing devices has changed how we communicate and store information, it also provides new possibilities for the surreptitious observation of private messages and data. In this work, we demonstrate the use of computer vision and machine learning techniques to compromise the privacy of users typing on virtual keyboards. Specifically, we show that so-called compromising reflections (in, for example, a victim's sunglasses) of a device's screen are sufficient to enable automated reconstruction, from video, of text typed on a touchscreen keyboard. Despite our deliberate use of low cost commodity video cameras, we are able to reconstruct fluent translations of recorded data, even in very challenging scenarios. We believe these results highlight the importance of recalibrating our expectations of privacy in response to emerging technologies.

Press: Engadget   PhysOrg   Gizmodo   New Scientist



Modeling and Organizing Large Scale Internet Photo Collections







Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs
Rahul Raguram, Changchang Wu, Jan-Michael Frahm, and Svetlana Lazebnik
International Journal of Computer Vision (IJCV), 2011.
[paper (pdf)]   [webpage]

Building Rome on a Cloudless Day
Jan-Michael Frahm, Pierre Georgel, David Gallup, Tim Johnson, Rahul Raguram, Changchang Wu, Yi-Hung Jen, Enrique Dunn, Brian Clipp, Svetlana Lazebnik, Marc Pollefeys
European Conference on Computer Vision (ECCV), 2010.
[paper (pdf)]   [webpage]

Recent years have seen an explosion in consumer digital photography and a phenomenal growth of community photo-sharing websites. At UNC, we have developed systems for modeling and visualizing landmarks based on large-scale, heavily contaminated image collections gathered from the Internet. Our approach to this problem encompasses image clustering, robust geometric verification, structure from motion and stereo to achieve high computational performance. Our original system was capable of handling datasets containing tens of thousands of images, and we have more recently extended this to be able to process millions of images in a day, on a single PC.


Press: BBC   ReeseNews   ReadWriteWeb   PhysOrg   Tech Journal   NewsWise








Computing Iconic Summaries of General Visual Concepts
Rahul Raguram and Svetlana Lazebnik
Workshop on Internet Vision, Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
[paper (pdf)]

In this work, we consider the problem of selecting iconic images to summarize general visual categories. We define iconic images as high-quality representatives of a large group of images consistent both in appearance and semantics. To find such groups, we perform joint clustering in the space of global image descriptors and latent topic vectors of tags associated with the images. To select the representative iconic images for the joint clusters, we use a quality ranking learned from a large collection of labeled images. Results on four large-scale datasets demonstrate the ability of our approach to discover plausible themes and recurring visual motifs for challenging abstract concepts such as "love" and "beauty".












2D/3D Scene Analysis






Semantic Segmentation of Urban Video Imagery
Every day, more and more of the Earth's cities and sights are photographed from a variety of digital cameras, viewing positions and angles, weather and illumination conditions. Given the massive growth of datasets of this nature (e.g., Google Street View and Microsoft Bing Maps), we explore the problem of interpreting this imagery using a combination of apperance and geometry cues.

This work was carried out at an internship at Google during the summer of 2011.





Real-Time 3D Reconstruction from Video






Fast Robust Large-scale Mapping from Video and Internet Photo Collections
Jan-Michael Frahm, Marc Pollefeys, Svetlana Lazebnik, Christopher Zach, David Gallup, Brian Clipp, Rahul Raguram, Changchang Wu, Tim Johnson
Special issue: "100 years of ISPRS", ISPRS Journal of Photogrammetry and Remote Sensing, 2010.
[paper (pdf)]   [webpage]

At UNC, the 3D vision group has developed a real-time system, called UrbanScape, for performing large-scale 3D reconstruction from video. The system achieves high computational performance through algorithmic optimizations, coupled with parallelization and execution on commodity graphics hardware. My contributions to this system include vision-based pose estimation, self-calibration and real-time robust estimation.






Image Compression








Improved Resolution Scalability for Bi-level Image Data in JPEG2000
Rahul Raguram, Michael W. Marcellin and Ali Bilgin
IEEE Transactions on Image Processing, 2009.
[paper (pdf)]   [invention disclosure (pdf)]

Improved Resolution Scalability for Bi-level Image Data in JPEG2000
Rahul Raguram, Michael W. Marcellin and Ali Bilgin
Data Compression Conference (DCC), 2007.
[paper (pdf)]

The JPEG2000 image compression standard is designed to compress both bilevel and continuous tone image data using a single unified framework; however, there exist significant limitations with respect to its use in the lossless compression of bilevel imagery. In particular, there is substantial degradation in image quality at low resolutions, which severely limits the resolution scalable features of the JPEG2000 code-stream. We examine these effects and present two efficient methods to improve resolution scalability for bilevel imagery in JPEG2000. By analyzing the sequence of rounding operations performed in the JPEG2000 lossless compression pathway, we introduce a very simple pixel flipping scheme that improves image quality for commonly occurring types of bilevel imagery. Additionally, we develop a more general strategy based on the JPIP protocol, which enables efficient interactive access of compressed bilevel imagery.