Skip Navigation
Text:
Increase font size
Decrease font size

    RI: Small: Modeling and Recognition of Landmarks and Urban Environments

    Principal Investigator:Jan Michael Frahm
    Funding Agency: National Science Foundation
    Agency Number:IIS-0916829

    Abstract
    In the last few years, digital cameras have become cheap and ubiquitous, enabling millions of people to snap photos of their environments, and to instantly upload and share them via the Internet. This technological revolution has led to an explosion of geo-tagged content on such websites as Flickr.com, and has whetted the public's interest in visual representations of landmarks and geographic locations. On the one hand, we have large-scale, centralized efforts for acquiring urban street-level imagery, e.g., Google StreetView. On the other hand, we have ``crowd-sourced' efforts, like the Microsoft PhotoSynth project, whereby images taken and uploaded by individual users can be automatically matched and combined into `collective' 3D scene representations. Computer vision has made these exciting developments possible through significant advances in geometric structure from motion (SFM) techniques. However, to date, existing SFM techniques have been applied to the problem of modeling large-scale 3D environments such as landmarks and cities in a fairly naive, brute-force fashion, with no regard to robustness or scalability to millions of images in unorganized, unstructured, heterogeneous collections such as those found on photo sharing websites. Accordingly, there is currently an urgent need for scalable and reliable computer vision techniques to build models of outdoor environments, specifically cities. Such techniques can achieve broad societal impact through applidation domains such as tourism, games and virtual environments, movie special effects, security, military, and cultural heritage preservation.

    This proposal describes the design of a robust and efficient system for modeling and recognition of outdoor urban environments, from a relatively local scale (a building or a landmark) to a scale of several miles (a city). Unlike any existing technique, our system will simultaneously provide multiple functionalities, including 3D reconstruction, browsing, summarization, recognition and localization. Moreover, the system will be designed from the ground up with computational efficiency and scalability as the main considerations, making it applicable to the kinds of datasets that cannot be successfully handled by existing approaches.

    Our key methodological insight is that successful design of such a system requires a hybrid approach that combines the strengths of statistical recognition approaches and geometric reconstruction approaches. Recognition approaches are well-adapted to deal with noise and uncertainty and make effective use of statistical appearance-based modeling, but lack strong geometric constraints that are needed for modeling categories with a common rigid 3D structure, such as famous tourist sites and landmarks. On the other hand, structure-from-motion methods employ powerful geometric constraints and produce compelling 3D reconstructions, but are currently not well suited to take advantage of more than a small subset of a large and noisy community photo collection. The intellectual merit of this proposal consists is combining the complementary strengths of these two philosophies or traditions in computer vision to develop a robust, scalable, and comprehensive landmark modeling and recognition system.

    The proposal will address the following technical challenges: (1) geometric image
    organization, (2) geo-spatial and semantic image organization, (3) scalable reconstruction, (4) recognition and localization, (5) scene segmentation, and (6) illumination modeling. In addition to these technical contributions, our proposed project will include a substantial education and outreach component through curriculum development at UNC, efforts to involve high school students, as well as broader dissemination of data and results.

    Document Actions