COMP 790-096: Computer Vision and the Web

Fall 2007, Tuesdays 3:30-4:30, SN 115

Instructor: Svetlana Lazebnik  (lazebnik -at-

Quick links:  presentation schedule,  reading list


Over the last few years, we have seen an explosion in the sheer amount of image and video data available to us over the Internet. The number of images indexed by Google and Yahoo is growing exponentially, and has currently reached several billion. But the revolution is not only technological, it is also cultural, giving rise to the phenomenon popularly referred to as Web 2.0. Part of this phenomenon is the emergence of digital communities like Flickr and YouTube that enable users to upload, tag, and share images and videos with millions of other users.

The wealth of images on the Internet is beginning to revolutionize computer vision. Researchers in object recognition are already taking advantage of the Internet for dataset collection and automatic discovery of object categories. But apart from simply using the Internet as a source of data, computer vision research can play a significant role in helping people to navigate the chaotic sea of visual information. More and more exciting Web-related research ideas are showing up in the latest vision and graphics literature. These ideas include organizing large image collections based on semantics or 3D geometry, using the content of these collections to synthesize new pictures through computational photography techniques such as image completion, or enabling users to interact with photos in novel ways, such as creating 3D-popups from their vacation snapshots. The purpose of the course is to get acquainted with these directions and to speculate about promising avenues for future research and, more generally, the future role of computer vision in the Web 2.0 revolution (and beyond).

This course is set up for variable credit. The basic format (for one unit) will consist of a weekly paper reading group. Discussions will emphasize the big picture and conceptual issues, so there are no formal technical prerequisites. Anyone with an ability to understand (at a high level) papers from recent computer vision and graphics conferences can benefit from the course. Interested students can choose to do additional readings, a report, or a project for higher credit.


Date Topic Presenter
August 21 Show and tell: PowerPoint, links in html format Lana Lazebnik
August 28 Photo Tourism / PhotoSynth Brian Clipp
September 4 Automatic Photo Pop-up David Gallup
September 11 Photo Clip ArtImage Completion Brian Eastwood
September 18 AutoCollageContent-Aware Image Resizing Stephen Guy
September 25 Photo Quality Assessment Zhimin Ren
October 2 Names and Faces in the NewsAnimals on the Web Josh Markwordt
October 9 Visual Category FilterLearning from Google's Image Search Peter Lincoln
October 16 ICCV 2007 (week off)  
October 23 ICCV Recap (part I) Lana Lazebnik
October 30 ICCV Recap (part II) Lana Lazebnik
November 6 LabelMeTiny Images Miranda Steed
November 13 ESP GamePeekaboom Ryan Schubert
November 20 Show and Tell Everybody
November 27 Project presentations Nick, Sashi, Marc
December 4 Project presentations Xiaowei, Ram, Jake, Rahul
December 11 Final project reports due by the end of the day  

Reading List*

*Starred papers are not on the presentation list. But feel free to read them for your own enlightenment or to incorporate them into your presentations if they are closely related to your main topic.

From 2D to 3D

Computational photography

Pictures and words

  • *Matching words and pictures.
    Kobus Barnard, Pinar Duygulu, Nando de Freitas, David Forsyth, David Blei, and Michael I. Jordan.
    Journal of Machine Learning Research, Vol 3, pp 1107-1135, 2003.

  • *Mutual information of words and pictures.
    Kobus Barnard and Keiji Yanai.
    Information Theory and Applications Inaugural Workshop, 2006.

  • Names and Faces in the News.
    Tamara L. Berg, Alexander C. Berg, Jaety Edwards, Michael Maire, Ryan White, Yee Whye Teh, Erik Learned-Miller, David A. Forsyth.
    Computer Vision and Pattern Recognition (CVPR) 2004.

  • Animals on the Web.
    Tamara L. Berg, David A. Forsyth.
    Computer Vision and Pattern Recognition (CVPR) 2006.

Learning visual models from Google image search

Dataset collection

Learning basic image properties

Indexing and retrieval in large datasets