SuperParsing: Scalable Nonparametric Image Parsing with Superpixels

Joseph Tighe and Svetlana Lazebnik
Dept. of Computer Science, University of North Carolina at Chapel Hill

Abstract: This paper presents a simple and effective nonparametric approach to the problem of image parsing, or labeling image regions (in our case, superpixels produced by bottom-up segmentation) with their categories. This approach is based on lazy learning, and it can easily scale to datasets with tens of thousands of images and hundreds of labels. Given a test image, it first performs global scene-level matching against the training set, followed by superpixel-level matching and efficient Markov random field (MRF) optimization for incorporating neighborhood context. Our MRF setup can also compute a simultaneous labeling of image regions into semantic classes (e.g., tree, building, car) and geometric classes (sky, vertical, ground). Our system outperforms the state-of-the-art nonparametric method based on SIFT Flow on a dataset of 2,688 images and 33 labels. In addition, we report per-pixel rates on a larger dataset of 45,676 images and 232 labels. To our knowledge, this is the first complete evaluation of image parsing on a dataset of this size, and it establishes a new benchmark for the problem. Finally, we present an extension of our method to video sequences and report results on a video dataset with frames densely labeled at 1 Hz.
Citation:
Joseph Tighe and Svetlana Lazebnik "SuperParsing: Scalable Nonparametric Image Parsing with Superpixels," European Conference on Computer Vision, 2010. (PDF) (Poster)
New! Accepted Journal Version:
Joseph Tighe and Svetlana Lazebnik "SuperParsing: Scalable Nonparametric Image Parsing with Superpixels," Accepted by the International Journal of Computer Vision. (PDF) (New Code)

Sift Flow Dataset:

Output for our entire testset: Web, Matlab
Confusion Matrix
Full Dataset

Barcelona Dataset:

Output for our entire testset: Web, Matlab
Confusion Matrix
Full Dataset

LM+Sun Dataset:

Output for our entire testset: Web
Full Dataset

CamVid Video Dataset:


Note on CamVid training data: For the CamVid results of Section 4.2, our training set is not identical to that of [3, 23, 40, 47]. Specifically, the 101 frames labelled at 15Hz were also included in the training set, increasing its size from 367 in [3, 23, 40, 47] to 468 frames. We do not believe this extra training data has had a significant impact on the accuracy of our system as it only adds frames that are very similar to ones already seen, but the comparison to other works is not strictly fair.