COMP 776 Spring 2009

Final Assignment: Bag-of-Features Image Classification (with Competition!)

Due date: April 21, 5PM

The Data (45 MB)

(source: Caltech Vision Group)

The goal of the assignment is to implement a system for bag-of-features image classification. The author of the highest-performing system will get a prize (see below)! The goal is to perform four-class image classification, with the four classes being airplanes, motorbikes, faces, and cars. The data file contains training and test subdirectories for each category. The test subdirectories contain 50 images each, and the training subdirectories contain up to 500 images each. You must test your system on all the test images, and train it on at least 50 training images per class (take either the first 50 ones or a random subset of 50). Keep in mind that using more training data will almost certainly result in better performance. However, if your computational resources are limited and your system is slow, it's OK to use less training data to save time. You can also experiment with splitting up the training images into two subsets, one for learning the visual dictionary, and one for learning the classifier.

System Outline and Implementation Details

  1. Feature extraction. You can use any of the following methods:
    • Sampling of fixed-size image patches on a regular grid. You can use either a single image size or several different sizes.
    • Sampling of random-size patches at random locations.
    • Regions produced by your blob detector from Assignment 2. It is also fine to use the blob detector provided as the solution to the assignment or to download somebody else's detector from the Web.
    • Fixed-size patches sampled around corner locations (sample corner detector).
    • Patches produced by any other detector you download.

  2. Feature description. You can use either the raw patches themselves (possibly downsampled or intensity-normalized), compute SIFT descriptors of the patches, or use any other descriptor you find in the literature, e.g., a color histogram. Here is sample code for computing SIFT descriptors of circular regions, such as the ones returned by a blob detector from Assignment 2. Note that this code is not rotation-invariant, i.e., it does not attempt to normalize the patches by rotating them so that the horizontal direction is aligned with the dominant gradient orientation of the patch. However, rotation invariance is not really necessary for the assignment.

  3. Dictionary computation. Run k-means clustering on a subset of all training features to learn the dictionary centers. You can write your own k-means code or find code on the web. Set the dictionary size to about 500, or experiment with several different sizes.

  4. Feature quantization and histogram computation. For each feature in a training or a test image, find the index of the nearest codevector in the dictionary. You may want to use this code for fast computation of squared Euclidean distances between two sets of vectors (i.e., all descriptors in an image and the codebook). Hint: if you are implementing k-means, this code should be very useful for that as well. Following quantization, represent each image by the histogram of these indices (check out MATLAB's hist function). Because different images can have different numbers of features, the histograms should be normalized to sum to one.

  5. Classifier training. The simplest options for this part of the assignment are a k-nearest-neighbor (kNN) classifier or a Naive Bayes classifier. For a kNN classifier, experiment with different distances discussed in class, such as L1 and chi2. For better performance, you should try to download and use a support vector machine (SVM) classifier. Here is one SVM package that is fairly easy to integrate with MATLAB.


For full credit, you should implement a working, fully documented system by making a single implementation choice for each of the above components, and obtain results that are (significantly) above chance. The performance of your system should be measured in terms of the classification rate, or the percentage of all test images correctly classified by your system. Please make sure that the best classification rate achieved by your algorithm is prominently featured in the report.

For bonus points, you should compare performance of different implementation choices for one or more system components, and/or investigate the effect of important system parameters (dictionary size, number of training images used, k in kNN, etc.). Wherever relevant, feel free to discuss computation time in addition to classification rate.

The grading will be based primarily on your report. I do not intend to run your code, though you must include it, and I may be looking at some parts of it. The report should thoroughly document everything you implemented and all important experimental findings (recognition rates for different versions of features, descriptors, classifiers, etc.). If you download code from the Web, state exactly where you downloaded and how you used the code. DO NOT download somebody else's complete recognition system, only individual pieces that help with some aspects of the assignment.


In an attempt to make this assignment more fun and exciting, I am adding a competition aspect. The person who achieves the highest classification rate on the dataset will receive bonus points and a valuable prize that will be disclosed by me on the last day of class. Apart from the competition, the classification rate of your algorithm will not be strongly considered as part of your grade, unless it is a reflection of serious implementation mistakes.

Turning in the Assignment

As usual, please email me or post on the Web your report in PDF and your code. The firm deadline is 5PM, Tuesday, April 21. If you think you might have a problem meeting this deadline, please let me know ahead of time! Late assignments will not be accepted without prior arrangement.