COMP 590096 Fall 2011
Assignment 3: Naive Bayes Classification
Due date: Thursday, November 17, 11:59 PM
The goal of this assignment is to implement a Naive Bayes classifier as described
in this lecture and to apply it to
the task of classifying handwritten digits:
(Adapted from Berkeley CS 188 project 5)
Detailed Instructions
 Data: This file is a zip archive containing
training and test digits, together with their ground truth labels (see readme.txt in the zip
archive for an explanation of the data format). There are 5000 training exemplars (roughly 500
per class) and 1000 test exemplars (roughly 100 per class).
 Features: The basic feature set consists of a single binary indicator feature
for each pixel. Specifically, the feature F_{ij} indicates the status of the (i,j)th
pixel. Its value is 1 if the pixel is foreground, and 0 if it is background.
 Training: The goal of the training stage is to estimate the likelihoods
P(F_{ij}  class) for every pixel location (i,j) and for every
digit class from 0 to 9. The likelihood estimate is defined as
P(F_{ij} = f  class) = (# of times pixel (i,j) has value f in training examples from this class) /
(Total # of training examples from this class).
In addition, as discussed in the lecture, you have to smooth the likelihoods to
ensure that there are no zero counts. Laplace smoothing is a very simple method
that increases the observation count of every value f by some constant k. This corresponds to adding
k to the numerator above, and k*V to the denominator (where V is the number of possible
values the feature can take on). The higher the value of k, the stronger the smoothing. Experiment with different integer
values of k (say, from 1 to 50) and find the one that gives the highest classification accuracy.
You should also estimate the priors P(class) by the empirical
frequencies of different classes in the training set.
 Testing: You will perform maximum a posteriori (MAP) classification of test digits
according to the learned Naive Bayes model. Suppose a test image has feature values f_{1,1},
f_{1,2}, ... , f_{28,28}. According to this model, the posterior probability (up to scale)
of each class given the digit is given by
P(class) ⋅ P(f_{1,1}class) ⋅ P(f_{1,2}class) ⋅ ...⋅
P(f_{28,28}  class).
Note that in order to avoid underflow, you need to work with the log of the above quantity:
log P(class) + log P(f_{1,1}class) + log P(f_{1,2}class) + ... +
log P(f_{28,28}  class).
After you compute the above decision function values for all ten classes for every test image, you will use
them for MAP classification. You should also try maximum likelihood (ML) classification omitting
the prior term from the above, and report whether this makes a lot of difference in the results.
 Evaluation: Use the true class labels of the test images from the testlabels
file to check the correctness of the estimated label for each test digit. Report your performance in terms
of the classification rate for each digit (percentage of all test images of a given digit correctly
classified). Also report your confusion matrix. This is a 10x10 matrix whose entry in row r and column c
is the percentage of test images from class r that are classified as class c. In addition, for each digit class,
show the test example with the highest posterior probability (i.e., the most "prototypical" instance
of that digit), as well as some "interesting" examples of incorrect classifications made by your system.
Important: The ground truth labels of test images should be used only
to evaluate classification accuracy. They should not be used in any way during the decision process.
Tip: You should be able to achieve at least 70% accuracy on the test set.
One "warning sign" that you have a bug in your implementation is if some digit gets 100% or 0% classification
accuracy (that is, your system either labels all the test images as the same class, or never wants to label
any test images as some particular class).
 Odds ratios: When using classifiers in real domains, it is important to be able to inspect
what they have learned. One way to inspect a naive Bayes model is to look at the most likely features for a given label.
Another tool for understanding the parameters is to look at odds ratios. For each pixel feature
F_{ij} and pair of classes c_{1}, c_{2}, the odds ratio is defined as
odds(F_{ij}=1, c_{1}, c_{2}) =
P(F_{ij}=1  c_{1}) / P(F_{ij}=1  c_{2}).
This ratio will be greater than one for features which cause belief in c_{1} to increase over
the belief in c_{2}. The features that have the greatest impact on classification are those
with both a high probability (because they appear often in the data) and a high odds ratio
(because they strongly bias one label versus another).
Take four pairs of digits that have the highest confusion rates according to your confusion matrix,
and for each pair, display the maps of feature likelihoods for both classes as well as the odds ratio
for the two classes. For example, the figure below shows the log likelihood maps for 1 (left), 8 (center),
and the log odds ratio for 1 over 8 (right):
If you cannot do a graphical display like the one above, you can display the maps in ASCII format
using some coding scheme of your choice. For example, for the odds ratio map, you can use '+' to
denote features with positive log odds, ' ' for features with log odds close to 1, and '' for
features with negative log odds.
For Extra Credit
 Experiment with more sophisticated features to improve the accuracy of the Naive Bayes model.
In particular, consider nonbinary features or features that combine the values of several pixels.
You can also try to implement a bag of features model (see
this lecture from my
computer vision class for details).
 Apply the Naive Bayes model to the task of classifying faces vs. nonfaces. See the
Berkeley COMP 188 Project 5 page
for the data and details of the task.
Submission Instructions
As usual, you will submit through Blackboard a report in PDF format
named lastname_firstname_assignment3.pdf and your source code
in ZIP format, named lastname_firstname_assignment3.zip.
Late policy: You lose 25% of the points for every day the assignment is late.
If you have a compelling reason for not being able to submit the assignment on time
and would like to make a special arrangement, you must let me know at least a week before the due date
(any genuine emergency situations will be handled on an individual basis).
Academic integrity: Feel free to discuss the assignment with each other in general terms,
and to search the Web for general guidance (not for complete solutions). Coding should be done
individually. If you make substantial use of some code snippets or information from outside sources,
be sure to acknowledge the sources in your report.
