COMP 590-096 Fall 2010

Assignment 4

Due date: Tuesday, December 14, 11:59 PM

Submission Instructions

This is the final assignment. There are four problems each worth 20 points, and the total assignment is worth 40 points. That is, you only need to do any two problems for full credit, but if you turn in additional work, you can get up to 40 points extra credit. Problems 1 and 2 are programming exercises, and problems 3 and 4 are written exercises.

As usual, you should turn in a report file named lastname_firstname_assignment4.pdf or lastname_firstname_assignment4.doc, and a source file named lastname_firstname_assignment4.zip. Note that if you do not choose to do any of the programming problems, you do not have to turn in a source file. Submission is via Blackboard as before.

Late policy: There will be absolutely no extensions.

Policy on outside sources: As usual, do not submit anybody else's work as your own and acknowledge all outside sources used in the preparation of this assignment.

Problem 1 (Programming): Naive Bayes Text Classification

In this problem, your goal is to implement a Naive Bayes text classifier as described in this lecture. The two target classes are:
  1. NIPS: Research articles published in the proceedings of Neural Information Processing Systems, a machine learning conference. There are 1500 documents in total, 100 of which will be used for training and the rest for testing.

  2. NSF: Abstracts of research projects funded by the National Science Foundation. There are 5000 documents in total, 1000 of which will be used for training and the rest for testing (there is more training data for this class because its documents tend to be shorter, and so give fewer word occurrence samples).

Details

  • Data: This file (4.2MB) contains the documents from both classes already pre-processed into a bag-of-words format, the vocabulary words, and the indices of the files to be used for training. Please see the readme.txt file in the archive for details of the data format.

  • Training: The goal of the training stage is to estimate the likelihoods P(word | class1) and P(word | class2) for all the vocabulary words. As explained in the lecture, the likelihood estimate for a given word is:

    P(W | class) = (Total # of occurrences of W in training documents from the class) / (Total # of word occurrences in the training documents from the class).

    In addition, as discussed in the lecture, you have to smooth the likelihoods to ensure there are no zero counts. Laplacian smoothing is a very simple method that increases the observation count of every word by one. This corresponds to adding 1 to the numerator above, and V to the denominator (where V is the number of words in the vocabulary).

  • Testing: You will perform maximum likelihood classification of test documents according to the learned Naive Bayes model. That is, you will not be using a prior for each document class, or assuming that the prior is uniform (0.5 chance of observing a document from each class). Suppose a document D consists of words W1, ..., WV with counts C1, ..., CV. Then the likelihood of the document given a class is:
    P(D | class) = P(W1 | class)C1 ⋅ ...⋅ P(WV | class)CV.
    As explained in the lecture slides, though, in order to avoid underflow, you need to work with the log of the above quantity. After you compute the log-likelihoods for both classes for every test image, you will classify that image as the class for which it gets the highest likelihood. Because you know which file the document actually came from, you know its true class and can use this information to check whether your classification is correct. Important: Don't forget to exclude the training documents from the test set!

What to put in the report

  1. For each of the two classes, display the top 50 words that get the highest likelihood. Intuitively, these are the most "characteristic" words for those classes. (To save space, separate words by spaces instead of line breaks. Or better yet, do some kind of "tag cloud" display where the size of the word corresponds to its likelihood).

  2. For each of the two classes, display the top two test documents that get the highest likelihood scores. These are the most "characteristic" documents for their classes.

  3. Report the number of misclassifications for test documents from each of the classes (that is, how many NIPS documents were misclassified as NSF and vice versa). Hint: this is a pretty easy text classification problem, so the number of mistakes should be pretty low.

  4. Display each misclassified document. Do the mistakes make sense?

  5. (Optional, potentially worth some extra credit) Any additional analysis or data exploration you think is interesting, for example, plots of distributions of document likelihoods, relationship between document class and document length, classification performance with smaller training set sizes, etc. Since this text classification problem is pretty easy, is there some way to use an even simpler document model and still do well? For example, can we ignore the word counts and simply look at whether given words are present or absent? Can we reduce the vocabulary to only the most "informative" words for each class, and how can we find such words?


Problem 2 (Programming): MDPs and Reinforcement Learning

Consider the following environment, similar to the one in Section 17.1 of the textbook:



As in the textbook, the transition model is as follows: the intended outcome occurs with probability 0.8, and with probability 0.1 the agent moves at either right angle to the intended direction. If the move would make the agent walk into a wall, the agent stays in the same place as before. The rewards for the non-terminal states are -0.04.
  1. Assuming the known transition model and reward function listed above, find the optimal policy and the utilities of all the states using value iteration or policy iteration. Display the optimal policy and the utilities of all the states, and plot utility estimates as a function of the number of iterations as in Figure 17.5(a) (for value iteration, you should need no more than 50 iterations to get convergence). In this question and the next one, use a discount factor of 0.99. Click here for reference utility values to help you check the correctness of your answer.

  2. Now consider the reinforcement learning scenario in which the transition model and the reward function are unknown to the agent, but the agent can observe the outcome of its actions and the reward received in any state. (In the implementation, this means that the successor states and rewards will be given to the agent by some black-box functions whose parameters the agent doesn't have access to.) Use TD Q-learning, as in Section 21.3.2, to learn an action-utility function. Experiment with different parameters for the exploration function f as defined in Section 21.3.1, and report which choice works the best. For the learning rate function, start with the suggestion on p. 837 of the book, i.e., alpha(t) = 60/(59+t), and play around with the numbers to get the best behavior.

    In the report, plot utility estimates and their RMS error as a function of the number of trials (you will probably need several thousand), as in Figure 21.7. RMS stands for root mean squared error:

    where U'(s) is the estimated utility of state s, U(s) is the "true" utility as determined by value iteration, and N is the number of states.

  3. (Optional; for more extra credit) Design a more complicated maze environment of your own and run algorithms from 1 and 2 on it. How does the number of states and the complexity of the environment affect convergence? How complex can you make the environment and still be able to learn the right policy?

  4. (Optional; for more extra credit) Experiment with any advanced methods from Chapters 17 and 21 of the textbook. For example, instead of Q-learning, try to use an explicit state-based representation and learn the transition function of the environment.


Problem 3 (Written): 20 Questions

Consider the problem of writing an AI program to play the 20 Questions Game. Such programs exist, for example, at 20q.net. Write a detailed proposal of how you would go about implementing such a program. Feel free to search the Web for information on how such programs may work (and be sure to reference any outside material you consulted), but do not constrain yourself to describing an existing solution. Instead, you should try to come up with your own solution based on what you have learned in this course or from other sources. Your proposal should include the following aspects:
  • The problem domain: In which domain would you want to play the game, and why? What considerations are important to you in selecting the target domain?

  • The model: How would you formulate the problem? What would be the underlying representation (model structure) used in your program, and what parameters does it have?

  • Model instantiation or learning: How would you instantiate the parameters of the model or the knowledge base that it needs to operate on? If you plan to acquire these parameters from training data, what is a feasible data source (give specific data collection scenarios or URLs of possible sources)? How much training data do you think would be required to learn a good model, and can your proposed source supply this amount?

  • Testing or inference: Assuming you have your knowledge base and/or model parameters, what strategy would you use to ask the user questions and how would you compute the answer? How would you deal with noise in your knowledge base or unreliable user responses?

  • Continuous improvement: How can you use user input to improve performance? Can you use it to grow the knowledge base, to adjust the model parameters, or to learn a better question-asking policy?
If you are not certain of what is the best solution for some of the above questions, you can list and discuss several alternatives with their pros and cons. But for the most part, you should try to commit yourself to a single overall solution and flesh it out as much as possible.

If you don't like the 20 Questions game and want to write a proposal for a different AI system, please let me know ASAP and we can discuss the possibility individually.

The target length of the writeup should be at least three pages, single-spaced, 1-inch margins, 11-point font. The more specific and detailed you are, the more points you will receive.

Problem 4 (Written): Recent Advances in AI

Choose at least three items from the class discussion board on AI in the news/on the Web. These items should all share the same theme, e.g., autonomous driving, game playing, etc. If there aren't three items for some topic you are interested in, feel free to post more until there are enough! Having chosen the items, write a summary or synthesis report comparing and contrasting them. In your writeup, try to touch on as many of the following aspects as possible:
  • Problem definition: give an agent-based formulation of the overall problem addressed in your chosen articles or web links. It doesn't have to be a complete "PEAS-style" specification, but be sure to touch on the most relevant and interesting characteristics of the problem, the environment, and the performance measure. What are the key technical challenges in solving this problem?

  • Technical solution: give an outline of the solution(s) proposed in the articles or web links. Be as specific as possible about the technical components of that solution, including algorithms, hardware, data, etc. If you feel that insufficient detail is given in the items themselves, try to gather additional information from related literature or make an educated guess based on your general knowledge and/or what you've learned from the class. What are the most interesting/revolutionary aspects of the technical approach? What are the similarities/differences between the different items you chose?

  • Connection to class material: How do the solution components relate to topics or themes covered in this class? Be as specific and detailed as possible.

  • Past context: What accounts for the recent progress made on this problem? Possible factors may include (but are not limited to) advances in knowledge, changes in scientific paradigm, increases in computing power, and increasing availability of training data. List all that are relevant and elaborate on them.

  • Future trends: What future developments do you anticipate for this problem/ technology? Is it basically "solved," or if not, what challenges still need to be overcome? How quickly will progress continue to be made on this problem?
The target length of the writeup should be at least three pages, single-spaced, 1-inch margins, 11-point font. The more specific and detailed you are, the more points you will receive.