Planning Project

Learning Behavior for Autonomous Agents using Multi-Armed Bandit Algorithms

Project Proposal
~~Learning to Plan with Desperation Planning~~

Learning RVO

UPDATE

Project change requested, so persuing learning RVO for final project

Intro

All funny working names aside, while working on my presentation on Machine learning and Motion Planning, I ran accross a paper which uses an idea called Viability Filtering. Essentially, this technique removes the clear query and replaces it with the viability query, where viability uses a binary classifier trained via ML techniques. The paper presents some difficulties transferring learned viability between vastly different scenarios. For example viability trained on large passages being used for small passages, and vice versa, etc. There also could be an issues of classification error as a failure of viability is treated exactly the same as a failure of the clear query. Moreover, the technique is not probabilistically complete, on the authors project website under the FAQ he does respond to this problem by posing the use of a standard planning algorithm in parallel.

Upon further thought I realized that while the original topic is still interesting to pursue, it might be more interesting to look into an ML based planner, inspired by Viability filtering.

The fist step was actually coming up with a ML based planning algorithm This algorithm is fundamentally different from Viability filtering as it uses the idea that we can have real valued ML classifiers. Here is a sketch of the algorithm.

As in Viability Filtering, we replace the clear query with a viability query, which in the following we will refer to as VQ. VQ is a real valued function with range range (0,1), which given a feature vector of the current state of the environment tells you the probability of the feasibility such that 1 is highest probability 0 is least. The algorithm makes a decision on wether to keep this point or throw it away based on desperation, a value in (0,1) that decreases with respect to time. Essentially what will happen is the planner with start off only expanding paths that are highly probable. As time progresses the planner will become more desperate expanding less likely paths. Interestingly, this still uses some of the same techniques from the multi armed bandit problem, with respect to changing the desperation parameter, only with a different concept of reward.

One thing to note, is once the the desperation parameter is 0 then the algorithm is using what every probabilistic method it is sitting on top of.

Itemized progress report for Oct 31:

Successes

Tasks which I have completed so far:

Algorithm - I also have some ideas of ways to extend this once I have the general framework created.
Experimented with various ML libraries
Decided on using SVM's for my ML algorithm
Deciding on svm_lite for a SVM library

Under development

What I am currently working on:

Creating a prototype implementation of the algorithm using the SBL planner of MPK as a base
Gathering data to use for training

Goals

What I want to get done by the last day of Thanksgiving vacation:

Finish implementation on SBL, train models and gather results
Request RRT-Blossom with Viability Filtering for comparison.

Final project goal

What I hope to achieve with my project

Achieve positive results for domain transfer

Planning Project

Learning Behavior for Autonomous Agents using Multi-Armed Bandit Algorithms

Project Proposal

Learning to Plan with Desperation Planning