Project change requested, so persuing learning RVO for final project
All funny working names aside, while working on my presentation on Machine learning and Motion Planning, I ran accross a paper which uses an idea called Viability Filtering. Essentially, this technique removes the clear query and replaces it with the viability query, where viability uses a binary classifier trained via ML techniques. The paper presents some difficulties transferring learned viability between vastly different scenarios. For example viability trained on large passages being used for small passages, and vice versa, etc. There also could be an issues of classification error as a failure of viability is treated exactly the same as a failure of the clear query. Moreover, the technique is not probabilistically complete, on the authors project website under the FAQ he does respond to this problem by posing the use of a standard planning algorithm in parallel.
Upon further thought I realized that while the original topic is still interesting to pursue, it might be more interesting to look into an ML based planner, inspired by Viability filtering.
The fist step was actually coming up with a ML based planning algorithm This algorithm is fundamentally different from Viability filtering as it uses the idea that we can have real valued ML classifiers. Here is a sketch of the algorithm.
As in Viability Filtering, we replace the clear query with a viability query, which in the following we will refer to as VQ. VQ is a real valued function with range range (0,1), which given a feature vector of the current state of the environment tells you the probability of the feasibility such that 1 is highest probability 0 is least. The algorithm makes a decision on wether to keep this point or throw it away based on desperation, a value in (0,1) that decreases with respect to time. Essentially what will happen is the planner with start off only expanding paths that are highly probable. As time progresses the planner will become more desperate expanding less likely paths. Interestingly, this still uses some of the same techniques from the multi armed bandit problem, with respect to changing the desperation parameter, only with a different concept of reward.
One thing to note, is once the the desperation parameter is 0 then the algorithm is using what every probabilistic method it is sitting on top of.