GRASP Home Page

Home | Goals | People  Research | Publications | Robot Platforms | Multimedia | Links | Grant

Reinforcement Learning

Greg Grudic, Lyle Ungar, Vijay Kumar


Traditional instantiations of RL algorithms appear incompatible with robotics: successful RL implementations are typically characterized by a) small discrete state spaces, b) hundreds of thousands of learning runs are used, and c) exploration is done using stochastic search. In comparison, robot control is characterized by a) large, noisy continuous state spaces, b) only limited learning runs are possible, and c) random  actions can result in dangerous or expensive outcomes. We are currently developing new RL algorithms that are specifically intended for large  continuous state spaces of the types typically found in robotics. Our first results include 1) Boundary Localized Reinforcement Learning (BLRL) and 2) Action Transition Policy Gradient (ATPG). BLRL develops a policy gradient framework for mode switching in high dimensional state spaces, and shows that search can be made computationally tractable even in very high dimensional state spaces through the use of deterministic modes.   Finally, BLRL shows the locally optimal mode switching policies can by found by restricting search to near mode boundaries.

ATPG is a policy gradient algorithm that is theoretically guaranteed to find locally optimal policies and can be applied to both deterministic mode switching controllers and stochastic controllers. By restricting policy gradients (PG) estimates to when relative estimates of the value of executing actions is available (which in continuous state spaces corresponds to when the agent changes actions), ATPG converges orders of magnitude faster than traditional PG algorithms such as REINFORCE, as well as newer algorithms which use function approximation techniques to improve convergence.



Home | Goals | People  Research | Publications | Robot Platforms | Multimedia | Links | Grant