Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

NTU Robot Learning Lab

Hierarchical Programmatic Reinforcement Learning
via Learning to Compose Programs

ICML 2023

Guan-Ting Liu^*
En-Pei Hu^*
Pu-Jen Cheng
Hung-yi Lee
Shao-Hua Sun

National Taiwan University

We re-formulate solving a reinforcement learning task as synthesizing a task-solving program that can be executed to interact with the environment and maximize the return. We first learn a program embedding space that continuously parameterizes a diverse set of programs sampled from a program dataset. Then, we train a meta-policy, whose action space is the learned program embedding space, to produce a series of programs (i.e., predict a series of actions) to yield a composed task-solving program.