Diffuser: Reinforcement Learning with Diffusion Models

Planning with Diffusion for Flexible Behavior Synthesis

Michael Janner^*, Yilun Du^*, Joshua Tenenbaum, and Sergey Levine

ICML 2022 (long talk) Paper Code Colab BibTex

*equal contribution

Planning as denoising

Diffuser is a denoising diffusion probabilistic model that plans by iteratively refining randomly sampled noise. The denoising process lends itself to flexible conditioning, by either using gradients of an objective function to bias plans toward high-reward regions or conditioning the plan to reach a specified goal.

replay

Variable-length planning

Diffuser's planning horizon is determined by the size of the random noise used to initialize the denoising process.

Flexible behavior synthesis

Diffuser acts as an unconditional prior over possible behaviors. We can plan for new test-time tasks by guiding its sampled plans with reward functions or constraints. All of the plans below are executed by a single model.

Unconditional stacking: Maximize the height of a block tower, with no further constraints.