Diffuser: Reinforcement Learning with Diffusion Models

Planning with Diffusion for Flexible Behavior Synthesis

*equal contribution


Planning as denoising

Diffuser is a denoising diffusion probabilistic model that plans by iteratively refining randomly sampled noise. The denoising process lends itself to flexible conditioning, by either using gradients of an objective function to bias plans toward high-reward regions or conditioning the plan to reach a specified goal.




replay       replay


Variable-length planning

Diffuser's planning horizon is determined by the size of the random noise used to initialize the denoising process.



Flexible behavior synthesis

Diffuser acts as an unconditional prior over possible behaviors. We can plan for new test-time tasks by guiding its sampled plans with reward functions or constraints. All of the plans below are executed by a single model.


Unconditional stacking: Maximize the height of a block tower, with no further constraints.


 

Conditional stacking: Stack towers subject to test-time constraints.

replay






Planning with Diffusion for Flexible Behavior Synthesis
*equal contribution