Model-free diffusion planners have shown great promise for robot motion planning, but practical robotic systems often require combining them with model-based optimization modules to enforce constraints, such as safety. Naively integrating these modules presents compatibility challenges when diffusion's multi-modal outputs behave adversarially to optimization-based modules. To address this, we introduce Joint Model-based Model-free Diffusion (JM2D), a novel generative modeling framework. JM2D formulates module integration as a joint sampling problem to maximize compatibility via an interaction potential, without additional training. Using importance sampling, JM2D guides modules outputs based only on evaluations of the interaction potential, thus handling non-differentiable objectives commonly arising from non-convex optimization modules. We evaluate JM2D via application to aligning diffusion planners with safety modules on offline RL and robot manipulation. JM2D significantly improves task performance compared to conventional safety filters without sacrificing safety. Further, we show that conditional generation is a special case of JM2D and elucidate key design choices by comparing with SOTA gradient-based and projection-based diffusion planners.
Problem. Model-based planners are safe but less capable; model-free diffusion planners are capable but unsafe. How can we mutually align model-free diffusion planner and model-based optimization module?
Key idea. Treat planning (x) and optimization variables (k) as a joint distribution with an interaction potential that scores their compatibility. Then sample (x, k) jointly with diffusion—no extra training. We estimate the joint denoising score via Monte Carlo using only zeroth-order evaluations, enabling performance–safety reasoning that is gradient-free and training-free.
Overview. We provide the following toy problem as an illustrative example.
Setting. A model-free planner pθ(x) samples start–goal pairs (●, ●) in a donut-shaped region (gray). A model-based optimization module finds a waypoint (★) so that the path (●-★-●) is the longest collision-free route that avoids inference-time obstacles (transparent red circles).
Result. Sequential sampling often produces infeasible pairs (i.e., the path intersects with obstacles). In contrast, JM2D jointly searches over (𝑥,𝑘), achieving the highest rate of compatible samples.
Setting. The robot must pick and place a mug while avoiding novel obstacles that appear only at inference time. Diffusion Policy (DP) is trained without these obstacles; we compare JM2D against DP and a conventional safety filter.
Result. DP collides with the new obstacles. The safety filter avoids collisions but requires frequent interventions, increasing task completion time. JM2D experiences 3x less safety interventions and 2x faster task completion time, all in real time.
Red = safety violation Green Circle = intervened Blue Circle = not intervened
Setting. The robot must reach its goal (red) while avoiding walls, even when they are padded at inference time. Diffusion Policy (DP) is trained only on the unpadded environment; we compare JM2D against DP and a safety filter, under various paddings. The video goes red when safety is violated. Green circle means intevened, blue circle means not intervened.
Result. JM2D balances task completion and safety in a unified way. DP alone fails when unseen obstacles appear, while the conventional safety filter, RAIL, improves safety but reduces performance. By jointly sampling plans and safety corrections, JM2D consistently achieves higher success rates, fewer interventions, and faster task completion.
Overview. No backup policy? You can still use JM2D. As shown in paper, JM2D subsumes conditional generation and model-based diffusion, so you can deploy only the conditional-generation component when a backup policy is unavailable. We benchmark this setting against prior diffusion-based motion planners with safety constraints.
Setting. The robot must reach the green line without contacting red obstacles. At inference, additional blue obstacles are introduced as safety constraints. We evaluate four scenarios: the three from the DPCC benchmark and a new Cluttered case. Baselines include a gradient-based planner (MPD) and projection-based diffusion planners (DPCC, SafeDiffuser).
Result. JM2D yields higher success with fewer constraint violations. It produces higher-fidelity, dynamically feasible plans while respecting safety constraints, outperforming all baselines.
The most appealing property of JM2D is that it is training-free and gradient-free. This allows us to use many combinations of modules in a modular way.
@inproceedings{
jung2025joint,
title={Joint Model-based Model-free Diffusion for Planning with Constraints},
author={Wonsuhk Jung and Utkarsh Aashu Mishra and Nadun Ranawaka Arachchige and Yongxin Chen and Danfei Xu and Shreyas Kousik},
booktitle={9th Annual Conference on Robot Learning},
year={2025},
url={https://openreview.net/forum?id=E9t1ekt6W9}
}