Joint Model-based Model-free Diffusion for Planning with Constraints

Wonsuhk Jung*¹, Utkarsh Mishra*¹, Nadun Ranawaka¹,

Yongxin Chen†¹, Danfei Xu†¹, Shreyas Kousik†¹,

¹Georgia Institute of Technology

* Equal authorship, † Equal advising,

The link is in preparation

How do we bridge the gap between what robots can achieve and what they can safely execute?

We propose Joint Model-based Model-free Diffusion (JM2D): a unified diffusion framework that combines the expressiveness of model-free planning with the safety guarantees of model-based optimization, enabling robots to achieve both performance and safety.

Abstract

Model-free diffusion planners have shown great promise for robot motion planning, but practical robotic systems often require combining them with model-based optimization modules to enforce constraints, such as safety. Naively integrating these modules presents compatibility challenges when diffusion's multi-modal outputs behave adversarially to optimization-based modules. To address this, we introduce Joint Model-based Model-free Diffusion (JM2D), a novel generative modeling framework. JM2D formulates module integration as a joint sampling problem to maximize compatibility via an interaction potential, without additional training. Using importance sampling, JM2D guides modules outputs based only on evaluations of the interaction potential, thus handling non-differentiable objectives commonly arising from non-convex optimization modules. We evaluate JM2D via application to aligning diffusion planners with safety modules on offline RL and robot manipulation. JM2D significantly improves task performance compared to conventional safety filters without sacrificing safety. Further, we show that conditional generation is a special case of JM2D and elucidate key design choices by comparing with SOTA gradient-based and projection-based diffusion planners.

Video

Method: How does JM2D work?

A unified diffusion framework for planning and optimization

Problem. Model-based planners are safe but less capable; model-free diffusion planners are capable but unsafe. How can we mutually align model-free diffusion planner and model-based optimization module?

Key idea. Treat planning (x) and optimization variables (k) as a joint distribution with an interaction potential that scores their compatibility. Then sample (x, k) jointly with diffusion—no extra training. We estimate the joint denoising score via Monte Carlo using only zeroth-order evaluations, enabling performance–safety reasoning that is gradient-free and training-free.

🍩 Toy Example: Donut

Overview. We provide the following toy problem as an illustrative example.

JM2D (Ours)
Sequential Sampling

Setting. A model-free planner p_θ(x) samples start–goal pairs (●, ●) in a donut-shaped region (gray). A model-based optimization module finds a waypoint (★) so that the path (●-★-●) is the longest collision-free route that avoids inference-time obstacles (transparent red circles).

Result. Sequential sampling often produces infeasible pairs (i.e., the path intersects with obstacles). In contrast, JM2D jointly searches over (𝑥,𝑘), achieving the highest rate of compatible samples.

Results: What can JM2D do?

(1) JM2D increases system performance under strict safety guarantee

Safety filter as JM2D. Classic safety filters pair a learning policy with a verified backup: if the learned action is unsafe, the backup overrides it, guaranteeing safety but often degrading performance when objectives conflict. JM2D inserts joint sampling between the two, reducing policy–backup conflict while preserving safety guarantees. We evaluate on (1) safe manipulation with novel obstacles and (2) safe navigation in novel mazes.

Setting. The robot must pick and place a mug while avoiding novel obstacles that appear only at inference time. Diffusion Policy (DP) is trained without these obstacles; we compare JM2D against DP and a conventional safety filter.

Result. DP collides with the new obstacles. The safety filter avoids collisions but requires frequent interventions, increasing task completion time. JM2D experiences 3x less safety interventions and 2x faster task completion time, all in real time.

Safe Navigation in Novel Mazes

JM2D (Ours)
Diffusion Policy (DP)
DP + Safety Filter

Red = safety violation Green Circle = intervened Blue Circle = not intervened

Setting. The robot must reach its goal (red) while avoiding walls, even when they are padded at inference time. Diffusion Policy (DP) is trained only on the unpadded environment; we compare JM2D against DP and a safety filter, under various paddings. The video goes red when safety is violated. Green circle means intevened, blue circle means not intervened.

Result. JM2D balances task completion and safety in a unified way. DP alone fails when unseen obstacles appear, while the conventional safety filter, RAIL, improves safety but reduces performance. By jointly sampling plans and safety corrections, JM2D consistently achieves higher success rates, fewer interventions, and faster task completion.

(2) JM2D generalizes Conditional Generation and Model-based Diffusion

Overview. No backup policy? You can still use JM2D. As shown in paper, JM2D subsumes conditional generation and model-based diffusion, so you can deploy only the conditional-generation component when a backup policy is unavailable. We benchmark this setting against prior diffusion-based motion planners with safety constraints.

Setting. The robot must reach the green line without contacting red obstacles. At inference, additional blue obstacles are introduced as safety constraints. We evaluate four scenarios: the three from the DPCC benchmark and a new Cluttered case. Baselines include a gradient-based planner (MPD) and projection-based diffusion planners (DPCC, SafeDiffuser).

Result. JM2D yields higher success with fewer constraint violations. It produces higher-fidelity, dynamically feasible plans while respecting safety constraints, outperforming all baselines.

(3) JM2D is training-free and gradient-free

The most appealing property of JM2D is that it is training-free and gradient-free. This allows us to use many combinations of modules in a modular way.

Contact

For more discussion and questions, please contact Wonsuhk Jung

BibTeX

@inproceedings{
    jung2025joint,
    title={Joint Model-based Model-free Diffusion for Planning with Constraints},
    author={Wonsuhk Jung and Utkarsh Aashu Mishra and Nadun Ranawaka Arachchige and Yongxin Chen and Danfei Xu and Shreyas Kousik},
    booktitle={9th Annual Conference on Robot Learning},
    year={2025},
    url={https://openreview.net/forum?id=E9t1ekt6W9}
}