MPA: Model-Based Policy Adaptation for Closed-Loop End-to-End Autonomous Driving

1CMU, 2Stanford, 3NVIDIA

MPA enables E2E driving agents to adapt to challenging scenarios in closed-loop simulation.

Abstract

End-to-end (E2E) autonomous driving models have demonstrated strong performance in open-loop evaluations but often suffer from cascading errors and poor generalization in closed-loop settings. To address this gap, we propose Model-based Policy Adaptation (MPA), a general framework that enhances the robustness and safety of pretrained E2E driving agents during deployment.

MPA first generates diverse counterfactual trajectories using a geometry-consistent simulation engine, exposing the agent to scenarios beyond the original dataset. Based on this generated data, MPA trains a diffusion-based policy adapter to refine the base policy's predictions and a multi-step Q value model to evaluate long-term outcomes. At inference time, the adapter proposes multiple trajectory candidates, and the Q value model selects the one with the highest expected utility. Experiments on the nuScenes benchmark using a photorealistic closed-loop simulator demonstrate that MPA significantly improves performance across in-domain, out-of-domain, and safety-critical scenarios. We further investigate how the scale of counterfactual data and inference-time guidance strategies affect overall effectiveness.

Motivation: Performance in Closed-Loop Evaluation

We compare the L2 error of the E2E policy under different data sources on the right figure. We use a 3DGS-based simulator HUGSIM to reconstruct the scenes from the nuScenes dataset and compare the performance difference using ground truth data among all three E2E policies, we see a very close open-loop performance in motion prediction. This confirms the fidelity of the 3DGS-based simulation in its reconstruction quality.

We also find that the open-loop pretrained E2E driving agents like UniAD have performance degradation in closed-loop evaluation due to the distribution shift caused by compounding errors.

We further visualize some qualitative examples for the failure modes below.

Closed-Loop Error Analysis

Driving off the road to non drive-able area near a construction zone.

Crashing on a stopping vehicle ahead with no collision avoidance.

Core Contribution

Methodology Overview

MPA includes three key modules: (i) Counterfactual Data Generation to diversify the ego's behavior, (ii) a Diffusion-based Policy Adapter to refine the base policy's actions, and (iii) a Multi-step Q Value Model to evaluate and select the best action sequence at inference time. For more details, welcome to check our paper.

MPA Methodology Overview

Key Experiment Results

MPA-enhanced End-to-End driving agents outperform the base policies across in-domain, out-of-domain, and safety-critical scenarios in closed-loop simulation.

MPA Methodology Overview
MPA Methodology Overview

Qualitative results of the Closed-Loop Evaluation

Related Links

There's a lot of excellent work that provides useful resources or share similar insights with our MPA framework. HUGSIM introduces an decomposed scene representation in the 3DGS to render photorealistic driving scenes, which paves our way to generate counterfactual driving scenarios. DiffusionDrive introduces a truncated diffusion policy framework to enhance the robustness of E2E driving agents. RAP introduces a rasterization-augmented planning to augment the ego's policy with diverse behaviors from surrounding vehicles to bridge the sim-to-real gap of E2E planning.

Our experiment is conducted using open-source E2E models, including UniAD, VAD, LTF, and LAW.

BibTeX

@inproceedings{lin2025modelbased,
  title     = {Model-Based Policy Adaptation for Closed-Loop End-to-end Autonomous Driving},
  author    = {Haohong Lin and Yunzhi Zhang and Wenhao Ding and Jiajun Wu and Ding Zhao},
  booktitle = {The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year      = {2025},
  url       = {https://openreview.net/forum?id=4OLbpaTKJe}
}