MPA: Model-Based Policy Adaptation for Closed-Loop End-to-End Autonomous Driving

Abstract

End-to-end (E2E) autonomous driving models have demonstrated strong performance in open-loop evaluations but often suffer from cascading errors and poor generalization in closed-loop settings. To address this gap, we propose Model-based Policy Adaptation (MPA), a general framework that enhances the robustness and safety of pretrained E2E driving agents during deployment.

MPA first generates diverse counterfactual trajectories using a geometry-consistent simulation engine, exposing the agent to scenarios beyond the original dataset. Based on this generated data, MPA trains a diffusion-based policy adapter to refine the base policy's predictions and a multi-step Q value model to evaluate long-term outcomes. At inference time, the adapter proposes multiple trajectory candidates, and the Q value model selects the one with the highest expected utility. Experiments on the nuScenes benchmark using a photorealistic closed-loop simulator demonstrate that MPA significantly improves performance across in-domain, out-of-domain, and safety-critical scenarios. We further investigate how the scale of counterfactual data and inference-time guidance strategies affect overall effectiveness.

Motivation: Performance in Closed-Loop Evaluation

We compare the L2 error of the E2E policy under different data sources on the right figure. We use a 3DGS-based simulator HUGSIM to reconstruct the scenes from the nuScenes dataset and compare the performance difference using ground truth data among all three E2E policies, we see a very close open-loop performance in motion prediction. This confirms the fidelity of the 3DGS-based simulation in its reconstruction quality.

We also find that the open-loop pretrained E2E driving agents like UniAD have performance degradation in closed-loop evaluation due to the distribution shift caused by compounding errors.

We further visualize some qualitative examples for the failure modes below.

Driving off the road to non drive-able area near a construction zone.

Crashing on a stopping vehicle ahead with no collision avoidance.

Core Contribution

Methodology Overview

MPA includes three key modules: (i) Counterfactual Data Generation to diversify the ego's behavior, (ii) a Diffusion-based Policy Adapter to refine the base policy's actions, and (iii) a Multi-step Q Value Model to evaluate and select the best action sequence at inference time. For more details, welcome to check our paper.

Key Experiment Results

MPA-enhanced End-to-End driving agents outperform the base policies across in-domain, out-of-domain, and safety-critical scenarios in closed-loop simulation.

Qualitative results of the Closed-Loop Evaluation

BibTeX

@inproceedings{lin2025modelbased,
  title     = {Model-Based Policy Adaptation for Closed-Loop End-to-end Autonomous Driving},
  author    = {Haohong Lin and Yunzhi Zhang and Wenhao Ding and Jiajun Wu and Ding Zhao},
  booktitle = {The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year      = {2025},
  url       = {https://openreview.net/forum?id=4OLbpaTKJe}
}