ReinDriveGen: Reinforcement Post-Training for Out-of-Distribution Driving Scene Generation

Full controllability over dynamic driving scenes with RL-based post-training for photorealistic OOD safety-critical scenario generation.

arXiv Paper Code (Coming Soon)
β–² ReinDriveGen enables photorealistic generation of OOD driving scenarios
We present ReinDriveGen, a framework that enables full controllability over dynamic driving scenes, allowing users to freely edit actor trajectories to simulate safety-critical corner cases such as front-vehicle collisions, drifting cars, vehicles spinning out of control, pedestrians jaywalking, and cyclists cutting across lanes. Our approach constructs a dynamic 3D point cloud scene from multi-frame LiDAR data, introduces a vehicle completion module to reconstruct full 360Β° geometry from partial observations, and renders the edited scene into 2D condition images that guide a video diffusion model. Since such edited scenarios inevitably fall outside the training distribution, we further propose an RL-based post-training strategy with a pairwise preference model and a pairwise reward mechanism, enabling robust quality improvement under out-of-distribution conditions without ground-truth supervision.
Pipeline Overview
Our framework combines explicit 3D point cloud editing with video diffusion synthesis, enhanced by RL post-training for OOD robustness.
Pipeline overview
OOD Safety-Critical Scenarios
Each demo shows the original scene on the left and our edited result on the right.

πŸ”„ In-place Spinning (Scene 1)

OOD
Original
β†’ EDIT β†’
Edited (Ours)

πŸ”„ In-place Spinning (Scene 2)

OOD
Original
β†’ EDIT β†’
Edited (Ours)

πŸ”ƒ Vehicle Rollover (Scene 1)

OOD
Original
β†’ EDIT β†’
Edited (Ours)

πŸ”ƒ Vehicle Rollover (Scene 2)

OOD
Original
β†’ EDIT β†’
Edited (Ours)

🏎️ Vehicle Drifting

OOD
Original
β†’ EDIT β†’
Edited (Ours)

➑ Vehicle Change Lane to Right

OOD
Original
β†’ EDIT β†’
Edited (Ours)

↱ Vehicle Turn Right

OOD
Original
β†’ EDIT β†’
Edited (Ours)

πŸ’₯ Vehicle Head-on Collision

OOD
Original
β†’ EDIT β†’
Edited (Ours)

🚢 Pedestrian Jaywalking (Scene 1)

Actor
Original
β†’ EDIT β†’
Edited (Ours)

🚢 Pedestrian Jaywalking (Scene 2)

Actor
Original
β†’ EDIT β†’
Edited (Ours)

🚢 Pedestrian Jaywalking (Scene 3)

Actor
Original
β†’ EDIT β†’
Edited (Ours)

🚴 Cyclist Cut-in

Actor
Original
β†’ EDIT β†’
Edited (Ours)

β¬… Ego Shift Left 3m

Novel View
Original
β†’ EDIT β†’
Edited (Ours)

β¬… Ego Shift Left 2m

Novel View
Original
β†’ EDIT β†’
Edited (Ours)

↱ Ego Turn Right

Novel View
Original
β†’ EDIT β†’
Edited (Ours)

⬆ Ego Shift Up 2m

Novel View
Original
β†’ EDIT β†’
Edited (Ours)
Baseline Comparisons
Side-by-side comparison with existing methods on novel ego trajectory and vehicle trajectory editing tasks.
Novel Ego Trajectory β€” Comparison across DriveDreamer4D, DeformableGS, Street Gaussians, ReconDreamer, and Ours.
Vehicle Trajectory Editing β€” Comparison across Street Gaussians, StreetCrafter, and Ours.
Quantitative Results
MethodNTA-IoU ↑NTL-IoU ↑FID ↓
PVG0.25650.70105.29
SΒ³Gaussian0.17549.05124.90
DD4D w/ SΒ³Gauss.0.49553.4266.93
Recon. w/ SΒ³Gauss.0.41351.62123.61
Deform.-GS0.24051.6292.24
Ours0.54956.1351.99
RL Post-Training Progression
As RL training progresses, vehicles exhibit progressively better geometry, texture fidelity, and lighting plausibility.
RL post-training progression