Reward-Guided Domain Randomization (RGDR)
Project Overview
This project introduces Reward-Guided Domain Randomization (RGDR), a scalable and principled framework designed to improve the robustness and zero-shot generalization of reinforcement-learning policies for autonomous driving. Unlike traditional domain randomization that samples parameters uniformly, RGDR adaptively prioritizes challenging environments using real-time reward feedback, enabling policies to focus on scenarios where they currently underperform.
Key Contributions
- Reward-Based Training Metric: Introduced mean reward per step as a stable and algorithm-agnostic indicator of environment difficulty, applicable to both value-based and policy-gradient RL methods.
- Adaptive Sampling Strategy: Developed a reward-guided weighted KDE method that assigns higher sampling probability to low-reward environments, emphasizing difficult regions of the domain space.
- Efficient Exploration–Exploitation Balance: Designed a dynamic sampling mechanism that transitions from broad exploration to targeted exploitation without additional network components.
- High Computational Efficiency: Achieves linear or sublinear complexity per iteration, significantly lower than Active Domain Randomization (ADR), while maintaining similar or superior robustness.
Technical Highlights
- Weighted Kernel Density Estimation (KDE) for difficulty-aware environment sampling
- Linear-complexity sampling pipeline with constant-time perturbation and buffer updates
- Integration with PPO for large-scale autonomous-driving tasks
- Extensive evaluations in both OpenAI Gym (Pendulum-v1) and CARLA (Adaptive Cruise Control)
Experimental Results
- OpenAI Gym: RGDR emphasizes harder gravity intervals, improving robustness without excessive sampling. It outperforms uniform DR in difficult scenarios and achieves performance comparable to ADR with far lower computation cost.
- CARLA Autonomous Driving:
- Demonstrated strong generalization across randomized road friction and vehicle mass settings.
- Achieved higher rewards in challenging and out-of-distribution conditions relative to UDR and ADR.
- Validated that adaptive sampling leads to more balanced and effective policy training.
Outcome
RGDR provides a general, scalable, and computationally efficient solution for training robust autonomous-driving RL policies. The method outperforms or matches state-of-the-art baselines while reducing computational overhead, making it suitable for high-dimensional, safety-critical driving tasks such as those simulated in CARLA.
This project forms the basis of ongoing work on robustness, sim-to-real transfer, and domain adaptation in autonomous driving.
