Post by Tim Walter

Doctoral Candidate at the Cyber-Physical Systems group at TUM

🤖 Safe Reinforcement Learning — now with analytic gradients! Control policies derived from reinforcement learning have achieved superhuman performance in games like Go and enabled robots to dance and box. But when it comes to safety-critical real-world applications, we demand hard guarantees on safety — while maintaining their remarkable performance. Our latest paper, "Leveraging Analytic Gradients in Provably Safe Reinforcement Learning" (with Hannah Markgraf, Jonathan Külz, and Matthias Althoff), takes a step toward providing such guarantees while harnessing the power of analytic-gradient RL. Analytic gradients, made possible through differentiable simulators, let us compute exact gradients instead of relying on stochastic estimates. This means we can train faster, use fewer samples, and leverage our deep knowledge of physics to inform policies — rather than learning from scratch and ignoring decades of research. However, integrating safety constraints into this gradient-based framework is non-trivial. We propose the first safeguarding mechanisms tailored for analytic-gradient RL — enabling safe, differentiable mappings that keep policies within verified safe sets without sacrificing performance or sample efficiency. The result: agents that remain provably safe during training and achieve state-of-the-art performance. 📄 Open-access paper: https://lnkd.in/euvmiGqu 💻 Project website: https://lnkd.in/ebNvmm_u