headshots of Drs. Tracey and Citrin

Bredan Tracey (left) and Jonathan Citrin (right)

Advancing tokamak scenario optimization and control through reinforcement learning and simulation

Brendan Tracey and Jonathan Citrin

Google DeepMind

Thursday, May 23, 2024


NW17-218 Hybrid

Abstract: Artificial intelligence presents many opportunities for advancing our understanding of plasma physics and to best utilize tokamak reactors. Here, we will present our work on deriving tokamak magnetic controllers through reinforcement learning, and then discuss our developments of scalable-differentiable solvers to unlock broader capabilities in scenario design.
Deep Reinforcement Learning is a powerful technique to find control policies through global search. We have recently demonstrated the use of deep-RL to create controllers for tokamak magnetic confinement, achieving a variety of different plasma configurations on the Tokamak à Configuration Variable (TCV) [1], and subsequent improvements to training speed and performance [2]. This reward-function approach to controller design gives focus on ‘what’ to control rather than ‘how’, and enables rapid development of controllers for novel scenarios, including the first sustained plasma “droplet”.  We present a “zero-shot” architecture, where control agents are trained entirely in simulation before being tested on the real plant. This approach relies heavily on  fast and realistic simulation of the tokamak control environment, here modeled by EPFL’s MEQ suite. 

Further generalization to new optimization and control targets (e.g. fusion power), as well as constraint calculations, demands an enrichment of the simulation environment. Therefore we have developed TORAX [3], an open-source differentiable tokamak core transport simulator written in Python using JAX. TORAX solves coupled time-dependent 1D PDEs for core ion and electron heat transport, particle transport, and current diffusion. While maintaining Python's ease of use and extensibility, JAX's just-in-time compilation provides fast computation. Auto-differentiability enables gradient-based optimization techniques and trajectory sensitivity analysis for controller design, without time-consuming manual Jacobian calculations. ML-surrogate coupling, key for fast and accurate differentiable simulation, is greatly facilitated by JAX's inherent support for neural network development and inference. TORAX broadens the capabilities of simulation environments for RL controller learning, and can be applied for general purpose tokamak pulse planning and optimization workflows.

[1] J. Degrave, F. Felici, J. Buchli et al. "Magnetic control of tokamak plasmas through deep reinforcement learning". Nature 602, 414–419 (2022).

[2] B.D. Tracey, et al. "Towards practical reinforcement learning for tokamak magnetic control." Fusion Engineering and Design 200,  114161 (2024)

[3] https://github.com/google-deepmind/torax


Brendan bio:

Brendan is a Staff Research Engineer at DeepMind, where he is a lead of the fusion effort. Brendan completed his PhD in Aeronautics and Astronautics at Stanford, focusing on machine learning and turbulence modeling. He then completed a post-doc joint with MIT and the Santa Fe Institute. From there, he joined DeepMind in 2018, working on reinforcement learning applied to real-world systems, including recent efforts to advance tokamak control and simulation.

Jonathan bio:

Jonathan has been a research scientist at Google DeepMind since 2023. He completed his PhD on tokamak advanced scenario simulation in 2012 at the Technical University of Eindhoven, and spent 3 years at CEA Cadarache as a post-doc working on high-fidelity and reduced gyrokinetic models. Between 2016-2023 he led a research group at DIFFER, focusing on integrated modeling, gyrokinetics, reduced turbulence modeling, and ML-surrogate development.