Environment Description
The environment used for this example is the ITER hybrid ramp-up scenario, provided as an example configuration file in TORAX. This scenario consists of a power ramp-up phase of 100 seconds, followed by a nominal phase lasting 50 seconds. The first phase of 100 seconds takes place in so-called L-mode (low-confinement regime), while the nominal phase occurs in H-mode (high-confinement regime). These are two distinct plasma confinement regimes with different physical properties.
The environment is named IterHybridEnv.
Actions
The environment features three actions:
IpAction— total current (\(I_p\)),NbiAction— neutral beam injection (\(NBI\)),EcrhAction— electron cyclotron resonance heating (\(ECRH\)).
Its action space \(\mathcal{A}\) is bounded, and a ramp-rate limit is imposed on the total current.
Observations
By default, the environment is fully observable and uses the AllObservation class
with custom bounds applied to some variables.
Reward
The reward function is a linear combination of four elements:
In this equation, \(\alpha_i\) and \(g_i\), with \(i\in\{Q,q_{min},q_{95},\mathrm{H98}\}\), represent weights and functions, respectively. These are related to the fusion gain \(Q\), the minimum \(q_{min}\) and edge \(q_{95}\) safety factors, and the H-mode confinement quality factor \(\mathrm{H}98\), respectively.
This environment is used both for physical validation and for control experiments.
Here is the environment:
import gymtorax.action_handler as ah
import gymtorax.observation_handler as oh
from gymtorax.envs.base_env import BaseEnv
CONFIG = {...}
class IterHybridEnv(BaseEnv):
def __init__(self, render_mode=None, **kwargs):
# Set environment-specific defaults
kwargs.setdefault("log_level", "warning")
kwargs.setdefault("plot_config", "default")
super().__init__(render_mode=render_mode, **kwargs)
def _define_action_space(self):
actions = [
IpAction(
max=[15e6], # 15 MA max plasma current
ramp_rate=[0.2e6],
), # 0.2 MA/s ramp rate limit
NbiAction(
max=[33e6, 1.0, 1.0], # 33 MW max NBI power
),
EcrhAction(
max=[20e6, 1.0, 1.0], # 20 MW max ECRH power
),
]
return actions
def _define_observation_space(self):
return AllObservation(custom_bounds_file="gymtorax/envs/iter_hybrid.json")
def _get_torax_config(self):
return {
"config": CONFIG,
"discretization": "fixed",
"ratio_a_sim": 1,
}
def _compute_reward(self, state, next_state, action):
weight_list = [1, 1, 1, 1]
def _is_H_mode():
if (
next_state["profiles"]["T_e"][0] > 10
and next_state["profiles"]["T_i"][0] > 10
):
return True
else:
return False
def _r_fusion_gain():
fusion_gain = (
reward.get_fusion_gain(next_state) / 10
) # Normalize with ITER target
if _is_H_mode():
return fusion_gain
else:
return 0
def _r_h98():
h98 = reward.get_h98(next_state)
if _is_H_mode():
if h98 <= 1:
return h98
else:
return 1
else:
return 0
def _r_q_min():
q_min = reward.get_q_min(next_state)
if q_min <= 1:
return q_min
elif q_min > 1:
return 1
def _r_q_95():
q_95 = reward.get_q95(next_state)
if q_95 / 3 <= 1:
return q_95 / 3
else:
return 1
# Calculate individual reward components
r_fusion_gain = weight_list[0] * _r_fusion_gain() / 50
r_h98 = weight_list[1] * _r_h98() / 50
r_q_min = weight_list[2] * _r_q_min() / 150
r_q_95 = weight_list[3] * _r_q_95() / 150
total_reward = r_fusion_gain + r_h98 + r_q_min + r_q_95
return total_reward