RL Users

This section is intended for users who want to leverage the Gym-TORAX package for reinforcement learning tasks. It provides an overview of how to interact with the environments, including key methods and a simple example.

BaseEnv.reset(*, seed: int | None = None, options: dict[str, Any] | None = None) → tuple[dict[str, Any], dict[str, Any]][source]

Reset the environment to its initial state for a new episode.

This method initializes a new simulation episode by:

Resetting internal counters and flags
Starting the TORAX simulation from initial conditions
Extracting the initial observation state
Optionally rendering the initial state

Parameters:

seed (int or None) – Random seed for reproducible episode initialization. Used to seed the environment’s random number generator for deterministic behavior across resets. If None, no seeding is performed. Defaults to None.
options (dict[str, Any] or None) – Additional options for environment reset. Currently unused but maintained for Gymnasium compatibility. Defaults to None.

Returns:

observation (dict): Initial observation of plasma state
info (dict): Additional information (empty dict)

Return type:

tuple[dict[str, Any], dict[str, Any]]

BaseEnv.step(action: dict[str, ndarray[tuple[int, ...], dtype[floating]]]) → tuple[dict[str, Any], float, bool, bool, dict[str, Any]][source]

Execute one environment step with the given action.

This method implements the core RL interaction by:

Capturing the current state before action
Applying the action to update TORAX configuration
Running the simulation for one time interval
Extracting the new observation state
Computing the reward signal
Checking for episode termination
Updating time counters

Parameters:

action (dict[str, numpy.ndarray]) – Action dictionary containing parameter values for all configured actions.

Returns:

observation (dict): New plasma state observation
reward (float): Reward signal for this step
terminated (bool): True if episode ended due to terminal condition
truncated (bool): True if episode ended due to time/step limits
info (dict): Additional step information

Return type:

tuple[dict[str, Any], float, bool, bool, dict[str, Any]]

BaseEnv.render() → ndarray | None[source]

Render the current environment state following Gymnasium convention.

Returns:: RGB array of shape (height, width, 3) if render_mode is “rgb_array” None: If render_mode is “human” or renderer is not available
Return type:: numpy.ndarray

BaseEnv.close() → None[source]

Clean up environment resources.

Return type:: None

BaseEnv.save_file(file_name)[source]

Save the simulation output data to a file.

This method saves the complete simulation history to a specified file. The simulation must have been initialized with store_history=True for this method to work properly.

Parameters:

file_name (str) – The path and filename where the output should be saved. The file format is typically NetCDF (.nc extension).

Raises:

ctypes.ArgumentError – If the environment was created without store_history=True.
RuntimeError – If there was an error during the save operation.

Here is a simple example of how to use the Gym-TORAX package for reinforcement learning applications:

import gymtorax
env = gymtorax.make("basic_env")
agent = YourRLAgent(env.action_space, env.observation_space)
obs, info = env.reset()
terminated = False
truncated = False

while not terminated and not truncated:
    action = agent.act(obs)  # Get action from your RL agent
    obs, reward, terminated, truncated, info = env.step(action)
    agent.learn(obs, reward)  # Update your RL agent with the new observation and reward
    env.render()  # Optional: render the environment

env.close()

For video recording without interactive display:

env = gym.make('gymtorax/IterHybrid-v0', render_mode="rgb_array")
env = RecordVideo(env, video_folder="videos")

obs, info = env.reset()
terminated = False
while not terminated:
    action = agent.act(obs)
    obs, reward, terminated, truncated, info = env.step(action)

env.close()