RL Users

This section is intended for users who want to leverage the Gym-TORAX package for reinforcement learning tasks. It provides an overview of how to interact with the environments, including key methods and a simple example.

BaseEnv.reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[dict[str, Any], dict[str, Any]][source]

Reset the environment to its initial state for a new episode.

This method initializes a new simulation episode by:

  1. Resetting internal counters and flags

  2. Starting the TORAX simulation from initial conditions

  3. Extracting the initial observation state

  4. Optionally rendering the initial state

Parameters:
  • seed (int or None) – Random seed for reproducible episode initialization. Used to seed the environment’s random number generator for deterministic behavior across resets. If None, no seeding is performed. Defaults to None.

  • options (dict[str, Any] or None) – Additional options for environment reset. Currently unused but maintained for Gymnasium compatibility. Defaults to None.

Returns:

  • observation (dict): Initial observation of plasma state

  • info (dict): Additional information (empty dict)

Return type:

tuple[dict[str, Any], dict[str, Any]]

BaseEnv.step(action: dict[str, ndarray[tuple[int, ...], dtype[floating]]]) tuple[dict[str, Any], float, bool, bool, dict[str, Any]][source]

Execute one environment step with the given action.

This method implements the core RL interaction by:

  1. Capturing the current state before action

  2. Applying the action to update TORAX configuration

  3. Running the simulation for one time interval

  4. Extracting the new observation state

  5. Computing the reward signal

  6. Checking for episode termination

  7. Updating time counters

Parameters:

action (dict[str, numpy.ndarray]) – Action dictionary containing parameter values for all configured actions.

Returns:

  • observation (dict): New plasma state observation

  • reward (float): Reward signal for this step

  • terminated (bool): True if episode ended due to terminal condition

  • truncated (bool): True if episode ended due to time/step limits

  • info (dict): Additional step information

Return type:

tuple[dict[str, Any], float, bool, bool, dict[str, Any]]

BaseEnv.render() ndarray | None[source]

Render the current environment state following Gymnasium convention.

Returns:

RGB array of shape (height, width, 3) if render_mode is “rgb_array” None: If render_mode is “human” or renderer is not available

Return type:

numpy.ndarray

BaseEnv.close() None[source]

Clean up environment resources.

Return type:

None

BaseEnv.save_file(file_name)[source]

Save the simulation output data to a file.

This method saves the complete simulation history to a specified file. The simulation must have been initialized with store_history=True for this method to work properly.

Parameters:

file_name (str) – The path and filename where the output should be saved. The file format is typically NetCDF (.nc extension).

Raises:

Here is a simple example of how to use the Gym-TORAX package for reinforcement learning applications:

import gymtorax
env = gymtorax.make("basic_env")
agent = YourRLAgent(env.action_space, env.observation_space)
obs, info = env.reset()
terminated = False
truncated = False

while not terminated and not truncated:
    action = agent.act(obs)  # Get action from your RL agent
    obs, reward, terminated, truncated, info = env.step(action)
    agent.learn(obs, reward)  # Update your RL agent with the new observation and reward
    env.render()  # Optional: render the environment

env.close()

For video recording without interactive display:

env = gym.make('gymtorax/IterHybrid-v0', render_mode="rgb_array")
env = RecordVideo(env, video_folder="videos")

obs, info = env.reset()
terminated = False
while not terminated:
    action = agent.act(obs)
    obs, reward, terminated, truncated, info = env.step(action)

env.close()