RL Users
This section is intended for users who want to leverage the Gym-TORAX package for reinforcement learning tasks. It provides an overview of how to interact with the environments, including key methods and a simple example.
- BaseEnv.reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[dict[str, Any], dict[str, Any]][source]
Reset the environment to its initial state for a new episode.
This method initializes a new simulation episode by:
Resetting internal counters and flags
Starting the TORAX simulation from initial conditions
Extracting the initial observation state
Optionally rendering the initial state
- Parameters:
seed (int or None) – Random seed for reproducible episode initialization. Used to seed the environment’s random number generator for deterministic behavior across resets. If
None, no seeding is performed. Defaults toNone.options (dict[str, Any] or None) – Additional options for environment reset. Currently unused but maintained for Gymnasium compatibility. Defaults to
None.
- Returns:
observation (dict): Initial observation of plasma state
info (dict): Additional information (empty dict)
- Return type:
- BaseEnv.step(action: dict[str, ndarray[tuple[int, ...], dtype[floating]]]) tuple[dict[str, Any], float, bool, bool, dict[str, Any]][source]
Execute one environment step with the given action.
This method implements the core RL interaction by:
Capturing the current state before action
Applying the action to update TORAX configuration
Running the simulation for one time interval
Extracting the new observation state
Computing the reward signal
Checking for episode termination
Updating time counters
- Parameters:
action (dict[str, numpy.ndarray]) – Action dictionary containing parameter values for all configured actions.
- Returns:
observation (dict): New plasma state observation
reward (float): Reward signal for this step
terminated (bool):
Trueif episode ended due to terminal conditiontruncated (bool):
Trueif episode ended due to time/step limitsinfo (dict): Additional step information
- Return type:
- BaseEnv.render() ndarray | None[source]
Render the current environment state following Gymnasium convention.
- Returns:
RGB array of shape (height, width, 3) if render_mode is “rgb_array” None: If render_mode is “human” or renderer is not available
- Return type:
- BaseEnv.save_file(file_name)[source]
Save the simulation output data to a file.
This method saves the complete simulation history to a specified file. The simulation must have been initialized with store_history=True for this method to work properly.
- Parameters:
file_name (str) – The path and filename where the output should be saved. The file format is typically NetCDF (.nc extension).
- Raises:
ctypes.ArgumentError – If the environment was created without store_history=True.
RuntimeError – If there was an error during the save operation.
Here is a simple example of how to use the Gym-TORAX package for reinforcement learning applications:
import gymtorax
env = gymtorax.make("basic_env")
agent = YourRLAgent(env.action_space, env.observation_space)
obs, info = env.reset()
terminated = False
truncated = False
while not terminated and not truncated:
action = agent.act(obs) # Get action from your RL agent
obs, reward, terminated, truncated, info = env.step(action)
agent.learn(obs, reward) # Update your RL agent with the new observation and reward
env.render() # Optional: render the environment
env.close()
For video recording without interactive display:
env = gym.make('gymtorax/IterHybrid-v0', render_mode="rgb_array")
env = RecordVideo(env, video_folder="videos")
obs, info = env.reset()
terminated = False
while not terminated:
action = agent.act(obs)
obs, reward, terminated, truncated, info = env.step(action)
env.close()