Example
To illustrate how Gym-TORAX can be used, we provide an example in which three different policies are used in the IterHybridEnv environment. The three policies are compared based on the expected return \(J(\pi)\) given by
\[\begin{equation}
J(\pi) = \underset{\pi}{\mathbb{E}} \left[ \sum_{t=0}^{T} \gamma^t r(s_t, a_t) \right] \quad .
\end{equation}\]
This example is organized into three parts: