Unity-Technologies · montplaisir · Jul 3, 2026 · Jul 3, 2026 · Jul 3, 2026 · Jul 3, 2026
diff --git a/colab/Colab_UnityEnvironment_4_SB3VectorEnv.ipynb b/colab/Colab_UnityEnvironment_4_SB3VectorEnv.ipynb
@@ -161,8 +161,8 @@
     "from pathlib import Path\n",
     "from typing import Callable, Any\n",
     "\n",
-    "import gym\n",
-    "from gym import Env\n",
+    "import gymnasium as gym\n",
+    "from gymnasium import Env\n",
     "\n",
     "from stable_baselines3 import PPO\n",
     "from stable_baselines3.common.vec_env import VecMonitor, VecEnv, SubprocVecEnv\n",

diff --git a/com.unity.ml-agents/CHANGELOG.md b/com.unity.ml-agents/CHANGELOG.md
@@ -7,6 +7,10 @@ and this project adheres to
 [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
 
 ## [Unreleased]
+### Major Changes
+#### ml-agents / ml-agents-envs
+- Migrated from gym to gymnasium (#6309)
+
 ### Minor Changes
 #### com.unity.ml-agents (C#)
 - Fixed StackingSensor compressed observation for sensors with more than 3 channels. (#6299)

diff --git a/com.unity.ml-agents/Documentation~/ML-Agents-Overview.md b/com.unity.ml-agents/Documentation~/ML-Agents-Overview.md
@@ -28,7 +28,7 @@ The ML-Agents Toolkit contains five high-level components:
 - **Python Low-Level API** - which contains a low-level Python interface for interacting and manipulating a learning environment. Note that, unlike the Learning Environment, the Python API is not part of Unity, but lives outside and communicates with Unity through the Communicator. This API is contained in a dedicated `mlagents_envs` Python package and is used by the Python training process to communicate with and control the Academy during training. However, it can be used for other purposes as well. For example, you could use the API to use Unity as the simulation engine for your own machine learning algorithms. See [Python API](Python-LLAPI.md) for more information.
 - **External Communicator** - which connects the Learning Environment with the Python Low-Level API. It lives within the Learning Environment.
 - **Python Trainers** which contains all the machine learning algorithms that enable training agents. The algorithms are implemented in Python and are part of their own `mlagents` Python package. The package exposes a single command-line utility `mlagents-learn` that supports all the training methods and options outlined in this document. The Python Trainers interface solely with the Python Low-Level API.
-- **Gym Wrapper** (not pictured). A common way in which machine learning researchers interact with simulation environments is via a wrapper provided by OpenAI called [gym](https://github.com/openai/gym). We provide a gym wrapper in the `ml-agents-envs` package and [instructions](Python-Gym-API.md) for using it with existing machine learning algorithms which utilize gym.
+- **Gym Wrapper** (not pictured). A common way in which machine learning researchers interact with simulation environments is via a wrapper provided by the Farama Foundation called [gymnasium](https://gymnasium.farama.org/) (formerly OpenAI `gym`). We provide a gym wrapper in the `ml-agents-envs` package and [instructions](Python-Gym-API.md) for using it with existing machine learning algorithms which utilize gymnasium.
 - **PettingZoo Wrapper** (not pictured) PettingZoo is python API for interacting with multi-agent simulation environments that provides a gym-like interface. We provide a PettingZoo wrapper for Unity ML-Agents environments in the `ml-agents-envs` package and [instructions](Python-PettingZoo-API.md) for using it with machine learning algorithms.
 
 <p align="center"> <img src="images/learning_environment_basic.png" alt="Simplified ML-Agents Scene Block Diagram" width="600" border="10" /> </p>
@@ -68,7 +68,7 @@ It is important to note that the ML-Agents Toolkit leverages [Sentis](Inference-
 
 ### Custom Training and Inference
 
-In the previous mode, the Agents were used for training to generate a PyTorch model that the Agents can later use. However, any user of the ML-Agents Toolkit can leverage their own algorithms for training. In this case, the behaviors of all the Agents in the scene will be controlled within Python. You can even turn your environment into a [gym.](Python-Gym-API.md)
+In the previous mode, the Agents were used for training to generate a PyTorch model that the Agents can later use. However, any user of the ML-Agents Toolkit can leverage their own algorithms for training. In this case, the behaviors of all the Agents in the scene will be controlled within Python. You can even turn your environment into a [gymnasium environment.](Python-Gym-API.md)
 
 Unity doesn't provide a tutorial highlighting this mode, but you can learn more about the Python API in [Unity ML-Agents Python Low Level API](Python-LLAPI.md).
 

diff --git a/com.unity.ml-agents/Documentation~/Python-Gym-API-Documentation.md b/com.unity.ml-agents/Documentation~/Python-Gym-API-Documentation.md
@@ -42,19 +42,19 @@ Environment initialization
 #### reset
 
 ```python
- | reset() -> Union[List[np.ndarray], np.ndarray]
+ | reset(*, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) -> Tuple[np.ndarray, Dict]
 ```
 
-Resets the state of the environment and returns an initial observation. Returns: observation (object/list): the initial observation of the space.
+Resets the state of the environment and returns an initial observation and info. Returns: observation (object/list): the initial observation of the space. info (dict): contains auxiliary diagnostic information.
 
 <a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.step"></a>
 #### step
 
 ```python
- | step(action: List[Any]) -> GymStepResult
+ | step(action: Any) -> GymStepResult
 ```
 
-Run one timestep of the environment's dynamics. When end of episode is reached, you are responsible for calling `reset()` to reset this environment's state. Accepts an action and returns a tuple (observation, reward, done, info).
+Run one timestep of the environment's dynamics. When end of episode is reached, you are responsible for calling `reset()` to reset this environment's state. Accepts an action and returns a tuple (observation, reward, terminated, truncated, info).
 
 **Arguments**:
 
@@ -63,14 +63,15 @@ Run one timestep of the environment's dynamics. When end of episode is reached,
 **Returns**:
 
 - `observation` _object/list_ - agent's observation of the current environment reward (float/list) : amount of reward returned after previous action
-- `done` _boolean/list_ - whether the episode has ended.
+- `terminated` _boolean/list_ - whether the episode has ended by termination.
+- `truncated` _boolean/list_ - whether the episode has ended by truncation.
 - `info` _dict_ - contains auxiliary diagnostic information.
 
 <a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.render"></a>
 #### render
 
 ```python
- | render(mode="rgb_array")
+ | render()
 ```
 
 Return the latest visual observations. Note that it will not render a new frame of the environment.