Skip to content

Using the MuJoCo Rl Environment

Cornelius W edited this page Feb 5, 2024 · 5 revisions

Variables for the MuJoCo_RL environment

Here's a breakdown of the arguments used in a reinforcement learning multi-agent environment that utilizes Mujoco:

  1. agents: A list of agents participating in the environment. The names must be equal to the mujoco body which is the top level object of the agent in the environment.

  2. xmlPath: The file path to the XML description of the environment. MuJoCo uses XML files to define physical simulations. Users can also hand over a list of XML file paths. Then at each reset() one level is selected at random.

  3. infoJson: An optional JSON file that provides additional information or configuration for the environment. Here, users can hand over either a path to one file or a list of file paths. If a list is provided, the file names of the JSON files have to match the names of the XML level files. If a list of XML files is provided but only one JSON file, this JSON file is then used for all levels.

  4. renderMode: A boolean flag indicating whether to enable rendering of the environment. If set to True, the environment will be visually displayed during the simulation. Be careful when setting this flag to true while using Ray, as each worker will spawn its own environment and start the rendering process.

  5. exportPath: The path where the environment can export each frame for usage in the Unreal Engine for better visualization.

  6. freeJoint: A boolean flag specifying whether free joints are used for movements in the environment. If set to False, the actuators are used instead. If set to True, the environment will ignore all actuators and only use freeJoint movements.

  7. skipFrames: The number of frames to skip between each agent action. This value determines the granularity of the simulation steps.

  8. maxSteps: The maximum number of steps or time-steps allowed in the environment before the simulation terminates. If this limit is reached, the simulation ends regardless of the agent's progress.

  9. rewardFunctions: A list of reward functions that define the rewards given to the agents based on their actions and the state of the environment. Reward functions are typically designed to encourage desired behavior or achievement of specific goals.

  10. doneFunctions: A list of done functions that determine when an episode or simulation should be considered complete or terminated. Done functions are used to define conditions that signify the end of an episode, such as reaching a goal or exceeding a time limit.

  11. environmentDynamics: A list of classes which need to incorporate an __init__(self, mujoco_gym) and a dynamic(self, agent, actions) function. The later one must return a reward and a NumPy array for its observations, in that order. Note that all dynamics also need to specify a self.observation_space = {"low":[], "high":[]} and a self.action_space = {"low":[], "high":[]} in its constructor.

  12. agentCameras: A boolean flag indicating whether agent-specific cameras should be enabled. If set to True, each agent may have its own camera view within the environment. Note that actually using those cameras creates a huge overhead and decreases performance significantly.

  13. sensorResolution: A tuple with the resolution with which camera data is rendered. It always uses three color channels.

These arguments collectively define various aspects of the reinforcement learning multi-agent environment, such as agent configuration, simulation parameters, rendering options, reward and termination conditions, and additional environment dynamics. By configuring these arguments appropriately, you can create different environments suited for different learning tasks and scenarios.

Example and default values

Sure! Here's the same information formatted as a Markdown document:

Below is an example of a config dict in Python where all the values are stored:

from MuJoCo_Gym.mujoco_rl import MuJoCo_RL

configDict = {
    "agents": ["Agent1", "Agent2"],  # List of agents (default: [])
    "xmlPath": "/path/to/xml/file.xml",  # XML file path
    "infoJson": "/path/to/info.json",  # Info JSON file path (default: "")
    "renderMode": True,  # Render mode (default: False)
    "exportPath": "/path/to/export",  # Export path
    "freeJoint": True,  # Free joint option (default: False)
    "skipFrames": 2,  # Number of frames to skip (default: 1)
    "maxSteps": 2048,  # Maximum number of steps (default: 1024)
    "rewardFunctions": [reward1, reward2],  # List of reward functions (default: [])
    "doneFunctions": [done1, done2],  # List of done functions (default: [])
    "environmentDynamics": [dyn1, dyn2],  # List of environment dynamics (default: [])
    "agentCameras": True,  # Agent cameras option (default: False)
    "sensorResolution": (64, 64) # Camera rendering resolution (default: (64, 64))
}

environment = MuJoCo_RL(config_dict)

Explanation of the default values:

  • agents: The default value is an empty list [].
  • xmlPath: There is no default value specified.
  • infoJson: The default value is an empty string None.
  • renderMode: The default value is False.
  • exportPath: There is no default value specified.
  • freeJoint: The default value is False.
  • skipFrames: The default value is 1.
  • maxSteps: The default value is 1024.
  • rewardFunctions: The default value is an empty list [].
  • doneFunctions: The default value is an empty list [].
  • environmentDynamics: The default value is an empty list [].
  • agentCameras: The default value is False.

You can customize these values in the configDict according to your specific requirements.