This repository contains the code and data for our paper on emergent language in complex, situated multi-agent systems. While traditional research on language emergence has largely relied on reference games with simple interactions, our work explores how communication develops in more sophisticated, open-ended environments where agents interact over multiple time steps.
To execute my experiments, the environment and my reinforcmenet learning code has to be installed as a package first. Clone this repository, navigate with your terminal into this repository and execute the following steps.
- Install the repository as a pip package
pip install .
- Check whether the installation was successful
python -c "import OpenEndedLanguage"
The basic multi agent pong environment can be imported and trained like this:
from OpenEndedLanguage.Environments.Multi_Pong.multi_pong import PongEnv
from OpenEndedLanguage.Reinforcement_Learning.Centralized_PPO.multi_ppo import PPO_Multi_Agent_Centralized
def make_env(seed, vocab_size, sequence_length, max_episode_steps):
env = PongEnv(width=20, height=20, vocab_size=vocab_size, sequence_length=sequence_length, max_episode_steps=max_episode_steps)
return env
if __name__ == "__main__":
num_envs = 64
seed = 1
for sequence_length in [2, 3, 1, 0]:
vocab_size = 3
max_episode_steps = 2048
total_timesteps = 1000000000
envs = [make_env(seed, vocab_size, sequence_length, max_episode_steps) for i in range(num_envs)]
agent = PPO_Multi_Agent_Centralized(envs, device="cpu")
agent.train(total_timesteps, exp_name=f"multi_pong_{sequence_length}", tensorboard_folder="Final_OneHot", checkpoint_path="models/checkpoints", num_checkpoints=40, learning_rate=0.001)
agent.save(f"models/final_model_multi_pong_{sequence_length}.pt")
The output of this code will look like this:
In our experiments, agents are challenged to solve environments where optimal performance requires language-based communication. We have developed two such environments:
-
Multi-Agent Pong: In this environment, two agents must coordinate to catch two simultaneously moving balls. Each agent’s observation does not include the position of the other agent, making coordination essential. The agents receive a reward of +1 for successfully catching a ball. However, if they miss a ball, both agents receive a penalty of -1, and the episode terminates.
-
Collectors: Here, two agents must collect targets by colliding with them, without being able to see each other. Unlike in Pong, agents can move both vertically and horizontally. Each target has a visible countdown, within which it must be collected. Agents receive a reward of +1 for each successfully collected target, but if they fail to collect a target before the countdown expires, both agents receive a penalty of -1, and the episode ends. Due to the distance and spawn frequency of the targets, agents must use the language channel to consistently coordinate and succeed in this task.