Advancing Language Multi-Agent Learning with Credit Re-Assignment for Interactive Environment Generalization

This is the official implementation of CollabUIAgents, a two-stage multi-agent learning framework for interactive environments.

Overview

In single-agent learning, the agent could encounter obstacles when target environments are different from the training one, while in multi-agent learning, collaboration between agents might enable effective decision-making in both environments.

The challenge of achieving both strong performance and good generalization has hindered the progress of multi-agent systems for interactive environments. To address these issues, we propose CollabUIAgents, a multi-agent reinforcement learning framework with a novel multi-agent credit re-assignment (CR) strategy, assigning process rewards with LLMs rather than environment-specific rewards and learning with synthesized preference data, in order to foster generalizable, collaborative behaviors among the role-free agents' policies (as shonw in Figure (a)). Empirical results show that our framework improves both performance and cross-environment generalizability of multi-agent systems. Moreover, our 7B-parameter system achieves results on par with or exceed strong closed-source models, and the LLM that guides the CR. We also provide insights in using granular CR rewards effectively for environment generalization, and accommodating trained LLMs in multi-agent systems.

CollabUIAgents

The training data for agentic fine-tuning is synthesized automatically with a multi-agent data synthesis pipeline and consists of progressively complex instruction sets in three levels, designed to help agents build a strong foundation of environmental knowledge. The UI agent generates responses to synthesize queries faithfully, the adversarial agent generates negative samples, and the critic agent grades process rewards. As shown in Figure (b), given a task, the pipeline can autonomously collect data from each step covering basic environmental knowledge, simple instruction knowledge, and process preference knowledge in interactive environments. Our data collection pipeline is illustrated in Figure (b). The training dataset is publicly available at ModelScope, model training was implemented using LLaMA-Factory.

Open Sourcing Progress

We are currently working on open-sourcing the code and models. The following components are available:

License

This repository is licensed under the Apache-2.0 License. All open-sourced data is for resarch purpose only.

Acknowledgement

The code is based on AndroidWorld, LLaMA-Factory, and GPTSwarm.

Citation

If you find our work beneficial, please cite our work:

@article{he2025enhancing,
  title={Enhancing Language Multi-Agent Learning with Multi-Agent Credit Re-Assignment for Interactive Environment Generalization},
  author={He, Zhitao and Liu, Zijun and Li, Peng and Fung, May and Yan, Ming and Zhang, Ji and Huang, Fei and Liu, Yang},
  journal={arXiv preprint arXiv:2502.14496},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
autowebbench		autowebbench
mind2web		mind2web
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Advancing Language Multi-Agent Learning with Credit Re-Assignment for Interactive Environment Generalization

Overview

CollabUIAgents

Open Sourcing Progress

License

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

THUNLP-MT/CollabUIAgents

Folders and files

Latest commit

History

Repository files navigation

Advancing Language Multi-Agent Learning with Credit Re-Assignment for Interactive Environment Generalization

Overview

CollabUIAgents

Open Sourcing Progress

License

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages