Skip to content

Langchain implementation of the paper "LLMs as Efficient Reward Function Searchers for Custom-Environment MORL".

Notifications You must be signed in to change notification settings

360ZMEM/LLMRsearcher-code

Repository files navigation

LLMRsearcher-code

arXiv WebSite WebSite

⚠ We are testing the code repository. Until then, the code may not work properly. Also, you can find our simplified Python implementation of underwater information collection task under the RL_task folder in the code repository. Please feel free to contact @360ZMEM(Guanwen Xie) if you encounter any issue.

  • Oct 22, 2024 - We update the full python code.
  • Oct 16, 2024 - We released our early-version code.
  • Sep 21, 2024 - The GitHub page and supplementary material WebSite of LLMRsearcher is available.

This repository contains langchain🦜️🔗 0.3 implementation code for paper LLMs as Efficient Reward Function Searchers for Custom-Environment MORL.

llmrsearcher

Get Started

Run this command to install dependencies:

pip install -r ./requirements.txt

Our paper uses the OpenAI API for language model queries. Therefore, ensure that you specify the OpenAI API key and base address (if applicable) in the config.py file:

openai_api_key = 'your_api_key'
openai_api_base = 'https://api.openai.com/' # an example
openai_model = 'gpt-4o-mini'
opensource_model = None # 'meta-llama/Meta-Llama-3-70B' / 'Qwen/Qwen2.5-72B' ...

Alternatively, if you wish to use open-source LLMs such as Llama or Qwen, specify the model's name in opensource_model. Notice that specifying this will override the openai_model argument.

Reward Code Search

The following script executes the reward code design and feedback process unattended:

python reward_code_search.py

Alternatively, you can execute the reward code design and feedback process separately. Run this command to generate the reward function code:

python ERFSL/reward_code_gen.py

The following script repeatedly validates the reward components through training and revises them using the reward critic until all components meet the corresponding requirements:

python reward_code_tfeedback.py

Reward Weight Search

Similarly, you can run this script to execute the reward weight generating and search process unattended.

python reward_weight_search.py

Alternatively, first run this command to generate initial weight groups:

python ERFSL/reward_weight_initializer.py

NOTE: You can interrupt the script execution at any time, and if you run it again, the script will continue training from the last full iteration before interruption. If this is not what you want, you can remove all temporary files in the reward_funcs folder, or specify this argument:

python reward_weight_search.py --restart

Custom Environment Guide

ERFSL can also be deployed to your custom MORL environment and can effectively benefit from human prior knowledge, although ERFSL also works well without prior knowledge. For more information, refer to the document custom_guide.md.

Citation

If you find it useful for your work please cite:

@article{xie2024llmrsearcher,
      title={Large Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement Learning},
      author={Xie, Guanwen and Xu, Jingzehua and and Yang, Yiyuan and Ren, Yong and Ding, Yimian and Zhang, Shuai},
      journal={arXiv preprint arXiv:2409.02428},
      year={2024}
    }

About

Langchain implementation of the paper "LLMs as Efficient Reward Function Searchers for Custom-Environment MORL".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages