LLMRsearcher-code

⚠ We are testing the code repository. Until then, the code may not work properly. Also, you can find our simplified Python implementation of underwater information collection task under the RL_task folder in the code repository. Please feel free to contact @360ZMEM(Guanwen Xie) if you encounter any issue.

Oct 22, 2024 - We update the full python code.
Oct 16, 2024 - We released our early-version code.
Sep 21, 2024 - The GitHub page and supplementary material of LLMRsearcher is available.

This repository contains langchain🦜️🔗 0.3 implementation code for paper LLMs as Efficient Reward Function Searchers for Custom-Environment MORL.

Get Started

Run this command to install dependencies:

pip install -r ./requirements.txt

Our paper uses the OpenAI API for language model queries. Therefore, ensure that you specify the OpenAI API key and base address (if applicable) in the config.py file:

openai_api_key = 'your_api_key'
openai_api_base = 'https://api.openai.com/' # an example
openai_model = 'gpt-4o-mini'
opensource_model = None # 'meta-llama/Meta-Llama-3-70B' / 'Qwen/Qwen2.5-72B' ...

Alternatively, if you wish to use open-source LLMs such as Llama or Qwen, specify the model's name in opensource_model. Notice that specifying this will override the openai_model argument.

Reward Code Search

The following script executes the reward code design and feedback process unattended:

python reward_code_search.py

Alternatively, you can execute the reward code design and feedback process separately. Run this command to generate the reward function code:

python ERFSL/reward_code_gen.py

The following script repeatedly validates the reward components through training and revises them using the reward critic until all components meet the corresponding requirements:

python reward_code_tfeedback.py

Reward Weight Search

Similarly, you can run this script to execute the reward weight generating and search process unattended.

python reward_weight_search.py

Alternatively, first run this command to generate initial weight groups:

python ERFSL/reward_weight_initializer.py

NOTE: You can interrupt the script execution at any time, and if you run it again, the script will continue training from the last full iteration before interruption. If this is not what you want, you can remove all temporary files in the reward_funcs folder, or specify this argument:

python reward_weight_search.py --restart

Custom Environment Guide

ERFSL can also be deployed to your custom MORL environment and can effectively benefit from human prior knowledge, although ERFSL also works well without prior knowledge. For more information, refer to the document custom_guide.md.

Citation

If you find it useful for your work please cite:

@article{xie2024llmrsearcher,
      title={Large Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement Learning},
      author={Xie, Guanwen and Xu, Jingzehua and and Yang, Yiyuan and Ren, Yong and Ding, Yimian and Zhang, Shuai},
      journal={arXiv preprint arXiv:2409.02428},
      year={2024}
    }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMRsearcher-code

Get Started

Reward Code Search

Reward Weight Search

Custom Environment Guide

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
ERFSL		ERFSL
RL_task		RL_task
Reward_gen		Reward_gen
paper		paper
prompts		prompts
README.md		README.md
config.py		config.py
custom_guide.md		custom_guide.md
env_desc_report.md		env_desc_report.md
requirements.txt		requirements.txt
reward_code_tfeedback.py		reward_code_tfeedback.py
reward_weight_search.py		reward_weight_search.py
utils.py		utils.py

360ZMEM/LLMRsearcher-code

Folders and files

Latest commit

History

Repository files navigation

LLMRsearcher-code

Get Started

Reward Code Search

Reward Weight Search

Custom Environment Guide

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages