SFTLLMs_for_ChemText_Mining

Download

git clone https://github.com/zw-SIMM/SFTLLMs_for_chemtext_mining
cd SFTLLMs_for_ChemText_Mining

🖊 Datasets and Codes

Preprocessed data， fine-tuning codes， README workflows have been placed in corresponding folders:

Paragraph2Comound/
Paragraph2RXNRole/prod/ and Paragraph2RXNRole/role/
Paragraph2MOFInfo/
Paragraph2NMR/
Paragraph2Action/ (dataset is derived from pistachio dataset, which is available upon request.)

💿Fine-tuning ChatGPT (GPT-3.5-Turbo) and Prompt-Engineering GPT-4

Environment (OS: Windows or Linux)

pip install openai
pip install pandas

Note: The fine-tuning code has been slightly different as the version of openai updated to v1.0.0+.

Here, we provide the latest code.

Implementation

Specific scripts for each task are in the corresponding folders.

All notebooks of fine-tuning and prompt engineering GPTs (GPT-4, GPT-3.5) as well as evaluating for each task has beed released!

Demo of Fine-tuning ChatGPT on small dataset

Here, we gave an example notebook of fine-tuning ChatGPT on 25 Paragraph2NMR data in demo/fine-tuning_chatgpt_on_25_paragraph2NMR_data.ipynb, including:

Preprocessing
Training
Inferencing
Evaluating

📀Fine-tuning Open-source Language Models (Mistral, Llama3, Bart, T5)

Environment (Linux)

mamba create -n llm python=3.10
mamba activate llm 
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pandas numpy ipywidgets tqdm
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torch==2.1.2  transformers==4.38.2 datasets tiktoken wandb==0.11 openpyxl
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple peft==0.8.0 accelerate bitsandbytes safetensors jsonlines
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple vllm==0.3.1
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple trl==0.7
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tensorboardX tensorboard
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple textdistance nltk matplotlib seaborn seqeval
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple modelscope

Pretrained Models Downloads

Open-sourced pretrained models (Llama3, Llama2, Mistral, Bart, T5) can be downloaded from huggingface or modelscope.

Here is an example for downloading pretrained models by scripts on linux servers from modelscope:

from modelscope import snapshot_download
model_dir = snapshot_download("LLM-Research/Meta-Llama-3-8B-Instruct", revision='master', cache_dir='/home/pretrained_models')
model_dir = snapshot_download('AI-ModelScope/Mistral-7B-Instruct-v0.2', revision='master', cache_dir='/home/pretrained_models')

Fine-tuning

The codes and tutorials of Fine-tuning Language models (ChatGPT, Llama3, Llama2, Mistral, Bart, T5) for each task are in the corresponding folders.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SFTLLMs_for_ChemText_Mining

Download

🖊 Datasets and Codes

💿Fine-tuning ChatGPT (GPT-3.5-Turbo) and Prompt-Engineering GPT-4

Environment (OS: Windows or Linux)

Implementation

Demo of Fine-tuning ChatGPT on small dataset

📀Fine-tuning Open-source Language Models (Mistral, Llama3, Bart, T5)

Environment (Linux)

Pretrained Models Downloads

Fine-tuning

Files

README.md

Latest commit

History

README.md

File metadata and controls

SFTLLMs_for_ChemText_Mining

Download

🖊 Datasets and Codes

💿Fine-tuning ChatGPT (GPT-3.5-Turbo) and Prompt-Engineering GPT-4

Environment (OS: Windows or Linux)

Implementation

Demo of Fine-tuning ChatGPT on small dataset

📀Fine-tuning Open-source Language Models (Mistral, Llama3, Bart, T5)

Environment (Linux)

Pretrained Models Downloads

Fine-tuning