SFTLLMs_for_ChemText_Mining

Download

git clone https://github.com/zw-SIMM/SFTLLMs_for_chemtext_mining
cd SFTLLMs_for_chemtext_mining

🖊 Datasets and Codes

Preprocessed data， fine-tuning codes， README workflows have been placed in corresponding folders:

Paragraph2Comound/
Paragraph2RXNRole/prod/ and Paragraph2RXNRole/role/
Paragraph2MOFInfo/
Paragraph2NMR/
Paragraph2Action/ (dataset is derived from pistachio dataset, which is available upon request.)

💿Fine-tuning ChatGPT (GPT-3.5-Turbo) and Prompt-Engineering GPT-4

Environment (OS: Windows or Linux)

pip install openai
pip install pandas

Note: The fine-tuning code has been slightly different as the version of openai updated to v1.0.0+.

Here, we provide the latest code.

Implementation

Specific scripts for each task are in the corresponding folders.

All notebooks of fine-tuning and prompt engineering GPTs (GPT-4, GPT-3.5) as well as evaluating for each task has beed released!

Demo of Fine-tuning ChatGPT on small dataset

Here, we gave an example notebook of fine-tuning ChatGPT on 25 Paragraph2NMR data in demo/fine-tuning_chatgpt_on_25_paragraph2NMR_data.ipynb, including:

Preprocessing
Training
Inferencing
Evaluating

📀Fine-tuning Open-source Language Models (Mistral, Llama3, Bart, T5)

Environment (Linux)

mamba create -n llm python=3.10
mamba activate llm 
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pandas numpy ipywidgets tqdm
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torch==2.1.2  transformers==4.38.2 datasets tiktoken wandb==0.11 openpyxl
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple peft==0.8.0 accelerate bitsandbytes safetensors jsonlines
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple vllm==0.3.1
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple trl==0.7
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tensorboardX tensorboard
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple textdistance nltk matplotlib seaborn seqeval
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple modelscope

Pretrained Models Downloads

Open-sourced pretrained models (Llama3, Llama2, Mistral, Bart, T5) can be downloaded from huggingface or modelscope.

Here is an example for downloading pretrained models by scripts on linux servers from modelscope:

from modelscope import snapshot_download
model_dir = snapshot_download("LLM-Research/Meta-Llama-3-8B-Instruct", revision='master', cache_dir='/home/pretrained_models')
model_dir = snapshot_download('AI-ModelScope/Mistral-7B-Instruct-v0.2', revision='master', cache_dir='/home/pretrained_models')

Fine-tuning

The codes and tutorials of Fine-tuning Language models (ChatGPT, Llama3, Llama2, Mistral, Bart, T5) for each task are in the corresponding folders.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SFTLLMs_for_ChemText_Mining

Download

🖊 Datasets and Codes

💿Fine-tuning ChatGPT (GPT-3.5-Turbo) and Prompt-Engineering GPT-4

Environment (OS: Windows or Linux)

Implementation

Demo of Fine-tuning ChatGPT on small dataset

📀Fine-tuning Open-source Language Models (Mistral, Llama3, Bart, T5)

Environment (Linux)

Pretrained Models Downloads

Fine-tuning

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Paragraph2Action		Paragraph2Action
Paragraph2Comound		Paragraph2Comound
Paragraph2MOFInfo		Paragraph2MOFInfo
Paragraph2NMR		Paragraph2NMR
Paragraph2RXNRole		Paragraph2RXNRole
demo		demo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

zw-SIMM/SFTLLMs_for_ChemText_Mining

Folders and files

Latest commit

History

Repository files navigation

SFTLLMs_for_ChemText_Mining

Download

🖊 Datasets and Codes

💿Fine-tuning ChatGPT (GPT-3.5-Turbo) and Prompt-Engineering GPT-4

Environment (OS: Windows or Linux)

Implementation

Demo of Fine-tuning ChatGPT on small dataset

📀Fine-tuning Open-source Language Models (Mistral, Llama3, Bart, T5)

Environment (Linux)

Pretrained Models Downloads

Fine-tuning

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages