Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues Converting lit_model.pth to Huggingface Format Using convert_from_litgpt #1847

Open
Kidand opened this issue Dec 2, 2024 · 7 comments
Labels
question Further information is requested

Comments

@Kidand
Copy link

Kidand commented Dec 2, 2024

Hello,

I have been using litgpt to pretrain a model, which produces a lit_model.pth file. This model functions correctly when loaded with LLM.load() for inference.

However, when I attempt to convert this model to the Huggingface format using the convert_from_litgpt script provided by litgpt, it outputs a model.pth file. This file doesn't meet Huggingface's expected format, and when I try to load it using Huggingface's tools, I receive the following error:

state_dict = torch.load("model.pth")
model = AutoModel.from_pretrained(
    model_id, local_files_only=True, state_dict=state_dict
)
OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory

I am unsure how to resolve this issue. Is there a step I'm missing in the conversion process, or is there a compatibility issue with the convert_from_litgpt script?

Additionally, I noticed that the .pth file obtained after training with lit-gpt is twice the size of the original model. Could you please explain why this is happening?

Thank you for your assistance.

@Kidand Kidand added the question Further information is requested label Dec 2, 2024
@Andrei-Aksionov
Copy link
Collaborator

Hello @Kidand

Could you verify that you did all the step from this tutorial?

@Kidand
Copy link
Author

Kidand commented Dec 3, 2024

Hello @Kidand

Could you verify that you did all the step from this tutorial?

Yes, I followed the tutorial. I noticed that after AutoModel.from_pretrained() reads the config.json file, it looks for model files like .bin and .safetensors but cannot properly load .pth files or state_dict. Why can't the converted files be in bin or safetensor format?

@2533245542
Copy link

I am having the same problem, except that I copied the config file.

state_dict = torch.load("out/model.pth")
model = AutoModel.from_pretrained(
    'out', local_files_only=True, state_dict=state_dict
)

the directory out contains model.pth, which is generated using the conversion file, and config.json, which I copied from the litgpt trained model directory.

i also got the error

OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory xxx/out.

@2533245542
Copy link

Is it possible to have a script that just converts to huggingface readable format directly?

Like appending this at the end of the original script.

state_dict = torch.load("out/model.pth")
model = AutoModel.from_pretrained(
    'out', local_files_only=True, state_dict=state_dict
)
tokenizer.save_pretrained(path)

@Andrei-Aksionov
Copy link
Collaborator

Yes, I guess we can do something like that.

cc @rasbt

@2533245542
Copy link

any updates on this? I have

python=3.10.15
torch=2.4.1
transformers=4.46.3
for litgpt

>>> from importlib.metadata import version
>>> print(version("litgpt"))
0.5.3

my script for running litgpt pretrain --config config.yaml

model_name: mymodel
out_dir: out/custom-model
data:
  class_path: litgpt.data.TextFiles
  init_args:
    train_data_path: mycorpus

    
tokenizer_dir: my_tokenizer

train:
  max_tokens: 600_000_0
  max_seq_length: 8192
  micro_batch_size: 1
  
model_config:

  ## llama3.2 1b
  block_size: 131072
  padded_vocab_size: 128256
  n_layer: 16
  n_embd: 2048
  n_head: 32
  n_query_groups: 8
  rotary_percentage: 1.0
  parallel_residual: false
  bias: false
  norm_class_name: "RMSNorm"
  mlp_class_name: "LLaMAMLP"
  intermediate_size: 8192
  rope_base: 500000
  rope_adjustments:
    factor: 32.0
    low_freq_factor: 1.0
    high_freq_factor: 4.0
    original_max_seq_len: 8192

  vocab_size: 115418


devices: 2

@pe-hy
Copy link

pe-hy commented Dec 8, 2024

Same issue, is it simply impossible to load a trained model converted to hf format using AutoModel?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants