Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failing to use BART models - Breaking the generation loop! #42

Open
kontramind opened this issue Oct 18, 2023 · 3 comments
Open

failing to use BART models - Breaking the generation loop! #42

kontramind opened this issue Oct 18, 2023 · 3 comments

Comments

@kontramind
Copy link

Hi,

I'm trying to use f.eks, 'sshleifer/distilbart-cnn-6-6' and failing. Following message:

An error has occurred: Breaking the generation loop! To address this issue, consider fine-tuning the GReaT model for an longer period. This can be achieved by increasing the number of epochs. Alternatively, you might consider increasing the max_length parameter within the sample function. For example: model.sample(n_samples=10, max_length=2000) If the problem persists despite these adjustments, feel free to raise an issue on our GitHub page at: https://github.com/kathrinse/be_great/issues

Aleksandar

@unnir
Copy link
Collaborator

unnir commented Oct 18, 2023

Hi,

Could you please provide your training hyperparameters or whole python code?

@kontramind
Copy link
Author

kontramind commented Oct 20, 2023

Hi,

Could you please provide your training hyperparameters or whole python code?

Hi @unnir ,

Sure. Here is the code. We run training on California dataset.
Keep in mind that we also introduce a workaround for BelenGarciaPascual' question.
Belen and me are collaborating on same task.
We are planning to work on a proper PR.

In the code below total number of epoch is 8*9.

```python
batch_size = 32
steps = len(data)//batch_size

epochs = [0,1,2,3,4,5,6,7]
columns = data.columns

for epoch in epochs:
    for idx, column in enumerate(columns):
        print(f'{epoch=} -> {column=}')
        great = GReaT(base,                                 # Name of the large language model used (see HuggingFace for more options)
              batch_size=batch_size,
              epochs=epoch*len(data.columns) + idx + 1,   # Number of epochs to train (only one epoch for demonstration)
              save_steps=steps,                            # Save model weights every x steps
              logging_steps=steps,                         # Log the loss and learning rate every x steps
              experiment_dir=f"aleks_{llm}_trainer",       # Name of the directory where all intermediate steps are saved
        )

        if epoch == 0 and  idx == 0:
            trainer = great.fit(data, conditional_col=column)
        else:
            trainer = great.fit(data, conditional_col=column, resume_from_checkpoint=True)
            rmtree(Path(f"aleks_{llm}_trainer")/f"checkpoint-{epoch*len(data.columns)*steps + idx*steps}")

        great.save(f"aleks_california_{llm}")
        

        for path in Path(f"aleks_{llm}_trainer").iterdir():
            if path.is_dir():
                print(f'{path=}')

@unnir
Copy link
Collaborator

unnir commented Oct 20, 2023

My suggestion, again, is to train the model longer, but I will try to reproduce the error and debug it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants