Quality of training results? #1851
Replies: 4 comments 8 replies
-
Which nanoGPT configuration and which data preparation did you use? |
Beta Was this translation helpful? Give feedback.
-
here's output of training-text-from-scratch with latest default parameters from examples page, seems it's exact chunk of input text: |
Beta Was this translation helpful? Give feedback.
-
1 training epoch output:
perplexity on input text:
2nd iteration of training output:
2nd iteration perplexity:
3rd iteration output:
3rd iteration perplexity:
|
Beta Was this translation helpful? Give feedback.
-
i have trained 100MB medical records from scratch with the aim of text competition. training was 1 week on a macpro (48GB ram). the same training data give excellent results upon LORA finetuning a 7b llama (pytorch, GPU cluster). |
Beta Was this translation helpful? Give feedback.
-
I've been experimenting with
training-text-from-scratch
.Compared to training on the same data with nanoGPT, the results seem considerably worse.
(Given roughly the same training time on CPU.)
I've been using the default parameters, which differ between the different code bases (llama.cpp vs nanoGPT).
I've noticed that the loading of the training data is somewhat different as well.
What are your experiences with
llama.cpp
training?Beta Was this translation helpful? Give feedback.
All reactions