Replies: 1 comment 1 reply
-
I made a few h8 quants expecting them to be better. But it bothered me that there was no info if it was actually true. But there are sometimes even perplexity differences between exllamav2 versions, so it's hard to tell if it's even conclusive in long run. |
Beta Was this translation helpful? Give feedback.
-
Has there been any comprehensive research on if changing any of the defaults has any beneficial effect on using covert.py? I see anecdotal comments on 'this calibration length and rows is better' and 6bit head for bpw 6.0 and below 8bit head for >6.0 bpw is better. Are they in a noticeable manner? Also, is there any benefit to changing the measurement length and rows?
I make my own quants of models I like and keep them on HF repos, and I would like to know I am doing them in the best way possible. I understand the benefits of the best bpw you can squeeze onto your GPUs it's the rest I am unsure of.
I am starting to do my own testing from 8b parameter models up to 120b but damn it's time-consuming and I really don't want to. 😭
Beta Was this translation helpful? Give feedback.
All reactions