When using starchat the model will likely start talking bullshit (in Spanish) after printing out the sequence:
<|system|> <|end|> <|user|>
I added a rudimentary fix to stop generating new tokens in case starchat is used after outputting <|system|> <|end|> <|user|>:
// check if model is santacoder
if (model.hparams.n_layer <= 30 && embd.back() == 49152) {
break;
}
// check if model is starcoder
else if (embd.back() == 0) { //TODO: this is only for starcoder
break;
}
// starchat since only these 3 models are supported atm
else{
// break to prevent starchat from talking gibberish
if (output.find("<|system|>\n<|end|>\n<|user|>") != std::string::npos) {
break;
}
}
Would that be desirable pr? And also what is the correct way to detect if the model is indeed starcoder instead of using the default?