-
Notifications
You must be signed in to change notification settings - Fork 416
add LLamaReranker and tests #1150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thanks, from a quick skim this looks great! I'll give it a proper in depth review this weekend :) Is the |
Mainly based on the embedding example of llama.cpp |
LLama/LLamaReranker.cs
Outdated
|
||
private async Task<(float Score, int Tokens)> GetRelevanceScoreWithTokenCount(string input, string document, CancellationToken cancellationToken = default) | ||
{ | ||
var prompt = $"{input}</s><s>{document}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be using StrBOS
and StrEOS
instead of hardcoding them?
LLama/LLamaReranker.cs
Outdated
{ | ||
var prompt = $"{input}</s><s>{document}"; | ||
// Add all of the tokens to the batch | ||
var tokens = Context.Tokenize(prompt, special: true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokenizing the entire prompt with special:true
means that characters in the input
or document
can be interpreted as special tokens. That's probably not what you want!
Would it be better to tokenize input
and document
separately, and directly insert the EOS/BOS tokens? That also means you don't have to handle EOS/BOS as strings, which is neater.
Something like this:
var inputTokens = Context.Tokenize(input);
var docTokens = Context.Tokenize(document);
var BOS = Context.Vocab.BOS;
var tokens = [..inputTokens, Context.Vocab.EOS, Context.Vocab.BOS, ..docTokens];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, I will test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var query = "what is panda?";
var document = "hi";
var tokens1 = reranker.Context.Tokenize($"{query}</s><s>{document}");
var tokens2 = reranker.Context.Tokenize($"{query}</s><s>{document}", special: true);
var eos = reranker.Context.Vocab.EOS!.Value;
var bos = reranker.Context.Vocab.BOS!.Value;
LLamaToken[] tokens3 = [.. reranker.Context.Tokenize(query), eos, bos, .. reranker.Context.Tokenize(document)];
LLamaToken[] tokens4 = [.. reranker.Context.Tokenize(query), .. reranker.Context.Tokenize(document)];
LLamaToken[] tokens5 = [.. reranker.Context.Tokenize(query, special: true), eos, bos, .. reranker.Context.Tokenize(document, special: true)];
Console.WriteLine(string.Join(" ", tokens1));
Console.WriteLine(string.Join(" ", tokens2));
Console.WriteLine(string.Join(" ", tokens3));
Console.WriteLine(string.Join(" ", tokens4));
Console.WriteLine(string.Join(" ", tokens5));
Console.ReadLine();
I haven't fully understood the meaning of special param in the Tokenize func, but the above output results, only tokens2 and tokens4 meet the requirements. So I ended up going with this approach:
var tokens = [..inputTokens, ..docTokens];
In addition, Using llamabatch is 2~ 3 times faster than foreach documents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The special
flag controls how strings like <s>
will be parsed into tokens.
With special=false
it will be handled as plain text, so the model would see the characters <s>
as normal. Just the same as the string "Hello"
.
With special=true
it would be converted into the special BOS token, instead of text.
LLama/LLamaReranker.cs
Outdated
List<float> scores = new List<float>(documents.Count); | ||
foreach (var document in documents) | ||
{ | ||
var score = (await GetRelevanceScoreWithTokenCount(input, document, cancellationToken).ConfigureAwait(false)).Score; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This runs the model once for each document. You could probably adapt this to run all at once for every document fairly easily - when you setup the LLamaBatch
just use a unique LLamaSeqId
for each string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay on the review. Overall this looks pretty good, I've suggested a few tweaks but nothing major 👍
I tested the case of different ContextSize, and there is something abnormal. Why is the scores output different from before when ContextSize = 2048? I haven't found the reason yet. |
/// <param name="token"></param> | ||
/// <param name="isSpecialToken"></param> | ||
/// <returns></returns> | ||
public string? LLamaTokenToString(LLamaToken? token, bool isSpecialToken) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed after the latest changes? It looks like it's not used any more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer needed in llamareranker, but I suggest that this can be opened as a public function
3a57779
to
d99670c
Compare
If the score is very similar but not identical then I think that's probably expected (might be worth asking in the llama.cpp repo to double check though). |
Is this ready to merge, or did you still have more you wanted to do (e.g. checking consistency with scores etc)? |
Let's merge later, I found a new bug in Chinese reranking |
I tested reranker using a mixed Chinese English approach, in the test example, the meaning of the Chinese and English sentences is the same. I am unable to use the llama-embedding tool for comparative testing of the above content contains Chinese, as I have not yet found a way to set wide char Encoding for llama-embedding. I don't think this is caused by LLamaSharp, so it shouldn't affect the merger. |
#1148