Skip to content

add LLamaReranker and tests #1150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

nipeone
Copy link

@nipeone nipeone commented Apr 3, 2025

@nipeone
Copy link
Author

nipeone commented Apr 3, 2025

this's tests output:
image

this's llama-embedding output:
image

llama-embedding scripts:
llama-embedding.exe --model jina-reranker-v1-tiny-en-FP16.gguf -p "what is panda?</s><s>hi\nwhat is panda?</s><s>it's a bear\nwhat is panda?</s><s>The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China." -c 0 --pooling rank --embd-normalize -1

@martindevans
Copy link
Member

Thanks, from a quick skim this looks great! I'll give it a proper in depth review this weekend :)

Is the LLamaReranker class based on some example code from llama.cpp? (just to give me something I can compare against while reviewing).

@nipeone
Copy link
Author

nipeone commented Apr 3, 2025

Mainly based on the embedding example of llama.cpp


private async Task<(float Score, int Tokens)> GetRelevanceScoreWithTokenCount(string input, string document, CancellationToken cancellationToken = default)
{
var prompt = $"{input}</s><s>{document}";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be using StrBOS and StrEOS instead of hardcoding them?

{
var prompt = $"{input}</s><s>{document}";
// Add all of the tokens to the batch
var tokens = Context.Tokenize(prompt, special: true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokenizing the entire prompt with special:true means that characters in the input or document can be interpreted as special tokens. That's probably not what you want!

Would it be better to tokenize input and document separately, and directly insert the EOS/BOS tokens? That also means you don't have to handle EOS/BOS as strings, which is neater.

Something like this:

var inputTokens = Context.Tokenize(input);
var docTokens = Context.Tokenize(document);

var BOS = Context.Vocab.BOS;

var tokens = [..inputTokens, Context.Vocab.EOS, Context.Vocab.BOS, ..docTokens];

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, I will test it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var query = "what is panda?";
var document = "hi";
var tokens1 = reranker.Context.Tokenize($"{query}</s><s>{document}");
var tokens2 = reranker.Context.Tokenize($"{query}</s><s>{document}", special: true);
var eos = reranker.Context.Vocab.EOS!.Value;
var bos = reranker.Context.Vocab.BOS!.Value;
LLamaToken[] tokens3 = [.. reranker.Context.Tokenize(query), eos, bos, .. reranker.Context.Tokenize(document)];
LLamaToken[] tokens4 = [.. reranker.Context.Tokenize(query), .. reranker.Context.Tokenize(document)];
LLamaToken[] tokens5 = [.. reranker.Context.Tokenize(query, special: true), eos, bos, .. reranker.Context.Tokenize(document, special: true)];

Console.WriteLine(string.Join(" ", tokens1));
Console.WriteLine(string.Join(" ", tokens2));
Console.WriteLine(string.Join(" ", tokens3));
Console.WriteLine(string.Join(" ", tokens4));
Console.WriteLine(string.Join(" ", tokens5));
Console.ReadLine();

I haven't fully understood the meaning of special param in the Tokenize func, but the above output results, only tokens2 and tokens4 meet the requirements. So I ended up going with this approach:
var tokens = [..inputTokens, ..docTokens];

In addition, Using llamabatch is 2~ 3 times faster than foreach documents.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The special flag controls how strings like <s> will be parsed into tokens.

With special=false it will be handled as plain text, so the model would see the characters <s> as normal. Just the same as the string "Hello".

With special=true it would be converted into the special BOS token, instead of text.

List<float> scores = new List<float>(documents.Count);
foreach (var document in documents)
{
var score = (await GetRelevanceScoreWithTokenCount(input, document, cancellationToken).ConfigureAwait(false)).Score;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs the model once for each document. You could probably adapt this to run all at once for every document fairly easily - when you setup the LLamaBatch just use a unique LLamaSeqId for each string.

Copy link
Member

@martindevans martindevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay on the review. Overall this looks pretty good, I've suggested a few tweaks but nothing major 👍

@nipeone
Copy link
Author

nipeone commented Apr 11, 2025

ContextSize=512
Avg = 3597ms
0.4354 0.4157 0.3509 0.2964 0.2264 0.2947 0.2665 0.3069 0.3483 0.3066 0.2492 0.2744 0.3542 0.2867 0.2273 0.4175 0.2186 0.2942 0.2466 0.4281
Avg = 3555ms
0.4354 0.4157 0.3509 0.2964 0.2264 0.2947 0.2665 0.3069 0.3483 0.3066 0.2492 0.2744 0.3542 0.2867 0.2273 0.4175 0.2186 0.2942 0.2466 0.4281
Avg = 3510ms
0.4354 0.4157 0.3509 0.2964 0.2264 0.2947 0.2665 0.3069 0.3483 0.3066 0.2492 0.2744 0.3542 0.2867 0.2273 0.4175 0.2186 0.2942 0.2466 0.4281

ContextSize=1024
GPU=1.9G
Avg = 3380ms
0.4354 0.4157 0.3509 0.2964 0.2264 0.2947 0.2665 0.3069 0.3483 0.3066 0.2492 0.2744 0.3542 0.2867 0.2273 0.4175 0.2186 0.2942 0.2466 0.4281
Avg = 3393ms
0.4354 0.4157 0.3509 0.2964 0.2264 0.2947 0.2665 0.3069 0.3483 0.3066 0.2492 0.2744 0.3542 0.2867 0.2273 0.4175 0.2186 0.2942 0.2466 0.4281
Avg = 3333ms
0.4354 0.4157 0.3509 0.2964 0.2264 0.2947 0.2665 0.3069 0.3483 0.3066 0.2492 0.2744 0.3542 0.2867 0.2273 0.4175 0.2186 0.2942 0.2466 0.4281

ContextSize=2048
GPU=2.1G
Avg = 4189ms
0.4367 0.4170 0.3518 0.2965 0.2267 0.2950 0.2669 0.3071 0.3486 0.3063 0.2483 0.2753 0.3544 0.2878 0.2273 0.4158 0.2197 0.2941 0.2468 0.4283
Avg = 4202ms
0.4367 0.4170 0.3518 0.2965 0.2267 0.2950 0.2669 0.3071 0.3486 0.3063 0.2483 0.2753 0.3544 0.2878 0.2273 0.4158 0.2197 0.2941 0.2468 0.4283
Avg = 4197ms
0.4367 0.4170 0.3518 0.2965 0.2267 0.2950 0.2669 0.3071 0.3486 0.3063 0.2483 0.2753 0.3544 0.2878 0.2273 0.4158 0.2197 0.2941 0.2468 0.4283

I tested the case of different ContextSize, and there is something abnormal. Why is the scores output different from before when ContextSize = 2048? I haven't found the reason yet.

/// <param name="token"></param>
/// <param name="isSpecialToken"></param>
/// <returns></returns>
public string? LLamaTokenToString(LLamaToken? token, bool isSpecialToken)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed after the latest changes? It looks like it's not used any more

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed in llamareranker, but I suggest that this can be opened as a public function

@nipeone nipeone force-pushed the feature-llamareranker branch from 3a57779 to d99670c Compare April 15, 2025 08:27
@martindevans
Copy link
Member

I tested the case of different ContextSize, and there is something abnormal. Why is the scores output different from before when ContextSize = 2048? I haven't found the reason yet.

If the score is very similar but not identical then I think that's probably expected (might be worth asking in the llama.cpp repo to double check though).

@martindevans
Copy link
Member

Is this ready to merge, or did you still have more you wanted to do (e.g. checking consistency with scores etc)?

@nipeone
Copy link
Author

nipeone commented Apr 18, 2025

Is this ready to merge, or did you still have more you wanted to do (e.g. checking consistency with scores etc)?

Let's merge later, I found a new bug in Chinese reranking

@nipeone
Copy link
Author

nipeone commented Apr 18, 2025

1. all in english:
query = "what's the weather today?";
documents = new string[] { "i'm so sad", "The weather is nice today, suitable for an outing.", "hello, good morning", "it's sunny today", "the answer is invalid" };
scores = 0.4259 0.5292 0.4416 0.5237 0.3766    

2. query is english, docs are mixed:
query = "what's the weather today?";
documents = new string[] { "i'm so sad", "今天天气很好,适合出去郊游", "hello, good morning", "今天是晴天", "the answer is invalid" };
scores = 0.4259 0.4407 0.4416 0.4087 0.3766   

3. query is chinese, docs ar mixed:
query = "今天天气怎么样?";
documents = new string[] { "i'm so sad", "今天天气很好,适合出去郊游", "hello, good morning", "it's sunny today", "the answer is invalid" };
scores = 0.4408 0.5250 0.4199 0.5139 0.3574    

4. all in chinese:
query = "今天天气怎么样?"; 
documents = new string[] { "我很伤心", "今天天气很好,适合出去郊游", "你好", "今天是晴天", "这个回答无效" };                                   
scores = 0.4237 0.5250 0.4345 0.5474 0.4148

I tested reranker using a mixed Chinese English approach, in the test example, the meaning of the Chinese and English sentences is the same.
if the query and documents are in the same language, such as English or Chinese, e.g 1 and 4, the results should meet expectations. If the query is in Chinese and the documents are a mixture of Chinese and English, the sorting can be correct. e.g 3, However, if the query is in English and the documents are a mixture of Chinese and English, there may be some issues with the sorting results, e.g 2

I am unable to use the llama-embedding tool for comparative testing of the above content contains Chinese, as I have not yet found a way to set wide char Encoding for llama-embedding.

I don't think this is caused by LLamaSharp, so it shouldn't affect the merger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants