add LLamaReranker and tests #1150

nipeone · 2025-04-03T04:37:53Z

nipeone · 2025-04-03T04:42:08Z

this's tests output:

this's llama-embedding output:

llama-embedding scripts:
llama-embedding.exe --model jina-reranker-v1-tiny-en-FP16.gguf -p "what is panda?</s><s>hi\nwhat is panda?</s><s>it's a bear\nwhat is panda?</s><s>The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China." -c 0 --pooling rank --embd-normalize -1

martindevans · 2025-04-03T13:28:06Z

Thanks, from a quick skim this looks great! I'll give it a proper in depth review this weekend :)

Is the LLamaReranker class based on some example code from llama.cpp? (just to give me something I can compare against while reviewing).

nipeone · 2025-04-03T15:16:34Z

Mainly based on the embedding example of llama.cpp

martindevans · 2025-04-10T13:22:11Z

LLama/LLamaReranker.cs

+
+    private async Task<(float Score, int Tokens)> GetRelevanceScoreWithTokenCount(string input, string document, CancellationToken cancellationToken = default)
+    {
+        var prompt = $"{input}</s><s>{document}";


Should this be using StrBOS and StrEOS instead of hardcoding them?

martindevans · 2025-04-10T13:26:07Z

LLama/LLamaReranker.cs

+    {
+        var prompt = $"{input}</s><s>{document}";
+        // Add all of the tokens to the batch
+        var tokens = Context.Tokenize(prompt, special: true);


tokenizing the entire prompt with special:true means that characters in the input or document can be interpreted as special tokens. That's probably not what you want!

Would it be better to tokenize input and document separately, and directly insert the EOS/BOS tokens? That also means you don't have to handle EOS/BOS as strings, which is neater.

Something like this:

var inputTokens = Context.Tokenize(input); var docTokens = Context.Tokenize(document); var BOS = Context.Vocab.BOS; var tokens = [..inputTokens, Context.Vocab.EOS, Context.Vocab.BOS, ..docTokens];

Good suggestion, I will test it.

The special flag controls how strings like <s> will be parsed into tokens.

With special=false it will be handled as plain text, so the model would see the characters <s> as normal. Just the same as the string "Hello".

With special=true it would be converted into the special BOS token, instead of text.

martindevans · 2025-04-10T13:28:13Z

LLama/LLamaReranker.cs

+        List<float> scores = new List<float>(documents.Count);
+        foreach (var document in documents)
+        {
+            var score = (await GetRelevanceScoreWithTokenCount(input, document, cancellationToken).ConfigureAwait(false)).Score;


This runs the model once for each document. You could probably adapt this to run all at once for every document fairly easily - when you setup the LLamaBatch just use a unique LLamaSeqId for each string.

martindevans

Sorry about the delay on the review. Overall this looks pretty good, I've suggested a few tweaks but nothing major 👍

nipeone · 2025-04-11T10:54:29Z

ContextSize=512
Avg = 3597ms
0.4354 0.4157 0.3509 0.2964 0.2264 0.2947 0.2665 0.3069 0.3483 0.3066 0.2492 0.2744 0.3542 0.2867 0.2273 0.4175 0.2186 0.2942 0.2466 0.4281
Avg = 3555ms
0.4354 0.4157 0.3509 0.2964 0.2264 0.2947 0.2665 0.3069 0.3483 0.3066 0.2492 0.2744 0.3542 0.2867 0.2273 0.4175 0.2186 0.2942 0.2466 0.4281
Avg = 3510ms
0.4354 0.4157 0.3509 0.2964 0.2264 0.2947 0.2665 0.3069 0.3483 0.3066 0.2492 0.2744 0.3542 0.2867 0.2273 0.4175 0.2186 0.2942 0.2466 0.4281

ContextSize=1024
GPU=1.9G
Avg = 3380ms
0.4354 0.4157 0.3509 0.2964 0.2264 0.2947 0.2665 0.3069 0.3483 0.3066 0.2492 0.2744 0.3542 0.2867 0.2273 0.4175 0.2186 0.2942 0.2466 0.4281
Avg = 3393ms
0.4354 0.4157 0.3509 0.2964 0.2264 0.2947 0.2665 0.3069 0.3483 0.3066 0.2492 0.2744 0.3542 0.2867 0.2273 0.4175 0.2186 0.2942 0.2466 0.4281
Avg = 3333ms
0.4354 0.4157 0.3509 0.2964 0.2264 0.2947 0.2665 0.3069 0.3483 0.3066 0.2492 0.2744 0.3542 0.2867 0.2273 0.4175 0.2186 0.2942 0.2466 0.4281

ContextSize=2048
GPU=2.1G
Avg = 4189ms
0.4367 0.4170 0.3518 0.2965 0.2267 0.2950 0.2669 0.3071 0.3486 0.3063 0.2483 0.2753 0.3544 0.2878 0.2273 0.4158 0.2197 0.2941 0.2468 0.4283
Avg = 4202ms
0.4367 0.4170 0.3518 0.2965 0.2267 0.2950 0.2669 0.3071 0.3486 0.3063 0.2483 0.2753 0.3544 0.2878 0.2273 0.4158 0.2197 0.2941 0.2468 0.4283
Avg = 4197ms
0.4367 0.4170 0.3518 0.2965 0.2267 0.2950 0.2669 0.3071 0.3486 0.3063 0.2483 0.2753 0.3544 0.2878 0.2273 0.4158 0.2197 0.2941 0.2468 0.4283

I tested the case of different ContextSize, and there is something abnormal. Why is the scores output different from before when ContextSize = 2048? I haven't found the reason yet.

martindevans · 2025-04-11T12:49:00Z

LLama/Native/SafeLlamaModelHandle.cs

+            /// <param name="token"></param>
+            /// <param name="isSpecialToken"></param>
+            /// <returns></returns>
+            public string? LLamaTokenToString(LLamaToken? token, bool isSpecialToken)


Is this still needed after the latest changes? It looks like it's not used any more

No longer needed in llamareranker, but I suggest that this can be opened as a public function

martindevans · 2025-04-16T00:21:36Z

I tested the case of different ContextSize, and there is something abnormal. Why is the scores output different from before when ContextSize = 2048? I haven't found the reason yet.

If the score is very similar but not identical then I think that's probably expected (might be worth asking in the llama.cpp repo to double check though).

martindevans · 2025-04-17T19:10:50Z

Is this ready to merge, or did you still have more you wanted to do (e.g. checking consistency with scores etc)?

nipeone · 2025-04-18T06:48:03Z

Is this ready to merge, or did you still have more you wanted to do (e.g. checking consistency with scores etc)?

Let's merge later, I found a new bug in Chinese reranking

nipeone · 2025-04-18T09:57:21Z

1. all in english:
query = "what's the weather today?";
documents = new string[] { "i'm so sad", "The weather is nice today, suitable for an outing.", "hello, good morning", "it's sunny today", "the answer is invalid" };
scores = 0.4259 0.5292 0.4416 0.5237 0.3766    

2. query is english, docs are mixed:
query = "what's the weather today?";
documents = new string[] { "i'm so sad", "今天天气很好，适合出去郊游", "hello, good morning", "今天是晴天", "the answer is invalid" };
scores = 0.4259 0.4407 0.4416 0.4087 0.3766   

3. query is chinese, docs ar mixed:
query = "今天天气怎么样？";
documents = new string[] { "i'm so sad", "今天天气很好，适合出去郊游", "hello, good morning", "it's sunny today", "the answer is invalid" };
scores = 0.4408 0.5250 0.4199 0.5139 0.3574    

4. all in chinese:
query = "今天天气怎么样？"; 
documents = new string[] { "我很伤心", "今天天气很好，适合出去郊游", "你好", "今天是晴天", "这个回答无效" };                                   
scores = 0.4237 0.5250 0.4345 0.5474 0.4148

I tested reranker using a mixed Chinese English approach, in the test example, the meaning of the Chinese and English sentences is the same.
if the query and documents are in the same language, such as English or Chinese, e.g 1 and 4, the results should meet expectations. If the query is in Chinese and the documents are a mixture of Chinese and English, the sorting can be correct. e.g 3, However, if the query is in English and the documents are a mixture of Chinese and English, there may be some issues with the sorting results, e.g 2

I am unable to use the llama-embedding tool for comparative testing of the above content contains Chinese, as I have not yet found a way to set wide char Encoding for llama-embedding.

I don't think this is caused by LLamaSharp, so it shouldn't affect the merger.

add LLamaReranker and tests

6f4c53c

martindevans reviewed Apr 10, 2025

View reviewed changes

martindevans requested changes Apr 10, 2025

View reviewed changes

nipeone added 3 commits April 11, 2025 12:48

Merge branch 'SciSharp:master' into feature-llamareranker

15c5247

optimize LLamaReranker function

c604359

fix Reranking if documents is too large

d99670c

martindevans reviewed Apr 11, 2025

View reviewed changes

nipeone force-pushed the feature-llamareranker branch from 3a57779 to d99670c Compare April 15, 2025 08:27

fix Reranking if document contains null

05677fe

Merge branch 'SciSharp:master' into feature-llamareranker

4258cc1

nipeone added 2 commits April 21, 2025 22:48

Merge branch 'SciSharp:master' into feature-llamareranker

8d61a92

Merge branch 'SciSharp:master' into feature-llamareranker

49ae0a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add LLamaReranker and tests #1150

add LLamaReranker and tests #1150

nipeone commented Apr 3, 2025

nipeone commented Apr 3, 2025

martindevans commented Apr 3, 2025

nipeone commented Apr 3, 2025

martindevans Apr 10, 2025

martindevans Apr 10, 2025

nipeone Apr 11, 2025

nipeone Apr 11, 2025

martindevans Apr 11, 2025

martindevans Apr 10, 2025

martindevans left a comment

nipeone commented Apr 11, 2025

martindevans Apr 11, 2025

nipeone Apr 11, 2025

martindevans commented Apr 16, 2025

martindevans commented Apr 17, 2025

nipeone commented Apr 18, 2025

nipeone commented Apr 18, 2025 •

edited

Loading

add LLamaReranker and tests #1150

Are you sure you want to change the base?

add LLamaReranker and tests #1150

Conversation

nipeone commented Apr 3, 2025

nipeone commented Apr 3, 2025

martindevans commented Apr 3, 2025

nipeone commented Apr 3, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martindevans left a comment

Choose a reason for hiding this comment

nipeone commented Apr 11, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martindevans commented Apr 16, 2025

martindevans commented Apr 17, 2025

nipeone commented Apr 18, 2025

nipeone commented Apr 18, 2025 • edited Loading

nipeone commented Apr 18, 2025 •

edited

Loading