From 9ffb0a4ecd65464bbfd1a31e353f022d0563580b Mon Sep 17 00:00:00 2001
From: Teo Orthlieb <teo.orthlieb@gmail.com>
Date: Thu, 15 Sep 2022 14:37:27 +0200
Subject: [PATCH] Update README.md

---
 README.md | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 750654b..68899d3 100644
--- a/README.md
+++ b/README.md
@@ -1,2 +1,26 @@
 # Fast-BM25
-a fast implementation of BM25
+A fast implementation of [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) in Python.  
+BM25 is a simple and fast ranking function for search engines operating on words (tokens).  
+It does not play well with misspelling so use it only in contexts where that's not a problem.
+
+The base BM25 implementation is from [dorianbrown/rank_bm25](https://github.com/dorianbrown/rank_bm25/blob/master/rank_bm25.py).
+
+## How to use
+Initialize BM25 by passing it a corpus, aka an iterator over tokenized documents (a list of Strings).
+```py
+from fast_bm25 import BM25
+
+# Load your corpus
+corpus = ...
+
+bm25 = new BM25(corpus)
+results = bm25.get_top_n(["largest", "city", "in", "Japan"], corpus);
+```
+*It's not a python package, copy the file if you want to use it*  
+
+## Principle
+In a text corpus, the most common words (the, a, an, ...) are often the least informative.  
+By cutting them off from the query and only searching documents containing at least a word of the query, 
+BM25 gain a lot of speed while loosing very little precision.  
+This trade-off is controlled by the parameter `alpha`: higher alpha => more speed and more word cut-off.  
+At $\alpha = -\inf$ the algorithm is equivalent to regular BM25.