a book about information retrieval
↓
[a, book, about, information, retrieval]
"Information Retrieval": Also available as e-book!
↓
[Information Retrieval]? [Information, Retrieval]?
[e-book]? [eBook]? [e, book]?
Inverted index can only find exact tokens
Term | Doc IDs |
---|---|
book | #1, #2, #3 |
information | #1, #2, #3 |
retrieval | #1 |
search | #2 |
e-book will return no results!
- How can books find book?
- wi-fi ↔ wifi?
- Jack's ↔ Jack?
- MMT ↔ Multimediatechnology?
- U.S.A. ↔ USA?
- running ↔ run?
- Analyze docs and query
- Add, remove, change terms
Try improved tokenization in Elasticsearch.
- Token
-
- Character sequence, meaningful semantic unit
- No analysis yet
- the, routers, the
- Term
-
- Index tokens
- router