Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

introduce the roaring multiop in milli #581

Draft
wants to merge 25 commits into
base: main
Choose a base branch
from
Draft

Conversation

irevoire
Copy link
Member

@irevoire irevoire commented Jul 6, 2022

@irevoire irevoire added indexing Related to the documents/settings indexing algorithms. no breaking The related changes are not breaking (DB nor API) performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption labels Jul 6, 2022
@irevoire
Copy link
Member Author

irevoire commented Jul 15, 2022

Current benchmark is pretty bad actually.

From what I understand, we’re missing two important functions.
First, we need to implement a try_xversion of all the operators because, without it, we need to collect all the roaring bitmap in a Result<Vec<_>> that we can then unwrap before calling the op. That’s already pretty bad.

But it becomes worse when we need to merge roaring bitmaps lazily because there is too much of it, and we don’t want to collect everything in place.
Thus I think we need something else that try to not collect all the roaring bitmap if the iterator is unsized for example 🤔

 % ./compare.sh indexing_test-roaring-multiop_0c74cbf4.json indexing_main_ebddfdb9.json
group                                                                     indexing_main_ebddfdb9                 indexing_test-roaring-multiop_0c74cbf4
-----                                                                     ----------------------                 --------------------------------------
indexing/-geo-delete-facetedNumber-facetedGeo-searchable-                 1.16      2.2±1.22ms        ? ?/sec    1.00  1870.0±346.85µs        ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-           1.20     10.9±5.34ms        ? ?/sec    1.00      9.1±2.84ms        ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-nested-    1.00     14.2±4.58ms        ? ?/sec    1.14     16.2±8.05ms        ? ?/sec
indexing/-songs-delete-facetedString-facetedNumber-searchable-            1.00    53.5±10.97ms        ? ?/sec    1.19    63.4±17.39ms        ? ?/sec
indexing/-wiki-delete-searchable-                                         1.05   313.7±11.20ms        ? ?/sec    1.00   297.4±16.71ms        ? ?/sec
indexing/Indexing geo_point                                               1.05      61.2±0.60s        ? ?/sec    1.00      58.2±0.58s        ? ?/sec
indexing/Indexing movies in three batches                                 1.00      19.3±0.17s        ? ?/sec    1.00      19.3±0.10s        ? ?/sec
indexing/Indexing movies with default settings                            1.01      18.8±0.14s        ? ?/sec    1.00      18.6±0.15s        ? ?/sec
indexing/Indexing nested movies with default settings                     1.00      25.8±0.17s        ? ?/sec    1.00      25.7±0.18s        ? ?/sec
indexing/Indexing nested movies without any facets                        1.00      24.7±0.13s        ? ?/sec    1.00      24.6±0.14s        ? ?/sec
indexing/Indexing songs in three batches with default settings            1.01      66.4±0.64s        ? ?/sec    1.00      65.5±0.72s        ? ?/sec
indexing/Indexing songs with default settings                             1.07      58.9±1.27s        ? ?/sec    1.00      55.0±1.56s        ? ?/sec
indexing/Indexing songs without any facets                                1.07      54.0±0.87s        ? ?/sec    1.00      50.6±1.17s        ? ?/sec
indexing/Indexing songs without faceted numbers                           1.08      58.1±1.17s        ? ?/sec    1.00      53.6±1.21s        ? ?/sec
indexing/Indexing wiki                                                    1.03   1067.3±13.56s        ? ?/sec    1.00   1036.9±21.26s        ? ?/sec
indexing/Indexing wiki in three batches                                   1.00    1191.5±9.27s        ? ?/sec    1.00   1195.0±10.50s        ? ?/sec
indexing/Reindexing geo_point                                             1.06      67.9±0.41s        ? ?/sec    1.00      64.0±0.85s        ? ?/sec
indexing/Reindexing movies with default settings                          1.01      18.9±0.19s        ? ?/sec    1.00      18.7±0.11s        ? ?/sec
indexing/Reindexing songs with default settings                           1.08      61.9±0.71s        ? ?/sec    1.00      57.3±1.13s        ? ?/sec
indexing/Reindexing wiki                                                  1.02   1770.6±32.30s        ? ?/sec    1.00   1736.7±20.85s        ? ?/sec

@irevoire irevoire force-pushed the test-roaring-multiop branch 2 times, most recently from 8e4920b to 1c93186 Compare August 19, 2022 06:29
@irevoire irevoire changed the title introduce the roaring multiop in the grenad merger introduce the roaring multiop in milli Aug 22, 2022
@irevoire
Copy link
Member Author

irevoire commented Aug 25, 2022

With a new batch of optimisations, the indexing part is now faster than on main (by ~5-10%);

% ./compare.sh indexing_main_a79ff8a1.json indexing_test-roaring-multiop_36e27e21.json
group                                                                     indexing_main_a79ff8a1                 indexing_test-roaring-multiop_36e27e21
-----                                                                     ----------------------                 --------------------------------------
indexing/-geo-delete-facetedNumber-facetedGeo-searchable-                 1.14    44.6±13.36ms        ? ?/sec    1.00     39.1±5.16ms        ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-           1.00      9.7±3.06ms        ? ?/sec    1.12     10.9±3.61ms        ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-nested-    1.06     14.4±3.87ms        ? ?/sec    1.00     13.6±4.24ms        ? ?/sec
indexing/-songs-delete-facetedString-facetedNumber-searchable-            1.05    52.8±15.61ms        ? ?/sec    1.00    50.1±14.69ms        ? ?/sec
indexing/-wiki-delete-searchable-                                         1.08   287.8±10.67ms        ? ?/sec    1.00   266.7±11.59ms        ? ?/sec
indexing/Indexing geo_point                                               1.04      47.6±0.14s        ? ?/sec    1.00      45.9±0.62s        ? ?/sec
indexing/Indexing movies in three batches                                 1.04      13.8±0.39s        ? ?/sec    1.00      13.3±0.14s        ? ?/sec
indexing/Indexing movies with default settings                            1.06      11.3±0.22s        ? ?/sec    1.00      10.7±0.09s        ? ?/sec
indexing/Indexing nested movies with default settings                     1.09       8.4±0.16s        ? ?/sec    1.00       7.7±0.14s        ? ?/sec
indexing/Indexing nested movies without any facets                        1.08       7.9±0.20s        ? ?/sec    1.00       7.4±0.34s        ? ?/sec
indexing/Indexing songs in three batches with default settings            1.00      47.6±1.06s        ? ?/sec    1.01      48.2±1.20s        ? ?/sec
indexing/Indexing songs with default settings                             1.06      45.9±1.28s        ? ?/sec    1.00      43.2±1.39s        ? ?/sec
indexing/Indexing songs without any facets                                1.05      43.4±0.92s        ? ?/sec    1.00      41.3±1.26s        ? ?/sec
indexing/Indexing songs without faceted numbers                           1.07      45.3±1.22s        ? ?/sec    1.00      42.5±0.85s        ? ?/sec
indexing/Indexing wiki                                                    1.02     878.2±9.30s        ? ?/sec    1.00    863.6±21.88s        ? ?/sec
indexing/Indexing wiki in three batches                                   1.00     936.9±6.64s        ? ?/sec    1.01     945.4±5.52s        ? ?/sec
indexing/Reindexing geo_point                                             1.02      16.1±0.25s        ? ?/sec    1.00      15.7±0.26s        ? ?/sec
indexing/Reindexing movies with default settings                          1.22   323.1±21.42ms        ? ?/sec    1.00   265.3±15.75ms        ? ?/sec
indexing/Reindexing songs with default settings                           1.01       4.3±0.13s        ? ?/sec    1.00       4.3±0.11s        ? ?/sec
indexing/Reindexing wiki                                                  1.00    1470.2±8.37s        ? ?/sec    1.01   1492.0±25.25s        ? ?/sec

That's cool. The search on wiki (thus, when there are a lot of words?) also improved;

TODO

Whatever, the real issue is that the search on songs lost a lot of performances on a lot of benchmarks (~20% loss with sometimes up to twice slower);

% ./compare.sh search_songs_main_a79ff8a1.json search_songs_test-roaring-multiop_36e27e21.json
group                                                                                                    search_songs_main_a79ff8a1             search_songs_test-roaring-multiop_36e27e21
-----                                                                                                    --------------------------             ------------------------------------------
smol-songs.csv: asc + default/Notstandskomitee                                                           1.06      3.1±0.65ms        ? ?/sec    1.00      3.0±0.01ms        ? ?/sec
smol-songs.csv: asc + default/charles                                                                    1.00      2.2±0.01ms        ? ?/sec    1.23      2.7±0.01ms        ? ?/sec
smol-songs.csv: asc + default/charles mingus                                                             1.00      3.1±0.01ms        ? ?/sec    1.35      4.2±0.02ms        ? ?/sec
smol-songs.csv: asc + default/david                                                                      1.00      2.9±0.01ms        ? ?/sec    1.18      3.4±0.03ms        ? ?/sec
smol-songs.csv: asc + default/david bowie                                                                1.00      4.6±0.55ms        ? ?/sec    1.22      5.6±0.02ms        ? ?/sec
smol-songs.csv: asc + default/john                                                                       1.00      3.1±0.01ms        ? ?/sec    1.20      3.8±0.01ms        ? ?/sec
smol-songs.csv: asc + default/marcus miller                                                              1.00      5.0±0.02ms        ? ?/sec    1.21      6.0±0.02ms        ? ?/sec
smol-songs.csv: asc + default/michael jackson                                                            1.00      4.7±0.40ms        ? ?/sec    1.16      5.5±0.03ms        ? ?/sec
smol-songs.csv: asc + default/thelonious monk                                                            1.00      4.4±0.02ms        ? ?/sec    1.53      6.8±0.03ms        ? ?/sec
smol-songs.csv: asc/charles mingus                                                                       1.00   783.5±68.03µs        ? ?/sec    1.11    871.9±3.18µs        ? ?/sec
smol-songs.csv: asc/david bowie                                                                          1.00   1121.2±7.70µs        ? ?/sec    1.06   1191.0±7.29µs        ? ?/sec
smol-songs.csv: asc/michael jackson                                                                      1.00   1061.7±9.80µs        ? ?/sec    1.06  1130.6±11.77µs        ? ?/sec
smol-songs.csv: asc/thelonious monk                                                                      1.10      3.0±0.01ms        ? ?/sec    1.00      2.7±0.01ms        ? ?/sec
smol-songs.csv: basic with quote/"Notstandskomitee"                                                      1.22    188.8±1.47µs        ? ?/sec    1.00    154.7±0.74µs        ? ?/sec
smol-songs.csv: basic with quote/"charles"                                                               1.00    163.7±6.29µs        ? ?/sec    1.14    186.9±1.38µs        ? ?/sec
smol-songs.csv: basic with quote/"david"                                                                 1.00    233.0±1.23µs        ? ?/sec    1.19    277.8±2.02µs        ? ?/sec
smol-songs.csv: basic with quote/"david" "bowie"                                                         1.00  1386.6±33.56µs        ? ?/sec    1.22   1691.7±7.20µs        ? ?/sec
smol-songs.csv: basic with quote/"john"                                                                  1.00    349.6±1.91µs        ? ?/sec    1.14    400.0±2.23µs        ? ?/sec
smol-songs.csv: basic with quote/"michael" "jackson"                                                     1.00  1337.4±31.72µs        ? ?/sec    1.13   1513.7±8.17µs        ? ?/sec
smol-songs.csv: basic with quote/"thelonious" "monk"                                                     1.00  1236.8±11.12µs        ? ?/sec    1.10  1362.0±10.19µs        ? ?/sec
smol-songs.csv: basic without quote/charles                                                              1.00    273.6±2.37µs        ? ?/sec    1.33    363.1±3.19µs        ? ?/sec
smol-songs.csv: basic without quote/charles mingus                                                       1.00      2.3±0.01ms        ? ?/sec    1.08      2.5±0.01ms        ? ?/sec
smol-songs.csv: basic without quote/david                                                                1.00    432.0±2.56µs        ? ?/sec    1.23    530.3±3.53µs        ? ?/sec
smol-songs.csv: basic without quote/david bowie                                                          1.00      5.6±0.02ms        ? ?/sec    1.10      6.1±0.03ms        ? ?/sec
smol-songs.csv: basic without quote/john                                                                 1.12  1297.8±11.98µs        ? ?/sec    1.00   1154.8±4.19µs        ? ?/sec
smol-songs.csv: basic without quote/tamo                                                                 1.11    811.8±8.56µs        ? ?/sec    1.00    731.5±4.36µs        ? ?/sec
smol-songs.csv: basic without quote/thelonious monk                                                      1.11      3.8±0.01ms        ? ?/sec    1.00      3.4±0.01ms        ? ?/sec
smol-songs.csv: big filter/charles mingus                                                                1.00    649.1±3.51µs        ? ?/sec    1.10    714.0±6.44µs        ? ?/sec
smol-songs.csv: big filter/david                                                                         1.00  1015.5±18.94µs        ? ?/sec    1.20  1214.3±33.74µs        ? ?/sec
smol-songs.csv: big filter/david bowie                                                                   1.00   1910.2±9.33µs        ? ?/sec    1.20      2.3±0.01ms        ? ?/sec
smol-songs.csv: big filter/john                                                                          1.00    871.3±4.01µs        ? ?/sec    1.22  1058.9±15.94µs        ? ?/sec
smol-songs.csv: big filter/marcus miller                                                                 1.00    716.8±3.93µs        ? ?/sec    1.08   775.7±17.11µs        ? ?/sec
smol-songs.csv: big filter/michael jackson                                                               1.00   1657.9±9.33µs        ? ?/sec    1.11  1840.2±15.93µs        ? ?/sec
smol-songs.csv: desc + default/charles                                                                   1.00  1605.9±10.44µs        ? ?/sec    1.20   1919.7±6.58µs        ? ?/sec
smol-songs.csv: desc + default/charles mingus                                                            1.00      2.4±0.01ms        ? ?/sec    1.08      2.5±0.01ms        ? ?/sec
smol-songs.csv: desc + default/david                                                                     1.00      5.7±0.02ms        ? ?/sec    1.18      6.7±0.03ms        ? ?/sec
smol-songs.csv: desc + default/david bowie                                                               1.00      9.0±0.57ms        ? ?/sec    1.26     11.3±0.04ms        ? ?/sec
smol-songs.csv: desc + default/john                                                                      1.00      4.6±0.95ms        ? ?/sec    1.13      5.2±0.02ms        ? ?/sec
smol-songs.csv: desc + default/marcus miller                                                             1.00      3.8±0.01ms        ? ?/sec    1.38      5.3±0.02ms        ? ?/sec
smol-songs.csv: desc + default/michael jackson                                                           1.00      6.5±0.03ms        ? ?/sec    1.35      8.8±0.03ms        ? ?/sec
smol-songs.csv: desc + default/thelonious monk                                                           1.00      4.5±0.02ms        ? ?/sec    1.52      6.8±0.02ms        ? ?/sec
smol-songs.csv: desc/charles                                                                             1.00    479.6±7.86µs        ? ?/sec    1.06    508.1±4.09µs        ? ?/sec
smol-songs.csv: desc/charles mingus                                                                      1.00  794.9±118.88µs        ? ?/sec    1.10    875.4±6.00µs        ? ?/sec
smol-songs.csv: desc/david bowie                                                                         1.00  1121.5±12.35µs        ? ?/sec    1.06  1183.8±11.13µs        ? ?/sec
smol-songs.csv: desc/michael jackson                                                                     1.00   1064.2±9.50µs        ? ?/sec    1.06  1125.1±14.91µs        ? ?/sec
smol-songs.csv: desc/thelonious monk                                                                     1.14      3.1±0.42ms        ? ?/sec    1.00      2.7±0.02ms        ? ?/sec
smol-songs.csv: prefix search/x                                                                          1.00    284.8±1.76µs        ? ?/sec    1.05    300.4±1.43µs        ? ?/sec
smol-songs.csv: proximity/7000 Danses Un Jour Dans Notre Vie                                             1.00      4.9±0.98ms        ? ?/sec    1.20      5.9±0.02ms        ? ?/sec
smol-songs.csv: proximity/The Disneyland Sing-Along Chorus                                               1.00      5.6±0.02ms        ? ?/sec    1.14      6.3±0.02ms        ? ?/sec
smol-songs.csv: proximity/Under Great Northern Lights                                                    1.00      2.5±0.01ms        ? ?/sec    1.40      3.4±0.01ms        ? ?/sec
smol-songs.csv: proximity/black saint sinner lady                                                        1.16      4.8±0.02ms        ? ?/sec    1.00      4.2±0.01ms        ? ?/sec
smol-songs.csv: words/7000 Danses / Le Baiser / je me trompe de mots                                     1.00     21.0±0.12ms        ? ?/sec    2.03     42.5±0.12ms        ? ?/sec
smol-songs.csv: words/Bring Your Daughter To The Slaughter but now this is not part of the title         1.00     48.4±0.12ms        ? ?/sec    1.53     74.0±0.20ms        ? ?/sec
smol-songs.csv: words/The Disneyland Children's Sing-Alone song                                          1.00     13.9±0.06ms        ? ?/sec    1.12     15.5±0.07ms        ? ?/sec
smol-songs.csv: words/seven nation mummy                                                                 1.00  1059.0±13.98µs        ? ?/sec    1.08  1147.5±23.15µs        ? ?/sec
smol-songs.csv: words/whathavenotnsuchforth and a good amount of words to pop to match the first one     1.00     66.3±0.36ms        ? ?/sec    1.51    100.2±0.20ms        ? ?/sec

That's not acceptable, the next step is to profile milli to understand where it comes from.

Here are flamegraph for the search "7000 Danses / Le Baiser / je me trompe de mots" which is two times slower on this branch.

  • main branch
    main
  • this branch
    roaring

@saik0
Copy link

saik0 commented Nov 17, 2022

Thinking out loud: Could we integrate typical indexing datasets and workflows into roaring-rs performance benchmarks? IMO Preventing surprise perf regressions like this would be beneficial to both projects.

@curquiza
Copy link
Member

@irevoire is it still relevant?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
indexing Related to the documents/settings indexing algorithms. no breaking The related changes are not breaking (DB nor API) performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve the indexing and search performance with the new roaring MultiOps trait
3 participants