Use thread to speedup indexing#38
Conversation
| "notify.rkt" | ||
| "static.rkt") | ||
|
|
||
| (define NUM-THREADS 4) |
There was a problem hiding this comment.
I can run a proper experiment to see what a good number is. In my initial run (which re-fetches the entire database), I used 8 threads, but it only gives 4x speedup. That's why I settled with only 4 threads here. It's also a number that doesn't seem to be too high.
There was a problem hiding this comment.
Full content fetching (with decent internet connection)
1 thread: 47 mins
2 threads: 31 mins
4 threads: 25 mins
8 threads: 24 mins
So I think 4 threads is the right call here.
There was a problem hiding this comment.
Iirc The reason I didn't use threads in the past was because the EC2 instance couldn't handle it. I seem to remember it was slower. Maybe I am misremembering or things have changed, but make sure you test on the actual server and not just locally
There was a problem hiding this comment.
On an AWS instance provided by @samdphillips (which has the same environment as the actual server)
1 thread: 72 mins
2 threads: 49 mins
4 threads: 42 mins
8 threads: 41 mins
On a regular run, it takes ~30 mins. With this PR, it takes ~15.
The speedup will be even more significant when the entire database truly requires re-fetching (in that case, it takes ~2 hours before the PR)