Skip to content
This repository was archived by the owner on May 20, 2026. It is now read-only.

Use thread to speedup indexing#38

Draft
sorawee wants to merge 1 commit into
racket:masterfrom
sorawee:thread-speedup
Draft

Use thread to speedup indexing#38
sorawee wants to merge 1 commit into
racket:masterfrom
sorawee:thread-speedup

Conversation

@sorawee
Copy link
Copy Markdown
Contributor

@sorawee sorawee commented Jun 9, 2022

On a regular run, it takes ~30 mins. With this PR, it takes ~15.

The speedup will be even more significant when the entire database truly requires re-fetching (in that case, it takes ~2 hours before the PR)

Comment thread official/update.rkt
"notify.rkt"
"static.rkt")

(define NUM-THREADS 4)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you choose 4?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can run a proper experiment to see what a good number is. In my initial run (which re-fetches the entire database), I used 8 threads, but it only gives 4x speedup. That's why I settled with only 4 threads here. It's also a number that doesn't seem to be too high.

Copy link
Copy Markdown
Contributor Author

@sorawee sorawee Jun 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full content fetching (with decent internet connection)

1 thread: 47 mins
2 threads: 31 mins
4 threads: 25 mins
8 threads: 24 mins

So I think 4 threads is the right call here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iirc The reason I didn't use threads in the past was because the EC2 instance couldn't handle it. I seem to remember it was slower. Maybe I am misremembering or things have changed, but make sure you test on the actual server and not just locally

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On an AWS instance provided by @samdphillips (which has the same environment as the actual server)

1 thread: 72 mins
2 threads: 49 mins
4 threads: 42 mins
8 threads: 41 mins

@sorawee sorawee marked this pull request as draft July 13, 2022 09:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants