Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve creating tfIdf with cache #433

Merged
merged 1 commit into from
Feb 4, 2024
Merged

perf: improve creating tfIdf with cache #433

merged 1 commit into from
Feb 4, 2024

Conversation

vivekjoshi556
Copy link
Contributor

@vivekjoshi556 vivekjoshi556 commented Jan 4, 2024

Description

Improve #353

While building indexes there is a call to creating TfIdf.
I noticed that the the getIdf() method only depends on the word and doesn't the other values remained constant, so I cached the result of getIdf() for different words and before calling it again I check if I already had the value.

This improved the performance significantly.

getTfIdf() performance:
Before: 1:52.468s. (maybe my sys is slow)
After: 9.021s

Checklist

Please review this checklist before submitting a pull request.

  • [x ] Code compiles correctly
  • Created tests, if possible
  • All tests passing (npm run test:all)
  • Extended the README / documentation, if necessary

@kbdharun kbdharun requested a review from owenvoke January 4, 2024 17:22
@vivekjoshi556
Copy link
Contributor Author

vivekjoshi556 commented Jan 4, 2024

As mentioned in #350 . @agnivade you mentioned:

I think it's rare to see this in a practical scenario. You want to lookup a command, you see that it doesn't exist. Would you want to retry that command again in 2 minutes? That seems unlikely. But I agree that there's definitely things to improve on. Let's track those here: #353

I agree that same command might not be looked up immediately but what if user looks for some other command that also doesn't exist.
If you agree I would like to work in this idea as well and add this suggestion in this commit as well.

Copy link
Member

@kbdharun kbdharun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks for this enhancement (and sorry for the late review). Tested it on Windows and Linux the indexing in the update process is much faster now.

@kbdharun kbdharun changed the title perf: Improved creating tfIdf with cache perf: improve creating tfIdf with cache Feb 4, 2024
@kbdharun kbdharun merged commit 0df16ea into tldr-pages:main Feb 4, 2024
9 checks passed
@vivekjoshi556 vivekjoshi556 deleted the issue_speed branch February 4, 2024 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants