Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Q] In what format and where are the indices stored? #78

Open
NightMachinery opened this issue May 13, 2021 · 3 comments
Open

[Q] In what format and where are the indices stored? #78

NightMachinery opened this issue May 13, 2021 · 3 comments

Comments

@NightMachinery
Copy link

Adding this info to the readme can be helpful.

@danuker
Copy link

danuker commented Mar 16, 2022

Possibly related to #62

@phil294
Copy link

phil294 commented Mar 16, 2022

I had a look into the extension's source: It's stored inside web extensions storage.local area, as [time]: { text } objects. It also keeps both a time index (timestamp of all existing entries) and a two week "preloaded cache" for quick access, which means all visited sites from the past 14 days are permanently in your browser memory. This could theoretically lead to memory problems, as the websites' texts are never truncated. If you search for text older than that, all entries for the specified time frame (determined using the time index) are retrieved from storage (again, possibly large amount of memory) and then processed.

This is all pretty clever and a reasonable implementation imo. I'm not sure what better way there could be using web extensions (that support FTS) - the only thing I could think of is a WASM SQlite module.

@danuker
Copy link

danuker commented Mar 16, 2022

Thanks for figuring it out! I did not know extensions had a separate "inspect" area when debugging them. More here for newbies, though it seems like it still has no easy export/import functionality in the browser.

I'm not sure what better way there could be

There are clever data structures like inverted indices, which grow with vocabulary (which has a limit) and not the amount of text on pages you visit. But the number of URLs still keeps growing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants