What if the data grows bigger than the client can realistically hold in memory? #244

balazshevesi · 2025-05-11T12:23:18Z

balazshevesi
May 11, 2025

What if the data grows bigger than the client can realistically hold in memory?

For context: i don't have much experience working with client-side databases. I've been researching different design solutions for a data heavy web app that i want to build over the summer. At a very high level, this app will essentially have the same functionality as a todo app Basically, the client will make a ton of crud-operations on their own data.

My concern is that the data will grow beyond what the client can realistically hold in memory. After the client has been continuously adding data over a very long period of time (think a few years), we might reach a point where it gets to, maybe say, 250mb. At that point the size of the data will effect the the users performance. I am aware that this is a hypothetical, and potentialy not a real concern, i asked chatgpt and it said that 250mb is roughly equal to 2.6 million lines of json (which is alot).

The problem of memory usage grows as the number of tabs the user has open grows.

What is there that we can do about that?

I was thinking that i would use dixie.js, which stores the data on-disk through indexedb. Most of the operations would be sub 16ms, but the operations would have to be async, that means that there could be problems where a frame gets rendered, and then when the promise resolves another frame is rendered (this time with the data included), so the user would experience a flicker (which isn't very nice) (also, ideally i would like for the local database to dubble as a state manager, and async kinda fucks that idea up quite a bit). I was thinking that i could cache these queries with something like tanstack query, then the user would only have to experience the flicker once for every uniqe query. Indexedb has a very high size limit, so i doubt that there would ever be any issues with it.

I also have an idea for a system where if we get to a point where we can no longer store all of the records on the client (because of size, wether it be in-memory or on-disk) then we can omit some of them. The omission would be based on how long ago the records were last read. The omission of data implies the loss of integrity, so what we could do is to essentially flag the table(s) where we omitted the data as "omitted", or maybe even store an integer for the number of omitted records in the table. When we then go to perform a query where there are tables which are flagged as omitted are involved, we would first return the result of the query of the local-database (so that the user doesn't have to see spinners n stuff), but we would tell the consumer that the data may be incorrect, and then we would query the central-database (through a backend server), if the result of the central-database differes from the result of the local-database then that would tell us that we're omitting records that are necessary for an accurate result of the query. The backend would find the records that the client omitted (causing the inaccurate result) and send send them to it, essentially re-syncing some of the records to the client. (I thought of this idea myself, but i probably wasn't the first, maybe it already has a name, please lmk if it does).

The data on the central database stores the full and complete data. The client could store a subset of that data an an on-disk (maybe a size limit of 2gb) db (like indexedb), and then, for speed, the client would keep a subset of the on-disk db data in a in-memory db (maybe a size limit of 250mb).Then, incase we make queries which include tables where there are records that are omitted both in the in-memory-db and in the on-disk-db, the query would kinda bubble up from the in-memory-db to on-disk-db up to the central db.

Is this even a reasonable concern? Am i overthinking and prematurely optimising? or would this be reasonable to implement?

jcs224 · 2025-06-03T21:42:10Z

jcs224
Jun 3, 2025

I share your concern. Basically I'm treating TinyBase as the data the must sync between devices, and using Dexie as sort of a "cache" to speed up the UI a bit and fill in the missing pieces that can be inferred from the metadata in TinyBase. Also, it's a multi-tenant app where each user gets their own database, or store in TinyBase lingo. Between those two strategies, I'm hoping to avoid needing to worry about all the data being in memory for a good long while. So far, it feels like a good tradeoff, but like you said, I won't know for a good long while 😅

If I wanted to share a bunch of data between users, I might opt for https://www.triplit.dev/ which seems to cater to more of a "one-big-database-but-filter-it-down-by-user" kind of approach vs "bunch-of-little-databases-for-each-user" way of thinking with TinyBase (at least, that's the way I'm using it). I haven't used Triplit so don't know the in-depth details, but I've heard good things 😁

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What if the data grows bigger than the client can realistically hold in memory? #244

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

What if the data grows bigger than the client can realistically hold in memory? #244

Uh oh!

balazshevesi May 11, 2025

Replies: 1 comment

Uh oh!

jcs224 Jun 3, 2025

balazshevesi
May 11, 2025

jcs224
Jun 3, 2025