Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite List Questions, Memory Usage #217

Closed
azizhk opened this issue Jan 26, 2019 · 17 comments
Closed

Infinite List Questions, Memory Usage #217

azizhk opened this issue Jan 26, 2019 · 17 comments
Labels
wontfix This will not be worked on

Comments

@azizhk
Copy link
Contributor

azizhk commented Jan 26, 2019

Hi WatermelonDB developers.
Wanted to ask about WatermelonDB usage with an infinite list. I understand that WatermelonDB is lazy first, would I be able to lazily load results of a query until the items are on screen. Wondering if watermelon DB supports something like this using RxJS

$items = $scrollPositionStream.switchMap(postion => runQueryForPosition(position))

Where I have only those objects which are on screen in memory (or those which have appeared on screen once)

@radex
Copy link
Collaborator

radex commented Jan 27, 2019

There is no specific support for this, but sure, you can use RxJs for this, like so:

withObservables([???], ({ ... }) => ({
  items: $scrollPositionStream.switchMap(postion => db.collections.get('x').query(Q.where('position', Q.between(position, position+100)).observe()))
}))

see #113
#113
#113

@azizhk
Copy link
Contributor Author

azizhk commented Jan 17, 2020

Hi @radex,
So we've been working on some POCs. We wanted to know your opinions.
Wondering if it would be easy for 🍉 to provide only a few columns like id and the columns needed for sorting:

const cols$ = database.collections
  .get("table")
  .query(Q.where("name", Q.like(sanitizedInput)))
  .observeOnlyColumns(["id", "updated_at"])

It should follow the same functionality as observeWithColumns. Here cols$ is Observable<Array<[string, Date]>>

const sortedIds$ = cols$.pipe(
  map(rows => rows
    .sort((a, b) => a[1] - b[1])
    .map(r => r[0])
  )
)

Then we sort these on the basis of the columns in JS, and pass it to the FlatList along with a RenderItem Function which returns a component wrapped using withObservables to query the model based on id.

Would this solve the problems of #113 and #15 while still being reactive?
This would help reduce our memory footprint.

@azizhk azizhk reopened this Jan 17, 2020
@radex
Copy link
Collaborator

radex commented Jan 17, 2020

This would help reduce our memory footprint.

Hm, is that a hypothesis, or did you verify that indeed individual Model instances take enough RAM to be a real problem?

My intuition is that moving from a Model to a tuple of simple types wouldn't save much memory unless you have so many of them that other performance bottlenecks arise first

@azizhk
Copy link
Contributor Author

azizhk commented Jan 17, 2020

Well right now we are storing everything in redux and we can see clear correlation between number of entries and RAM usage.
(Can also be because of Redux Persist)
We have 18 columns where there are multiple json and blob text fields.

posts: blog.posts.observe(),

Like even in the example, you would have all the post bodies in memory when rendering the list of posts.
I know an alternative would be to move blobs to different file and save their references, but i think querying what is needed can make things simpler while retaining reactivity.

@pranjal-jain
Copy link

@radex It's not only a problem with memory. But also passing huge amount of data over the bridge which cause performance issues. With just selected columns of data we'll reduce this overhead substantially.

@radex
Copy link
Collaborator

radex commented Jan 20, 2020

I know an alternative would be to move blobs to different file and save their references, but i think querying what is needed can make things simpler while retaining reactivity.

I think it would be much simpler for you in the long term to move blobs to a separate place, since only getting raw column values from Watermelon really dumbs down the framework's capabilities - you no longer have reactive records, just pure data.

But I understand you point, and can see how this may be necessary / a good idea in some extreme circumstances. So I very much encourage you to continue developing the #609 PR. I will be happy to do code review and merge it if it's good :)

@Kenneth-KT
Copy link
Contributor

There is no specific support for this, but sure, you can use RxJs for this, like so:

withObservables([???], ({ ... }) => ({
  items: $scrollPositionStream.switchMap(postion => db.collections.get('x').query(Q.where('position', Q.between(position, position+100)).observe()))
}))

see #113
#113
#113

@radex Thanks your suggestions, but I afraid the workaround may not quite work for large list in offline first apps.

Let say I want to limit the list to only show first 5 records in a large table, and I have the list of records pre-filled with sequential position during sync. Using your workaround, I can query first 5 records with position >= 0 AND position < 5 condition.

record #1, position: 0
record #2, position: 1
record #3, position: 2
record #4, position: 3
record #5, position: 4

But now if the user deletes record #3, the same query will just return 4 records:

record #1, position: 0
record #2, position: 1
record #4, position: 3
record #5, position: 4

Unless we run additional raw SQL query to manually shift the position number up for each record like:
UPDATE position SET position = position - 1 WHERE position > 2
But that would be a big performance hit for having to run such big updates for each record delete.
The cost for implementing such workaround with that bad performance is just not worth it.

I understand supporting LIMIT and ORDER BY queries may requires some sacrifice on observability, for my use case I am willing to sacrifice some observability to exchange for this performance gain.
However, I don't quite see how certain observability feature in 🍉 especially hard to support.
For example, I think that would be okay for single record observability (ie. monitoring changes of data fields in single record), queries result set without limit operation, etc. Even for queries that has limit applied to it, the worse case scenario we can still run the query in SQLite database.

(PS: Please point out if i got any concepts wrong)

Now we are planning to switch out from Realm, we really love the async nature of 🍉 as oppose to Realm's everything is synchronous which freezes the UI and also their performance of insertion is really terrible (more than 0.5 sec of UI freeze per record insertion) as our database size grows. It renders our app completely unusable during data sync. Our use case requires the database to be able to fetch few records with pagination from a list of > 100K records and able to search from it (may use FTS or roll out own tokenization indexer) without freezing the UI.

Please advise what I can help on that feature implementation or suggest workaround to use LIMIT query. Probably using raw queries? But then we would have to compose SQL without query builder, and also I am not sure if it is possible to observe that single record fetched with raw queries. The record set pagination feature is really a deal breaker for us to adopt 🍉.

(Side note: I just realized I have commented on a closed issue #113, so I repost it on a relevant open issue)

@azizhk
Copy link
Contributor Author

azizhk commented Feb 3, 2020

Hi @Kenneth-KT,
Yeah we are not going ahead with this approach.
We are actually going ahead with #609 for our infinite list requirements. It's working well so far and we might share our experiences once we are close to a production release.
We've seen a 20% reduction in memory usage in development, (will share prod stats soon).

@radex wanted to ask you what is the cache bursting strategy in 🍉?

@radex
Copy link
Collaborator

radex commented Feb 4, 2020

@Kenneth-KT You're right, my suggested workaround has real flaws, which depending on the exact use case may be fatal. To be clear: I'm not against adding SQL-level limiting/ordering to WatermelonDB, I just don't need it for my use cases, so I'm not going to drive the implementation of it. But if this is key to you, I would highly encourage you to consider contributing those, even in a limited form. I can offer guidance. And yes, @azizhk's PR is another useful workaround for large lists. Using raw SQL queries is also a great option, if you have a good way to know when to trigger re-fetch. Yes, records fetched using SQL are still 🍉 objects and you can observe them normally.

@azizhk

wanted to ask you what is the cache bursting strategy in 🍉?

None. Observable objects are cached on Query, Relation, but only for as long as they're observed. And Collection has a cache of { id : Record } objects. Once object with given ID is fetched, it's going to stay there forever.

My initial plan was to implement a two-level cache, keeping Records in a Map for the first X (100? 200?) records, and then move oldest records to a WeakMap, so that they're cached only as long as some piece of the app actually keeps a reference to it. If memory usage is important to your app, try contributing this improvement - it may help a lot if you're dealing with a lot of Records

@sofiageo
Copy link
Contributor

sofiageo commented Feb 4, 2020

Using raw SQL queries is also a great option, if you have a good way to know when to trigger re-fetch. Yes, records fetched using SQL are still watermelon objects and you can observe them normally.

Hi @radex , I was planning on asking it on another issue but since you mentioned it, I've been trying WatermelonDB for the past few weeks and currently I don't need the observability for my project, so I've been only using raw SQL queries. I didn't have much luck with JOINing tables with raw SQL queries, so I guess it's not supported, or maybe there was something wrong with the syntax I used, however it's not documented so I'm not sure. Could you clarify if it's possible and if it's not at the moment should I open an issue about it ?

I'm not against adding SQL-level limiting/ordering to WatermelonDB, I just don't need it for my use cases, so I'm not going to drive the implementation of it. But if this is key to you, I would highly encourage you to consider contributing those, even in a limited form

This sounds great, I hope the community can start helping with it. Would it make sense to have it in a different API for it in order not to conflict with the normal API ? Something like mango-queries rxdb has?

@radex
Copy link
Collaborator

radex commented Feb 4, 2020

I didn't have much luck with JOINing tables with raw SQL queries, so I guess it's not supported, or maybe there was something wrong with the syntax I used, however it's not documented so I'm not sure. Could you clarify if it's possible and if it's not at the moment should I open an issue about it ?

You can do any SQL which results in a list of records. (SQLs that result in a number or a list of numbers or something like that are not supported). You can do JOINs - check out tests for encodeQuery, you can see what built in join queries produce for SQL.

Please don't open issues for unimplemented features -- but please do send improvements in pull requests :)

Would it make sense to have it in a different API for it in order not to conflict with the normal API ? Something like mango-queries rxdb has?

No, I don't like the idea of maintaining two different query syntaxes. I think the current query builder can be extended to add limits and order

@Kenneth-KT
Copy link
Contributor

Kenneth-KT commented Feb 13, 2020

I've just implement sortBy, take, skip query operators (for sqlite only, and definitely without taking observability in mind) at
https://github.com/Kenneth-KT/WatermelonDB/commits/sort_take_skip_query_operators

I am planning to create a PR for this branch.

@radex What are your thoughts about it

@radex
Copy link
Collaborator

radex commented Feb 13, 2020

@Kenneth-KT Freaking amazing! Awesome work. Please do make a PR for that -- it will be much easier for me to review this code and give feedback this way

@Kenneth-KT
Copy link
Contributor

Kenneth-KT commented Feb 13, 2020

@radex It has some flaws currently, the order of objects might get messed up after creating new objects since the processChangeSet in subscribeToSimpleQuery has a code like this:

...
    } else if (matches && !currentlyMatching) {
      // Add if should be included but isn't
      mutableMatchingRecords.push(record)
      shouldEmit = true
    }

If I need to make it better, I guess what I need to do is to implement reloading change set post-processor which refetches objects from database.
@radex Do you think that complete handling is necessary for a PR to be accepted?

But anyway, I will make the PR after I finished testing it all out.

@radex
Copy link
Collaborator

radex commented Feb 13, 2020

Do you think that complete handling is necessary for a PR to be accepted?

No, as long as you clearly convey that the API may change or not fully work, e.g. Q.experimentalSortBy().

@Kenneth-KT
Copy link
Contributor

Take a look at this PR #622
It implements SQL ORDER BY, LIMIT and OFFSET by Q.sortBy(sortColumn, sortOrder), Q.take(count) and Q.skip(count) query operators.

@stale
Copy link

stale bot commented Aug 13, 2020

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the wontfix This will not be worked on label Aug 13, 2020
@stale stale bot closed this as completed Sep 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

5 participants