Infinite List Questions, Memory Usage #217

azizhk · 2019-01-26T12:08:12Z

Hi WatermelonDB developers.
Wanted to ask about WatermelonDB usage with an infinite list. I understand that WatermelonDB is lazy first, would I be able to lazily load results of a query until the items are on screen. Wondering if watermelon DB supports something like this using RxJS

$items = $scrollPositionStream.switchMap(postion => runQueryForPosition(position))

Where I have only those objects which are on screen in memory (or those which have appeared on screen once)

The text was updated successfully, but these errors were encountered:

radex · 2019-01-27T10:41:15Z

There is no specific support for this, but sure, you can use RxJs for this, like so:

withObservables([???], ({ ... }) => ({
  items: $scrollPositionStream.switchMap(postion => db.collections.get('x').query(Q.where('position', Q.between(position, position+100)).observe()))
}))

see #113
#113
#113

azizhk · 2020-01-17T11:54:39Z

Hi @radex,
So we've been working on some POCs. We wanted to know your opinions.
Wondering if it would be easy for 🍉 to provide only a few columns like id and the columns needed for sorting:

const cols$ = database.collections
  .get("table")
  .query(Q.where("name", Q.like(sanitizedInput)))
  .observeOnlyColumns(["id", "updated_at"])

It should follow the same functionality as observeWithColumns. Here cols$ is Observable<Array<[string, Date]>>

const sortedIds$ = cols$.pipe(
  map(rows => rows
    .sort((a, b) => a[1] - b[1])
    .map(r => r[0])
  )
)

Then we sort these on the basis of the columns in JS, and pass it to the FlatList along with a RenderItem Function which returns a component wrapped using withObservables to query the model based on id.

Would this solve the problems of #113 and #15 while still being reactive?
This would help reduce our memory footprint.

radex · 2020-01-17T13:00:59Z

This would help reduce our memory footprint.

Hm, is that a hypothesis, or did you verify that indeed individual Model instances take enough RAM to be a real problem?

My intuition is that moving from a Model to a tuple of simple types wouldn't save much memory unless you have so many of them that other performance bottlenecks arise first

azizhk · 2020-01-17T13:17:48Z

Well right now we are storing everything in redux and we can see clear correlation between number of entries and RAM usage.
(Can also be because of Redux Persist)
We have 18 columns where there are multiple json and blob text fields.

WatermelonDB/examples/web/src/components/Blog/index.js

Line 84 in faeda58

posts: blog.posts.observe(),

Like even in the example, you would have all the post bodies in memory when rendering the list of posts.
I know an alternative would be to move blobs to different file and save their references, but i think querying what is needed can make things simpler while retaining reactivity.

pranjal-jain · 2020-01-17T14:44:41Z

@radex It's not only a problem with memory. But also passing huge amount of data over the bridge which cause performance issues. With just selected columns of data we'll reduce this overhead substantially.

radex · 2020-01-20T13:18:57Z

I know an alternative would be to move blobs to different file and save their references, but i think querying what is needed can make things simpler while retaining reactivity.

I think it would be much simpler for you in the long term to move blobs to a separate place, since only getting raw column values from Watermelon really dumbs down the framework's capabilities - you no longer have reactive records, just pure data.

But I understand you point, and can see how this may be necessary / a good idea in some extreme circumstances. So I very much encourage you to continue developing the #609 PR. I will be happy to do code review and merge it if it's good :)

Kenneth-KT · 2020-02-03T18:25:05Z

There is no specific support for this, but sure, you can use RxJs for this, like so:
withObservables([???], ({ ... }) => ({
  items: $scrollPositionStream.switchMap(postion => db.collections.get('x').query(Q.where('position', Q.between(position, position+100)).observe()))
}))
see #113
#113
#113

@radex Thanks your suggestions, but I afraid the workaround may not quite work for large list in offline first apps.

Let say I want to limit the list to only show first 5 records in a large table, and I have the list of records pre-filled with sequential position during sync. Using your workaround, I can query first 5 records with position >= 0 AND position < 5 condition.

record #1, position: 0
record #2, position: 1
record #3, position: 2
record #4, position: 3
record #5, position: 4

But now if the user deletes record #3, the same query will just return 4 records:

record #1, position: 0
record #2, position: 1
record #4, position: 3
record #5, position: 4

Unless we run additional raw SQL query to manually shift the position number up for each record like:
UPDATE position SET position = position - 1 WHERE position > 2
But that would be a big performance hit for having to run such big updates for each record delete.
The cost for implementing such workaround with that bad performance is just not worth it.

I understand supporting LIMIT and ORDER BY queries may requires some sacrifice on observability, for my use case I am willing to sacrifice some observability to exchange for this performance gain.
However, I don't quite see how certain observability feature in 🍉 especially hard to support.
For example, I think that would be okay for single record observability (ie. monitoring changes of data fields in single record), queries result set without limit operation, etc. Even for queries that has limit applied to it, the worse case scenario we can still run the query in SQLite database.

(PS: Please point out if i got any concepts wrong)

Now we are planning to switch out from Realm, we really love the async nature of 🍉 as oppose to Realm's everything is synchronous which freezes the UI and also their performance of insertion is really terrible (more than 0.5 sec of UI freeze per record insertion) as our database size grows. It renders our app completely unusable during data sync. Our use case requires the database to be able to fetch few records with pagination from a list of > 100K records and able to search from it (may use FTS or roll out own tokenization indexer) without freezing the UI.

Please advise what I can help on that feature implementation or suggest workaround to use LIMIT query. Probably using raw queries? But then we would have to compose SQL without query builder, and also I am not sure if it is possible to observe that single record fetched with raw queries. The record set pagination feature is really a deal breaker for us to adopt 🍉.

(Side note: I just realized I have commented on a closed issue #113, so I repost it on a relevant open issue)

azizhk · 2020-02-03T18:49:15Z

Hi @Kenneth-KT,
Yeah we are not going ahead with this approach.
We are actually going ahead with #609 for our infinite list requirements. It's working well so far and we might share our experiences once we are close to a production release.
We've seen a 20% reduction in memory usage in development, (will share prod stats soon).

@radex wanted to ask you what is the cache bursting strategy in 🍉?

radex · 2020-02-04T07:44:43Z

@Kenneth-KT You're right, my suggested workaround has real flaws, which depending on the exact use case may be fatal. To be clear: I'm not against adding SQL-level limiting/ordering to WatermelonDB, I just don't need it for my use cases, so I'm not going to drive the implementation of it. But if this is key to you, I would highly encourage you to consider contributing those, even in a limited form. I can offer guidance. And yes, @azizhk's PR is another useful workaround for large lists. Using raw SQL queries is also a great option, if you have a good way to know when to trigger re-fetch. Yes, records fetched using SQL are still 🍉 objects and you can observe them normally.

@azizhk

wanted to ask you what is the cache bursting strategy in 🍉?

None. Observable objects are cached on Query, Relation, but only for as long as they're observed. And Collection has a cache of { id : Record } objects. Once object with given ID is fetched, it's going to stay there forever.

My initial plan was to implement a two-level cache, keeping Records in a Map for the first X (100? 200?) records, and then move oldest records to a WeakMap, so that they're cached only as long as some piece of the app actually keeps a reference to it. If memory usage is important to your app, try contributing this improvement - it may help a lot if you're dealing with a lot of Records

sofiageo · 2020-02-04T09:57:33Z

Using raw SQL queries is also a great option, if you have a good way to know when to trigger re-fetch. Yes, records fetched using SQL are still watermelon objects and you can observe them normally.

Hi @radex , I was planning on asking it on another issue but since you mentioned it, I've been trying WatermelonDB for the past few weeks and currently I don't need the observability for my project, so I've been only using raw SQL queries. I didn't have much luck with JOINing tables with raw SQL queries, so I guess it's not supported, or maybe there was something wrong with the syntax I used, however it's not documented so I'm not sure. Could you clarify if it's possible and if it's not at the moment should I open an issue about it ?

I'm not against adding SQL-level limiting/ordering to WatermelonDB, I just don't need it for my use cases, so I'm not going to drive the implementation of it. But if this is key to you, I would highly encourage you to consider contributing those, even in a limited form

This sounds great, I hope the community can start helping with it. Would it make sense to have it in a different API for it in order not to conflict with the normal API ? Something like mango-queries rxdb has?

radex · 2020-02-04T11:59:29Z

I didn't have much luck with JOINing tables with raw SQL queries, so I guess it's not supported, or maybe there was something wrong with the syntax I used, however it's not documented so I'm not sure. Could you clarify if it's possible and if it's not at the moment should I open an issue about it ?

You can do any SQL which results in a list of records. (SQLs that result in a number or a list of numbers or something like that are not supported). You can do JOINs - check out tests for encodeQuery, you can see what built in join queries produce for SQL.

Please don't open issues for unimplemented features -- but please do send improvements in pull requests :)

Would it make sense to have it in a different API for it in order not to conflict with the normal API ? Something like mango-queries rxdb has?

No, I don't like the idea of maintaining two different query syntaxes. I think the current query builder can be extended to add limits and order

Kenneth-KT · 2020-02-13T14:57:12Z

I've just implement sortBy, take, skip query operators (for sqlite only, and definitely without taking observability in mind) at
https://github.com/Kenneth-KT/WatermelonDB/commits/sort_take_skip_query_operators

I am planning to create a PR for this branch.

@radex What are your thoughts about it

radex · 2020-02-13T15:00:41Z

@Kenneth-KT Freaking amazing! Awesome work. Please do make a PR for that -- it will be much easier for me to review this code and give feedback this way

Kenneth-KT · 2020-02-13T15:03:54Z

@radex It has some flaws currently, the order of objects might get messed up after creating new objects since the processChangeSet in subscribeToSimpleQuery has a code like this:

...
    } else if (matches && !currentlyMatching) {
      // Add if should be included but isn't
      mutableMatchingRecords.push(record)
      shouldEmit = true
    }

If I need to make it better, I guess what I need to do is to implement reloading change set post-processor which refetches objects from database.
@radex Do you think that complete handling is necessary for a PR to be accepted?

But anyway, I will make the PR after I finished testing it all out.

radex · 2020-02-13T15:09:10Z

Do you think that complete handling is necessary for a PR to be accepted?

No, as long as you clearly convey that the API may change or not fully work, e.g. Q.experimentalSortBy().

Kenneth-KT · 2020-02-15T05:10:22Z

Take a look at this PR #622
It implements SQL ORDER BY, LIMIT and OFFSET by Q.sortBy(sortColumn, sortOrder), Q.take(count) and Q.skip(count) query operators.

stale · 2020-08-13T09:46:09Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

azizhk closed this as completed Jan 27, 2019

radex mentioned this issue Nov 12, 2019

Large list performance #561

Closed

azizhk reopened this Jan 17, 2020

pranjal-jain mentioned this issue Jan 20, 2020

Add support to observe on selected columns of a collection #609

Open

4 tasks

stale bot added the wontfix This will not be worked on label Aug 13, 2020

stale bot closed this as completed Sep 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite List Questions, Memory Usage #217

Infinite List Questions, Memory Usage #217

azizhk commented Jan 26, 2019

radex commented Jan 27, 2019

azizhk commented Jan 17, 2020

radex commented Jan 17, 2020

azizhk commented Jan 17, 2020

pranjal-jain commented Jan 17, 2020

radex commented Jan 20, 2020

Kenneth-KT commented Feb 3, 2020

azizhk commented Feb 3, 2020

radex commented Feb 4, 2020

sofiageo commented Feb 4, 2020 •

edited

Loading

radex commented Feb 4, 2020

Kenneth-KT commented Feb 13, 2020 •

edited

Loading

radex commented Feb 13, 2020

Kenneth-KT commented Feb 13, 2020 •

edited

Loading

radex commented Feb 13, 2020

Kenneth-KT commented Feb 15, 2020

stale bot commented Aug 13, 2020

Infinite List Questions, Memory Usage #217

Infinite List Questions, Memory Usage #217

Comments

azizhk commented Jan 26, 2019

radex commented Jan 27, 2019

azizhk commented Jan 17, 2020

radex commented Jan 17, 2020

azizhk commented Jan 17, 2020

pranjal-jain commented Jan 17, 2020

radex commented Jan 20, 2020

Kenneth-KT commented Feb 3, 2020

azizhk commented Feb 3, 2020

radex commented Feb 4, 2020

sofiageo commented Feb 4, 2020 • edited Loading

radex commented Feb 4, 2020

Kenneth-KT commented Feb 13, 2020 • edited Loading

radex commented Feb 13, 2020

Kenneth-KT commented Feb 13, 2020 • edited Loading

radex commented Feb 13, 2020

Kenneth-KT commented Feb 15, 2020

stale bot commented Aug 13, 2020

sofiageo commented Feb 4, 2020 •

edited

Loading

Kenneth-KT commented Feb 13, 2020 •

edited

Loading

Kenneth-KT commented Feb 13, 2020 •

edited

Loading