-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pagination using Range cannot be consistent #87
Comments
PS: This is related, but not identical, to #81. |
Hey! This isn't too obvious from the source material, but the So for example, you'd get back a range that looks like:
When you feed that into an API, it should translate to something that looks roughly like: SELECT * FROM table WHERE condition AND id >= 'resource_key_123' ORDER BY field LIMIT size Aside from being unstable as you suggest, you should also never use offset-based pagination because its performance degrades the further you page. This Postgres presentation is still the best description of that problem out there. At Stripe, we actually had to migrate from offset-based to identifier-based pagination, and trust me that it has been a long and painful process :) I'm going to close this out for now, but let me know if you have any other questions/comments! |
@brandur I'm aware of id-based pagination, but it still doesn't solve the issue I mentioned. PS: The example I initially gave works for both id-based pagination and index-based pagination. This was on purpose. |
@KellerFuchs In your example above, you said:
Paginated by IDs does solve the problem of a client seeing duplicates. The API has told the client that the next page starts at There is definitely a more general problem in that it's very difficult for a client to paginate an entire list and be sure that it's received every item because of the possibility of subsequent insertions. You could try to use your database's isolation to keep a consistent snapshot as a client iterates, but even there you still have a few problems:
One fairly easy solution might be to use atomically incremented IDs to in effect make the list append-only. |
Yes, as I mentioned id-based pagination and append-only is sufficient to get an atomic view of the result (which might be a more recent one than the one at the time you start iterating, though). Yes, I agree that breaking consistency (or isolation, in DB-speak) is a solution in some cases, but being utterly silent about this issue in a “best practices” guide seems shoddy. |
Yeah, the state of this info in the guide remains unsatisfactory for me as well. Unfortunately there are quite a lot of different ways to do this and it depends a lot on the use cases and data stores involved. As such it's been rather tricky to pin down anything that looks particularly best-practice-esque. One option to improve the existing usage might be to add data to the next-range which would allow the server to figure out if things have mutated. This would normally fall to All that being said, for many use cases (ie displaying a UI), eventual consistency is good enough, with a slightly out of date list not really being too problematic. |
Hence why I suggested using |
Fair point. We have largely expected |
Section 1.6 and the Heroku API reference both recommend using
Range
for pagination.Using the
Range
/Next-Range
mechanism from HTTP accidentally exposes in the API that the data is (likely) obtained by computing the requested data within a selected window; for instance, for a query that exposes data from a SQL database:Moreover, this implementation pattern cannot provide a consistent view of the data when concurrent actions can introduce new elements at arbitrary indexes. Consider the following sequence of events:
Next-Range: 20 ..
; let's call the elementse0
toe9
;x
andy
at indexes 9 and 14;e10 = e9
,e11
...e19
:e9
has been seen twice by the client (unless you use id-based pagination), and it sawy
but notx
(despite them having been added simultaneously, regardless of whether you use id-based pagination).The right solution is to provide the API client with a consistent view of the data; for APIs returning results of DB queries, this is easily done using either cursors or materialized views, the second having the additional advantage of supporting later refinement queries.
At the API level, this should be materialized using the
Link
header, with (at least) a reference taggedrel=next
. UsingLink
allows the API implementer to store there whatever is required to designate the right query result (for instance, a DB cursor id).The text was updated successfully, but these errors were encountered: