Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Storage Plugin INDEX pagination mode is field-order sensitive and should not be #2895

Open
hyperbolix opened this issue Mar 26, 2024 · 2 comments
Labels

Comments

@hyperbolix
Copy link

hyperbolix commented Mar 26, 2024

Describe the bug
HTTP Storage Plugin INDEX pagination mode is field-order sensitive and should not be (JSON is supposed to be unordered). When the pagination fields 'has_next' and 'next_page' come after the data path field 'results', the presence of a next page is ignored. The next page fields are only honored when they come before the data path field.

To Reproduce
Steps to reproduce the behavior:

  1. Start Drill.
  2. Start the Python3 HTTP server that demonstrates the issue:
    https://gist.github.com/hyperbolix/b66f07d5d007ca2df321550241d748ac
  3. Enable the http storage plugin and configure it using this configuration: https://gist.github.com/hyperbolix/7df108c8169cf54bcc72fdc499be1dce
  4. Open the Drill query interface and issue this query:
    select * from http.a_bug_test where total_results_val=300 and page_size_val=10 limit 15
  5. See that it returns 10 results, but it should return 15.
  6. Modify the Python3 HTTP server, moving lines 52-58 so they come before line 50, so that the 'has_next' and 'next_page' fields come before the 'results' field.
  7. Restart the Python3 HTTP server and issue the query again from step 4.
  8. Observe than now 15 results are returned.

Expected behavior
The INDEX pagination mode page token fields should be honored regardless of field ordering in the JSON response. Or, at a minimum, the documentation should clearly indicate that these fields must come before the results field (which is not the order in the example in the HTTP pagination readme today).

Error detail, log output or screenshots
When the 'has_next' and 'next_page' fields come after the data path 'results' field, they are ignored, meaning the result set is incorrectly trimmed short, as though there are not additional pages, yet there are additional pages.

Drill version
Observed in 1.21.1 and also in the latest commit as of 2024-03-26: 749772c

@hyperbolix hyperbolix added the bug label Mar 26, 2024
@cgivre
Copy link
Contributor

cgivre commented Mar 27, 2024

@hyperbolix I've been reading this ticket and I'm a little confused here. The index pagination is intended to be used when the API in question returns an index or an index link to indicate the next page.

With that said, the parameters page_size and total_results suggest that the correct pagination method would be page pagination. (https://github.com/apache/drill/blob/master/contrib/storage-http/Pagination.md#page-pagination). Page pagination is when the API provides paging information such as the page size and result count.

If you use the page pagination with a limit, that will work and there are unit tests for that.

@hyperbolix
Copy link
Author

hyperbolix commented Mar 27, 2024

This test rig doesn't exactly represent the system I've been experimenting with using Drill that led to the discovery of this bug. It was simply the most expedient way to demonstrate the issue. It helped to be able to tune total result size and max results per page to prove the issue would occur at various page sizes, and it helped to keep the 'next_page' token simple.

I appreciate your guidance in the matter; I can assure you the I am actually using page tokens in my experiments thereby necessitating the use of the 'index' pagination method. I can't recall a single cloud computing list API that doesn't use page tokens, and if I came across one, I'd be worried about using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants