Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: list_of_documents does not fetch all collection entries #12

Open
1 task done
flashsturz opened this issue Mar 23, 2023 · 6 comments
Open
1 task done

[Bug]: list_of_documents does not fetch all collection entries #12

flashsturz opened this issue Mar 23, 2023 · 6 comments
Labels
bug Something isn't working triage

Comments

@flashsturz
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

-   OS: Win11
-   Python: 3.9.16

What happened?

When using the command list_of_documents() to get all entries from a collection, it seems that only a subset of the whole collection is returned.
This is further confirmed when having a look at collection.py in the google_cloud_firestore library in method list_documents(). The function list_documents takes as argument an optional "page_size", which determines how many documents will be displayed/returned. This is set per default by the API, but it is not certain that this command returns all available documents in the collection (e.g. if you have several hundred documents.)

I currently do not know how to work around this issue. Any ideas?

Code Snippet

# I used the following command to fetch all documents of a large collection:

active_ids = fsdb.collection("active_files").list_of_documents(self.user["idToken"])
#Returns only a subset of all documents in "active_files", accessible through the firestore console.

Relevant log output

No response

Anything else?

No response

@flashsturz flashsturz added bug Something isn't working triage labels Mar 23, 2023
@AsifArmanRahman
Copy link
Owner

@flashsturz
Does the issue take place if when you're using service account credentials? Because for service account it uses google_cloud_firebase library, meanwhile for user based, it used an API endpoint. The code snippet provided uses the API endpoint, while in description, it's mentioning the official library.

@flashsturz
Copy link
Contributor Author

I use a user-based account and can confirm that it uses the API endpoint to get the documents. Sorry about the confusion.
Is there a parameter that we can pass with the request header in list_of_documents() to specify that we want to fetch all available documents from req_ref?

@AsifArmanRahman
Copy link
Owner

@flashsturz
The method is supposed to return all the documents, the API endpoint wasn't set properly, which will be fixed in the next version.

This is set per default by the API, but it is not certain that this command returns all available documents in the collection (e.g. if you have several hundred documents.)

Were you able to find out this default value use in the official library? I wonder if it's a constant number, or it changes based on something, which an AI might determine.

@flashsturz
Copy link
Contributor Author

Unfortunately, I do not know how the default value is chosen, since the documentation of the firestore library is not exactly stating how it is set:

page_size (Optional[int]]): The maximum number of documents
in each page of results from this request. Non-positive values
are ignored. Defaults to a sensible value set by the API.

(From list_documents in collection.py in the library firestore_v1 library)

@AsifArmanRahman
Copy link
Owner

Sorry to get back at it this late, but from what I checked, The REST API for firestore doesn't return all documents at once, it returns a next page token, using which further ones need to be retrieved. Now it would mean the library will have to make multiple requests before providing a result of all documents. And in case of large number of documents, it might take a long time. I could enable passing of page_size argument, so for cases where the no of docs is supposed to be large, the dev can a larger page size. I'll set the default page size to 20, as tradition.

Does it seem fine? @flashsturz

@flashsturz
Copy link
Contributor Author

Sorry for the delayed response from my side. Yes, I think it might be good to give the user the ability to set the page_size argument.
Do you know how large the page_size can be chosen, e.g. if there is an upper limit? Or is it possible to specify a very large number in order to ensure that all documents fit on one page?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants