-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Speed up GetSchema #25
Comments
Hi @karn09 , thanks for the detailed feedback. I'm very surprised of time spent retrieving the schema, how many measurements do you have on DB and what cardinality? I think I know what you propose but I'm not sure if a raw Just to be sure, can you check if the time spent doing Thanks, |
Hi @sbengo, thanks for following up. In one extreme case, we have 10391 measurements, however, a query for measurement cardinality gives 10389. On this db, Given the high cardinality, I'm hesitant to load all measurement field keys via
For comparison, I have another DB with a measurement cardinality of 13. A bulk You bring up valid points that I had not considered. I can see both approaches being problematic. To try and accommodate filtering, I put together a query that uses regex on Perhaps a 'use at your own risk' warning could surround an option to bulk load fields, which would disable filtering - or have filtering applied after all the fields are fetched. |
Hi @karn09 , thanks for your feedback. As you have said, and IMHO its not acceptable to spent 8-20 min retriving schema. Personally I agree to implement what you proposed. If you can, try to make an initial PR and we will discuss it on it. Remember to do the measurement filter after retrieve all fields with the bulk query. Thanks, |
Loading the schema from a DB with a large number of measurements takes a long time. I've observed anywhere from 8-20 minutes before
GetSchema
completes.I suspect the cause of long load times to be a result of:
syncflux/pkg/agent/hacluster.go
Line 147 in 9d69de4
This is making individual API calls for each measurement to fetch field keys.
I was thinking that it may be possible to use
show field keys on <sdb>
, so that the API responds with field keys for ALL measurements in the selected db. I think this would work, but I haven't investigated whether there are any size limitations with influxdb JSON responses, or the rest client used.With 1000 measurements, the API took 12s to respond with a 1.72MB JSON payload. Compared to a request for fields on a single measurement, which took between 500-800ms within a small sample size of requests.
An alternate could be splitting the list of measurements and fetch field keys in batches, but this could also be very slow. For example,
show field keys from disk,diskio,interrupts,kernel
would take upward of 12s, sometimes even giving an empty response. Maybe influxdb does not index on this sort of query?For my limited testing, I am running InfluxDB 1.7.7, with queries being routed through influxdb-srelay. Queries made directly to master were slightly faster, with all fields being returned in 4s, and batches of 4 varying between 4-12s per request.
It would be awesome if we could set a flag at the command line to force bulk loading of all field keys in a single request, or have some sort of logic that automatically switches to bulk loading if a certain amount of measurements are seen in one DB. If batching requests is workable with additional configuration in influxdb, that would also be great.
I'd be happy to submit a PR with my proposed solution, but would appreciate some feedback on the correct approach to take.
The text was updated successfully, but these errors were encountered: