Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WriteAPI.Flush doesn't do what's expected #289

Open
pabigot opened this issue Dec 13, 2021 · 6 comments
Open

WriteAPI.Flush doesn't do what's expected #289

pabigot opened this issue Dec 13, 2021 · 6 comments

Comments

@pabigot
Copy link
Contributor

pabigot commented Dec 13, 2021

Steps to reproduce:

  1. Start a loop writing data once per second, with a WriteFailedCallback that detects loss of connection, prevents new data from being added, and tells the infrastructure to retry the send of data it's already been given.
  2. Restart the influxdb process, which causes a transient failure
  3. Logic in the application checks the client Ready/Health status and verifies that the server is back up, and re-enables writing after invoking w.Flush() to first send what's already queued.

Expected behavior:

Upon flush the pending data from queued retries will be written to the server and new data will be promptly written as specified by the FlushInterval.

Actual behavior:

New data is queued in individual batches until the original retryDelay elapses, possibly causing data loss when the number of queued batches exceeds RetryBufferLimit / RetryBatchSize (10 by default).

pre write 11 ... past write 11
2021/12/13 09:43:42.876083 influxdb2client I! sending batch
2021/12/13 09:43:42.878517 influxdb2client E! Write error: Post "http://tirzah.pab:8086/api/v2/write?bucket=sandbox&org=green-tirzah&precision=us": dial tcp 192.168.65.21:8086: connect: connection refused
Batch kept for retrying
2021/12/13 09:43:42.878537 Write failed: 0, counter,id=HACK/influx.go:2021-12-13T09:43:32 counter=10i 1639413821630922
: Post "http://tirzah.pab:8086/api/v2/write?bucket=sandbox&org=green-tirzah&precision=us": dial tcp 192.168.65.21:8086: connect: connection refused
2021/12/13 09:43:42.878544 influxdb2client D! Write proc: next wait for write is 8081ms
2021/12/13 09:43:42.878562 influxdb2client D! Write proc: received write request
2021/12/13 09:43:42.878569 influxdb2client D! Write proc: taking batch from retry queue
2021/12/13 09:43:42.878572 influxdb2client W! Write proc: cannot write yet, storing batch to queue
2021/12/13 09:43:42.878596 WriteError: *fmt.wrapError write failed (attempts 1): Post "http://tirzah.pab:8086/api/v2/write?bucket=sandbox&org=green-tirzah&precision=us": dial tcp 192.168.65.21:8086: connect: connection refused true
2021/12/13 09:43:43.631849 influxdb2client I! HTTP GET req to http://tirzah.pab:8086/health
Status: &domain.HealthCheck{
    Checks: &[]domain.HealthCheck{
    },
    Commit:  &"657e1839de",
    Message: &"ready for queries and writes",
    Name:    "influxdb",
    Status:  "pass",
    Version: &"2.1.1",
}
Past flush
pre write 12 ... past write 12
2021/12/13 09:43:43.875926 influxdb2client I! sending batch
2021/12/13 09:43:43.875953 influxdb2client D! Write proc: received write request
2021/12/13 09:43:43.875957 influxdb2client D! Write proc: taking batch from retry queue
2021/12/13 09:43:43.875962 influxdb2client W! Write proc: cannot write yet, storing batch to queue
pre write 13 ... past write 13
2021/12/13 09:43:44.876353 influxdb2client I! sending batch
2021/12/13 09:43:44.876390 influxdb2client D! Write proc: received write request
2021/12/13 09:43:44.876394 influxdb2client D! Write proc: taking batch from retry queue
2021/12/13 09:43:44.876398 influxdb2client W! Write proc: cannot write yet, storing batch to queue
pre write 14 ... past write 14
2021/12/13 09:43:45.876116 influxdb2client I! sending batch
2021/12/13 09:43:45.876153 influxdb2client D! Write proc: received write request
2021/12/13 09:43:45.876167 influxdb2client D! Write proc: taking batch from retry queue
2021/12/13 09:43:45.876175 influxdb2client W! Write proc: cannot write yet, storing batch to queue

Specifications:

  • Client Version: 2.6.0
  • InfluxDB Version: 2.1.1
  • Platform: Linux go 1.17
@pabigot
Copy link
Contributor Author

pabigot commented Dec 13, 2021

With #291 the observed behavior is:

pre write 3 ... past write 3
2021/12/13 12:39:56.078557 influxdb2client I! sending batch
2021/12/13 12:39:56.078582 influxdb2client D! Write proc: received write request
2021/12/13 12:39:56.078594 influxdb2client D! Writing batch: counter,id=HACK/influx.go:2021-12-13T12:39:54 counter=3i 1639424396078515
2021/12/13 12:39:56.078611 influxdb2client I! HTTP POST req to http://tirzah.pab:8086/api/v2/write?bucket=sandbox&org=green-tirzah&precision=us
pre write 4 ... past write 4
2021/12/13 12:39:57.328254 influxdb2client I! sending batch
2021/12/13 12:39:57.328280 influxdb2client D! Write proc: received write request
2021/12/13 12:39:57.328288 influxdb2client D! Writing batch: counter,id=HACK/influx.go:2021-12-13T12:39:54 counter=4i 1639424397078939
2021/12/13 12:39:57.328305 influxdb2client I! HTTP POST req to http://tirzah.pab:8086/api/v2/write?bucket=sandbox&org=green-tirzah&precision=us
pre write 5 ... past write 5
2021/12/13 12:39:58.328885 influxdb2client I! sending batch
2021/12/13 12:39:58.338488 influxdb2client E! Write error: Post "http://tirzah.pab:8086/api/v2/write?bucket=sandbox&org=green-tirzah&precision=us": dial tcp 192.168.65.21:8086: connect: connection refused
Batch kept for retrying
2021/12/13 12:39:58.338521 Write failed: 0, counter,id=HACK/influx.go:2021-12-13T12:39:54 counter=4i 1639424397078939
: Post "http://tirzah.pab:8086/api/v2/write?bucket=sandbox&org=green-tirzah&precision=us": dial tcp 192.168.65.21:8086: connect: connection refused
2021/12/13 12:39:58.338528 influxdb2client D! Write proc: next wait for write is 8081ms
2021/12/13 12:39:58.338555 influxdb2client D! Write proc: received write request
2021/12/13 12:39:58.338561 influxdb2client D! Write proc: taking batch from retry queue
2021/12/13 12:39:58.338565 influxdb2client W! Write proc: cannot write yet, storing batch to queue
2021/12/13 12:39:58.338595 WriteError: *fmt.wrapError write failed (attempts 1): Post "http://tirzah.pab:8086/api/v2/write?bucket=sandbox&org=green-tirzah&precision=us": dial tcp 192.168.65.21:8086: connect: connection refused true
2021/12/13 12:39:59.079777 influxdb2client I! HTTP GET req to http://tirzah.pab:8086/health
Status: &domain.HealthCheck{
    Checks: &[]domain.HealthCheck{
    },
    Commit:  &"657e1839de",
    Message: &"ready for queries and writes",
    Name:    "influxdb",
    Status:  "pass",
    Version: &"2.1.1",
}
2021/12/13 12:39:59.080886 influxdb2client I! Triggering write
2021/12/13 12:39:59.080895 influxdb2client I! Triggered write active
Past flush
pre write 6 ... 2021/12/13 12:39:59.080904 influxdb2client D! Write proc: received flush retry request
2021/12/13 12:39:59.080916 influxdb2client D! Write proc: taking batch from retry queue
past write 6
2021/12/13 12:39:59.080944 influxdb2client D! Writing batch: counter,id=HACK/influx.go:2021-12-13T12:39:54 counter=4i 1639424397078939
2021/12/13 12:39:59.080963 influxdb2client I! HTTP POST req to http://tirzah.pab:8086/api/v2/write?bucket=sandbox&org=green-tirzah&precision=us
2021/12/13 12:39:59.091512 influxdb2client D! Write proc: taking batch from retry queue
2021/12/13 12:39:59.091534 influxdb2client D! Writing batch: counter,id=HACK/influx.go:2021-12-13T12:39:54 counter=5i 1639424398079554
2021/12/13 12:39:59.091552 influxdb2client I! HTTP POST req to http://tirzah.pab:8086/api/v2/write?bucket=sandbox&org=green-tirzah&precision=us
2021/12/13 12:39:59.328624 influxdb2client I! sending batch
2021/12/13 12:39:59.328643 influxdb2client D! Write proc: received write request
2021/12/13 12:39:59.328648 influxdb2client D! Writing batch: counter,id=HACK/influx.go:2021-12-13T12:39:54 counter=6i 1639424399080907
2021/12/13 12:39:59.328665 influxdb2client I! HTTP POST req to http://tirzah.pab:8086/api/v2/write?bucket=sandbox&org=green-tirzah&precision=us
pre write 7 ... past write 7
2021/12/13 12:40:00.329102 influxdb2client I! sending batch
2021/12/13 12:40:00.329126 influxdb2client D! Write proc: received write request
2021/12/13 12:40:00.329141 influxdb2client D! Writing batch: counter,id=HACK/influx.go:2021-12-13T12:39:54 counter=7i 1639424400080708
2021/12/13 12:40:00.329173 influxdb2client I! HTTP POST req to http://tirzah.pab:8086/api/v2/write?bucket=sandbox&org=green-tirzah&precision=us
pre write 8 ... past write 8
2021/12/13 12:40:01.328216 influxdb2client I! sending batch
2021/12/13 12:40:01.328240 influxdb2client D! Write proc: received write request
2021/12/13 12:40:01.328248 influxdb2client D! Writing batch: counter,id=HACK/influx.go:2021-12-13T12:39:54 counter=8i 1639424401080881
2021/12/13 12:40:01.328264 influxdb2client I! HTTP POST req to http://tirzah.pab:8086/api/v2/write?bucket=sandbox&org=green-tirzah&precision=us

@vlastahajek
Copy link
Contributor

@pabigot, thanks for using this client.
As you noticed, you are facing the built-in retry algorithm, which is standard for InfluxDB 2 clients. Default options are chosen for InfluxDB Cloud.

You can customize its parameters to reach lower wait times, which will be more suitable for OSS server: E.g.

client := influxdb2.NewClientWithOptions("http://localhost:8888", "token", influxdb2.DefaultOptions().WriteOptions().SetRetryInterval(1000));

@vlastahajek
Copy link
Contributor

@pabigot, v2.9.2 release added flushing of retry queue when calling WriteAPI.Flush. This should work for your issue.

@pabigot
Copy link
Contributor Author

pabigot commented Jul 29, 2022

Thanks; I ultimately gave up on WriteAPI since there were multiple places where failure situations would cause loss of data without notifying the application. I have a robust alternative solution with WriteAPIBlocking. I'll close this assuming this problem has been fixed.

@vlastahajek
Copy link
Contributor

Data loss occurs only when a batch is removed from the retry queue, either due to the max retry time or max retry attempt.
I plan to add a callback when the batch needs to be removed.

I understood from your use case you would like also to force a retry attempt, right?

@pabigot
Copy link
Contributor Author

pabigot commented Aug 25, 2022

@vlastahajek yes, that may be the only place that loses data without informing the application.

I do want retry, but under complete application control. This branch has my approach, which needs documentation review before I might publish it.

This provides data structures for connections, and buckets that are accessed through connections. The connections signal whether they're ready or not, allowing the application to determine whether to submit data to a bucket or retain it locally.

Buckets include information about the measurement structures, and goroutines to handle communication with the server in the background. The application is notified of the success or failure of every server communication. All failed writes will be offered to the user to determine whether to resubmit, drop, or cache locally, and on shutdown all unsent and in-flight data is reclaimed and presented to the application for disposition. Unlike the upstream WriteAPI this implementation guarantees that no data will be dropped without giving the application an opportunity to record it for future resubmission.

All operations traffic in newline-terminated Line Protocol-encoded batches, simplifying local storage of data when the server is unavailable.

I'm unlikely to go back to WriteAPI at this point, so feel free to close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants