-
Notifications
You must be signed in to change notification settings - Fork 3.7k
feat: update loki canary to support batched log push #17558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit introduces a new `EntryWriter` implementation in the Loki canary codebase, supporting batched log pushes. A new runtime CLI opt, `-log-batch-size` receives an int, specifying the # of log lines to batch before they are sent to Loki. This opt may be set to '0' or '1' to disable batching entirely, in which case the canary functionality remains as-is. If an int value > 1 is passed to this CLI opt, a new `BatchedPush` instance is created. This instance collects log lines from its channel and stores them in local array; once the array size is at least the value of `-log-batch-size`, the logs are then marshaled and pushed to Loki using the standard push API. This implementation also maintains a 30s countdown timer; every 30s, the timer triggers and whatever logs exist in the "batch" array are pushed to Loki. This prevents a slowdown in log ingestion into the canary instance from preventing logs from backing up for too long. Moreover, if/when the canary org is shutdown (i.e., the termination channel receives a signal to quit), the remaining logs are likewise pushed to Loki and the shutdown process continues.
Rather than have a lockable struct with the log lines array, just maintain a simple callback mechanism which implements locking. This means every time a channel in the `BatchedPush.run` function receives a response, run a function which acquires the lock, checks the array size (if applicable), flushes all batched log lines to Loki, and releases the lock. Either of these implementations should work, but this is simpler.
Updates the push_test.go module with additional tests for batching logs from the loki-canary instance.
Updates the tests in `canary/writer/push_test.go` to include batching a logs and terminating the batched writer. This tests our ability to stop a writer, but ensure the batched logs are pushed before the writer is completely stopped.
Adds a CLI arg w/a default value to limit the upper bound of the loki-canary log pusher batch size. Canary pods don't have a lot of memory, so this implements a limit of 20 log lines in a given batch, the goal being to restrict the memory to 20MB of logs in memory at any given point in time. This limit is only imposed in the canary's `main.go` module when initializing the canary. We don't do any additional checks in the `batched_push.go` implementation to restrict the batch size to an upper-bound. This also reorganizes the push tests a little.
Fixes the existing terminate test so it accurately tests shutting down the BufferedPush writer. This also adds a separate test for ensuring the timeout countdown effectively sends all logs in the batched array. Updates the `main.go` module which checks the logBatchSize and Max so we allow the batch size to be equal to the max size. Also adds a few very minor changes to the `batched_push.go` module: a few comment changes, moving some things around, removing an unnecessary log line, and updating the `Stop()` function.
This was causing a race condition in the test which checks for larger batching. The WaitGroup was potentially incremented apart from the other goroutines, causing a potential inconsistency with how this test waits for routines to wrap up.
…com:grafana/loki into agoss/enhance-loki-canary-ingestion-process
paul1r
approved these changes
May 20, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
This commit introduces a new
EntryWriter
implementation in the Loki canary codebase, supporting batched log pushes. A new runtime CLI opt,-logs-batch-size
receives an int, specifying the # of log lines to batch before they are sent to Loki.This opt may be set to '0' or '1' to disable batching entirely, in which case the canary functionality remains as-is. If an int value > 1 is passed to this CLI opt, a new
BatchedPush
instance is created. This instance collects log lines from its channel and stores them in local array; once the array size is at least the value of-logs-batch-size
, the logs are then marshaled and pushed to Loki using the standard push API.A second CLI opt is added,
-logs-batch-size-max
. This arg restricts the-logs-batch-size
opt passed by the user so the # of batched log lines don't blow up the memory of the loki-canary pod. The default value is 20, ensuring the array stays w/in ~20MBs (1MB log payloads). If 20MB is too restrictive, the-logs-batch-size-max
can be used to exceed this limit.This implementation also maintains a 30s countdown timer; every 30s, the timer triggers and whatever logs exist in the "batch" array are pushed to Loki. This ensures batched logs are pushed to Loki even if the canary fails to receive more logs.
Moreover, if/when the canary org is shutdown (i.e., the termination channel receives a signal to quit), the remaining logs are likewise pushed to Loki and the shutdown process will continue.
Checklist
CONTRIBUTING.md
guide (required)feat
PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.md
deprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR