Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Skip writeback for chunks fetched by queriers older than a duration #15393

Closed
wants to merge 104 commits into from

Conversation

mveitas
Copy link
Contributor

@mveitas mveitas commented Dec 12, 2024

What this PR does / why we need it:
This change will prevent any chunks fetched by the querier to not be written back to the chunk cache that are before a given time range.

95% of the searches that our engineering teams execute are within a 7 day range and we have a cache enabled to support 14 days worth of chunks to aid in query performance. We recently found that we have compliance need to maintain searchable logs for 365 days. While these queries outside of the 7 day window are rare, they tend to be longer duration searches of up to 30 days for compliance, contain a lot of data, and tend to be one-off searches.

Currently these chunks fetched by queriers outside our target window are written back to the chunk cache and causing more recent data to be evicted from the cache and impacting the query performance for the most recent data.

Which issue(s) this PR fixes:
Fixes #14983

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@mveitas mveitas requested a review from a team as a code owner December 12, 2024 14:04
@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Dec 12, 2024
@@ -138,6 +140,10 @@ func (c *Fetcher) FetchChunks(ctx context.Context, chunks []chunk.Chunk) ([]chun
l2OnlyChunks := make([]chunk.Chunk, 0, len(chunks))

for _, m := range chunks {
if c.skipQueryWritebackCacheOlderThan > 0 && m.From.Time().Before(time.Now().UTC().Add(-c.skipQueryWritebackCacheOlderThan)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not look like the right place for this check. It is preventing the fetcher from fetching chunks older than skipQueryWritebackCacheOlderThan.

WriteBackCache() method might be a better place to add this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching that and I do have the check in the WriteBackCache.

What I do want to add is the ability to short circuit a cache lookup if it before this duration value. There might already be a setting for that and need to take a deeper look

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashwanthgoli This is going to fit our needs for the time being and would like to get this merged.

If you feel there should be some work to be done to short-circuit the read path for these requests to the cache, we can look to add that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I do want to add is the ability to short circuit a cache lookup if it before this duration value.

Sorry, I think you go it right the first time. processCacheResponse should be able to figure out the missing keys and request them from store using the chunks request

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest looks good! Happy to approve once you revert the latest commit

@@ -230,6 +231,10 @@ func (c *Fetcher) WriteBackCache(ctx context.Context, chunks []chunk.Chunk) erro
keysL2 := make([]string, 0, len(chunks))
bufsL2 := make([][]byte, 0, len(chunks))
for i := range chunks {
if c.skipQueryWritebackCacheOlderThan > 0 && chunks[i].From.Time().Before(time.Now().UTC().Add(-c.skipQueryWritebackCacheOlderThan)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: return early if the resulting keys or keysL2 is empty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would we return early? There is already a check if keysL2 is greater than 0. I can add that to the main cache, but an empty slice passed to the store is a no-op

@@ -56,6 +56,11 @@ func TestPipeline_JSON(t *testing.T) {
entry string
expectedExtract map[string]interface{}
}{
"successfully run a pipeline with 1 logfmt stage with log not using json formatted string": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks unrelated. can you raise a separate pr for this if needed?

@@ -44,6 +44,11 @@ func TestPipeline_Logfmt(t *testing.T) {
entry string
expectedExtract map[string]interface{}
}{
"successfully run a pipeline with 1 logfmt stage with log not using logfmt formatted string": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, looks unrelated to this pr.

@@ -38,7 +39,7 @@ func (cfg *ChunkStoreConfig) RegisterFlags(f *flag.FlagSet) {
f.DurationVar(&cfg.L2ChunkCacheHandoff, "store.chunks-cache-l2.handoff", 0, "Chunks will be handed off to the L2 cache after this duration. 0 to disable L2 cache.")
f.BoolVar(&cfg.chunkCacheStubs, "store.chunks-cache.cache-stubs", false, "If true, don't write the full chunk to cache, just a stub entry.")
cfg.WriteDedupeCacheConfig.RegisterFlagsWithPrefix("store.index-cache-write.", "", f)

f.DurationVar(&cfg.SkipQueryWritebackOlderThan, "store.skip-query-writeback-older-than", 0, "Chunks fetched from queriers before this duration will not be written to the cache. A value of 0 will write all chunks to the cache")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wdyt about SkipCachingChunksOlderThan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for including Query in the name was to distinguish that these were the chunks being written by the querier to the cache. This configuration only applies to the query operation.

If I needed to ingest historical data from 3 months ago into Loki, using SkipCachingChunksOlderThan might imply that the ingested data is not written to the cache

mveitas and others added 19 commits December 18, 2024 14:07
…rafana#15378)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…hunks-inspect (grafana#15377)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…ion filter prefix (grafana#15381)

Exposes the filter_prefix in the lambda_function of the aws_s3_bucket_notification resource as a terraform variable.
…/v2 to v2.2.0 (grafana#15392)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…afana#15396)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…a#15400)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…ken (grafana#13195)

This adds an extra nil check to verify the Azure blob storage client has received a service principal token before trying to do anything with said token. This has led to crashes if `use_federated_token is enabled` and an error message is returned instead of a token.
ashwanthgoli and others added 22 commits January 6, 2025 07:01
…ana#15556)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…rafana#15540)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…ana#15529)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…na#15539)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…5564)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…rafana#15573)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…rafana#15569)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…fana#15581)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
@mveitas
Copy link
Contributor Author

mveitas commented Jan 6, 2025

Closing this PR and will submit a new one as I messed up branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm sig/operator size/L type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't write to chunk cache if query results are outside of a desired window of time