Skip to content

Add discoveryServiceMapEnabled field to AgentConfiguration proto#471

Merged
BarFinsdd merged 6 commits into
masterfrom
bar.fins/discovery-dep-map-poc
May 12, 2026
Merged

Add discoveryServiceMapEnabled field to AgentConfiguration proto#471
BarFinsdd merged 6 commits into
masterfrom
bar.fins/discovery-dep-map-poc

Conversation

@BarFinsdd
Copy link
Copy Markdown
Contributor

What does this PR do?

Adds a new discoveryServiceMapEnabled (field 7) boolean to the AgentConfiguration proto message in process/connections.proto. This allows the agent to signal whether the Discovery service map feature is enabled in its
configuration.

Motivation

Enables the agent to communicate whether the Discovery dependency map (service map) feature is turned on, so downstream consumers (e.g. the backend/process-agent) can act on that configuration flag. This is the proto
groundwork for the Discovery service map POC.

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

Reviewer's Checklist

Reviewers: please see the review guidelines.

@BarFinsdd BarFinsdd self-assigned this Apr 14, 2026
@BarFinsdd BarFinsdd requested review from a team as code owners April 14, 2026 14:23
@BarFinsdd BarFinsdd requested a review from matthewleese April 14, 2026 14:23
@BarFinsdd BarFinsdd force-pushed the bar.fins/discovery-dep-map-poc branch 2 times, most recently from 0233288 to c9b279f Compare April 21, 2026 10:14
Copy link
Copy Markdown
Member

@brycekahle brycekahle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems fine, but I'm curious if you need a flag at all? Is there going to be additional data in the payload? Would the existence of that data indicate the feature is enabled?

@BarFinsdd BarFinsdd requested review from a team as code owners April 28, 2026 13:48
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76133a4c54

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread process/agent.pb.go Outdated
// Populated by discovery service map mode in lieu of DDSketches and
// firstLatencySample, since discovery mode does not retain a sketch
// or per-request samples.
AvgLatency float64 `protobuf:"fixed64,5,opt,name=avgLatency,proto3" json:"avgLatency,omitempty"`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Regenerate descriptor metadata for new avgLatency field

Adding AvgLatency here without updating the embedded fileDescriptor_69b34851fbf62631 leaves HTTPStats_Data.Descriptor() out of sync with the struct and marshal/unmarshal logic. Any reflection-based consumer (e.g., descriptor-driven tooling, dynamic protobuf handling, or schema introspection) will not see avgLatency and can silently ignore/drop it even though generated encode/decode methods now use field 5; this commit should include a full proto regeneration so descriptor metadata matches the new field.

Useful? React with 👍 / 👎.

@DanielLavie
Copy link
Copy Markdown
Contributor

It seems fine, but I'm curious if you need a flag at all? Is there going to be additional data in the payload? Would the existence of that data indicate the feature is enabled?

@brycekahle This flag means we won’t bill for USM and will process the data through the new discovery service map pipeline

@BarFinsdd
Copy link
Copy Markdown
Contributor Author

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 Bot commented May 12, 2026

View all feedbacks in Devflow UI.

2026-05-12 09:09:59 UTC ℹ️ Start processing command /merge


2026-05-12 09:10:06 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.


2026-05-12 10:10:28 UTC ⚠️ MergeQueue: This merge request was unqueued

bar.fins@datadoghq.com unqueued this merge request

@BarFinsdd
Copy link
Copy Markdown
Contributor Author

/merge -c

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 Bot commented May 12, 2026

View all feedbacks in Devflow UI.

2026-05-12 10:10:22 UTC ℹ️ Start processing command /merge -c

BarFinsdd and others added 5 commits May 12, 2026 13:14
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Discovery service map mode does not retain DDSketches or per-request
samples; it tracks only counts and a running latency sum. Add a
dedicated avgLatency field (nanoseconds) so downstream consumers can
distinguish discovery-mode averages from the legacy firstLatencySample
single-sample optimization.

Field 5 on HTTPStats.Data, double, fixed64. Generated code edited by
hand (proto codegen toolchain isn't available on Windows).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Discovery service map mode previously shipped a precomputed mean
(LatencySum/Count) on the wire. That prevents the backend from
correctly aggregating across hosts (an average of averages requires
weighting by sample count, and we lose that information once the
mean is computed client-side per flush window).

Switch the proto field to carry the raw running latency sum. The
backend (NSX) computes the mean as latencySum/count when emitting
dd.apm.usm_service_map.latency_ns, matching the resolved design in
the USM Discovery Mode RFC, Appendix B.

Field number unchanged (5) so wire encoding is identical; only the
field name and the agent's interpretation of it change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@BarFinsdd BarFinsdd force-pushed the bar.fins/discovery-dep-map-poc branch from 8f6dc28 to 595b669 Compare May 12, 2026 10:54
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 595b669d7d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread process/agent.pb.go
// and firstLatencySample. The mean is computed at the backend as
// latencySum / count; shipping the raw sum (instead of a precomputed
// mean) preserves correct cross-host aggregation.
LatencySum float64 `protobuf:"fixed64,5,opt,name=latencySum,proto3" json:"latencySum,omitempty"`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Regenerate the descriptor with the latencySum name

Fresh evidence after the rename: decompressing fileDescriptor_69b34851fbf62631 in this file still shows field 5 as avgLatency and contains no latencySum, while the generated struct/marshal/unmarshal code now uses LatencySum with protobuf name latencySum. Reflection-based consumers using HTTPStats_Data.Descriptor() will therefore see the wrong schema name for field 5 and can ignore or mis-handle the new metric; please regenerate the protobuf output so the embedded descriptor matches the proto and generated methods.

Useful? React with 👍 / 👎.

@BarFinsdd BarFinsdd requested a review from a team as a code owner May 12, 2026 11:04
@BarFinsdd BarFinsdd requested a review from nathan-b May 12, 2026 11:04
@BarFinsdd BarFinsdd force-pushed the bar.fins/discovery-dep-map-poc branch from 1f6234f to 595b669 Compare May 12, 2026 11:06
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1f6234f814

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread tasks/codegen.py Outdated
Comment on lines +116 to +117
rm -rf $GOPATH/src/github.com/DataDog/agent-payload
ln -sf {root_dir} $GOPATH/src/github.com/DataDog/agent-payload
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid deleting the checked-out repository during codegen

In the codegen workflow I checked, the job runs from /home/runner/work/agent-payload/agent-payload/go/src/github.com/DataDog/agent-payload with GOPATH=/home/runner/work/agent-payload/agent-payload/go (.github/workflows/test.yml), so $GOPATH/src/github.com/DataDog/agent-payload is the current checkout and also {root_dir}. This new rm -rf therefore deletes the repository before recreating a symlink to the same now-deleted path, causing the subsequent mkdir, protoc, and final git diff steps to operate on a missing/self-referential checkout; it also breaks the documented local setup where the repo is already checked out inside GOPATH.

Useful? React with 👍 / 👎.

…dep-map-poc

# Conflicts:
#	process/agent.pb.go
#	process/connections.pb.go
@BarFinsdd BarFinsdd force-pushed the bar.fins/discovery-dep-map-poc branch from 624fb85 to e634912 Compare May 12, 2026 11:10
@BarFinsdd
Copy link
Copy Markdown
Contributor Author

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 Bot commented May 12, 2026

View all feedbacks in Devflow UI.

2026-05-12 11:13:23 UTC ℹ️ Start processing command /merge


2026-05-12 11:13:27 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in master is approximately 3m (p90).


2026-05-12 11:15:26 UTC 🚨 MergeQueue: This merge request is in error because of DDCI

DDCI pipeline didn't start (sourcing_failed)... Please retry.

DDCI Change Request: 8823896131387910928
You can get help from #ci-infra-support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants