[USM] FIx Istio issue #43651

amitslavin · 2025-11-30T10:56:22Z

What does this PR do?

Fix issue for costumer - wip

Motivation

Describe how you validated your changes

Additional Notes

…nch (#40610)

Backport 6f75013 from #40576. ___ ### What does this PR do? Adds a `conf.yaml.example` for the Versa integration ### Motivation Prepare Versa integration for public use ### Describe how you validated your changes Ran through the versa.go to ensure that the expected configuration options are exposed. ### Additional Notes Co-authored-by: Ken Schneider <[email protected]> Co-authored-by: Dustin Long <[email protected]>

…40658) This is a manual backport of #40639 to 7.71.x due to merge conflicts. ### What does this PR do? This PR fixes an issue where datadog-traceroute was potentially incorrectly handling file descriptors on linux. More details on the datadog-traceroute PR: [\[linux\] Fix linuxSink file descriptor handling](DataDog/datadog-traceroute#24) ### Motivation Errors from the current 7.71 RC ### Describe how you validated your changes This was triggered by the finalizer of `*os.File` -- the datadog traceroute library now has zero manual calls to unix.Close() on linux which was the source of this issue. It purely uses os.File's handling to close files

### Motivation This fixes issues with S3 cache by fetching the target branch when doing `git merge-base` operations. ### Describe how you validated your changes ### Additional Notes #incident-42773

Backport ff4c8bb from #40623. ___ This PR fixes a bug found during QA where leftover config/package experiments could be picked up during unrelated package/config experiments. This should ensure a config experiment always uses the stable package and a package experiment always uses the stable config. Co-authored-by: Arthur Bellal <[email protected]>

Backport 6936e9f from #40607. ___ These tests were poorly conceived. I wanted to make sure that the tests still ran with a low depth limit, but I wasn't validating much. The way to get the test to avoid having expectations was to tell it it was in rewrite mode. The downside of being in rewrite mode is we do not know how many events to expect, so we just wait a while. This turns out to be problematic and can cause flakes in production. We do exercise a bunch of depth limits explicitly. For now, just remove this bad subtest. Co-authored-by: ajwerner <[email protected]> Co-authored-by: piob-io <[email protected]>

Backport a8014fd from #40680. ___ ### What does this PR do? Following #40345, the scrubber was scrubbing debugging information, this PR aims to solve this issue ### Motivation Avoid decreasing support efficiency by not be able to read configuration for secret feature: ``` root@datadog-qn879:/tmp# ./agent config | grep secret_ secret_name: "********" secret_audit_file_max_size: "********" secret_backend_arguments: "********" secret_backend_command: "********" secret_backend_command_allow_group_exec_perm: "********" secret_backend_config: "********" secret_backend_output_max_size: "********" secret_backend_remove_trailing_line_break: "********" secret_backend_skip_checks: "********" secret_backend_timeout: "********" secret_backend_type: "********" secret_image_to_secret: "********" secret_kubernetes: "********" secret_refresh_interval: "********" secret_refresh_scatter: "********" ``` ### Describe how you validated your changes CI ### Additional Notes Co-authored-by: louis-cqrl <[email protected]>

…ronously (#40687) ### What does this PR do? This fixes a missing nil check by pulling in the datadog-traceroute PR [\[filters\] Check for nil in SetBPFAndDrain](DataDog/datadog-traceroute#28). Normally this almost always has an error, but occasionally `MSG_DONTWAIT` can finish synchronously. ### Motivation Rarely we get this error running netpath at scale in staging: ``` SYS-PROBE | ERROR | (cmd/system-probe/modules/traceroute.go:72 in func1) | unable to run traceroute for host: 10.128.11.211: UDP traceroute failed to set packet filter: SetPacketFilter failed to apply BPF filter: SetBPFAndDrain failed to drain: %!w(<nil>) ``` Apparently this syscall can occasionally finish synchronously which results in a nil error. I think it happens on hosts with barely any network traffic. ### Describe how you validated your changes In datadog-traceroute, I ran the packet filtering suite 100 times: ``` sudo env PATH="$PATH" go test -v -tags linux,test,root github.com/DataDog/datadog-traceroute/packets -count 100 ``` (I had to make a few extra changes to get the root privileged test suite compiling: [\[packets\] Use os.File to close all fds](DataDog/datadog-traceroute#25)) --------- Co-authored-by: sabrina lu <[email protected]>

Backport c86042b from #40650. ___ ### What does this PR do? Adds the Versa integration to the build tasks as a core check so the `conf.yaml.example` is included in the build. ### Motivation ### Describe how you validated your changes ### Additional Notes Co-authored-by: Ken Schneider <[email protected]>

…gths (#40775) Backport 4b43b06 from #40765. ___ This fixes a bug that occurs when compressed length is longer than compressed one. Add a cli for running irgen. Co-authored-by: Piotr Bejda <[email protected]>

Co-authored-by: dd-octo-sts[bot] <200755185+dd-octo-sts[bot]@users.noreply.github.com>

…es (#40788) Backport d4093ba from #40779. ___ #### dyninst/rcscrape: mark if a data item had a failed read We've seen data items missing related to a runtime ID. Perhaps we failed to read those memory addresses. We'd like to know about that. Note that this same treatment should get applied to regular decoding (perhaps in a more principled way). That's left for a different change that won't be backported. #### dyninst/rcscrape: gracefully handle decoding failures If we can't decode a message, we don't want to shut down the entire dyninst subsystem. The fact that that's what happens when decoding fails is not great, but it's not for this change we intend to backport. There's an upcoming refactor to address that. Fixes [DEBUG-4455](https://datadoghq.atlassian.net/browse/DEBUG-4455). [DEBUG-4455]: https://datadoghq.atlassian.net/browse/DEBUG-4455?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Co-authored-by: ajwerner <[email protected]>

…tion/successful_async_initialization` (#40799) Backport e0d98b0 from #40642. ___ ### What does this PR do? #### The Test Bug `TestAsyncInitialization/successful_async_initialization` was flaking due to a race condition deadlock. This PR fixes the test to un-flake the test. This is not a bug in the ImageResolver feature, but a bug specifically in the way the test is configured. Prior to this change, the following race condition would cause a deadlock: 1. `mockClient := newMockRCClient(...)` -> `blockGetConfigs`: `false` and `configsReady`: `make(chan struct{})` 2. `mockClient.setBlocking(true)` -> mutex lock **HOLD**, `blockGetConfigs`: `true`, mutex lock **RELEASE** 3. `resolver := newRemoteConfigImageResolverWithRetryConfig(...)` - `go func() waitForInitialConfig()` -> `rcClient.GetConfigs(...)` -> mutex lock **HOLD** until `configsReady` closed 4. `mockClient.setBlocking(false)` -> mutex lock **HOLD** ... ❌ cannot close `configsReady` because mutex lock by goroutine. If step 3 (go routine) would get the mutex lock first, this would create a deadlock situation. If step 4 (test) gets the mutex lock first, then it would successfully close the `configsReady` channel, release the mutex lock, and then the goroutine would hold the mutex lock, do its work, then release the mutex. #### The Fix The fix for the above is to explicitly release the mutex hold in `GetConfig(...)` after we store the state of `m.blockGetConfigs` and `m.configsReady` in local variables first. Then we block `GetConfig(...)` on those local vars. This way, regardless of which step obtains the mutex first, it will be released properly. ### Motivation Un-flake the `TestAsyncInitialization/successful_async_initialization` test for the ImageResolver to avoid blocking CI. See example of failure from this [here](https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/1113117038). ### Describe how you validated your changes Reproduced CI issue locally with artificial delays. The above fix prevents this race condition consistently. ### Additional Notes Co-authored-by: erikayasuda <[email protected]>

### Motivation Backports #40685 and #40798 that fixes some edge cases for incident-42773. ### Describe how you validated your changes ### Additional Notes

…eResolver.Resolve()` (#40785) Backport 730bb5d from #40697. ___ ### What does this PR do? Adds `image_resolution_attempts` telemetry to the newly added `ImageResolver.Resolve()` function to track image resolution attempts, and whether they resulted in a successful image digest resolution, or if they defaulted to using the mutable tag. ### Motivation This telemetry is necessary for the new K8s SSI gradual rollout feature in order to determine whether a rollout is working successfully or not. The goal is for this to be easier to display on a dashboard: | `registry` | `repository` | `digest_resolution` | `outcome` | Notes | |----------|----------|----------|----------|----------| | `gcr.io/datadoghq` | `dd-lib-python-init` | `enabled` | `sha256:abc123` | ✅ Was supposed to resolve, did resolve | | `hub.docker.com/r/datadog` | `dd-lib-java-init` | `enabled` | `v2` | ❌ Was supposed to resolve, but it did NOT resolve | | `mycustomregistry.org` | `dd-lib-php-init` | `enabled` | `v2` | ✅ Cannot resolve for custom registry, did NOT resolve | | `gallery.ecr.aws/datadog` | `dd-lib-rb-init` | `disabled` | `v1` | ✅ Was NOT supposed to resolve, did NOT resolve | | `gcr.io/datadoghq` | `dd-lib-dotnet-init` | `disabled` | `sha256:abc123` | ❌ Was NOT supposed to resolve, did resolve *this shouldn't be possible | ### Describe how you validated your changes Ran a local app via `injector-dev` to verify that the telemetry counts for `apm-inject` and `dd-lib-python-init` were accurate. ### Additional Notes Co-authored-by: erikayasuda <[email protected]> Co-authored-by: sabrina lu <[email protected]>

Backport d3ce06c from #40789. ___ ### What does this PR do? This commit fixes an issue where we only resolve image tags during startup and not on every pod mutation by ensuring the resolution happens just before use. ### Motivation We've added gradual rollout support for Single Step Instrumentation so that language libraries are released in a gradual fashion in #39915. This was missed during code review and caught during testing. ### Describe how you validated your changes I tested this using [injector-dev](https://github.com/DataDog/injector-dev): ``` injector-dev apply -f dev.yaml --profile staging --build ``` <details> <summary>dev.yaml</summary> ```yaml helm: apps: - name: gradual-rollout-test namespace: application values: env: - name: DD_TRACE_DEBUG value: "true" - name: DD_APM_INSTRUMENTATION_DEBUG value: "true" image: repository: registry.ddbuild.io/ci/injector-dev/python tag: 2cd78ded podLabels: language: python tags.datadoghq.com/env: local service: port: "8080" versions: agent: 7.69.1 cluster_agent: version: 7.69.1 build: {} injector: version: 0.44.0 config: clusterAgent: env: - name: DD_REMOTE_CONFIGURATION_ENABLED value: "true" datadog: site: "datad0g.com" apm: instrumentation: enabled: true targets: - name: python podSelector: matchLabels: language: python ddTraceVersions: python: default ``` </details> ### Additional Notes We will need this backported to `7.71.x` Co-authored-by: Mark Spicer <[email protected]> Co-authored-by: adel121 <[email protected]>

…history from status and allow directional-only fallback (#40823) Backport f65e87a from #40542 ___ ### What does this PR do? * Load recommendation history from new `LastRecommendations` field in CR * Allow local fallback to only be applied if it is for upscale / downscale / for both ### Motivation Loading recommendation history allows us to more accurately determine the stabilized recommendation Changes to fallback allow us to shorten the time to enable fallback by providing users the flexibility to set how they'd like fallback to be activated ### Describe how you validated your changes 1. Deploy these changes; set up DPA CR to set fallback direction 2. Verify that fallback is only enabled when the scaling direction matches the enabled direction 3. Restart the cluster agent - verify that DPAs in store is populated with data from the CR for recommendation history ### Additional Notes

… checks (#40767) Backport f10a8b6 from #40532. ___ ### What does this PR do? Stop using GET /probe endpoints to perform connectivity checks. Replaced by POST with empty payloads or removed. ### Motivation Support question from a concerned customer In the connectivity check, we use GET /probe endpoints. These endpoints are exposed by synthetics which is available depending on customer orgs setup and causes 403 when the endpoint is not available. ### Describe how you validated your changes Manual QA ### Additional Notes Co-authored-by: san-jos <[email protected]>

…40832) Backport 4dc700f from #40827. ___ Before this change, we wouldn't send fresh diagnostics for updated probes, making it seem like we haven't installed them. Fixes https://datadoghq.atlassian.net/browse/DEBUG-4467 Co-authored-by: ajwerner <[email protected]>

…tchers for api/app keys & common HTTP auth headers (#40837) Backport 2643133 from #40774. ___ ### What does this PR do? - **Normalizes YAML keys to lowercase before matching** so scrubbing is case-insensitive across config variants. - In `ScrubDataObj`, keys are lowercased for the `YAMLKeyRegex` check. - **Expands key match coverage**: - `api_key` → `api[-_]?key` - `ap(?:p|plication)_?key` → `ap(?:p|plication)[-_]?key` - Adds support for **HTTP header-style fields** via `matchYAMLKeyPrefixSuffix("x-","key|token|auth", …)` and explicit lists: - `x-api-key`, `x-rapidapi-key`, `x-functions-key`, `x-octopus-apikey`, `x-dreamfactory-api-key`, `x-lz-api-key`, `x-pm-partner-key`, `x-sungard-idp-api-key`, `x-vtex-api-appkey` - `x-auth-token`, `x-rundeck-auth-token` - `x-auth`, `x-stratum-auth` - Adds **exact key matches** for common auth fields: - `auth-tenantid`, `authority`, `cainzapp-api-key`, `cms-svc-api-key`, `lodauth`, `sec-websocket-key`, `statuskey` - **Regex robustness**: - Allow hyphens in YAML key detectors (`(\w|_|-)`) in `matchYAMLKeyPart` and `matchYAMLKeyEnding`. - New helper: `matchYAMLKeyPrefixSuffix`. - **Version metadata**: - Sets `LastUpdated` to `7.70.2` for updated replacers. - **Tests**: - `pkg/util/scrubber/default_test.go`: new `TestNewHTTPHeaderAndExactKeys`. - `pkg/util/scrubber/yaml_scrubber_test.go`: comprehensive table-driven cases for case/format variants. ### Motivation Configurations and headers vary widely in **case** (`APIKEY`, `Api-Key`) and **separators** (`_` vs `-`). Prior scrubbing missed many real-world keys, potentially leaking secrets in logs/support bundles. Lowercasing keys for matching + expanding patterns closes these gaps without over-scrubbing generic keys. ### Describe how you validated your changes - **Unit tests** (new & updated) cover: - Case variants: `APIKEY`, `Api_key`, `api-key`, `apikey` - HTTP headers with `x-` prefix and `key`/`token`/`auth` suffixes - Exact-match auth fields - Non-matching benign keys remain untouched Co-authored-by: louis-cqrl <[email protected]> Co-authored-by: sabrina lu <[email protected]>

Backport 909cc7d from #40565. ___ ### What does this PR do? Simply adds minified version of several javascript files and replaces originals in deliverables to optimize their size. Co-authored-by: Joseph Gette <[email protected]>

…name as key (#40871) Backport 191acf6 from #40762. ___ ### What does this PR do? Restructures the `ImageResolver` cache structure so that the keys are only the repository names (ex - `dd-lib-python-init`) and not the repository URL (ex - `gcr.io/dd-lib-python-init`). It also enables us to configure the default Datadog container registries via configuration. This is not public-facing, and will be used primarily for dogfooding the new feature on staging (given staging uses different container registries). ### Motivation The repository URL field in the `K8S_INJECTION_DD` remote config data was not designed to consumed by DCA, but primarily for use by our internal `equilibrium` tooling. This meant that by using the repository URL for the key, it would be limited to only allowing gradual rollout for customers using `grc.io/datadoghq` as their registry. See [here](https://docs.datadoghq.com/tracing/trace_collection/automatic_instrumentation/single-step-apm/kubernetes/?tab=agentv764recommended#change-the-default-image-registry) for other valid Datadog registries. ### Describe how you validated your changes - Updated existing tests - Added new unit tests - Local E2E testing with `injector-dev` ### Additional Notes Co-authored-by: erikayasuda <[email protected]>

Co-authored-by: dd-octo-sts[bot] <200755185+dd-octo-sts[bot]@users.noreply.github.com>

Backport 35654a3 from #40756. ___ Co-authored-by: Florent Clarret <[email protected]>

Backport 6b6ae07 from #40884. ___ Co-authored-by: Florent Clarret <[email protected]>

…imit (#40876) Backport e693ea6 from #40860. ___ ### What does this PR do? Before this change, we'd flush *after* adding a message that puts the batch over the limit. Now we'll flush the current buffer before exceeding the limit. Fixes https://datadoghq.atlassian.net/browse/DEBUG-4480 ### Motivation We've been seeing 413 errors in staging. ### Describe how you validated your changes Added testing. Co-authored-by: ajwerner <[email protected]>

…I are properly sorted (#40895) Backport 7be3f24 from #40889. ___ ### What does this PR do? Ensures that the devices returned by the PodResources API are properly sorted before being returned. In some cases we have seen k8s returning them in a different order than what they will seen as in the system. ### Motivation Ensure correct attribution of devices. https://datadoghq.atlassian.net/browse/EBPF-813 ### Describe how you validated your changes Added unit tests. ### Additional Notes Co-authored-by: Guillermo Julián <[email protected]>

Co-authored-by: dd-octo-sts[bot] <200755185+dd-octo-sts[bot]@users.noreply.github.com>

Backport 7bd16cd from #40924. ___ ### What does this PR do? Fixes #39647 by closing the root after using it. ### Motivation ### Describe how you validated your changes ### Additional Notes Co-authored-by: Paul Cacheux <[email protected]>

…by main consumer loop (#41226) Backport 74a7786 from #41193. ___ ### What does this PR do? Changes the processing of process exit events so that they happen in the main consumer loop. ### Motivation Avoid race conditions by ensuring process exits are handled in the same goroutine, no matter whether they come from process scans or from the process monitor. ### Describe how you validated your changes Added unit tests to ensure process exit is properly handled. ### Additional Notes Co-authored-by: Guillermo Julián <[email protected]>

The kubelet entity is currently only used to generate a tag on kubernetes check metrics, it has no tags of its own. (cherry picked from commit 34f0e21) ### What does this PR do? Fixes the error logs that occur when trying to build the kubelet's tagger entity ID ### Motivation Reduce error logs ### Describe how you validated your changes 1. Deploy RC 7 and run `agent check kubelet` to kickoff the workloadmeta kubelet collector ``` root@justin-lesko-minikube:/# agent check kubelet | grep "kubelet-id" 2025-09-23 19:05:49 UTC | CORE | ERROR | (comp/core/tagger/common/entity_id_builder.go:35 in BuildTaggerEntityID) | can't recognize entity "kubelet-id" with kind "kubelet"; trying kubelet-id://kubelet as tagger entity 2025-09-23 19:05:49 UTC | CORE | ERROR | (comp/core/tagger/collectors/workloadmeta_extract.go:161 in processEvents) | cannot handle event for entity "kubelet-id" with kind "kubelet" ``` 2. Deploy this branch and observe no more errors ``` root@justin-lesko-minikube:/# agent check kubelet | grep "kubelet-id" root@justin-lesko-minikube:/# ``` ### Additional Notes

…r older CRs (#41234) Backport 0b9ffe1 from #41212. ___ ### What does this PR do? Fixes a case where the CR does not contain any value preventing from applying any fallback. Fixes feature introduced in #40542 ### Motivation Bugfix ### Describe how you validated your changes Use a 7.71+ Agent version with an outdated CRs or CRD, local fallback should still happen instead of being refused due to scaling direction disabled. ### Additional Notes Cluster Agent impact only Co-authored-by: Vincent Boulineau <[email protected]>

…d for psycopg #41003 (#41238) ### What does this PR do? ### Motivation ### Describe how you validated your changes ### Additional Notes --------- Co-authored-by: Florent Clarret <[email protected]> Co-authored-by: sabrina-datadog <[email protected]>

Co-authored-by: dd-octo-sts[bot] <200755185+dd-octo-sts[bot]@users.noreply.github.com>

…ClosedProcesses flaky test (#41310) Backport ec77d23 from #41302. ___ ### What does this PR do? Fixes a low frequency flaky test. The way the test was written, the `procRoot` with the fake data was not being actually used to check for the closed processes (it was being passed to the context but not the consumer), and instead it was using the real `/proc` path. This made the test work most of the time, except when we had a real process with the PID of our fake process. ### Motivation Eliminate flaky tests. ### Describe how you validated your changes Fixed the unit test. ### Additional Notes Co-authored-by: Guillermo Julián <[email protected]>

…re release (#41314) ### What does this PR do? Backports #41196 to 7.71.x ### Motivation Making sure tests don't break on 7.71.x when a new install script gets released ### Describe how you validated your changes If E2E tests pass we're good! ### Additional Notes --------- Co-authored-by: Arthur Bellal <[email protected]>

Backport c46abad from #40181. ___  ### What does this PR do? > [!NOTE] > Buildimages are also bumped in this PR. This migrates Gitlab PATs, now we use a custom API to generate short lived Gitlab PATs. ### Motivation ### Describe how you validated your changes  ### Possible Drawbacks / Trade-offs ### Additional Notes  Co-authored-by: Célian Raimbault <[email protected]>

…eludes (#41413)

### What does this PR do? fix release notes ### Motivation ### Describe how you validated your changes ### Additional Notes

…onfig stream snapshot creation (#41289) Backport e4fcdfc from #41279. ___ ### What does this PR do? Updates config stream to handle nested keys in the configuration that are type `map[interface{}]interface{}` ### Motivation Previously, we were only converting the top level to `map[string]interface` ### Describe how you validated your changes 1. Create custom image and deployed it on an experimental cluster (sasquatch) with ADP enabled. 2. Run `agent-data-plane config` and see the config. 3. Update a setting (ex: `agent config set dogstatd_stats true`) 4. . Run `agent-data-plane config` and see the updated config value. ### Additional Notes Co-authored-by: Raymond Zhao <[email protected]>

Backport a4e26e7 from #41166. ___ ### What does this PR do? don't pass context when launching detached process ### Motivation https://datadoghq.atlassian.net/browse/WINA-1666 https://datadoghq.atlassian.net/browse/WINA-1707 fix bug that can cause fleet upgrades to fail ### Describe how you validated your changes existing E2E tests fail due to this issue once every few days manual test: added sleep after `i.stop()` to give time for background terminate process to finish ### Additional Notes before returning from main, `hookCommand` calls defer `i.stop(err)` https://github.com/DataDog/datadog-agent/blob/4c3ba894409b1fde22ed134e0b7a2eb518b80297/pkg/fleet/installer/commands/hooks.go#L37-L38 `i.stop` calls `c.stopsighandler` https://github.com/DataDog/datadog-agent/blob/4c3ba894409b1fde22ed134e0b7a2eb518b80297/pkg/fleet/installer/commands/command.go#L73-L80 which is the context cancelfunc https://github.com/DataDog/datadog-agent/blob/4c3ba894409b1fde22ed134e0b7a2eb518b80297/pkg/fleet/installer/commands/command.go#L46-L55 Since this context was passed to `exec.CommandContext` when launching the detached `postStartExperimentBackground` subprocess, it sees `ctx.Done()` and sends a kill signal to the subprocess. Co-authored-by: Branden Clark <[email protected]>

…hes do not match (#41324) Backport 8fb9d9b from #41301. ___ ### What does this PR do? This PR fixes the behaviour of the Cluster Agent when the `.Spec` retrieved _after_ a DPA has been created/updated with values not originating from the Cluster Agent itself. The discrepancy can come from different sources: admission controller, CRD defaulting, etc. which are not necessarily expected but should not break the overall feature either. ### Motivation Fixing inactive remote Autoscalers when Remote Spec hash is different from actual object Spec hash after update. ### Describe how you validated your changes Change validate in a testing Kubernetes cluster. It can be reproduced by creating a remote autoscaler with a recent version of the Cluster Agent/CRD that has defaulting for local fallback values. Added a unit test to cover this specific case. ### Additional Notes The current solution is not optimal and we cannot cope with differences manually done in case Cluster Agent leader was not available at the time of the update. Co-authored-by: Vincent Boulineau <[email protected]>

… sizes (#41359) Backport 5d7eaae from #41293. ___ ### What does this PR do? ### Motivation https://datadoghq.atlassian.net/browse/VULN-12554 https://nvd.nist.gov/vuln/detail/CVE-2025-8194 https://gist.github.com/sethmlarson/1716ac5b82b73dbcbf23ad2eff8b33e1 ### Describe how you validated your changes ### Additional Notes Co-authored-by: Kyle Neale <[email protected]>

…-definitions to 2a6d59a9b3f3a7a6c91630515ad6ee659256b9a2 (#41437) This PR was automatically created by the test-infra-definitions bump task. This PR bumps the test-infra-definitions submodule to 2a6d59a9b3f3a7a6c91630515ad6ee659256b9a2 from 1faea1273955. Here is the full changelog between the two commits: DataDog/test-infra-definitions@1faea12...2a6d59a ⚠️ This PR is opened with the `qa/no-code-change` and `changelog/no-changelog` labels by default. Please make sure this is appropriate ### What does this PR do? ### Motivation ### Describe how you validated your changes ### Additional Notes --------- Co-authored-by: agent-platform-auto-pr[bot] <153269286+agent-platform-auto-pr[bot]@users.noreply.github.com> Co-authored-by: Célian Raimbault <[email protected]>

### What does this PR do? Backports #41375 & #41458 to 7.71.x ### Motivation Fixing the APM SSI script ### Describe how you validated your changes E2E, manual QA ### Additional Notes

Backport 02bfb4b from #41435. ___ ### What does this PR do? Unpins the install script in the test where it is pinned ### Motivation Test is currently failing because the pin is >5 versions old ### Describe how you validated your changes E2E ### Additional Notes Co-authored-by: Baptiste Foy <[email protected]>

Fixes the generic container corecheck processor to receive the parsed value for ExtendedMemory collection Address bug with the parsing and configuration of extended memory metric collection in the containers processor Deploy Agent locally with container check enabled with extended memory metric collection enabled: ``` datadog: confd: container.yaml: |- ad_identifiers: - _container init_config: instances: - extended_memory_metrics: true ``` Run the container corecheck and expect extended memory metrics like `container.memory.active_file` to be outputted. ``` k exec -it datadog-agent-linux-bj667 -n datadog-agent -- agent check container | grep container.memory.active_file Defaulted container "agent" out of: agent, trace-agent, process-agent, init-volume (init), init-config (init) "metric": "container.memory.active_file", "metric": "container.memory.active_file", "metric": "container.memory.active_file", "metric": "container.memory.active_file", ``` Follow up e2e tests can be added for detecting the presence of extended memory metrics (cherry picked from commit 8f75bfd) ### What does this PR do? ### Motivation ### Describe how you validated your changes ### Additional Notes

Backport 91376d3 from #41428. ___ This PR fixes a bug preventing the update of configurations through Fleet on windows. With the recent switch to modifying the user configuration dir directly we failed to account that the "configuration" directory on windows also contains all the runtime files (python cache, installer state, etc...). This broke the current implementation of the experiment. This PR fixes the issue by disabling config experiments on windows. ### QA This change was QA'd manually. Co-authored-by: Arthur Bellal <[email protected]>

…only (#41508) Backport 46246af from #41506. ___ ### What does this PR do? Skips cgroup tests for `pkg/gpu` when the cgroupfs is not writable. Additionally, it marks the oracle job as allowed to fail, as it started to fail at the same point as the pkg/gpu tests. ### Motivation FIx for #incident-43807 ### Describe how you validated your changes CI green. These tests also execute in KMT, where we have full access to the cgroup, so we do not lose coverage. ### Additional Notes Co-authored-by: Guillermo Julián <[email protected]>

… generation (#41509) Backport d950e76 from #41507. ___ ### What does this PR do? Don't use short lived tokens in the CI. ### Motivation ### Describe how you validated your changes ### Additional Notes Co-authored-by: Célian Raimbault <[email protected]>

…eludes (#41603)

agent-platform-auto-pr · 2025-11-30T13:13:02Z

Static quality checks

✅ Please find below the results from static quality gates
Comparison made with ancestor 20c9c84

Successful checks

Info

	Quality gate	Delta	On disk size (MiB)	Delta	On wire size (MiB)
✅	agent_deb_amd64	DataNotFound	$${694.01}$$ < $${709.39}$$	DataNotFound	$${174.99}$$ < $${178.58}$$
✅	agent_deb_amd64_fips	DataNotFound	$${688.52}$$ < $${703.09}$$	DataNotFound	$${173.47}$$ < $${178.12}$$
✅	agent_heroku_amd64	DataNotFound	$${332.37}$$ < $${355.37}$$	DataNotFound	$${88.24}$$ < $${95.72}$$
✅	agent_msi	DataNotFound	$${981.8}$$ < $${986.02}$$	DataNotFound	$${150.15}$$ < $${152.67}$$
✅	agent_rpm_amd64	DataNotFound	$${694.0}$$ < $${709.38}$$	DataNotFound	$${177.29}$$ < $${181.22}$$
✅	agent_rpm_amd64_fips	DataNotFound	$${688.51}$$ < $${703.08}$$	DataNotFound	$${175.75}$$ < $${179.85}$$
✅	agent_rpm_arm64	DataNotFound	$${680.4}$$ < $${695.74}$$	DataNotFound	$${158.68}$$ < $${163.96}$$
✅	agent_rpm_arm64_fips	DataNotFound	$${675.98}$$ < $${693.05}$$	DataNotFound	$${157.54}$$ < $${163.0}$$
✅	agent_suse_amd64	DataNotFound	$${694.0}$$ < $${709.38}$$	DataNotFound	$${177.29}$$ < $${181.22}$$
✅	agent_suse_amd64_fips	DataNotFound	$${688.51}$$ < $${703.08}$$	DataNotFound	$${175.75}$$ < $${179.85}$$
✅	agent_suse_arm64	DataNotFound	$${680.4}$$ < $${695.74}$$	DataNotFound	$${158.68}$$ < $${163.96}$$
✅	agent_suse_arm64_fips	DataNotFound	$${675.98}$$ < $${693.05}$$	DataNotFound	$${157.54}$$ < $${163.0}$$
✅	docker_agent_amd64	DataNotFound	$${765.4}$$ < $${788.65}$$	DataNotFound	$${262.37}$$ < $${272.01}$$
✅	docker_agent_arm64	DataNotFound	$${775.78}$$ < $${802.0}$$	DataNotFound	$${248.47}$$ < $${259.7}$$
✅	docker_agent_jmx_amd64	DataNotFound	$${956.28}$$ < $${979.84}$$	DataNotFound	$${330.99}$$ < $${340.95}$$
✅	docker_agent_jmx_arm64	DataNotFound	$${955.38}$$ < $${981.8}$$	DataNotFound	$${313.1}$$ < $${324.65}$$
✅	docker_cluster_agent_amd64	DataNotFound	$${213.04}$$ < $${214.5}$$	DataNotFound	$${72.32}$$ < $${73.51}$$
✅	docker_cluster_agent_arm64	DataNotFound	$${228.98}$$ < $${230.33}$$	DataNotFound	$${68.58}$$ < $${69.77}$$
✅	docker_cws_instrumentation_amd64	DataNotFound	$${7.07}$$ < $${7.12}$$	DataNotFound	$${2.95}$$ < $${3.29}$$
✅	docker_cws_instrumentation_arm64	DataNotFound	$${6.69}$$ < $${6.92}$$	DataNotFound	$${2.71}$$ < $${3.07}$$
✅	docker_dogstatsd_amd64	DataNotFound	$${38.37}$$ < $${39.57}$$	DataNotFound	$${14.82}$$ < $${15.76}$$
✅	docker_dogstatsd_arm64	DataNotFound	$${37.07}$$ < $${38.2}$$	DataNotFound	$${14.27}$$ < $${14.83}$$
✅	dogstatsd_deb_amd64	DataNotFound	$${29.59}$$ < $${31.4}$$	DataNotFound	$${7.81}$$ < $${8.95}$$
✅	dogstatsd_deb_arm64	DataNotFound	$${28.18}$$ < $${29.97}$$	DataNotFound	$${6.76}$$ < $${7.89}$$
✅	dogstatsd_rpm_amd64	DataNotFound	$${29.59}$$ < $${31.4}$$	DataNotFound	$${7.81}$$ < $${8.96}$$
✅	dogstatsd_suse_amd64	DataNotFound	$${29.59}$$ < $${31.4}$$	DataNotFound	$${7.81}$$ < $${8.96}$$
✅	iot_agent_deb_amd64	DataNotFound	$${54.42}$$ < $${54.97}$$	DataNotFound	$${13.73}$$ < $${14.45}$$
✅	iot_agent_deb_arm64	DataNotFound	$${51.71}$$ < $${51.9}$$	DataNotFound	$${11.87}$$ < $${12.63}$$
✅	iot_agent_deb_armhf	DataNotFound	$${51.29}$$ < $${51.84}$$	DataNotFound	$${11.94}$$ < $${12.74}$$
✅	iot_agent_rpm_amd64	DataNotFound	$${54.42}$$ < $${54.97}$$	DataNotFound	$${13.75}$$ < $${14.47}$$
✅	iot_agent_suse_amd64	DataNotFound	$${54.42}$$ < $${54.97}$$	DataNotFound	$${13.75}$$ < $${14.47}$$

FlorentClarret and others added 30 commits September 5, 2025 09:14

[release] Update release.json and .gitlab-ci.yml files for 7.71.x bra…

353c39b

…nch (#40610)

[release] Update release.json and Go modules for 7.71.0-rc.1 (#40663)

eea9090

[incident-42773] Fix cache errors on git merge-base operations (#40664)

e32cea3

### Motivation This fixes issues with S3 cache by fetching the target branch when doing `git merge-base` operations. ### Describe how you validated your changes ### Additional Notes #incident-42773

[Backport 7.71.x] [dyninst/object] Simplify computing compression len…

864fcf7

…gths (#40775) Backport 4b43b06 from #40765. ___ This fixes a bug that occurs when compressed length is longer than compressed one. Add a cli for running irgen. Co-authored-by: Piotr Bejda <[email protected]>

[release] Update release.json and Go modules for 7.71.0-rc.2 (#40754)

6cd4d8d

Co-authored-by: dd-octo-sts[bot] <200755185+dd-octo-sts[bot]@users.noreply.github.com>

[incident-42773] Backport #40685 (#40790)

0f3107f

### Motivation Backports #40685 and #40798 that fixes some edge cases for incident-42773. ### Describe how you validated your changes ### Additional Notes

[Backport 7.71.x] Minify javascript files (#40869)

0163c8a

Backport 909cc7d from #40565. ___ ### What does this PR do? Simply adds minified version of several javascript files and replaces originals in deliverables to optimize their size. Co-authored-by: Joseph Gette <[email protected]>

[release] Update release.json and Go modules for 7.71.0-rc.3 (#40865)

1173574

Co-authored-by: dd-octo-sts[bot] <200755185+dd-octo-sts[bot]@users.noreply.github.com>

[Backport 7.71.x] Changelog update for 7.70.1 release (#40758)

0c13fbf

Backport 35654a3 from #40756. ___ Co-authored-by: Florent Clarret <[email protected]>

[Backport 7.71.x] Changelog update for 7.70.2 release (#40887)

6ccb19a

Backport 6b6ae07 from #40884. ___ Co-authored-by: Florent Clarret <[email protected]>

[release] Update release.json and Go modules for 7.71.0-rc.4 (#40904)

d1d9271

Co-authored-by: dd-octo-sts[bot] <200755185+dd-octo-sts[bot]@users.noreply.github.com>

dd-octo-sts bot and others added 26 commits September 24, 2025 11:16

[release] Update release.json and Go modules for 7.71.0-rc.7 (#41252)

526ab09

Co-authored-by: dd-octo-sts[bot] <200755185+dd-octo-sts[bot]@users.noreply.github.com>

Final updates for release.json and Go modules for 7.71.0 release + pr…

f1671a4

…eludes (#41413)

[Backport 7.71.x] fix release notes (#41418)

8e73054

### What does this PR do? fix release notes ### Motivation ### Describe how you validated your changes ### Additional Notes

Changelog update for 7.71.0 release (#41419)

6a1c0c2

[Backport to 7.71.x] fix(fleet): Fixes to the APM SSI script (#41484)

e929478

### What does this PR do? Backports #41375 & #41458 to 7.71.x ### Motivation Fixing the APM SSI script ### Describe how you validated your changes E2E, manual QA ### Additional Notes

[release] Update release.json and Go modules for 7.71.1-rc.1 (#41501)

a2c0913

[release] Update release.json and Go modules for 7.71.1-rc.2 (#41591)

24dbdf7

Final updates for release.json and Go modules for 7.71.1 release + pr…

bc04a78

…eludes (#41603)

fix istio

bad6f6a

amitslavin added changelog/no-changelog team/usm The USM team qa/done QA done before merge and regressions are covered by tests labels Nov 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[USM] FIx Istio issue #43651

[USM] FIx Istio issue #43651

amitslavin commented Nov 30, 2025

Uh oh!

agent-platform-auto-pr bot commented Nov 30, 2025

Info

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

[USM] FIx Istio issue #43651

Are you sure you want to change the base?

[USM] FIx Istio issue #43651

Conversation

amitslavin commented Nov 30, 2025

What does this PR do?

Motivation

Describe how you validated your changes

Additional Notes

Uh oh!

agent-platform-auto-pr bot commented Nov 30, 2025

Static quality checks

Info

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants