[SDTEST-2055] Add documentation for AI flaky test categorization (#30589)

ManuelPalenzuelaDD · web-flow · commit e535510d8640 · 2025-07-22T10:30:46.000+02:00
* [SDTEST-2055] Add documentation for AI flaky test categorization

* fix capitalization

* add metric

* be more specific with time

* change descriptions
diff --git a/content/en/tests/flaky_management/_index.md b/content/en/tests/flaky_management/_index.md
@@ -46,6 +46,7 @@ Track the evolution of the number of flaky tests with the `test_optimization.tes
 - `branch`
 - `flaky_status`
 - `test_codeowners`
+- `flaky_category`
 
 The `branch` tag only exists when the test has flaked in the default branch of the repository during the last 30 days. This helps you discard flaky tests that have only exhibited flakiness in feature branches, as these may not be relevant. You can configure the default branch of your repositories under [Repository Settings][2].
 
@@ -73,6 +74,31 @@ When you fix a flaky test, Test Optimization's remediation flow can confirm the
    - If all retries pass, updates the test's status to `Fixed`.
    - If any retry fails, keeps the test's current status (`Active`, `Quarantined`, or `Disabled`).
 
+## AI-powered flaky test categorization
+
+Flaky Test Management uses AI to automatically assign a root cause category to each flaky test based on execution patterns and error signals. This helps you filter, triage, and prioritize flaky tests more effectively.
+
+<div class="alert alert-info"><strong>Note:</strong> A test must have at least one failed execution that includes both <code>@error.message</code> and <code>@error.stack</code> tags to be eligible for categorization. If the test was recently detected, categorization may take several minutes to complete.</div>
+
+### Categories
+
+| Category                | Description |
+|-------------------------|-------------|
+| **Concurrency**         | Test that invokes multiple threads interacting in an unsafe or unanticipated manner. Flakiness is caused by, for example, race conditions resulting from implicit assumptions about the ordering of execution, leading to deadlocks in certain test runs. |
+| **Randomness**          | Test uses the result of a random data generator. If the test does not account for all possible cases, then the test may fail intermittently, e.g., only when the result of a random number generator is zero. |
+| **Floating Point**      | Test uses the result of a floating-point operation. Floating-point operations can suffer from precision over- and under-flows, non-associative addition, etc., which—if not properly accounted for—can result in inconsistent outcomes (e.g., comparing a floating-point result to an exact real value in an assertion). |
+| **Unordered Collection**| Test assumes a particular iteration order for an unordered-collection object. Since no order is specified, tests that assume a fixed order will likely be flaky for various reasons (e.g., collection-class implementation). |
+| **Too Restrictive Range**| Test whose assertions accept only part of the valid output range. It intermittently fails on unhandled corner cases. |
+| **Timeout**             | Test fails due to time limitations, either at the individual test level or as part of a suite. This includes tests that exceed their execution time limit (e.g., single test or the whole suite) and fail intermittently due to varying execution times. |
+| **Order Dependency**    | Test depends on a shared value or resource modified by another test. Changing the test-run order can break those dependencies and produce inconsistent outcomes. |
+| **Resource Leak**       | Test improperly handles an external resource (e.g., failing to release memory). Subsequent tests that reuse the resource may become flaky. |
+| **Asynchronous Wait**   | Test makes an asynchronous call or waits for elements to load/render and does not explicitly wait for completion (often using a fixed delay). If the call or rendering takes longer than the delay, the test fails. |
+| **IO**                  | Test is flaky due to its handling of input/output—for example, failing when disk space runs out during a write. |
+| **Network**             | Test depends on network availability (e.g., querying a server). If the network is unavailable or congested, the test may fail. |
+| **Time**                | Test relies on system time and may be flaky due to precision or timezone discrepancies (e.g., failing when midnight passes in UTC). |
+| **Environment Dependency** | Test depends on specific OS, library versions, or hardware. It may pass on one environment but fail on another, especially in cloud-CI environments where machines vary nondeterministically. |
+| **Unknown**             | Test is flaky for an unknown reason. |
+
 ## Compatibility
 
 To use Flaky Test Management features, you must use Datadog's native instrumentation for your test framework. The table below outlines the minimum versions of each Datadog tracing library required to quarantine, disable, and attempt to fix flaky tests. Click a language name for setup information: