Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
6c88227
Add flake detector skill
yschimke Jan 10, 2026
186848b
add reproducer
yschimke Jan 10, 2026
c688d83
update command
yschimke Jan 10, 2026
f2e6a16
update command
Jan 10, 2026
49166fe
update command
Jan 10, 2026
a8c3893
more repeated tests
Jan 10, 2026
ca993c6
more repeated tests
Jan 10, 2026
da50082
update script
Jan 10, 2026
1d37e57
update script
Jan 10, 2026
804a20d
feat: Add local flake reproduction with @RepeatedTest
Jan 10, 2026
be8150c
update script
Jan 10, 2026
2e3e0c2
update script
Jan 10, 2026
98c94bf
update script
Jan 10, 2026
51d56a3
Merge branch 'master' into flake_skill
yschimke Jan 10, 2026
a88e9cc
Merge branch 'master' into flake_skill
Jan 10, 2026
3bc5880
Merge branch 'master' into flake_skill
yschimke Jan 11, 2026
e9d3623
Merge branch 'master' into flake_skill
Jan 11, 2026
093eb50
Merge remote-tracking branch 'yschimke/flake_skill' into flake_skill
Jan 11, 2026
b5b7864
Duplex test
Jan 11, 2026
9ddcc5a
Merge branch 'master' into flake_skill
yschimke Jan 11, 2026
cefe25c
Merge branch 'master' into flake_skill
yschimke Jan 11, 2026
509ad5a
Add sleep to HttpOverHttp2Test to prevent flakiness
yschimke Jan 11, 2026
dce7f71
Fix flaky tests: HttpOverHttp2Test, Http2ConnectionTest, WebSocketHtt…
yschimke Jan 11, 2026
98e3f78
Merge remote-tracking branch 'yschimke/flake_skill' into flake_skill
yschimke Jan 11, 2026
b828c91
More fixes
yschimke Jan 11, 2026
8d51dff
Fix DuplexTest.duplexWithRedirect flake by flushing headers.
yschimke Jan 11, 2026
f3ec0d6
Fix EventListenerTest flakes.
yschimke Jan 11, 2026
12467ea
Revert @RepeatedTest to @Test in RouteFailureTest.
yschimke Jan 11, 2026
e0a95ce
More fixes
yschimke Jan 11, 2026
22acc81
Merge branch 'master' into flake_skill
yschimke Jan 11, 2026
c33320b
Final cleanup of flaky tests.
yschimke Jan 11, 2026
4df76de
Test who close to failing we are
yschimke Jan 11, 2026
061bf37
Merge branch 'master' into flake_skill
yschimke Jan 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions .github/skills/flake-detector/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Flake Detector Skill

## Description
This skill helps identify flaky tests in the OkHttp project by analyzing recent failures in the GitHub Actions `build.yml` workflow on the `master` branch. It fetches failed run logs, extracts test failure patterns, and **aggregates the number of failures encountered per test class**.

## Reproduction
To reproduce these flakes locally:

1. **Identify** the specific failing method using the detector:
```bash
./.github/skills/flake-detector/identify-flakes.sh
```

2. **Modify** the test code to repeat the test.
* Open the failing test file.
* Replace `@Test` with `@RepeatedTest(100)` for the failing test method.
* Add import `org.junit.jupiter.api.RepeatedTest`.

3. **Run** the reproduction script:
```bash
./.github/skills/flake-detector/reproduce-flakes.sh
```
This script will automatically detect recent failing methods and execute them. If you have applied `@RepeatedTest`, it will run them 100 times.

**Note**: When investigating a particular test, you can run the script with the specific test name as an argument. This will override the `flaky-tests.txt` file and only run that test:
```bash
./.github/skills/flake-detector/reproduce-flakes.sh okhttp3.CacheTest.testGoldenCacheHttpsResponseOkHttp27
```

4. **Cleanup**: @RepeatedTest should be removed from tests that aren't recently flaky in CI, and that run fine locally also. Revert them to @Test to avoid slowing down the test suite.

## Known Flakes (as of Jan 2026)
Based on recent analysis, the following tests are known to be flaky, ordered by observed frequency:

1. **`CacheTest` (Golden Cache) - High Frequency**
* `testGoldenCacheHttpsResponseOkHttp27`, `testGoldenCacheHttpsResponseOkHttp30`
* **Symptom**: `java.net.SocketTimeoutException: Read timed out` or `timeout`. Also `AssertionFailedError` on content mismatches.

2. **`RouteFailureTest`**
* `http2OneBadHostRetryOnConnectionFailureFastFallback`
* `http2OneBadHostOneGoodNoRetryOnConnectionFailureFastFallback`
* `http2OneBadHostRetryOnConnectionFailure`
* **Symptom**: `AssertionFailedError: expected:<[1]> but was:<[0]>` (Retry count mismatch).

3. **`ServerTruncatesRequestTest`**
* `serverTruncatesRequestButTrailersCanStillBeReadHttp1`
* `serverTruncatesRequestOnLongPostHttp1`
* **Symptom**: `java.net.SocketException: An established connection was aborted by the software in your host machine` (likely environment specific).

4. **`WebSocketHttpTest`**
* `closeWithoutSuccessfulConnect`
* **Symptom**: `AssertionFailedError: Still 0 connections open ==> expected: <0> but was: <1>`

5. **`Http2ConnectionTest`**
* `discardedDataFramesAreCounted`
* **Symptom**: Data frame count mismatch (`1024` vs `2048`).

6. **`EventListenerTest_Relay`**
* `cancelAsyncCall`
* **Symptom**: Unexpected event sequence.

7. **`DuplexTest`**
* `duplexWithRedirect`
* **Symptom**: `java.util.concurrent.TimeoutException` (timed out after 30 seconds).

8. **`AlpnOverrideTest`**
* **Symptom**: `java.net.ConnectException` (often transient CI network issues connecting to google.com).

9. **`ThreadInterruptTest`**
* `forciblyStopDispatcher`
* **Symptom**: `java.util.concurrent.TimeoutException`.

10. **`HttpOverHttp2Test`**
* `recoverFromMultipleCancelReusesConnection`
* **Symptom**: `AssertionFailedError` (Connection count mismatch).
24 changes: 24 additions & 0 deletions .github/skills/flake-detector/flaky-tests.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
CacheTest.testGoldenCacheHttpResponseOkHttp30
CacheTest.testGoldenCacheHttpsResponseOkHttp27
CacheTest.testGoldenCacheHttpsResponseOkHttp30
CacheTest.testGoldenCacheResponse
CookiesTest.testQuotedAttributeValues
DuplexTest.duplexWithRedirect
EventListenerTest.timeToFirstByteHttp2OverHttps
EventListenerTest_Relay.cancelAsyncCall
EventListenerTest_Relay.successfulCallEventSequenceForEnqueue
Http2ConnectionTest.discardedDataFramesAreCounted
HttpOverHttp2Test.recoverFromMultipleCancelReusesConnection
HttpOverHttp2Test_HTTP_2.connectionTimeout
HttpOverHttp2Test_HTTP_2.oneStreamTimeoutDoesNotBreakConnection
HttpOverHttp2Test_HTTP_2.readResponseHeaderTimeout
HttpOverHttp2Test_HTTP_2.readTimeoutOnSlowConnection
HttpOverHttp2Test_HTTP_2.streamTimeoutDegradesConnectionAfterNoPong
RouteFailureTest.http2OneBadHostOneGoodNoRetryOnConnectionFailure
RouteFailureTest.http2OneBadHostOneGoodNoRetryOnConnectionFailureFastFallback
RouteFailureTest.http2OneBadHostRetryOnConnectionFailure
RouteFailureTest.http2OneBadHostRetryOnConnectionFailureFastFallback
ServerTruncatesRequestTest.serverTruncatesRequestButTrailersCanStillBeReadHttp1
ServerTruncatesRequestTest.serverTruncatesRequestOnLongPostHttp1
ThreadInterruptTest.forciblyStopDispatcher
WebSocketHttpTest.closeWithoutSuccessfulConnect
79 changes: 79 additions & 0 deletions .github/skills/flake-detector/identify-flakes.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
#!/bin/bash
# Identify flaky tests in GitHub Actions workflow runs.
# Requires: gh CLI authenticated

LIMIT=${1:-10}
WORKFLOW="build.yml"
BRANCH="master"
REPO="square/okhttp"
SKILL_DIR=$(dirname "$0")
FAILURES_FILE=$(mktemp)
OUTPUT_FILE="$SKILL_DIR/flaky-tests.txt"

# Clear previous output
rm -f "$OUTPUT_FILE"
touch "$OUTPUT_FILE"

echo "Fetching last $LIMIT failed runs for $WORKFLOW on $BRANCH..."

RUN_IDS=$(gh run list --workflow "$WORKFLOW" --branch "$BRANCH" --status failure --limit "$LIMIT" --repo "$REPO" --json databaseId --jq '.[].databaseId')

if [ -z "$RUN_IDS" ]; then
echo "No failed runs found."
rm -f "$FAILURES_FILE"
exit 0
fi

for run_id in $RUN_IDS; do
echo "--------------------------------------------------------------------------------"
echo "Run ID: $run_id"
echo "URL: https://github.com/$REPO/actions/runs/$run_id"

# Get failed job IDs
JOB_DATA=$(gh api "repos/$REPO/actions/runs/$run_id/jobs" --jq '.jobs[] | select(.conclusion=="failure") | "\(.id) \(.name)"')

if [ -z "$JOB_DATA" ]; then
echo " No failed jobs found (possibly cancelled or infra failure)."
continue
fi

while read -r job_id job_name; do
echo " Job: $job_name (ID: $job_id)"
# Fetch logs
LOG_CONTENT=$(gh api "repos/$REPO/actions/jobs/$job_id/logs")

# Extract failure details for display
echo "$LOG_CONTENT" | grep "FAILED" -A 1 | grep -v "Task :" | sed 's/^/ /' || echo " Could not extract failure details from logs."

# Extract class names for summary
echo "$LOG_CONTENT" | grep "FAILED" | grep -v "Task :" | grep -v "BUILD FAILED" | \
sed -E 's/^.*Z //;s/^[[:space:]]*//;s/\[.*//;s/ >.*//' >> "$FAILURES_FILE"

# Extract clean test names for output file (ClassName.methodName)
# Filter out Android tests and malformed lines
echo "$LOG_CONTENT" | grep "FAILED" | grep " > " | grep -v "Task :" | grep -v "android" | \
sed -E 's/^.*Z //;s/\[.*\] > /./;s/\(.*//;s/_[0-9]+$//' >> "$OUTPUT_FILE"

done <<< "$JOB_DATA"
done

echo ""
echo "========================================"
echo "SUMMARY OF FAILURES PER CLASS"
echo "========================================"
if [ -s "$FAILURES_FILE" ]; then
sort "$FAILURES_FILE" | uniq -c | sort -nr
else
echo "No specific test failures identified."
fi

# Unique and sort the output file
if [ -s "$OUTPUT_FILE" ]; then
sort -u "$OUTPUT_FILE" -o "$OUTPUT_FILE"
echo ""
echo "Clean list of flaky tests written to: $OUTPUT_FILE"
else
echo "No clean test names extracted."
fi

rm -f "$FAILURES_FILE"
123 changes: 123 additions & 0 deletions .github/skills/flake-detector/reproduce-flakes.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
#!/bin/bash
# Reproduce flaky tests locally.
# Usage: ./reproduce-flakes.sh
#
# This script:
# 1. Reads the list of flaky tests from flaky-tests.txt
# 2. Maps them to the correct Gradle module and task.
# 3. Runs them in appropriate Gradle invocations.
#
# RECOMMENDATION:
# Before running this, manually edit the failing test methods in your IDE
# to use @RepeatedTest(100) instead of @Test. This ensures they run enough
# times to trigger the flake.

SKILL_DIR=$(dirname "$0")
FLAKY_TESTS_FILE="$SKILL_DIR/flaky-tests.txt"
CLASS_FILE_MAP_FILE=$(mktemp)

# 1. Determine which tests to run
TESTS_TO_RUN=()
if [ "$#" -ge 1 ]; then
echo "Overriding flaky-tests.txt with provided test filters: $*"
for arg in "$@"; do
TESTS_TO_RUN+=("$arg")
done
else
# Check for flaky tests file
if [ ! -f "$FLAKY_TESTS_FILE" ]; then
echo "Error: flaky-tests.txt not found."
echo "Run ./identify-flakes.sh first to generate the list of flakes or provide a test filter as an argument."
exit 1
fi

if [ ! -s "$FLAKY_TESTS_FILE" ]; then
echo "No flaky tests found in flaky-tests.txt."
rm -f "$CLASS_FILE_MAP_FILE"
exit 0
fi

echo "Reading flaky tests from $FLAKY_TESTS_FILE..."
while read -r test_entry; do
if [ -n "$test_entry" ]; then
TESTS_TO_RUN+=("$test_entry")
fi
done < "$FLAKY_TESTS_FILE"
fi

# Generate class name to file path mapping once
echo "Generating class file map for faster lookups..."
find . -path "*/src/*Test/*" \( -name "*.kt" -o -name "*.java" \) -print0 | while IFS= read -r -d $'\0' file; do
BASENAME=$(basename "$file")
CLASS_NAME="${BASENAME%.*}"
echo "${CLASS_NAME};${file}" >> "$CLASS_FILE_MAP_FILE"
done

# associative array to hold class name to file path
declare -A CLASS_FILE_MAP
while IFS=';' read -r class_name file_path; do
CLASS_FILE_MAP["$class_name"]="$file_path"
done < "$CLASS_FILE_MAP_FILE"

echo "--------------------------------------------------"

# associative array to hold task -> test filters
declare -A TASK_FILTERS

for test_entry in "${TESTS_TO_RUN[@]}"; do
# Extract the fully qualified class name (e.g., "okhttp3.RouteFailureTest")
FULLY_QUALIFIED_CLASS_NAME=$(echo "$test_entry" | sed -E 's/\.[^.]+$//')

# Extract the simple class name (e.g., "RouteFailureTest") for the find command
CLASS_NAME=$(basename "$(echo "$FULLY_QUALIFIED_CLASS_NAME" | tr . /)")

# Find the file using the simple class name
FILE_PATH=$(find . -name "${CLASS_NAME}.kt" -o -name "${CLASS_NAME}.java" | head -n 1)

if [ -z "$FILE_PATH" ]; then
echo "Warning: Could not find file for class $FULLY_QUALIFIED_CLASS_NAME. Skipping."
continue
fi

# Determine module and task
# Example path: ./okhttp/src/jvmTest/kotlin/okhttp3/CacheTest.kt -> module: okhttp, task: jvmTest
# Example path: ./mockwebserver/src/test/java/... -> module: mockwebserver, task: test

MODULE=$(echo "$FILE_PATH" | cut -d'/' -f2)

if [[ "$FILE_PATH" == *"/src/jvmTest/"* ]]; then
TASK=":$MODULE:jvmTest"
elif [[ "$FILE_PATH" == *"/src/test/"* ]]; then
TASK=":$MODULE:test"
elif [[ "$FILE_PATH" == *"/src/androidTest/"* ]]; then
# Skip Android instrumentation tests for local reproduction for now
echo "Skipping Android instrumentation test: $test_entry"
continue
else
# Default fallback
TASK=":$MODULE:test"
fi

# Append to the list for this task
if [ -z "${TASK_FILTERS[$TASK]}" ]; then
TASK_FILTERS[$TASK]="--tests $test_entry"
else
TASK_FILTERS[$TASK]="${TASK_FILTERS[$TASK]} --tests $test_entry"
fi

done < "$FLAKY_TESTS_FILE"

echo "--------------------------------------------------"

# Run Gradle commands
for TASK in "${!TASK_FILTERS[@]}"; do
ARGS="${TASK_FILTERS[$TASK]}"
echo "Running tests for task $TASK..."
echo "./gradlew $TASK $ARGS"
# We intentionally don't quote $ARGS here to allow word splitting of multiple --tests flags
# shellcheck disable=SC2086
./gradlew "$TASK" $ARGS
echo "--------------------------------------------------"
done

rm -f "$CLASS_FILE_MAP_FILE"
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
*/
package okhttp3

import java.util.concurrent.atomic.AtomicInteger

/**
* A special [EventListener] for testing the mechanics of event listeners.
*
Expand All @@ -37,10 +39,10 @@ class EventListenerRelay(
val eventListener: EventListener
get() = eventListenerAdapter

var eventCount = 0
var eventCount = AtomicInteger()

private fun onEvent(callEvent: CallEvent) {
if (eventCount++ == 0) {
if (eventCount.getAndIncrement() == 0) {
eventRecorder.logEvent(callEvent)
val next = EventListenerRelay(call, eventRecorder)
call.addEventListener(next.eventListener)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ open class EventRecorder(
private val forbiddenLocks = mutableListOf<Any>()

/** The timestamp of the last taken event, used to measure elapsed time between events. */
private var lastTimestampNs: Long? = null
var lastTimestampNs: Long? = null

/** Confirm that the thread does not hold a lock on `lock` during the callback. */
fun forbidLock(lock: Any) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ class Http2ExchangeCodec(
}
stream!!.readTimeout().timeout(chain.readTimeoutMillis.toLong(), TimeUnit.MILLISECONDS)
stream!!.writeTimeout().timeout(chain.writeTimeoutMillis.toLong(), TimeUnit.MILLISECONDS)
http2Connection.flush()
}

override fun flushRequest() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,12 @@ class WindowCounter(
) {
/** The total number of bytes consumed. */
var total: Long = 0L
@Synchronized get
private set

/** The total number of bytes acknowledged by outgoing `WINDOW_UPDATE` frames. */
var acknowledged: Long = 0L
@Synchronized get
private set

val unacknowledged: Long
Expand Down
Loading
Loading