A multi-language flaky test detection system powered by GitHub Agentic Workflows (gh-aw). This repository contains sample applications with intentionally flaky tests across 14 programming languages, used to validate and demonstrate automated flaky test detection, triage, and remediation via AI agents.
- Test workflows run across all language projects on a schedule (and on push/PR)
- Each language job uploads JUnit XML test reports as artifacts
- The corn-flakes-detection agentic workflow analyzes test artifacts daily
- An AI agent detects flaky tests, creates/manages GitHub Issues, and assigns Copilot Coding Agent to fix them
βββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββββββ
β test.yml ββββββΆβ Test Artifacts ββββββΆβ corn-flakes- β
β (14 langs) β β (JUnit XML) β β detection agent β
βββββββββββββββ ββββββββββββββββββββ ββββββββββ¬βββββββββββββ
β
ββββββββββββββΌβββββββββββββ
β GitHub Issues β
β + Copilot Agent Fixes β
βββββββββββββββββββββββββββ
| Language | Folder | Test Framework(s) | Report Format |
|---|---|---|---|
| Java | java/ |
JUnit 5 (Maven Surefire) | JUnit XML |
| Python | python/ |
pytest + unittest | JUnit XML |
| TypeScript | typescript/ |
Jest + Playwright | JUnit XML |
| Go | golang/ |
built-in testing (gotestsum) | JUnit XML |
| C# | csharp/ |
xUnit.net | JUnit XML |
| Rust | rust/ |
cargo test (cargo2junit) | JUnit XML |
| C++ | cpp/ |
Google Test (gTest) | CTest JUnit XML |
| C | c/ |
Unity (CTest) | CTest JUnit XML |
| Swift | swift/ |
Swift Testing (XCTest) | Text output |
| Kotlin | kotlin/ |
kotlin.test (Gradle) | JUnit XML |
| PHP | php/ |
PHPUnit | JUnit XML |
| Ruby | ruby/ |
RSpec | JUnit XML |
| Elixir | elixir/ |
ExUnit | JUnit XML |
| Dart | dart/ |
dart test | JUnit XML |
Each language folder follows the same pattern β a MathOperations class (deterministic) and a RandomMathOperations class (intentionally flaky):
corn-test/
βββ java/ # Java (JUnit 5 / Maven)
βββ python/ # Python (pytest + unittest)
βββ typescript/ # TypeScript (Jest + Playwright)
βββ golang/ # Go (built-in testing)
βββ csharp/ # C# (xUnit.net / .NET)
βββ rust/ # Rust (cargo test)
βββ cpp/ # C++ (Google Test / CMake)
βββ c/ # C (Unity / CMake)
βββ swift/ # Swift (XCTest / SPM)
βββ kotlin/ # Kotlin (kotlin.test / Gradle)
βββ php/ # PHP (PHPUnit / Composer)
βββ ruby/ # Ruby (RSpec / Bundler)
βββ elixir/ # Elixir (ExUnit / Mix)
βββ dart/ # Dart (dart test)
βββ docs/
β βββ ADOPTION_GUIDE.md # How to adopt corn-flakes-detection in your repo
βββ .github/
βββ workflows/
βββ test.yml # Multi-language test runner
βββ corn-flakes-detection.md # Agentic workflow definition
βββ corn-flakes-detection.lock.yml # Compiled workflow (auto-generated)
βββ scripts/
βββ analyze_gh_test_failures.py # JUnit XML test report analyzer
βββ analyze_test_results.py # Multi-framework report analyzer
Every language project implements the same two classes:
Reliable mathematical operations that always produce consistent results:
add,subtract,multiply,divide(with zero-check)power,factorial,derivative(polynomial)pi(first 40 decimals),gcd(Euclidean algorithm)
Functions designed to produce intermittent test failures:
| Function | Behavior | Flakiness |
|---|---|---|
generateRandomOddNumber() |
Returns odd number 1β99 | β Reliable |
generateRandomEvenNumber() |
Returns even number 0β100 | |
generateRandomPrimeCandidate() |
Returns prime from list 2β97 | β Reliable |
Each language's test suite runs the flaky tests 20 times per execution to amplify the detection signal.
Each language folder is self-contained. Navigate to the folder and use the language's standard tooling:
# Java
cd java && mvn test
# Python
cd python && pip install -r requirements.txt && python -m pytest -v
# TypeScript
cd typescript && npm ci && npm test
# Go
cd golang && go test -v ./...
# C#
cd csharp && dotnet test
# Rust
cd rust && cargo test
# C++
cd cpp && mkdir -p build && cd build && cmake .. && make && ctest
# C
cd c && mkdir -p build && cd build && cmake .. && make && ctest
# Swift
cd swift && swift test
# Kotlin
cd kotlin && ./gradlew test
# PHP
cd php && composer install && vendor/bin/phpunit
# Ruby
cd ruby && bundle install && bundle exec rspec
# Elixir
cd elixir && mix deps.get && mix test
# Dart
cd dart && dart pub get && dart testThe .github/workflows/test.yml runs 14 parallel jobs (one per language), then a collect-results job merges all artifacts into a single test-results artifact consumed by the flake detection workflow.
All test jobs use continue-on-error: true to ensure artifacts are always uploaded, even when flaky tests fail.
Want to add automated flaky test detection to your own repository? See the Adoption Guide for step-by-step instructions covering:
- Setting up the test workflow for your language
- Installing and configuring the agentic workflow
- Token and permission configuration
- Customization options
This project is designed for:
- π Flake detection β automatically identify tests that pass and fail intermittently
- π€ AI-powered remediation β Copilot Coding Agent is assigned to fix detected flaky tests
- π Reliability metrics β track test stability across multiple languages and frameworks
- π CI/CD resilience β validate retry strategies and failure handling
- π Reference implementation β example of multi-language test infrastructure with agentic workflows