Optimize CI: consolidate workflows, fix caching, speed up e2e tests#771
Closed
vikrantpuppala wants to merge 1 commit intoci/protected-runners-jfrogfrom
Closed
Optimize CI: consolidate workflows, fix caching, speed up e2e tests#771vikrantpuppala wants to merge 1 commit intoci/protected-runners-jfrogfrom
vikrantpuppala wants to merge 1 commit intoci/protected-runners-jfrogfrom
Conversation
Workflow consolidation:
- Delete integration.yml and daily-telemetry-e2e.yml (redundant with
coverage workflow which already runs all e2e tests)
- Add push-to-main trigger to coverage workflow
- Run all tests (including telemetry) in single pytest invocation with
--dist=loadgroup to respect xdist_group markers for isolation
Fix pyarrow cache:
- Remove cache-path: .venv-pyarrow from pyarrow jobs. Poetry always
creates .venv regardless of the cache-path input, so the cache was
never saved ("Path does not exist" error). The cache-suffix already
differentiates keys between variants.
Fix 3.14 post-test DNS hang:
- Add enable_telemetry=False to unit test DUMMY_CONNECTION_ARGS that
use server_hostname="foo". This prevents FeatureFlagsContext from
making real HTTP calls to fake hosts, eliminating ~8min hang from
ThreadPoolExecutor threads timing out on DNS on protected runners.
Improve e2e test parallelization:
- Split TestPySQLLargeQueriesSuite into 3 separate classes
(TestPySQLLargeWideResultSet, TestPySQLLargeNarrowResultSet,
TestPySQLLongRunningQuery) so xdist distributes them across workers
instead of all landing on one.
Speed up slow tests:
- Reduce large result set sizes from 300MB to 100MB (still validates
large fetches, lz4, chunking, row integrity)
- Start test_long_running_query at scale_factor=50 instead of 1 to
skip ramp-up iterations that finish instantly
Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
integration.ymlanddaily-telemetry-e2e.yml— coverage workflow already runs all e2e tests. Addpush: maintrigger to coverage. Run all tests (including telemetry) in a single pytest invocation with--dist=loadgroupfor xdist_group isolation.cache-path: .venv-pyarrow— poetry always creates.venv, so the cache was never saved. This alone fixes 3.14 pyarrow taking 15+ min (compiling mypy/pyarrow from source every run).enable_telemetry=Falseto unit test dummy connection args. Unit tests usingserver_hostname="foo"were triggering real HTTP calls to fake hosts — on protected runners the DNS timeout caused an 8-min process hang after tests finished.TestPySQLLargeQueriesSuiteinto 3 separate classes so the 6 slowest tests get distributed across workers instead of all landing on one (gw3 was running 40 min while gw0/gw1 sat idle after 5 min).test_long_running_querystarts atscale_factor=50instead of 1 to skip ramp-up.Expected Impact
Test plan
SKIP_COVERAGE_CHECK = CI workflow changes only, no source code coverage impact
This pull request was AI-assisted by Isaac.