Skip to content

Fix botocore endpoint resolution by preserving top-level data files#23039

Open
piochelepiotr wants to merge 2 commits intomasterfrom
piotr.wolski/fix-botocore-endpoints-data
Open

Fix botocore endpoint resolution by preserving top-level data files#23039
piochelepiotr wants to merge 2 commits intomasterfrom
piotr.wolski/fix-botocore-endpoints-data

Conversation

@piochelepiotr
Copy link
Copy Markdown
Contributor

Summary

  • PR Delete botocore API schemas that we don't use to save disk space #22283 used botocore/data/* to strip unused service models at build time, but this glob also matched essential top-level files (endpoints.json, partitions.json, sdk-default-configuration.json, _retry.json) that botocore needs to resolve any AWS service endpoint
  • Changes the pattern from botocore/data/* (matches files and directories) to botocore/data/*/ (matches only subdirectories), so top-level files are preserved automatically
  • Fixes DataNotFoundError: Unable to load data for: endpoints that breaks amazon_msk and kafka_consumer with MSK IAM auth

Test plan

  • Verify with pathspec that top-level files are no longer excluded:
    python3 -c "
    import pathspec, tomllib
    with open('.builders/scripts/files_to_remove.toml', 'rb') as f:
        spec = pathspec.PathSpec.from_lines('gitwildmatch', tomllib.load(f)['excluded_paths'])
    for f in ['botocore/data/endpoints.json', 'botocore/data/_retry.json',
              'botocore/data/partitions.json', 'botocore/data/sdk-default-configuration.json']:
        assert not spec.match_file(f), f'{f} should not be excluded'
        print(f'PASS: {f} preserved')
    for f in ['botocore/data/ec2/2016-11-15/service-2.json']:
        assert spec.match_file(f), f'{f} should be excluded'
        print(f'PASS: {f} excluded')
    for f in ['botocore/data/sts/2011-06-15/service-2.json', 'botocore/data/kafka/2018-11-14/service-2.json']:
        assert not spec.match_file(f), f'{f} should not be excluded'
        print(f'PASS: {f} preserved')
    "
  • Build agent wheels and confirm botocore/data/endpoints.json is present in the output
  • Deploy to a cluster running kafka_consumer + amazon_msk against MSK and verify checks pass

🤖 Generated with Claude Code

PR #22283 used `botocore/data/*` to strip unused service models, but
this also removed essential top-level files (endpoints.json,
partitions.json, sdk-default-configuration.json, _retry.json) that
botocore requires to function. This causes
`DataNotFoundError: Unable to load data for: endpoints` breaking
amazon_msk and kafka_consumer with MSK IAM auth.

Change the pattern from `botocore/data/*` (matches everything) to
`botocore/data/*/` (matches only subdirectories), so top-level files
are preserved automatically without needing to enumerate them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 24, 2026

⚠️ Recommendation: Add qa/skip-qa label

This PR does not modify any files shipped with the agent.

To help streamline the release process, please consider adding the qa/skip-qa label if these changes do not require QA testing.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 37ffa88109

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

The previous approach using `botocore/data/*/` was defeated by
build_wheels.py's `is_excluded_from_wheel` which appends "/" to all
paths before matching, causing files like endpoints.json/ to match
a directory-only pattern.

Instead, keep the original `botocore/data/*` glob and add a negation
`!botocore/data/*.json` to explicitly preserve the top-level JSON
files that botocore needs (endpoints.json, partitions.json,
_retry.json, sdk-default-configuration.json).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants