Skip to content

Fix RagTokenizer attribute delegation#46919

Open
Pruthvi226 wants to merge 1 commit into
huggingface:mainfrom
Pruthvi226:contribution-good-first-issue
Open

Fix RagTokenizer attribute delegation#46919
Pruthvi226 wants to merge 1 commit into
huggingface:mainfrom
Pruthvi226:contribution-good-first-issue

Conversation

@Pruthvi226

@Pruthvi226 Pruthvi226 commented Jun 26, 2026

Copy link
Copy Markdown

What does this PR do?

Fixes #35532.

This PR updates RagTokenizer so that missing tokenizer attributes and methods are delegated to the active current_tokenizer.

Previously, RagTokenizer.__call__, decode, and batch_decode were forwarded, but other tokenizer APIs such as encode, patch_token, and patch_token_id were not available through the wrapper. This caused AttributeError even though the active underlying tokenizer supported those attributes.

The change adds a guarded __getattr__ implementation that looks up missing attributes on current_tokenizer. This keeps the existing input/target tokenizer switching behavior while making RagTokenizer behave more like the tokenizer it wraps.

A regression test was added to check:

  • RagTokenizer.encode(...) delegates to the question encoder in input mode
  • tokenizer attributes like patch_token and patch_token_id are accessible through the wrapper
  • missing attributes still behave correctly with hasattr
  • encode(...) delegates to the generator after switching to target mode

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by code agents. We are currently bottlenecked by our ability to review and respond to them. As a result, we ask that new users do not submit pure code agent PRs at this time. You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents not to open any PRs or issues for the moment.

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this repeatedly or maliciously.

This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result, this policy is likely to be updated regularly in the near future. For more information, please read CONTRIBUTING.md.

  • I confirm that this is not a pure code agent PR.

I used an AI coding assistant while drafting this change. I reviewed the issue, source changes, tests, and final diff before submitting.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline and the Pull Request checks?
  • Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.

Issue: #35532

  • Did you make sure to update the documentation with your changes according to the guidelines?

No documentation update needed; this is a small tokenizer wrapper behavior fix.

  • Did you write any new necessary tests?

Added RagTokenizerTest::test_delegates_missing_attributes_to_current_tokenizer.

Tests

PYTHONPATH=src python -m pytest -q -p no:cacheprovider tests/models/rag/test_tokenization_rag.py -k "delegates_missing_attributes_to_current_tokenizer or save_load_pretrained_with_saved_config"
# Passed 3 consecutive runs locally.

PYTHONPATH=src python -m pytest -q -p no:cacheprovider tests/models/rag/test_tokenization_rag.py
# 2 passed, 2 skipped locally. Slow pretrained RAG tests were skipped.

python -m ruff check src/transformers/models/rag/tokenization_rag.py tests/models/rag/test_tokenization_rag.py
python -m ruff format --check src/transformers/models/rag/tokenization_rag.py tests/models/rag/test_tokenization_rag.py
git diff --check origin/main...HEAD -- src/transformers/models/rag/tokenization_rag.py tests/models/rag/test_tokenization_rag.py

@github-actions

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: rag

@github-actions

Copy link
Copy Markdown
Contributor

CI Dashboard: View test results in Grafana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RagTokenizer Missing patch_token_id, patch_token, and encode Functionality

1 participant