-
Notifications
You must be signed in to change notification settings - Fork 231
Fix tokenizers #3737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix tokenizers #3737
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses a critical issue where the optimum CLI fails with a ModuleNotFoundError for openvino_tokenizers. The fix restructures the Docker build process to ensure the Python tokenizers module is properly installed before attempting to build the native library components.
Key changes:
- Reorders Python tokenizers installation to occur before native library compilation
- Moves Python environment setup earlier in the build process
- Adds missing
whichpackage dependency for RedHat-based images
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| create_package.sh | Relocates OpenVINO GenAI library symbolic link creation to after patchelf operations |
| Dockerfile.ubuntu | Restructures tokenizers build to install Python components before native library compilation |
| Dockerfile.redhat | Similar restructuring as Ubuntu plus adds missing which package dependency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # hadolint ignore=DL3003 | ||
| RUN git clone https://github.com/$ov_tokenizers_org/openvino_tokenizers.git /openvino_tokenizers && cd /openvino_tokenizers && git checkout $ov_tokenizers_branch && git submodule update --init --recursive | ||
| RUN if ! [[ $debug_bazel_flags == *"_py_off"* ]]; then \ | ||
| mkdir -p /opt/intel/openvino/python/openvino_tokenizers/lib ; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we moving this logic here?
there is usage of libopenvino_tokenizers.so here, but tokenizer building happens in the next section, is that not a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only create directories and links. We need to copy the python folder for both cases - ov_us_binary=1 and =0, thus we move the logic outside.
| fi | ||
|
|
||
| # hadolint ignore=DL3003 | ||
| RUN if [ "$ov_use_binary" == "0" ]; then true ; else exit 0 ; fi ; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as for redhat, why do we need to move this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only create directories and links. We need to copy the python folder for both cases - ov_us_binary=1 and =0, thus we move the logic outside. Gen_ai bin package does not provide python tokenizers setup for python.
Dockerfile.redhat
Outdated
| $DNF_TOOL install -y python3-libs --setopt=install_weak_deps=0 --nodocs; \ | ||
| fi ; \ | ||
| $DNF_TOOL install -y shadow-utils; \ | ||
| $DNF_TOOL install -y shadow-utils which; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why which is installed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required for convert_tokenizers check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we use command -v convert_tokenizer instead?
🛠 Summary
Fix:
optimum cli error: Tokenizer won't be converted.
/usr/bin/python3: Error while finding module specification for 'openvino_tokenizers.cli' (ModuleNotFoundError: No module named 'openvino_tokenizers')
🧪 Checklist
``