Skip to content

Conversation

@rasapala
Copy link
Collaborator

🛠 Summary

Fix:
optimum cli error: Tokenizer won't be converted.
/usr/bin/python3: Error while finding module specification for 'openvino_tokenizers.cli' (ModuleNotFoundError: No module named 'openvino_tokenizers')

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

@rasapala rasapala requested review from dtrawins and mzegla October 28, 2025 12:20
@mzegla mzegla requested a review from Copilot October 29, 2025 12:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a critical issue where the optimum CLI fails with a ModuleNotFoundError for openvino_tokenizers. The fix restructures the Docker build process to ensure the Python tokenizers module is properly installed before attempting to build the native library components.

Key changes:

  • Reorders Python tokenizers installation to occur before native library compilation
  • Moves Python environment setup earlier in the build process
  • Adds missing which package dependency for RedHat-based images

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
create_package.sh Relocates OpenVINO GenAI library symbolic link creation to after patchelf operations
Dockerfile.ubuntu Restructures tokenizers build to install Python components before native library compilation
Dockerfile.redhat Similar restructuring as Ubuntu plus adds missing which package dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# hadolint ignore=DL3003
RUN git clone https://github.com/$ov_tokenizers_org/openvino_tokenizers.git /openvino_tokenizers && cd /openvino_tokenizers && git checkout $ov_tokenizers_branch && git submodule update --init --recursive
RUN if ! [[ $debug_bazel_flags == *"_py_off"* ]]; then \
mkdir -p /opt/intel/openvino/python/openvino_tokenizers/lib ; \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we moving this logic here?

there is usage of libopenvino_tokenizers.so here, but tokenizer building happens in the next section, is that not a problem?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only create directories and links. We need to copy the python folder for both cases - ov_us_binary=1 and =0, thus we move the logic outside.

fi

# hadolint ignore=DL3003
RUN if [ "$ov_use_binary" == "0" ]; then true ; else exit 0 ; fi ; \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as for redhat, why do we need to move this logic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only create directories and links. We need to copy the python folder for both cases - ov_us_binary=1 and =0, thus we move the logic outside. Gen_ai bin package does not provide python tokenizers setup for python.

$DNF_TOOL install -y python3-libs --setopt=install_weak_deps=0 --nodocs; \
fi ; \
$DNF_TOOL install -y shadow-utils; \
$DNF_TOOL install -y shadow-utils which; \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why which is installed here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required for convert_tokenizers check.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use command -v convert_tokenizer instead?

@dtrawins dtrawins added this to the 2025.4rc milestone Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants