Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create tensor #599

Open
BhuviTheDataGuy opened this issue Dec 15, 2024 · 2 comments
Open

Unable to create tensor #599

BhuviTheDataGuy opened this issue Dec 15, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@BhuviTheDataGuy
Copy link

Command I ran:

 docling --no-ocr https://arxiv.org/pdf/2408.09869

Error:

ValueError: Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length.

Full Error:


docling --no-ocr https://arxiv.org/pdf/2408.09869

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.2.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/Library/Frameworks/Python.framework/Versions/3.11/bin/docling", line 5, in <module>
    from docling.cli.main import app
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling/cli/main.py", line 13, in <module>
    from docling_core.types.doc import ImageRefMode
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling_core/types/__init__.py", line 8, in <module>
    from docling_core.types.doc.document import DoclingDocument
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling_core/types/doc/__init__.py", line 9, in <module>
    from .document import (
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling_core/types/doc/document.py", line 19, in <module>
    import pandas as pd
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/__init__.py", line 26, in <module>
    from pandas.compat import (
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/compat/__init__.py", line 27, in <module>
    from pandas.compat.pyarrow import (
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/compat/pyarrow.py", line 8, in <module>
    import pyarrow as pa
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pyarrow/__init__.py", line 65, in <module>
    import pyarrow.lib as _lib
AttributeError: _ARRAY_API not found

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.2.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/Library/Frameworks/Python.framework/Versions/3.11/bin/docling", line 5, in <module>
    from docling.cli.main import app
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling/cli/main.py", line 13, in <module>
    from docling_core.types.doc import ImageRefMode
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling_core/types/__init__.py", line 8, in <module>
    from docling_core.types.doc.document import DoclingDocument
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling_core/types/doc/__init__.py", line 9, in <module>
    from .document import (
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling_core/types/doc/document.py", line 19, in <module>
    import pandas as pd
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/__init__.py", line 49, in <module>
    from pandas.core.api import (
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/core/api.py", line 9, in <module>
    from pandas.core.dtypes.dtypes import (
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/core/dtypes/dtypes.py", line 24, in <module>
    from pandas._libs import (
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pyarrow/__init__.py", line 65, in <module>
    import pyarrow.lib as _lib
AttributeError: _ARRAY_API not found


A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.2.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/Library/Frameworks/Python.framework/Versions/3.11/bin/docling", line 5, in <module>
    from docling.cli.main import app
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling/cli/main.py", line 43, in <module>
    from docling.document_converter import DocumentConverter, FormatOption, PdfFormatOption
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling/document_converter.py", line 35, in <module>
    from docling.pipeline.standard_pdf_pipeline import StandardPdfPipeline
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling/pipeline/standard_pdf_pipeline.py", line 22, in <module>
    from docling.models.easyocr_model import EasyOcrModel
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling/models/easyocr_model.py", line 6, in <module>
    import torch
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/__init__.py", line 1477, in <module>
    from .functional import *  # noqa: F403
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/functional.py", line 9, in <module>
    import torch.nn.functional as F
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/__init__.py", line 1, in <module>
    from .modules import *  # noqa: F403
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/modules/__init__.py", line 35, in <module>
    from .transformer import TransformerEncoder, TransformerDecoder, \
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/modules/transformer.py", line 20, in <module>
    device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
  device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
WARNING:docling.pipeline.base_pipeline:Encountered an error during conversion of document 82dd470712ce8389f19f20eb9330475e2166a281f8c7990a9f1d0763d73b4d22:
Traceback (most recent call last):

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/feature_extraction_utils.py", line 193, in convert_to_tensors
    tensor = as_tensor(value)
             ^^^^^^^^^^^^^^^^

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/feature_extraction_utils.py", line 150, in as_tensor
    return torch.from_numpy(value)
           ^^^^^^^^^^^^^^^^^^^^^^^

RuntimeError: Numpy is not available


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling/pipeline/base_pipeline.py", line 150, in _build_document
    for p in pipeline_pages:  # Must exhaust!

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling/pipeline/base_pipeline.py", line 116, in _apply_on_pages
    yield from page_batch

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling/models/page_assemble_model.py", line 59, in __call__
    for page in page_batch:

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling/models/table_structure_model.py", line 113, in __call__
    for page in page_batch:

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling/models/layout_model.py", line 300, in __call__
    for ix, pred_item in enumerate(

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
               ^^^^^^^^^^^^^^

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/docling_ibm_models/layoutmodel/layout_predictor.py", line 138, in predict
    inputs = self._image_processor(
             ^^^^^^^^^^^^^^^^^^^^^^

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/image_processing_utils.py", line 41, in __call__
    return self.preprocess(images, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/utils/generic.py", line 852, in wrapper
    return func(*args, **valid_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/models/rt_detr/image_processing_rt_detr.py", line 1021, in preprocess
    encoded_inputs = BatchFeature(data={"pixel_values": images}, tensor_type=return_tensors)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/feature_extraction_utils.py", line 79, in __init__
    self.convert_to_tensors(tensor_type=tensor_type)

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/feature_extraction_utils.py", line 199, in convert_to_tensors
    raise ValueError(

ValueError: Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length.

WARNING:docling.cli.main:Document /var/folders/lc/q0txc0ts5799d5zl19143g6c0000gn/T/tmphwep0ko0/2408.09869v5.pdf failed to convert.

Device:

  • Macbook Pro, Intel processor
@BhuviTheDataGuy BhuviTheDataGuy added the bug Something isn't working label Dec 15, 2024
@dolfim-ibm
Copy link
Contributor

@BhuviTheDataGuy can you please post the following details?

  1. Which Docling version, i.e. you can run the command docling --version
  2. Which numpy version, e.g. by running pip freeze | grep numpy
  3. Which torch version, e.g. by running pip freeze | grep torch

I think you might have a mix of incompatible torch and numpy versions in your environment.

@BhuviTheDataGuy
Copy link
Author

Docling version: 2.12.0
Docling Core version: 2.10.0
Docling IBM Models version: 3.1.0
Docling Parse version: 3.0.0
numpy==2.2.0
torch==2.2.2
torchvision==0.17.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants