Skip to content

Document too long at transforms #1852

Closed
@itogaston

Description

@itogaston
  • I checked the documentation and related resources and couldn't find an answer to my question.

Your Question
I have a really long pdf (475 pages), it raises this error Saying "Documents appears to be too short (ie 100 tokens or less). Please provide longer documents."
I think the problem is not the document being short but the oposite.

code to classify documents based on size:

bin_ranges = [(0, 100), (101, 500), (501, 100000)]
result = count_doc_length_bins(documents, bin_ranges)
result = {k: v / len(documents) for k, v in result.items()}

debuggin my code the size of my document gives 390000. The size exceeds the upper limit of the last bin. So there is no bin to place this, falling for the default condition which raises the above exception.

I think the Document should be process as those in the last bean or raise an Exception saying the document is too big.

Metadata

Metadata

Assignees

No one assigned

    Labels

    answered🤖 The question has been answered. Will be closed automatically if no new commentsbugSomething isn't workingquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions