Closed
Description
- I checked the documentation and related resources and couldn't find an answer to my question.
Your Question
I have a really long pdf (475 pages), it raises this error Saying "Documents appears to be too short (ie 100 tokens or less). Please provide longer documents."
I think the problem is not the document being short but the oposite.
code to classify documents based on size:
bin_ranges = [(0, 100), (101, 500), (501, 100000)]
result = count_doc_length_bins(documents, bin_ranges)
result = {k: v / len(documents) for k, v in result.items()}
debuggin my code the size of my document gives 390000. The size exceeds the upper limit of the last bin. So there is no bin to place this, falling for the default condition which raises the above exception.
I think the Document should be process as those in the last bean or raise an Exception saying the document is too big.