Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for XLSX files generated by Apache POI and Numbers #107

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ricardolopezb-kognitos
Copy link

Fix: Incorrect Detection of certain xlsx Files in infer::get()

When using infer::get(), I encountered an issue where some xlsx files were incorrectly detected as zip files. This inconsistency seemed to depend on the source of the xlsx file.

Upon reviewing the code for inferring msooxml files, I found that xlsx files generated by Numbers and Apache POI have the xlb bytes in a different position compared to those from other applications. To ensure correct detection, it was necessary to skip over and check the following PK Header as well.

This PR adds a check for the 5th PK Header, allowing proper detection of xlsx files from these sources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant