Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug47950 -- make stream/directory name lookup in OLE2 case insensitive #477

Closed
wants to merge 9 commits into from

Conversation

tballison
Copy link
Contributor

This is a first draft at making stream/directory name look up case insensitive for OLE2 containers.

@@ -297,11 +298,15 @@ public static String getWorkbookDirEntryName(DirectoryNode directory) {
"It must be decrypted before use by XSSF, it cannot be used by HSSF");
}

// check for previous version of file format
if (directory.hasEntry(OLD_WORKBOOK_DIR_ENTRY_NAME)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal if this is too hard but is there any chance that hasEntry could remain case sensitive for backward compatibility reasons? And a new caseInsensitiveHasEntry added?

If leaving hasEntry case sensitive is too hard, could you could not using directory.getEntryNames().contains and adding a caseSensitiveHasEntry instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Y, I don't like the backwards compatibility break that this proposed change brings about. If we don't do this, though, then we'll have to change hasEntry(name) and getEntry(name) to the case insensitive options throughout our codebase. I can do that if desired.

Yep, I'll add that now.

@tballison
Copy link
Contributor Author

tballison commented Jun 21, 2023

I look forward to running the regression tests after this change. 😄 Let me know what else you find. I haven't worked with POI in a while, and this is a major change in behavior.

@tballison
Copy link
Contributor Author

tballison commented Jun 22, 2023

Results show no significant changes in ~400k msoffice files (not all ole2): https://corpora.tika.apache.org/base/reports/tika-2.8.1-msoffice-reports.tgz

There is a major difference in handling "true"/"false" "1/0" in xlsx(?only?) between the versions of POI, but this change is unrelated.

@pjfanning
Copy link
Contributor

@tballison could you raise an issue for the xssf boolean issue with and xlsx attached that makes it easier to debug?

With this PR, I'm still a little concerned about changing the behaviour of hasEntry/getEntry. I suppose that we could agree that the next release of POI has to be a v5.3.0 instead of v5.2.4 because of the behaviour change.

@tballison
Copy link
Contributor Author

tballison commented Jun 22, 2023

I'm not sure the boolean issue is wrong, just different. When I opened a couple of the files, there was nothing in the cells. Will look deeper later.

I'm also concerned about changing the behavior. We could go to 5.3.0, or I could have a major refactoring where I modify all the internal hasEntry and getEntry to hasEntryCaseInsensitive and getEntryCaseInsensitive. If Intellij is accurate, there are ~230 uses of getEntry and ~90 hasEntry.

I'm willing to make that change, but it will be a large set of changes -- TBF, a bunch are in unit tests. Let me know what you think.

The other thing I shouldn't have done was change the values of InternalWorkbook#WORKBOOK_DIR_ENTRY_NAMES. I should have left that as is and made my own mods on a new field. I'll change that back to what it was.

@pjfanning
Copy link
Contributor

If possible and the IDE makes it easy, I think keeping getEntry/hasEntry case-sensitive and switching most code to a set of case-sensitive methods makes releasing easier.

@tballison
Copy link
Contributor Author

Should I add getEntryCaseInsensitive and hasEntryCaseInsensitive to the DirectoryEntry interface?

@pjfanning
Copy link
Contributor

Should I add getEntryCaseInsensitive and hasEntryCaseInsensitive to the DirectoryEntry interface?

Seems best to expose them fully.

@tballison
Copy link
Contributor Author

Agree, but won't that break any implementations of DirectoryEntry? I can't imagine anyone is actually doing this, but, it still would break backward compatibility, no?

@pjfanning
Copy link
Contributor

I still think we should add it to the interface. The number of affected users will be low.

…nsitive; restore InternalWorkbook.WORKBOOK_DIR_ENTRY_NAMES
@tballison
Copy link
Contributor Author

PR just got huge...LOL...

@tballison
Copy link
Contributor Author

@pjfanning , wdyt? Ready to merge or further work required? Thanks, again!

@pjfanning
Copy link
Contributor

Sure - seems ok to me. You'll need to rebase your branch before submitting to svn because a few changes have gone into trunk.

@pjfanning
Copy link
Contributor

@tballison is this PR still something that you want to merge? It's a year since the last POI release and it would be good to get a release out in the next month or 2.

@tballison
Copy link
Contributor Author

Y, will update it today and merge. Thank you!

@tballison
Copy link
Contributor Author

Argh... Monday...

# Conflicts:
#	poi-scratchpad/src/main/java/org/apache/poi/hpbf/model/HPBFPart.java
#	poi-scratchpad/src/main/java/org/apache/poi/hslf/record/CurrentUserAtom.java
#	poi-scratchpad/src/main/java/org/apache/poi/hslf/usermodel/HSLFSlideShowImpl.java
#	poi-scratchpad/src/main/java/org/apache/poi/hwpf/HWPFDocumentCore.java
#	poi-scratchpad/src/test/java/org/apache/poi/hsmf/parsers/TestPOIFSChunkParser.java
#	src/resources/ooxml-lite-report.xsb
@tballison
Copy link
Contributor Author

I'm consistently getting two test failures now locally:

[TestAllFiles](file:///home/tallison/Intellij/my-poi/poi-integration/build/reports/tests/test/classes/org.apache.poi.stress.TestAllFiles.html). [Extracting - #443 document/clusterfuzz-testcase-minimized-POIXWPFFuzzer-6733884933668864.docx XWPF](file:///home/tallison/Intellij/my-poi/poi-integration/build/reports/tests/test/classes/org.apache.poi.stress.TestAllFiles.html#handleExtracting(String,%20FileHandlerKnown,%20String,%20Class,%20String)[443])
    [TestAllFiles](file:///home/tallison/Intellij/my-poi/poi-integration/build/reports/tests/test/classes/org.apache.poi.stress.TestAllFiles.html). [#443 document/clusterfuzz-testcase-minimized-POIXWPFFuzzer-6733884933668864.docx XWPF](file:///home/tallison/Intellij/my-poi/poi-integration/build/reports/tests/test/classes/org.apache.poi.stress.TestAllFiles.html#handleFile(String,%20FileHandlerKnown,%20String,%20Class,%20String)[443])
org.opentest4j.AssertionFailedError: document/clusterfuzz-testcase-minimized-POIXWPFFuzzer-6733884933668864.docx - failed for handler XWPFFileHandler:  ==> Unexpected exception thrown: org.apache.xmlbeans.XmlException: error: Element or attribute "a:" do not match QName production: QName::=(NCName:)?NCName.
	at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:152)
...
Caused by: org.apache.xmlbeans.XmlException: error: Element or attribute "a:" do not match QName production: QName::=(NCName:)?NCName.
...
Caused by: org.xml.sax.SAXParseException; systemId: file://; lineNumber: 2; columnNumber: 6188; Element or attribute "a:" do not match QName production: QName::=(NCName:)?NCName.
	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:204)
	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:178)
	at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:399)
	at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:326)

@centic9
Copy link
Member

centic9 commented Sep 18, 2023

Sorry, my bad, different version of Java have different XML Parsers, should be fixed on latest trunk, please rebase/merge onto latest.

@tballison
Copy link
Contributor Author

tballison commented Sep 18, 2023 via email

@centic9
Copy link
Member

centic9 commented Sep 18, 2023

As this file is now set to IGNORED in stress.xls it should not be executed at all, so if you see the same file-name "clusterfuzz-testcase-minimized-POIXWPFFuzzer-6733884933668864.docx" in the failures, the necessary change seems to be still missing.

CI-Jobs at https://ci-builds.apache.org/job/POI/ were failing as well and are now fixed, thus I am fairly confident that latest trunk has it fixed now.

@tballison
Copy link
Contributor Author

Pushed merge remote-tracking. I'm getting a clean build now. Thank you @centic9 !

@tballison
Copy link
Contributor Author

Looks like I don't have write access?! @pjfanning or @centic9 would you be able to merge? THANK YOU!

@pjfanning
Copy link
Contributor

@tballison You need to commit this to svn. https://svn.apache.org/repos/asf/poi/trunk

The git repo is a read-only mirror afaik.

@tballison
Copy link
Contributor Author

tballison commented Sep 20, 2023 via email

@tballison
Copy link
Contributor Author

47950.zip
This patch looks about right. Will apply shortly. Sorry about that, and thank you, again, @pjfanning and @centic9!

@pjfanning
Copy link
Contributor

@tballison could you close this? I believe that this was applied and was part of POI 5.2.4 release.

@tballison
Copy link
Contributor Author

@tballison tballison closed this Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants