Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError when opening some (but not all) files in "rw" mode in openxlsx (no formulas present in any file) #210

Open
Boxylmer opened this issue Aug 18, 2022 · 8 comments

Comments

@Boxylmer
Copy link

When opening a file with

 XLSX.openxlsx(results_path, mode="rw") do xf
        sheet = xf[sheet_name]
        # more reading functions
end

I get

ERROR: AssertionError: Some internal files were not loaded into memory. Did you use `XLSX.open_xlsx_template` to open this file?
Stacktrace:
 [1] writexlsx(output_filepath::String, xf::XLSX.XLSXFile; overwrite::Bool)
   @ XLSX C:\Users\Will Laptop\.julia\packages\XLSX\FFzH0\src\write.jl:64
...

Where the call is occurring during writexlsx(filepath, xf, overwrite=true)

Originally thought to be a ram issue, I have this calling on hundreds of other excel sheets (some larger, some smaller) with no real telltale sign as to why one would give this error. What's going on here and how can I narrow down why this would happen in a particular file? Alternatively, what's another design pattern I can apply to large files I need to open and edit? I've tried enable_cache=false, but on rw it appears to be a requirement to also have it enabled. (https://github.com/felipenoris/XLSX.jl/blob/a9738321c945ad4f1a44c1c6406b260f04d411bb/src/read.jl#L184-187)

@Boxylmer
Copy link
Author

Updates:

I rolled back to 0.7.10 and faced no issues. I'll have to stay here until there's some more knowledge about what's going on here. Could this be due to new versions in ZipFile? I didn't see any code that would cause breaking changes like this when moving from 0.7.x -> 0.8.x, but this transition is what causes the issue.

@Boxylmer
Copy link
Author

This is still present in 0.9.x unfortunately :(

@TimG1964
Copy link
Contributor

TimG1964 commented Feb 7, 2024

I have this issue, too. Is there a fix, or do I have to roll back to 0.7.10 (and how do I do that, I wonder...)
ERROR: LoadError: AssertionError: Some internal files were not loaded into memory. Did you use XLSX.open_xlsx_template to open this file? Stacktrace: [1] writexlsx(output_source::String, xf::XLSX.XLSXFile; overwrite::Bool) @ XLSX C:\Users\TGebbels\.julia\packages\XLSX\U2Bcm\src\write.jl:64 [2] writexlsx(output_source::String, xf::XLSX.XLSXFile) @ XLSX C:\Users\TGebbels\.julia\packages\XLSX\U2Bcm\src\write.jl:60

Note: Can't roll back because I'm using Add keep_empty_rows kwarg in 0.10.0

@TimG1964
Copy link
Contributor

TimG1964 commented Feb 7, 2024

Did you use XLSX.open_xlsx_template to open this file?
Also, if my answer to this question is "yes", what does that imply?

@Boxylmer
Copy link
Author

Boxylmer commented Feb 7, 2024

I have this issue, too. Is there a fix, or do I have to roll back to 0.7.10 (and how do I do that, I wonder...) ERROR: LoadError: AssertionError: Some internal files were not loaded into memory. Did you use XLSX.open_xlsx_template to open this file? Stacktrace: [1] writexlsx(output_source::String, xf::XLSX.XLSXFile; overwrite::Bool) @ XLSX C:\Users\TGebbels\.julia\packages\XLSX\U2Bcm\src\write.jl:64 [2] writexlsx(output_source::String, xf::XLSX.XLSXFile) @ XLSX C:\Users\TGebbels\.julia\packages\XLSX\U2Bcm\src\write.jl:60

Note: Can't roll back because I'm using Add keep_empty_rows kwarg in 0.10.0

Im still unsure. Personally I would roll back and edit your project to filter empty rows manually.

@TimG1964
Copy link
Contributor

TimG1964 commented Jun 17, 2024

I managed to work around this for a while, but now it's got me again and now it's a bigger problem for me. I've done a little digging. I don't understand much, but here is what I've found:

ERROR: LoadError: AssertionError: Some internal files were not loaded into memory. Did you use `XLSX.open_xlsx_template` to open this file?
Stacktrace:
 [1] writexlsx(output_source::String, xf::XLSX.XLSXFile; overwrite::Bool)
   @ XLSX C:\Users\TGebbels\.julia\packages\XLSX\U2Bcm\src\write.jl:64

Using some different .xlsx files works just fine.

Line 64 in write.jl says

@assert all(values(xf.files)) "Some internal files were not loaded into memory. Did you use `XLSX.open_xlsx_template` to open this file?"

So I tried to see what was going on with

println(keys(template.files))
println(values(template.files))

which showed:

["xl/sharedStrings.xml", "xl/drawings/drawing1.xml", "customXml/item2.xml", "customXml/itemProps1.xml", "customXml/itemProps2.xml", "xl/worksheets/sheet1.xml", "xl/ctrlProps/ctrlProp1.xml", "_rels/.rels", "xl/workbook.xml", "xl/styles.xml", "[Content_Types].xml", "docProps/core.xml", "xl/_rels/workbook.xml.rels", "xl/worksheets/sheet2.xml", "customXml/_rels/item1.xml.rels", "xl/worksheets/_rels/sheet1.xml.rels", "customXml/item1.xml", "xl/theme/theme1.xml", "customXml/_rels/item2.xml.rels", "docProps/app.xml"]
Bool[1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1]

for a file that fails whereas, for a working file, it shows

["xl/sharedStrings.xml", "xl/drawings/drawing1.xml", "xl/worksheets/sheet1.xml", "xl/ctrlProps/ctrlProp1.xml", "_rels/.rels", "xl/workbook.xml", "xl/styles.xml", "[Content_Types].xml", "docProps/core.xml", "xl/_rels/workbook.xml.rels", "xl/worksheets/sheet2.xml", "xl/worksheets/_rels/sheet1.xml.rels", "xl/theme/theme1.xml", "docProps/app.xml"]
Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

So it looks like XLSX.jl doesn't correctly handle the customXml content.

I can manually delete the customXml folder if I open the xlsx file as a zip. If I do that, the file will work again with XLSX.jl.

["xl/sharedStrings.xml", "xl/drawings/drawing1.xml", "xl/worksheets/sheet1.xml", "xl/ctrlProps/ctrlProp1.xml", "_rels/.rels", "xl/workbook.xml", "xl/styles.xml", "[Content_Types].xml", "docProps/core.xml", "xl/_rels/workbook.xml.rels", "xl/worksheets/sheet2.xml", "xl/worksheets/_rels/sheet1.xml.rels", "xl/theme/theme1.xml", "docProps/app.xml"]
Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

However, it will no longer work with Excel which offers to try to repair the file. Once repaired by Excel, the file fails again but now with even more customXml elements!

["customXml/_rels/item3.xml.rels", "xl/sharedStrings.xml", "xl/drawings/drawing1.xml", "customXml/item2.xml", "customXml/itemProps1.xml", "customXml/item3.xml", "customXml/itemProps2.xml", "xl/worksheets/sheet1.xml", "xl/ctrlProps/ctrlProp1.xml", "_rels/.rels", "xl/workbook.xml", "xl/styles.xml", "[Content_Types].xml", "docProps/core.xml", "xl/_rels/workbook.xml.rels", "customXml/itemProps3.xml", "xl/worksheets/sheet2.xml", "customXml/_rels/item1.xml.rels", "xl/worksheets/_rels/sheet1.xml.rels", "customXml/item1.xml", "xl/theme/theme1.xml", "docProps/custom.xml", "customXml/_rels/item2.xml.rels", "docProps/app.xml"]
Bool[0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1]

I don't understand the internals of Excel and I don't understand how XLSX.jl works. However, I found the lines:

                    # ignore custom XML internal files
                    if startswith(f.name, "customXml")
                        continue
                    end

in the code to read the template file. Are you ignoring these elements on read but requiring them on write? It would be really great to see resolution of this issue.

TimG1964 added a commit to TimG1964/XLSX.jl that referenced this issue Jun 20, 2024
@TimG1964
Copy link
Contributor

Also see this thread on Discourse.

I’ve had a go at this with some success.

To start with, I edited read.jl to remove the exclusion of the customXml internal files from being processed by internal_xml_file_read by commenting out the lines:

# ignore custom XML internal files
#if startswith(f.name, "customXml")
#    continue
#end

This resulted in the following warnings:

┌ Warning: XMLError: xmlns: URI ed0bb2e3-83d5-46a8-9e95-92c1ca9cc599 is not absolute from XML Namespace module (code: 100, line: 1)
└ @ EzXML C:\Users\TGebbels\.julia\packages\EzXML\DL8na\src\error.jl:97
┌ Warning: XMLError: xmlns: URI 8fe8bdaf-491a-4d2e-89dd-756120b60898 is not absolute from XML Namespace module (code: 100, line: 1)
└ @ EzXML C:\Users\TGebbels\.julia\packages\EzXML\DL8na\src\error.jl:97

Besides these, the process seemed to work for my purposes. I don’t really know how significant these warnings are but I suppose Excel knows best what it puts in these elements of a .xlsx file.

Second, I changed the initial condition so that the customXml internal files were treated as binary files and simply “passed through” from read to write without being touched.

if !startswith(f.name, "customXml") && (endswith(f.name, ".xml") || endswith(f.name, ".rels"))
#if endswith(f.name, ".xml") || endswith(f.name, ".rels")

This also worked for my purpose and without producing any errors or warnings!

I ran tests on the revised code which gave the following results:

     Testing Running tests...
Test Summary:   | Pass  Total  Time
read test files |   23     23  3.2s
Test Summary: | Pass  Total  Time
Cell names    |  326    326  0.0s
Test Summary: | Pass  Total  Time
getindex      |   12     12  1.2s
Test Summary:     | Pass  Total  Time
Time and DateTime |    6      6  0.1s
Test Summary:  | Pass  Total  Time
number formats |   16     16  1.3s
Test Summary: | Pass  Total  Time
Defined Names |   24     24  0.5s
Test Summary: | Pass  Total  Time
Book1.xlsx    |   32     32  0.3s
Test Summary:       | Pass  Total  Time
book_1904_ptbr.xlsx |    9      9  0.3s
Test Summary: | Pass  Total  Time
numbers.xlsx  |   75     75  0.0s
Test Summary: | Pass  Total  Time
Column Range  |    9      9  0.1s
Test Summary:      | Pass  Total  Time
CellRange iterator |    1      1  0.1s
Test Summary: | Pass  Total  Time
Table         |  598    598  2.8s
Test Summary:    | Pass  Total  Time
Helper functions |   32     32  0.5s
Test Summary: | Pass  Total  Time
Write         |   73     73  0.3s
Test Summary: | Pass  Total  Time
Edit Template |    3      3  0.2s
Test Summary: | Pass  Total  Time
addsheet!     |   10     10  0.1s
Test Summary: | Pass  Total  Time
Edit          |   13     13  0.3s
Test Summary: | Pass  Total  Time
writetable    |  183    183  1.3s
Test Summary: | Pass  Total  Time
Styles        |   74     74  0.2s
Test Summary: | Pass  Total  Time
filemodes     |  200    200  2.8s
Test Summary: | Pass  Total  Time
escape        |   84     84  0.2s
Test Summary: | Pass  Total  Time
row_index     |    1      1  0.1s
Test Summary: |Time
show xlsx     | None  0.0s
Test Summary:  | Pass  Total  Time
relative paths |    5      5  0.2s
Test Summary:         | Pass  Total  Time
windows compatibility |    3      3  0.0s
Test Summary:    | Pass  Total  Time
whitespace nodes |    8      8  0.0s
Test Summary: | Pass  Total  Time
inlineStr     |   20     20  0.0s
Test Summary:         | Pass  Total  Time
Tables.jl integration |   27     27  3.4s
     Testing XLSX tests passed

TimG1964 added a commit to TimG1964/XLSX.jl that referenced this issue Jun 21, 2024
This is a test for issue felipenoris#210. It checks that templates that are read can be written again. It needs an example .xlsx file containing internal customXml files to test.
@TimG1964
Copy link
Contributor

I'm hoping this PR will be merged soon. I'm still having to use the dev'd version...
😃

felipenoris pushed a commit that referenced this issue Aug 26, 2024
* Update read.jl to pass through Custom XML internal files

Addresses issue #210

* Add example containing customXml internal files

* Read and write template containing customXml 

This is a test for issue #210. It checks that templates that are read can be written again. It needs an example .xlsx file containing internal customXml files to test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants