Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot open file (failed to parse internal XML file) #243

Open
tbeason opened this issue Sep 18, 2023 · 1 comment
Open

Cannot open file (failed to parse internal XML file) #243

tbeason opened this issue Sep 18, 2023 · 1 comment

Comments

@tbeason
Copy link

tbeason commented Sep 18, 2023

I'm not quite sure what the problem is here, but I've encountered a file that I am not able to open! Luckily the file is public so maybe somebody can figure this out. Link: https://www.aqr.com/-/media/AQR/Documents/Insights/Data-Sets/Quality-Minus-Junk-Factors-Monthly.xlsx

With that file downloaded, I try this:

df = @chain begin
        XLSX.readtable("Quality-Minus-Junk-Factors-Monthly.xlsx", "QMJ Factors","A:AE";first_row=19,infer_eltypes=true)
        DataFrame
        transform("DATE"=>ByRow(d->Date(d,dateformat"m/d/Y"))=>"DATE")
        select("DATE","USA"=>"QMJ")
    end

But get the error

┌ Error: Failed to parse internal XML file `_rels/.rels`
└ @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:446
ERROR: EOFError: read end of file
Stacktrace:
  [1] read!
    @ ZipFile C:\Users\beasont\.julia\packages\ZipFile\evaHP\src\Zlib.jl:299 [inlined]
  [2] unsafe_read(f::ZipFile.ReadableFile, p::Ptr{UInt8}, n::UInt64)
    @ ZipFile C:\Users\beasont\.julia\packages\ZipFile\evaHP\src\ZipFile.jl:498
  [3] unsafe_read
    @ EzXML .\io.jl:774 [inlined]
  [4] (::EzXML.var"#7#8")(context::ZipFile.ReadableFile, buffer::Ptr{UInt8}, len::Int32)
    @ EzXML C:\Users\beasont\.julia\packages\EzXML\ZNwhK\src\document.jl:218
  [5] macro expansion
    @ XLSX C:\Users\beasont\.julia\packages\EzXML\ZNwhK\src\error.jl:50 [inlined]
  [6] readxml
    @ XLSX C:\Users\beasont\.julia\packages\EzXML\ZNwhK\src\document.jl:154 [inlined]
  [7] internal_xml_file_read(xf::XLSX.XLSXFile, filename::String)
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:444
  [8] xmldocument
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:480 [inlined]
  [9] xmlroot
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:484 [inlined]
 [10] get_package_relationship_root(xf::XLSX.XLSXFile)
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\relationship.jl:51
 [11] parse_relationships!(xf::XLSX.XLSXFile)
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:296
 [12] open_or_read_xlsx(source::String, read_files::Bool, enable_cache::Bool, read_as_template::Bool)
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:235
 [13] openxlsx(f::XLSX.var"#32#33"{Int64, Nothing, Bool, Bool, Bool, Nothing, Bool, String, String}, source::String; mode::String, enable_cache::Bool)
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:135
 [14] openxlsx
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:128 [inlined]
 [15] #readtable#31
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:611 [inlined]
 [16] top-level scope
    @ REPL[12]:2

Here is my Project.toml (I am on 1.10 beta 2)

  [336ed68f] CSV v0.10.11
  [13f3f980] CairoMakie v0.10.9
  [8be319e6] Chain v0.5.0
  [992eb4ea] CondaPkg v0.2.18
  [60f91f6f] CovarianceMatrices v0.10.4
  [a10d1c49] DBInterface v2.5.0
  [a93c6f00] DataFrames v1.6.1
  [d2f5444f] DuckDB v0.8.1
  [bd2a388e] FamaFrenchData v0.4.3
⌃ [38e38edf] GLM v1.8.3
  [5432bcbf] PaddedViews v0.5.12
  [6099a3de] PythonCall v0.9.14
  [cbe49d4c] RemoteFiles v0.5.0
⌅ [2913bbd2] StatsBase v0.33.21
⌅ [3eaba693] StatsModels v0.6.33
  [bd369af6] Tables v1.10.1
  [fdbf4ff8] XLSX v0.10.0
  [ade2ca70] Dates
  [f43a241f] Downloads v1.6.0
  [37e2e46d] LinearAlgebra
  [10745b16] Statistics v1.9.0
Info Packages marked with ⌃ and ⌅ have new versions available, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`

If I open the file in Excel and Save As to a new file, everything works fine. I don't know how to do that programmatically though!

@tbeason
Copy link
Author

tbeason commented Sep 18, 2023

I should add that I have been able to open the file, immediately after downloading, with the pandas.read_excel function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant