Skip to content

fix deflation of epub files with abnormal zip header #315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

fred777
Copy link

@fred777 fred777 commented Sep 19, 2021

This fixes zip deflation bug triggered by some watermarked, DRM-free epub file which I bought from a book store. It seems to have been created with Adobe Indesign 9.2.

Cool Reader standalone (as well as KOReader on my Tolino based on crengine, see related issue koreader/koreader#8222) was unable to display it while other readers like the one from Calibre does not have issues. As soon as I extract all contents and re-compress them into a new epub file, Cool Reader did not have issues anymore, so the culprit lies in the binary zip structure of the epub file.

It turned out that packSize was detected as zero within LVZipDecodeStream::Create while unpSize was detected correctly, so the existing workaround did not kick in => I have separated it into two distinct if clauses.

Unfortunately I can't give you the epub due to the watermarking and copyright :-/

if unpSize is detected correctly but packSize is not,
use packSize provided by central director instead
@virxkane
Copy link
Collaborator

Please check InfoZip AppNote 6.3.9:

   4.4.8 compressed size: (4 bytes)
   4.4.9 uncompressed size: (4 bytes)

       The size of the file compressed (4.4.8) and uncompressed,
       (4.4.9) respectively.  When a decryption header is present it 
       will be placed in front of the file data and the value of the
       compressed file size will include the bytes of the decryption
       header.  If bit 3 of the general purpose bit flag is set, 
       these fields are set to zero in the local header and the 
       correct values are put in the data descriptor and
       in the central directory.
       .....................
        Bit 3: If this bit is set, the fields crc-32, compressed 
               size and uncompressed size are set to zero in the 
               local header.  The correct values are put in the 
               data descriptor immediately following the compressed
               data.  (Note: PKZIP version 2.04g for DOS only 
               recognizes this bit for method 8 compression, newer 
               versions of PKZIP recognize this bit for any 
               compression method.)

Can you check this flag with this files?

lUInt16 Flags; // 6

Perhaps you need to add yet another workaround for this case.

@virxkane
Copy link
Collaborator

virxkane commented Sep 19, 2021

Also, please format the code to match the formatting of this file (lvzipdecodestream.cpp).
What is this block for? The condition was removed, but the block remained. I would like the files to be formatted in the same style.

https://github.com/fred777/coolreader/blob/fdcf0a6f638e1d45818d64b554dd30fc5278935b/crengine/src/lvstream/lvzipdecodestream.cpp#L383-L388

Thanks for the fix.
It is inconvenient that you cannot provide the problematic file. This means you need to check that there are no regressions :).

Will there be no problems with catalogs/directories with these conditions?

@fred777
Copy link
Author

fred777 commented Sep 19, 2021

Regarding the formatting - will fix that as you prefer - I just kept the block for cosmetic reasons, i.e. to indicate where the comment is related to...

Regressions - not sure how to deal with it - if you have a test suite somewhere or can give some files that you suspect to be problematic I can check them.

@fred777
Copy link
Author

fred777 commented Sep 19, 2021

The value of hdr.Flags is 8 for META-INF/content.xml => only bit 4 is set. The full contents of the hdr struct is:

		hdr	@0x7ffe6d2396c0	ZipLocalFileHdr
			Flags	8	lUInt16
			Mark	67324752	lUInt32
			UnpOS	0	lUInt8
			UnpVer	20	lUInt8
			others	@0x7ffe6d2396c8	lUInt16[11]
				[0]	8	lUInt16
				[1]	20672	lUInt16
				[2]	17665	lUInt16
				[3]	0	lUInt16
				[4]	0	lUInt16
				[5]	0	lUInt16
				[6]	0	lUInt16
				[7]	266	lUInt16
				[8]	0	lUInt16
				[9]	22	lUInt16
				[10]	0	lUInt16

...and this is the related zipinfo output:

Central directory entry #122:
---------------------------

  There are an extra 16 bytes preceding this file.

  META-INF/container.xml

  offset of local header from start of archive:   3060847
                                                  (00000000002EB46Fh) bytes
  file system or operating system of origin:      MS-DOS, OS/2 or NT FAT
  version of encoding software:                   2.0
  minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
  minimum software version required to extract:   2.0
  compression method:                             deflated
  compression sub-type (deflation):               normal
  file security status:                           not encrypted
  extended local header:                          yes
  file last modified on (DOS date/time):          2014 Aug 1 10:06:00
  32-bit CRC value (hex):                         053eef39
  compressed size:                                188 bytes
  uncompressed size:                              266 bytes
  length of filename:                             22 characters
  length of extra field:                          0 bytes
  length of file comment:                         0 characters
  disk number on which file begins:               disk 1
  apparent file type:                             binary
  non-MSDOS external file attributes:             000000 hex
  MS-DOS file attributes (00 hex):                none

  There is no file comment.

Not sure if this helps...

@virxkane
Copy link
Collaborator

Regressions - not sure how to deal with it - if you have a test suite somewhere or can give some files that you suspect to be problematic I can check them.

Unfortunately, we do not have any test suite, so you can just check the functionality on other zip files. Although your code looks safe, I think it shouldn't be a problem.
How often have I thought so, and how often have I been wrong :)
But OK, I don't mind. But we'll have to wait for @buggins, I don't have an agreement with him about accepting other people's PR.

@virxkane
Copy link
Collaborator

The value of hdr.Flags is 8 for META-INF/content.xml => only bit 4 is set. The full contents of the hdr struct is:

You can also uncomment this line:

//#define DUMP_ZIP_HEADERS

and see in console some debug messages including contents of the extra fields. Description of this extra field you can find in InfoZip AppNote.

@poire-z
Copy link
Contributor

poire-z commented Sep 19, 2021

Looks quite innofensive to me.
I haven't looked a the code around, but the only situation that could feel it needs a test is with a file with empty content inside the zip (where I expect unpSize to be 0 while packSize might not be).

@fred777
Copy link
Author

fred777 commented Sep 19, 2021

There you go:

2021/09/19 19:19:16.2621 TRACE ZIP entry 'META-INF/container.xml' unpSz=266, pSz=188, m=8, offs=3060847, zAttr=0, flg=0, addL=0, commL=0, dn=0

For your convenience I also tried an empty epub file (aka empty zip file) => no crashes / errors, I guess because it will never try to read META-INF/container.xml in this cause ;-) The first thing it tries to extract from epub zip is mimedata in zip root - which is not present in this case here, so the code will exit early.

@poire-z
Copy link
Contributor

poire-z commented Sep 19, 2021

TRACE ZIP entry 'META-INF/container.xml' unpSz=266, pSz=188

That was with an empty META-INF/container.xml ? So, unpSz is not the size of the file content, but the size of the zip item/entry, which must includes the date and other attributes, so always non-zero, right ?

@virxkane
Copy link
Collaborator

virxkane commented Sep 19, 2021

So, unpSz is not the size of the file content, but the size of the zip item/entry, which must includes the date and other attributes, so always non-zero, right ?

Isn't this information present in localzipheader and extra fields?
This is exactly the size of the file. See appnote infozip, section 4.4.9.

@fred777
Copy link
Author

fred777 commented Sep 19, 2021

No, that was with the original epub file. With empty container.xml the result is

2021/09/19 19:41:33.0281 TRACE ZIP entry 'META-INF/container.xml' unpSz=0, pSz=0, m=0, offs=780464, zAttr=0, flg=81800000, addL=24, commL=0, dn=0
2021/09/19 19:41:33.0281 TRACE   ZIP entry extra data: :55:54:05:00:03:77:75:47:61:75:78:0B:00:01:04:E8:03:00:00:04:64:00:00:00

=> notice that the original epub file does not have extra header data - opposed to the modified file I just created with empty container.xml

@poire-z
Copy link
Contributor

poire-z commented Sep 19, 2021

OK, I don't want to put my head in the specs, so trusting you that your small fix is sane with various combinations of packSize / unpSize when only one is zero, and that it is sane to use only one of srcPackSize / srcUnpSize :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants