Support for "Unicode Path Extra Field (0x7075)" and "Unicode Comment Extra Field (0x6375)" #656
Replies: 2 comments 3 replies
-
Instead of using those, you can just set the Unicode flag (bit 11). The docs for the extra fields you mentioned actually explicitly suggest you do that unless you need different encodings for comment and file name:
|
Beta Was this translation helpful? Give feedback.
-
Are these the same fields mentioned in the (rather old) #33 ? |
Beta Was this translation helpful? Give feedback.
-
Request
I would like you to realize one of the following functions.
Background of request
I'm using Windows (NTFS) and I mainly use "WinRar" as the software for compression / decompression.
I often work with ZIP files that contain multibyte character entry names, as I often use filenames that contain multibyte characters.
In most cases, the problem will not occur. However, in rare cases, the entry name may not be converted successfully. This is because the character set of the NTFS file name is UNICODE and the ZIP file entry name is SHIFT-JIS (for Japanese).
"WinRar" adds "Unicode Path Extra Field" if the entry name contains multibyte characters. That's probably because it avoids the problems mentioned above.
On the other hand, I'm developing software for batching ZIP files and I'm thinking of using "SharpZipLib" for that.
Unfortunately, the current version of "SharpZipLib" doesn't seem to support "Unicode Path Extra Field", but fortunately "SharpZipLib" has published "ITaggedData" interface so I'm trying to implement it myself.
However, according to the "Unicode Path Extra Field (0x7075)" specification (see section 4.6.9 of "https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT"), there is a condition to refer to the UTF8 string contained in this extra field. The condition is that the value of NameCRC32 (4-byte integer) on the extra field matches the CRC of the byte array of the original entry name (Name property of the ZipEntry class). However, in the current version of "SharpZipLib", there is no way to get the byte array of entry names or their CRC.
I can convert the value of the Name property of the ZipEntry class to a byte array using the default Encoding, but that doesn't always match the original byte array.
For the above reasons, I want to get a byte array of entry name or its CRC.
For reference, I will post the source code of the implementation of "Unicode Path Extra Field (0x7075)" that I am developing.
You are free to modify or reprint these source codes.
Note that "Unicode Comment Extra Field (0x6375)" for entry comments can be implemented in the same way, except that the tag IDs are different.
However, in that case as well, it is necessary to be able to obtain the byte array (or its CRC) from which the "Comment" property of the "ZipEntry" class is based.
Source code of "UnicodePathExtraField" class (sample under development)
You are free to modify or reprint these source codes.
How to use the "UnicodePathExtraField" class
You are free to modify or reprint these source codes.
Beta Was this translation helpful? Give feedback.
All reactions