Skip to content

ToDoAndDone

Christian Schneider edited this page Mar 25, 2018 · 1 revision

List of stable online parsers from EMMA that can be adapted

  • BMP/DIB/ICO/CUR/ANI. For BMP it detects a few valid header types: BITMAPCOREHEADER, BITMAPINFOHEADER, BITMAPV4HEADER, BITMAPV5HEADER - done, see bitmapparser.h
  • TIFF images (with multipage support), RAW photo formats: ARW, RAW, RW2, RWL, RAF, KDC, ORF, PEF, MOS, MEF, ERF
  • JPEG images, with support for embedded thumbnails and MJPEG frames - in progress, see jpegparser.h, #12, #13
  • GIF images
  • DICOM/ACR NEMA 1.0 & 2.0 medical images
  • Photoshop PSD images
  • Truevision TGA images
  • Netpbm image formats: PBM, PGM, PPM, PAM
  • SGI Graphics images: SGI, RGB
  • RIFF files: WAV, AVI, SF2
  • AIFF files
  • Module audio formats: MOD done, see modparser.cpp, IT/MPTM, S3M, XM

Archiving, parsers and extractors:

  • libarchive - GitHub, website, license: New BSD License, latest activity: February 2018
  • libzpaq
  • 7-zip code for about two dozen different types.
  • The unarchiver for "more formats than I can remember", "stuff I don't even know what it is", in its author words.
  • QuickBMS: supports tons of file formats, archives, encryptions, compressions (over 500), obfuscations and other algorithms. Currently, +2100 plugins to open different archives. Mostly games but also normal packers, like balz.

Specific compressors and recompressors:

Uncompressed audio:

  • TTA: very fast while maintaining good compression.
  • Optimfrog: stronger/slower
  • Wavpack: the one used on zipx
  • FLAC: "the fastest and most widely supported lossless audio codec" according to its authors.
  • ALAC.

JPG images:

  • Lepton: fastest, weakest.
  • PackJPG: medium, no arithmetic.
  • Paq model: strongest, slowest, no progressive.

MP3 audio:

  • PackMP3: ~15% savings.

MP2 audio:

  • unpackMP2+grzip:m3 (as in fazip): ~19% savings, 2-3x faster than packMP3.

Deflate, bzip and LZW (gif):

  • precomp

zlib:

  • Anti-z

Microsoft algorithms:

  • wimlib (not yet implemented a working recompressor but code ready to use)

General purpose codecs:

Asymmetric:

  • LZMA - Deprecated in favour of LZMA2
  • LZMA2
  • Radyx: LZMA2 with a more parallelizable match finder, can fit a larger dictionary in the same RAM so helpful with ~2-4gb machines and large archives.
  • CSArc: faster than LZMA2, still good compression and good filters too.
  • BSC: A little stronger/slower than LZMA.
  • ZSTD: very efficient on fast compression.
  • LZO: hellishly fast compression.
  • GLZA: good on text, not so much on binary.

Symmetric:

  • MCM: fast cm
  • Grzip: bwt
  • ppmd: good and fast on text, not so much on binary
  • paq* family: best ratios, worst speed.

Filters:

Dedupe:

  • Per file: as in WIM or squashfs files.
  • Bulat's rep: Very fast and efficient; memory hungry.
  • zpaq's hash based: works best at large distances and can be reused in an incremental run.
  • rzip
  • zstd new implementation

Executables:

  • BCJ2
  • E8E9
  • Dispack

Delta:

  • Bulat's
  • Igor's

Text:

  • XWRT
  • FA's lzp and dict

Data rippers (used to identify, for example, a JPG image embedded in an unknown container and process it with a corresponding algorithm):

  • paq8px detection code for uncompressed audio and bitmaps, exe code, gif, jpeg and zlib
  • precomp detection code for gif, jpeg, mp3, pdf bitmaps, deflate and bz2 streams
  • extrJPG (from the author of packJPG)
  • Dragon UnPACKer / Hyper Ripper: 23 formats supported: AVI;BIK;BMP;DDS;EMF;GIF;IFF;JPEG;MIDI;MOV;MPEG Audio;OGG;PNG;TGA;VOC;WAV;WMF;XM and a few more prone to false positives. Pretty slow if the container is unknown.