GitHub - serverless-mom/frequency_compressor

Compressing English with a frequency table

This Python tool attempts to compress a large english text (in this example Moby Dick) with a frequency table of the most common english words.

To Use:

save a .txt file in the root directory
set the textFileName and compressFileName in compressor.py
run compressor.py

-credits-

This ended up re-inventing Huffman Coding, with some ineffciencies when compared to that technique.

http://en.wikipedia.org/wiki/Huffman_coding

-status-

Currently the compression/decompression process is non-harmful, but the reduction in filesize is small. Due to a behavior around changing some double-spaced sentence endings, the checksum of the de-compressed file isn't identical to the source, but the readable text block should be identical.

-to do- A few changes to bring us closer to generalized Huffman Coding for English should improve the compression rate vastly

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Sources		Sources
CompressDick.txt		CompressDick.txt
MobyDickshort.txt		MobyDickshort.txt
compressor.py		compressor.py
deCompressDick.txt		deCompressDick.txt
lineprinter.py		lineprinter.py
readme.md		readme.md
top1000.txt		top1000.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

serverless-mom/frequency_compressor

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages