Overview
In the age of big data, there is a constant struggle with regard to storage. For example, a single Whole Genome Sequencing experiment can generate over 100 GB of raw data. For years, computer scientists have attempted to reduce the storage size of files without losing any information. In this assignment, we will be implementing our own file compression tool using the Huffman algorithm.
Task: File Compression and Decompression
In this project, "all" you have to do is implement two programs: compress
and decompress
. The usage should be as follows:
./compress ./decompress
File Compression
The compress
program will take as input an arbitrary file (original_file
) and will use Huffman Compression to create a compressed version of the file (compressed_file
):
Construct a Huffman code for the contents of original_file
Use the constructed Huffman code to encode original_file
Write the results (with a header to allow for decompression) to compressed_file
File Decompression
The decompress
program will then take as input a compressed file created by your compress
program (compressed_file
) and will decompress the file (decompressed_file
):
Construct a Huffman code by reading the header of compressed_file
Use the constructed Huffman code to decode compressed_file
Write the results to decompressed_file
Note that decompressed_file
must be completely identical to original_file
! In other words, calling diff
should return no differences:
diff decompressed_file original_file # this should not output anything
Reference Solution
You have been provided two programs (refcompress
and refdecompress
), which are a reference solution implementation of this Project. You can use them to help guide you. Note that the reference binaries were compiled to run on Ed, so if you attempt to run them on a different architecture, they will most likely not work.
Note: Your compress
is not expected to work with refdecompress
, and your decompress
is not expected to work with refcompress
. The provided reference executable files are a matched pair.
Compiling and Running Your Code
We have provided a Makefile
, but you are free to modify anything in the starter code that you'd like (including the Makefile
and any of the .cpp
or .hpp
files) however you wish (including adding public
methods). You may also create your own source files and header files. Our only requirements are the following:
You should not import or use any libraries or methods that directly perform the compression or decompression for you.
You should not change any of the g++ compile flags (CXXFLAGS
).
Your code needs to be compilable and produce two executables, compress
and decompress
, using the make
command
Your compress
and decompress
programs can be run as described above after make
.