This lesson discusses the problem of analyzing data which is stored in custom binary formats and describes how to go about parsing such data using Kaitai Struct and Construct. It is structured to give readers an understanding of the creation of custom file types and what operations computers and their programs perform to translate file types into a human-readable format.
This lesson is broken into the following sections:
A brief overview of Kaitai Struct, Construct, and the problem that they exist to solve.
Instructions for configuring Python for Construct as well as showing how to access the Kaitai IDE
A discussion of file types, computer number systems, counting with binary and hexadecimals, and endianness.
A discussion of how computers interpret binary data.
An overview of how to use Kaitai, along with descriptions of the IDE and the different sections of the Kaitai editor and important keywords using the .gif filetype as an example.
An overview of how to use Construct, along with explanations of imporant keywords using the .gif filetype as an example.
Showing the creation of example data and an example filetype that will be used in later sections. The example is created using Python to generate data in the form of an arbitrary function that is separated into different sections.
Demonstrating how to use Kaitai to extract the data from the example file. Walks through the creation of a .ksy file, the generation of a Python parser from that file, and the use of the parser inside a Python environment to return the example data.
Demonstrating how to use Construct to extract the data from the example file. Walks through the creation and combination of Structs to describe and load the file data before visualizing it in Python.
Provides additional resources for Kaitai and Construct, such as documentation, forum communities, and mailing lists.
Provides installation advice and advanced options for users of both Kaitai and Construct.
By the end of this lesson, you should be able to:
-
Understand filetypes and file formats as custom binary data formats
-
Have a basic understanding of how to use Kaitai and Construct to parse custom data formats
-
Python 3.6+
-
Construct 2.10+