-
Notifications
You must be signed in to change notification settings - Fork 30
File Format Specification Style
Home » Project Styles » File Format Specification Style
To keep things simple and easy to understand, we prefer that all file format specifications submitted to the HedgeLib Wiki follow the following style guidelines.
Here's a simple example of a file, followed by a specification of the format that file uses.
00 00 00 07 00 00 00 02 00 04 00 01 01 00 00 00
class DATA
{
uint importantNumber; // This number is used for a very important task.
uint importantNumber2; // This one is almost even more important!
ushort importantBooleanCount; // How many importantBooleans there are
bool[] importantBooleans = new bool[importantBooleanCount];
// Pad to a 0x4 offset.
}
This specification is quite obviously C#-based pseudo-code, which I've chosen as my preferred method of writing file format specifications for HedgeLib as I feel that:
- With proper knowledge of computer data, it's extremely simple to read and understand.
- The fact that it's C#-based makes it more consistent with the actual code used in HedgeLib, and therefore, again, easier to understand.
- Specifications like these can be written quickly and easily, therefore large amounts of time don't need to be wasted on things such as formatting.
All of these combined is why I've chosen to write all specifications I publish to this Wiki in a similar fashion to the example seen above, and why I recommend you to do the same.
Let's say we wanted to write a specification to explain something like this:
00 00 00 02 00 00 00 07 00 00 00 01 00 00 00 1A
00 00 00 01
Here, we have a file which repeats a basic structure of one unsigned integer, followed by a boolean that for some reason is padded out to be four bytes (wow, whoever designed this format must be an idiot). How do we represent this as a specification? Well, with something like this:
class DATA
{
uint importantBlobCount;
ImportantBlob[] blobs = new ImportantBlob[importantBlobCount];
}
struct ImportantBlob
{
uint importantNumber;
bool importantBool; // For some reason padded out to be 4 bytes, as if it's an integer. Yeah, ikr? SO stupid!!
}
We basically just define an "ImportantBlob" data-type in our specification, then we say the file contains an array of them. See? Simple! So at this point, the basic rules are as follows:
- Classes are chunks of data that are, for sure, in the file, in order from top to bottom.
- Structs are data types that can be used in the file.
- No matter what, data is represented via a data type, followed by a name, in order from top to bottom.
Alright, I think you get it gist at this point to be honest, but there's a few more things I need to go over as they sometimes do come up when trying to write a specification.
Almost always when writing a specification you need to define some rules, or other such information. For example:
- Whether the file is typically big or little endian.
- If strings in the file are null-terminated or length-prefixed.
- What encoding is to be used for string/character data.
- Who wrote the specification and when.
- Whether the specification is a work-in-progress or not.
Just to name a few.
Typically when I write-up a specification in the style explained on this page and need to define these things, I do so using a multi-line C/C++/C# styled comment towards the top of the file.
However, while this works fine on specifications I share via simple text files, on the HedgeLib Wiki we can do a lot more than that! Hence why I recommend you define this information at the top of the Wiki page for the format in the following fashion:
- Specification Version: 1.0
- Author(s): Radfordhound
- Rules:
- All data is in big-endian, unless otherwise specified.
- All strings are length-prefixed, unless otherwise specified.
You can "specify" exceptions to the rules you list (if any) for the specification simply via a comment at the end of the line, for example:
class DATA
{
uint importantNumber;
uint importantWeirdoNumber; // In little-endian for some reason.
}
Apart from the above-mentioned rules/exceptions stating whether strings in the file are typically length-prefixed or null-terminated, you can also define such like so:
class DATA
{
string importantString; // Length-prefixed
string anotherImportantString; // Null-terminated
// This string must always be written as 0x20 characters, add nulls to fit that if the string you want to write is shorter.
char[0x20] textOfCertainLength;
}
Got a piece of data that's supposed to be the same in every file that follows the format? Simple! Just define the piece of data in the specification similarly to how you would define a variable in C#
class DATA
{
char[4] importantSignature = "ISIG";
uint numberThatForSomeReasonIsAlwaysTheSame = 7;
uint hexMagic = 0x20;
}
Sometimes there's certain data that's only included with a file under a certain condition. For example, let's say you have a model format that can be written as static-terrain-models, or as skeletal-models.
Obviously, if you saved the model as a terrain-model, you wouldn't want to save skeletal data along with it, as there would be no use for such data, and including it would just increase file size and possibly file read/write times!
This is a common case for a lot of formats, basically there's times where certain data just isn't included, but of course still needs to be documented. So how do we do that?
Well as much as I know this triggers the inner-programmer in all of us, I've found the simplest way is this:
class DATA
{
bool hasSkeletalData; // Whether or not this model contains skeletal information.
if (hasSkeletalData)
{
uint boneCount;
Bone[] bones = new Bone[boneCount];
}
}
struct Bone
{
// Shh just pretend I put stuff here
}
You can also include else statements, if-else statements, etc. if you feel it's needed to explain an optional piece of data in the file.
TODO: Link to a full-fledged specification as a more complex example.
Unless otherwise specified, all content contained within the HedgeLib wiki falls under the MIT License, just like HedgeLib itself.