Skip to content

Conversation

@aliheidary1381
Copy link

I've also added support to encode & decode raw, wave riff, & aiff float formats.

The float encoding feature achieves a near 70% compression ratio, which is better than nothing. No oss-fuzz or tests have been added yet (sorry). The replay gain feature should also be expanded in the future to support the new feature. Sorry for the big PR. It's pretty readable tho! I tried to make it as modular & independent as possible. Documentation is good. Should this make it to a release version, an update to the standard RFC could also be considered. The changes are backwards-compatible and are as follows:

  1. When storing float samples, the bps bits in the streaminfo metadata block should be 0b00000.
  2. When storing float samples, the zero-padded bit, originally reserved, after the bit-depth bits of each frame header should be 1.
  3. When storing float samples, the actual data samples stored in each subframe are obtained by doing some bit manipulation before encoding and after decoding. That part is in src/libFLAC/transform_float.c and is necessary to boost the compression ratio (more info in the file comments). Monkey's Audio .ape does something similar, but this one seems to achieve better ratios (~8%). Sorry for the unorthodox type conversions. will fix it if I see it being considered for a merge.

…standard) samples. I've also added support to encode & decode raw, wave riff & aiff float formats.

The float encoding feature achieves a near 70% compression ratio, which is better than nothing. no oss-fuzz or tests have been added yet (sorry). the replay gain feature should also be expanded in the future to support the new feature.
sorry for the big PR. it's pretty readable tho! I tried to make it as modular & independent as possible. documentation is good.
should this make it to a release version, an update to the standard RFC could also be considered. the changes are backwards-compatible and are as follows:
1. when storing float samples, the bps bits in the streaminfo metadata block should be 0b00000.
2. when storing float samples, the zero-padded bit, originally reserved, after the bit-depth bits of each frame header should be 1.
3. when storing float samples, the actual data samples stored in each subframe are obtained by doing some bit manipulation before encoding and after decoding. That part is in src/libFLAC/transform_float.c and is necessary to boost the compression (more info in the file comments).
Monkey's Audio .ape does something similar, but this one seems to achieve better ratios (~8% improvement). sorry for the unorthodox type conversions. will fix it if I see it being considered for a merge.
@ktmf01
Copy link
Collaborator

ktmf01 commented Aug 12, 2025

Hi,

Many thanks for filing a PR. I haven't tested it, and as you might understand, this will need a lot of testing :) I've tried doing this in the past, but didn't get very far.

float PCM coding has been a often-requested feature, so this would be a great addition in that regard. I have some "objections on principal grounds" to floating point audio in FLAC. For example, floating point audio isn't meant for playback. Also, because this breaks forward compatibility. It feels to me like this will cause confusion for a lot of users, as these FLAC files will not playback on current equipment. But then again, that was also the case (and still is for some newly manufactured equipment) for 24-bit audio. I'm not saying this won't be merged, but I'll need to hear a few opinions of other people involved in the FLAC project. Still, I already know a lot of people want this.

I am surprised you didn't need any new coding methods (subframe types or residual coding methods). I think this is really nice.

FLAC is also used for compression measurement signals (for scientific experiments and such) where this also might be useful. Not sure whether your bit manipulation works for that as well.

After I test this myself (which might take a while), I would like to propose this change through various channels with FLAC enthousiasts. However, before that, I would like to make the format changes future-proof. The problem is that the reserved bit after the bit depth bits that you use, is the last remaining reserved bit. Use of this makes it impossible for the frame header to extended any further in the future. I would like this bit to have a different function: signal an extension to the frame header with one byte of extra flag bits (or feature bits) and a channel mask.

Anyway, many thanks for taking on this task. Please be patient, as merging might take quite a while, and publishing a release after that a while longer.

@ktmf01
Copy link
Collaborator

ktmf01 commented Aug 12, 2025

By the way, perhaps we should also define a new STREAMINFO metadata block variant.

@aliheidary1381
Copy link
Author

aliheidary1381 commented Aug 12, 2025

Thx!

The problem is that the reserved bit after the bit depth bits that you use, is the last remaining reserved bit. Use of this makes it impossible for the frame header to extended any further in the future. I would like this bit to have a different function: signal an extension to the frame header with one byte of extra flag bits (or feature bits) and a channel mask.

There's also another reserved bit in the frame header, just after the sync bits at the very beginning (FLAC__FRAME_HEADER_RESERVED_LEN). That one suits better for your proposed (alternative) functionality, as it is closer to the beginning.
I think using the sample type indicator fits better in this place (FLAC__FRAME_HEADER_ZERO_PAD_LEN, right after the sample bps), IMHO.

By the way, perhaps we should also define a new STREAMINFO metadata block variant.

I don't think it'd be necessary. A bps of 1-3 is forbidden, according to the current standard rfc. Using them seems like a no-brainer to me. It also suggests (to the older decoders) to stop playing these (cause of the forbidden value), and requires minimal changes in the standard and other implementations.

p.s. Sorry for closing the PR, wrong button 😅

@ktmf01
Copy link
Collaborator

ktmf01 commented Aug 12, 2025

There's also another reserved bit in the frame header

No, there isn't. The code still refers to it, but the RFC made it part of the sync code. That is to make it distinguishable from MPEG. See here: https://lists.xiph.org/pipermail/flac-dev/2008-December/002607.html So, we only have one bit remaining.

By the way, perhaps we should also define a new STREAMINFO metadata block variant.

I don't think it'd be necessary. A bps of 1-3 is forbidden, according to the current standard rfc. Using them seems like a no-brainer to me. It also suggests (to the older decoders) to stop playing these (cause of the forbidden value), and requires minimal changes in the standard and other implementations.

When it is necessary to deviate from the standard, I'd like to do it in a clear and conscious way. So, there'll be more features incorporated in a new streaminfo metadata block, like increasing the max number of samples or perhaps increasing the max samplerate and the total number of samples. Still really niche stuff, so it is really only necessary for exotic stuff. The thing is, like using floats (which is a niche), there are others using FLAC for stuff it wasn't made for, like RF captures.

So, this will take a long time.

@aliheidary1381
Copy link
Author

No, there isn't.

Oh, OK.
There's also a reserved bit pattern for the frame header's bit depth bits (0b011). How about doing something similar to what's done to the streaminfo bps bits?

@H2Swine
Copy link
Contributor

H2Swine commented Aug 16, 2025

I completely agree both on

  1. cool! and
  2. think over this more than just once.

There is also the 64-bit float format (in both endiannesses) - not that it is any more urgently needed for listening, but for compatibility (edit: with DAW plugins, for example) it wouldn't be a bad thing for a FLAC plugin to be able to handle "everything" the application could save. (64-bits would likely need more than 5-bit Rice, but who cares if that element also makes today's decoders err out on something they cannot decode.)

Here is a part of a possible solution, if we adopt the following view:

For example, floating point audio isn't meant for playback.

The FLAC format admits sample rate "0" for non-audio. It could also be used for "audio in files that are stored as non-audio", with the FLAC format being used to compress the file and not just the audio stream: interpret the use of "0" as "files you need to treat as a full file", and a player/DAW should then skip it unless it knows what it is doing.

There are a bunch of APPLICATION block types left to be used. You got ones for foreign metadata already, but here you might consider ones for mandatory file headers/footers (I think footer would be potentially more crucial for float than for integer, with possible metadata chunks for volume?!), and maybe one for audio properties the decoder needs to reconstruct the files (endianness, and signendness although that isn't applicable for float) - and source file extension (like WavPack does)?

So the workflow of a "player"/DAW would then be:

  • Step 0: Read the "0" sample rate. If it doesn't know there could be something for it, then treat it as per current RFC, refuse to recognize it as anything playable. Otherwise:
  • Step 1: Read and understand those APPLICATION block types and the content - or err out. If it continues:
  • Step 2: Receive the "audio"; likely then flac-the-decoder should know enough to reorder endianness (/signedness) and interleave the channels
  • Step 3: Treat the bitstream as if it were the original file, from the beginning of the file headers.

This could also make it possible to store AIFF(/CAF) with non-integer sampling rates - or object-based audio in the BW64 format (like Monkey's and WavPack do), as there would be no need for FLAC to "understand" the objects metadata. Sure the _en_coder does need to know what a sample and a channel is, and if it is not aware of the input format it could be force-fed with "raw format options" plus OptimFROG-stype --headersize and --tailsize. As long as the source file is structured in the order header--audio--footer and nothing between audio and audio, then it would future-proof against new file types? (And past-proof enough to encode .au and A-law and µ-law ... just what the world has not been waiting for.)
To handle headerless input, the _en_coder needs to have a minimal file header and store it. The default for FLAC would be a WAVE header although AIFF can contain more info, like non-integer sample rates - and the level of support would be up to implementing more "raw format" parameters.

One more consideration at file-level: lift the max metadata size limit. Attached art could be much bigger nowadays. Edit: Thanks to a correction that the size limit is per metadata block, it could just be done by distinguishing "first 16 meg of file header" from "next 16 meg of file header" and same for footer.

In frame headers then:

There's also a reserved bit pattern for the frame header's bit depth bits

And more:

  • Bit depth info as you say, has vacant/forbidden values 1 to 3, although I suggest that "1" should be available in case of a future extension to compress 1-bit PWM (there is one such compressor out there, WavPack - that has a permissive license but I would surely take the polite way to ask first. "1" for "other, please specify" of course also keeps it available).
  • A frame header has three distinct ways to set "zero" sample rate. This idea likely spends two of them to distinguish between "0 that means rewind to beginning and see file metadata" and one for "0 that means this IS STILL NOT audio so don't even try!".
  • Channel bits info has vacant values/patterns (already when the channel mask Vorbis comment was introduced, one such could be used for "see that info or err out").

@aliheidary1381
Copy link
Author

How does WavPack (or its patented competitor, DST) compress 1-bit PWM streams, I don't know. I suppose we also need to reserve some space for defining new subframe types later?

AFAIK, with --keep-foreign-metadata used, FLAC could store BW64 XML chunks (including ADM chna and axml chunks used for immersive/3d moving sound objects) as-is. It can also store channel masks in Vorbis comments, at least as a file header.

As for the frame header, I will use bit_depth=0b000 to indicate that it is only stored in the streaminfo metadata block, with no changes to the frame header definition. The float samples feature is already outside of the streamable subset. There was no need to change anything in frame headers. Though I think a channel mask on frame headers is a very welcome addition for the streamable subset, it is outside the scope of this PR.

I'm more in favour of defining a new STREAMINFO_EXTENSION metadata block type (instead of wrapping it inside an APPLICATION block), with the same workflow you said.
The required fields, using the ideas mentioned here, are:

  1. sample type (float PCM, int PCM, 1-bit PWM, or even A-law, µ-law, etc). It should also specify the number of bits, endianness, and signedness, ultimately specifying the samples' bit structure.
  2. sample rate, stored in floats (adding support for bigger and non-integer rates).
  3. more channels (than 8). It could also be used when/if support for immersive/3d moving sound objects becomes better. Dolby Atmos, for example, supports up to ~128 channels (and 16 even for its downmixed spatial coded streams).
  4. more reserved bits.

@H2Swine
Copy link
Contributor

H2Swine commented Aug 19, 2025

Oof my bad, I wrote APPLICATION when I meant "metadata". I agree yes, different (new) type - there are 120 left, not running out soon. STREAMINFO_EXTENSION to inform a "new audio type-aware" decoder/player how to play, and a "this block type-aware but won't play" decoder how to order the decoded bits wrt. endianness and signedness and interleaving. Then I suggest types to inform about and store file header and footer when those are "mandatory" to get output right; If full file headers/footers are indeed indispensable, there should be a way to get them past the 16MiB by spanning them over several blocks if necessary; whether the STREAMINFO_EXTENSION contains the info about that or there are just block types for "first" and "continued" header and ditto footer ... there are many ways.

Though I think a channel mask on frame headers is a very welcome addition for the streamable subset, it is outside the scope of this PR.

Though I tend to disagree (I think that "subset" should not become more permissible, that would put new demands on finalized software that claims to decode subset), and maybe it has to be sorted out before committing, but ... not needed yet.

@aliheidary1381
Copy link
Author

aliheidary1381 commented Aug 19, 2025

Well, now that we're modifying the STREAMINFO block, we can also replace the 3-byte block size indicator with an elias gamma representation...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants