Unused series unloading #110

gshigin · 2025-06-02T16:02:27Z

Loader/Unloader classes for unused series manipulation
Test coverage

# Conflicts: # pp/bare_bones/encoding.h

cherep58 · 2025-06-03T09:00:36Z

pp/bare_bones/bit_sequence.h

@@ -321,6 +325,13 @@ class PROMPP_ATTRIBUTE_PACKED CompactBitSequence : public CompactBitSequenceBase
 public:
  [[nodiscard]] PROMPP_ALWAYS_INLINE BitSequenceReader reader() const noexcept { return {Base::memory_, size_in_bits_}; };

+  PROMPP_ALWAYS_INLINE void trim_lower_bytes(uint32_t bytes_count) {
+    assert(Bit::to_bits(bytes_count) <= Base::size_in_bits());
+    memset(Base::memory_, '\0', bytes_count);


Invalid zeroing bytes: you should zero bytes in region Base::size_in_bytes() - bytes_count

pp/bare_bones/bit_sequence.h

cherep58 · 2025-06-03T09:22:41Z

pp/bare_bones/bitset.h

+  template <OutputStream S>
+  PROMPP_ALWAYS_INLINE void write_to(S& stream) const noexcept {
+    const uint32_t data_size_in_bits = size();
+    const uint32_t data_size_in_bytes = ((data_size_in_bits + 63) >> 6) * 8;


You can create method:

bare_bones/bit.h template <class T> PROMPP_ALWAYS_INLINE constexpr T used_bytes(T bits) noexcept { return to_bytes(bits + kByteBits - 1); }

And use it here. Also you can refactor similar code in CompactBitSequenceBase::size_in_bytes and in Loader class

cherep58 · 2025-06-03T11:06:59Z

pp/series_data/unloading/unloader.h

+
+  [[nodiscard]] PreparedSequences prepare_sequences() const noexcept {
+    PreparedSequences result{};
+    result.total_bitseqs_size = 0;


It's better to initialize this field in PreparedSequences:

struct PreparedSequences { ... uint32_t total_bitseqs_size{}; }

cherep58 · 2025-06-03T11:09:07Z

pp/series_data/unloading/unloader.h

+
+    for (const auto ls_id : storage_.unused_series_bitmap) {
+      const auto encoding_type = storage_.open_chunks[ls_id].encoding_state.encoding_type;
+      if (!storage_.open_chunks[ls_id].is_empty() && is_unloadable_encoder(encoding_type)) {


Suggested change

if (!storage_.open_chunks[ls_id].is_empty() && is_unloadable_encoder(encoding_type)) {

if (!storage_.open_chunks[ls_id].is_empty() && is_unloadable_encoder(encoding_type)) {

Suggested change

if (!storage_.open_chunks[ls_id].is_empty() && is_unloadable_encoder(encoding_type)) {

if (is_unloadable_encoder(encoding_type)) {

Because unloadable chunk can't be empty

cherep58 · 2025-06-03T11:13:37Z

pp/series_data/unloading/unloader.h

+    for (const auto ls_id : storage_.unused_series_bitmap) {
+      const auto encoding_type = storage_.open_chunks[ls_id].encoding_state.encoding_type;
+      if (!storage_.open_chunks[ls_id].is_empty() && is_unloadable_encoder(encoding_type)) {
+        result.ls_id_bitmap.resize(ls_id + 1);


You call result.ls_id_bitmap inside loop. It degrades performance.
You can find max unloadable chunk ls_id (use reverse iterator) and call result.ls_id_bitmap.resize before loop

cherep58 · 2025-06-03T11:30:06Z

pp/series_data/unloading/unloader.h

+                              const BareBones::Bitset& ls_id_bitmap,
+                              const EncodingChunkLengthSequence& chunk_length_sequence,
+                              const EncodingChunkIDSequence& chunk_id_sequence) noexcept {
+    ls_id_bitmap.write_to(stream);


You can reserve size in stream: calculate needed memory size in prepare_sequences and call stream.reserve. See how it implemented in series_data::serialization::Serializer::serialize_impl

cherep58 · 2025-06-03T15:48:20Z