Skip to content

Add tracing support to the compressor#7385

Draft
connortsui20 wants to merge 4 commits intodevelopfrom
ct/compress-tracing
Draft

Add tracing support to the compressor#7385
connortsui20 wants to merge 4 commits intodevelopfrom
ct/compress-tracing

Conversation

@connortsui20
Copy link
Copy Markdown
Contributor

@connortsui20 connortsui20 commented Apr 10, 2026

Summary

Tracking issue: #7216

We have very little observability into the compressor. When we are debugging, we don't really have any idea of what schemes the compressor is trying, how good or how bad estimates are, how reliable sampling is, how the cascading paths look, etc.

This change adds tracing support to vortex-compressor. The compressor now emits structured tracing spans and events across four composable RUST_LOG targets (cascade, select, estimate, encode).

The scheme.compress_result event is the most important, which reports before/after bytes and estimated vs actual ratio, with a new short_circuit { reason = "larger_output" } surfacing the previously-silent case where a chosen scheme produced a larger output than the canonical input.

All instrumentation lives in the orchestration layer (none of the ~23 individual Scheme impls were touched), field names are stable so tracing-perfetto/opentelemetry/timing subscribers work with no adapter code, and an integration test pins the event names against rename.

I still need to figure out how to make this useful when the compressor generates a HUGE amount of logs (for example when it produces logs when generating TPC-H partition files).

Testing

Some basic integration testing for tracing.

@connortsui20 connortsui20 added the changelog/feature A new feature label Apr 10, 2026
@connortsui20 connortsui20 requested review from a10y and robert3005 April 10, 2026 14:24
@connortsui20
Copy link
Copy Markdown
Contributor Author

Here is an example of some information we can get from the tracing json output.

This is looking at the trace for generating tpch SF1 data, which produces 6837 logs.

RUST_LOG=vortex_compressor::encode=debug \
          cargo run --release --bin data-gen -- \
              --log-format json \
              --opt scale-factor=1.0 \
              --formats vortex \
              tpch \
          2> trace.jsonl

This is an example of looking at all times the compressor chooses a scheme and the final result ends up being larger than the original array:

jq -r 'select(.fields.message == "scheme.compress_result"
                and .fields.accepted == false)
         | .fields.scheme' trace.jsonl \
        | sort | uniq -c | sort -rn

 143 vortex.int.for
  27 vortex.bool.constant
  24 vortex.int.dict
  10 vortex.int.bitpacking
   6 vortex.int.constant

And we can see that the estimator for FoR is very off for some reason:

Details
❯ jq -n '
    [inputs
     | select(.fields.message == "scheme.compress_result")
     | .fields as $f
     | select(($f.after_nbytes // 0) > 0
              and ($f.before_nbytes // 0) > 0
              and ($f.estimated_ratio // null) != null)
     | ($f.before_nbytes / $f.after_nbytes) as $actual_ratio
     | {
         scheme:          $f.scheme,
         estimated_ratio: $f.estimated_ratio,
         actual_ratio:    $actual_ratio,
         before_nbytes:   $f.before_nbytes,
         after_nbytes:    $f.after_nbytes,
         accepted:        $f.accepted,
         relative_error:  (($f.estimated_ratio - $actual_ratio) / $actual_ratio)
       }
     | select(.relative_error > 0)
    ]
    | sort_by(.relative_error)
    | reverse
    | .[:15]
  ' trace.jsonl
[
  {
    "scheme": "vortex.int.for",
    "estimated_ratio": 1.6,
    "actual_ratio": 0.003125,
    "before_nbytes": 2,
    "after_nbytes": 640,
    "accepted": false,
    "relative_error": 511
  },
  {
    "scheme": "vortex.int.for",
    "estimated_ratio": 4.571428571428571,
    "actual_ratio": 0.008928571428571428,
    "before_nbytes": 8,
    "after_nbytes": 896,
    "accepted": false,
    "relative_error": 511
  },
  {
    "scheme": "vortex.int.for",
    "estimated_ratio": 4.0,
    "actual_ratio": 0.0078125,
    "before_nbytes": 8,
    "after_nbytes": 1024,
    "accepted": false,
    "relative_error": 511
  },
  {
    "scheme": "vortex.int.for",
    "estimated_ratio": 1.6,
    "actual_ratio": 0.003125,
    "before_nbytes": 2,
    "after_nbytes": 640,
    "accepted": false,
    "relative_error": 511
  },
  {
    "scheme": "vortex.int.for",
    "estimated_ratio": 8.0,
    "actual_ratio": 0.015625,
    "before_nbytes": 2,
    "after_nbytes": 128,
    "accepted": false,
    "relative_error": 511
  },
  {
    "scheme": "vortex.int.for",
    "estimated_ratio": 2.56,
    "actual_ratio": 0.005,
    "before_nbytes": 16,
    "after_nbytes": 3200,
    "accepted": false,
    "relative_error": 511
  },
  {
    "scheme": "vortex.int.for",
    "estimated_ratio": 5.818181818181818,
    "actual_ratio": 0.011363636363636364,
    "before_nbytes": 16,
    "after_nbytes": 1408,
    "accepted": false,
    "relative_error": 511
  },
  {
    "scheme": "vortex.int.for",
    "estimated_ratio": 2.6666666666666665,
    "actual_ratio": 0.005208333333333333,
    "before_nbytes": 2,
    "after_nbytes": 384,
    "accepted": false,
    "relative_error": 510.99999999999994
  },
  {
    "scheme": "vortex.int.for",
    "estimated_ratio": 5.333333333333333,
    "actual_ratio": 0.010416666666666666,
    "before_nbytes": 8,
    "after_nbytes": 768,
    "accepted": false,
    "relative_error": 510.99999999999994
  },
  {
    "scheme": "vortex.int.for",
    "estimated_ratio": 2.0,
    "actual_ratio": 0.0078125,
    "before_nbytes": 4,
    "after_nbytes": 512,
    "accepted": false,
    "relative_error": 255
  },
  {
    "scheme": "vortex.int.for",
    "estimated_ratio": 2.0,
    "actual_ratio": 0.0078125,
    "before_nbytes": 4,
    "after_nbytes": 512,
    "accepted": false,
    "relative_error": 255
  },
  {
    "scheme": "vortex.int.for",
    "estimated_ratio": 2.0,
    "actual_ratio": 0.0078125,
    "before_nbytes": 4,
    "after_nbytes": 512,
    "accepted": false,
    "relative_error": 255
  },
  {
    "scheme": "vortex.int.for",
    "estimated_ratio": 2.0,
    "actual_ratio": 0.0078125,
    "before_nbytes": 4,
    "after_nbytes": 512,
    "accepted": false,
    "relative_error": 255
  },
  {
    "scheme": "vortex.int.for",
    "estimated_ratio": 2.0,
    "actual_ratio": 0.0078125,
    "before_nbytes": 4,
    "after_nbytes": 512,
    "accepted": false,
    "relative_error": 255
  },
  {
    "scheme": "vortex.int.for",
    "estimated_ratio": 2.0,
    "actual_ratio": 0.0078125,
    "before_nbytes": 4,
    "after_nbytes": 512,
    "accepted": false,
    "relative_error": 255
  }
]

And then here is how much each scheme saved across all of TPC-H SF1 data.

Details
❯ jq -c 'select(.fields.message == "scheme.compress_result"
                and .fields.accepted == true)
         | {scheme: .fields.scheme,
            saved: (.fields.before_nbytes - .fields.after_nbytes)}' \
          trace.jsonl \
        | jq -s 'group_by(.scheme)
             | map({scheme: .[0].scheme,
                    n: length,
                    total_saved: (map(.saved) | add)})
             | sort_by(-.total_saved)'

[
  {
    "scheme": "vortex.string.fsst",
    "n": 1330,
    "total_saved": 357089797
  },
  {
    "scheme": "vortex.int.bitpacking",
    "n": 3670,
    "total_saved": 195278268
  },
  {
    "scheme": "vortex.decimal.byte_parts",
    "n": 247,
    "total_saved": 172207404
  },
  {
    "scheme": "vortex.int.for",
    "n": 762,
    "total_saved": 82129120
  },
  {
    "scheme": "vortex.int.runend",
    "n": 74,
    "total_saved": 25347394
  },
  {
    "scheme": "vortex.int.constant",
    "n": 181,
    "total_saved": 6516208
  },
  {
    "scheme": "vortex.int.sequence",
    "n": 40,
    "total_saved": 5439040
  },
  {
    "scheme": "vortex.string.dict",
    "n": 17,
    "total_saved": 2188574
  },
  {
    "scheme": "vortex.string.constant",
    "n": 39,
    "total_saved": 128547
  },
  {
    "scheme": "vortex.int.rle",
    "n": 71,
    "total_saved": 62026
  },
  {
    "scheme": "vortex.int.dict",
    "n": 69,
    "total_saved": 19213
  },
  {
    "scheme": "vortex.bool.constant",
    "n": 120,
    "total_saved": 3290
  },
  {
    "scheme": "vortex.int.sparse",
    "n": 7,
    "total_saved": 1405
  }
]

connortsui20 added a commit that referenced this pull request Apr 13, 2026
## Summary

Tracking issue: #7216

Makes the compressor types more robust (removes the possibility for
invalid state), which additionally sets up adding tracing easier (draft
at #7385)

## API Changes

Changes some types:

```rust
/// Closure type for [`DeferredEstimate::Callback`].
///
/// The compressor calls this with the same arguments it would pass to sampling. The closure must
/// resolve directly to a terminal [`EstimateVerdict`].
#[rustfmt::skip]
pub type EstimateFn = dyn FnOnce(
        &CascadingCompressor,
        &mut ArrayAndStats,
        CompressorContext,
    ) -> VortexResult<EstimateVerdict>
    + Send
    + Sync;

/// The result of a [`Scheme`]'s compression ratio estimation.
///
/// This type is returned by [`Scheme::expected_compression_ratio`] to tell the compressor how
/// promising this scheme is for a given array without performing any expensive work.
///
/// [`CompressionEstimate::Verdict`] means the scheme already knows the terminal answer.
/// [`CompressionEstimate::Deferred`] means the compressor must do extra work before the scheme can
/// produce a terminal answer.
#[derive(Debug)]
pub enum CompressionEstimate {
    /// The scheme already knows the terminal estimation verdict.
    Verdict(EstimateVerdict),

    /// The compressor must perform deferred work to resolve the terminal estimation verdict.
    Deferred(DeferredEstimate),
}

/// The terminal answer to a compression estimate request.
#[derive(Debug)]
pub enum EstimateVerdict {
    /// Do not use this scheme for this array.
    Skip,

    /// Always use this scheme, as it is definitively the best choice.
    ///
    /// Some examples include constant detection, decimal byte parts, and temporal decomposition.
    ///
    /// The compressor will select this scheme immediately without evaluating further candidates.
    /// Schemes that return `AlwaysUse` must be mutually exclusive per canonical type (enforced by
    /// [`Scheme::matches`]), otherwise the winner depends silently on registration order.
    ///
    /// [`Scheme::matches`]: crate::scheme::Scheme::matches
    AlwaysUse,

    /// The estimated compression ratio. This must be greater than `1.0` to be considered by the
    /// compressor, otherwise it is worse than the canonical encoding.
    Ratio(f64),
}

/// Deferred work that can resolve to a terminal [`EstimateVerdict`].
pub enum DeferredEstimate {
    /// The scheme cannot cheaply estimate its ratio, so the compressor should compress a small
    /// sample to determine effectiveness.
    Sample,

    /// A fallible estimation requiring a custom expensive computation.
    ///
    /// Use this only when the scheme needs to perform trial encoding or other costly checks to
    /// determine its compression ratio. The callback returns an [`EstimateVerdict`] directly, so
    /// it cannot request more sampling or another deferred callback.
    Callback(Box<EstimateFn>),
}
```

This will make some changes that we want to make is the future easier as
well (tracing, better decision making for what things to try, etc).

## Testing

Some new tests

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
@connortsui20 connortsui20 force-pushed the ct/compress-tracing branch 4 times, most recently from 56bdc36 to 414149e Compare April 13, 2026 20:58
@connortsui20 connortsui20 marked this pull request as ready for review April 13, 2026 21:24
@connortsui20 connortsui20 marked this pull request as draft April 14, 2026 01:40
Copy link
Copy Markdown
Contributor

@robert3005 robert3005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is reasonable but will wait for it to not be a draft

claude and others added 4 commits April 14, 2026 11:20
Instrument the cascading compressor with composable `tracing` spans and
events so users can see what the compressor is doing, compare estimated
and actual compression ratios, time individual phases, and surface
previously-silent "compressed but the output grew" decisions.

Four targets let users select one aspect at a time via `RUST_LOG`:
- `vortex_compressor::cascade` — top-level + `compress_child` spans
- `vortex_compressor::select`  — scheme eligibility, evaluation, winner,
  and short-circuit reasons
- `vortex_compressor::estimate` — sampling span and sample.collected /
  sample.result events
- `vortex_compressor::encode`  — per-scheme encode span and the
  scheme.compress_result event with estimated vs actual ratio + accepted

Spans are at `trace` level so `tracing-perfetto` / `tracing-timing` /
`tracing-opentelemetry` only materialize them on demand. Events are at
`debug` for outcomes so `RUST_LOG=vortex_compressor::encode=debug`
produces one readable summary line per leaf.

New `tests/tracing.rs` uses a custom capture layer (not `TestWriter`) to
pin the names and stable fields of the emitted events so downstream
observability tooling does not break under rename.

Instrumentation lives entirely in the orchestration layer
(compressor.rs + estimate.rs); individual scheme implementations are
untouched. The existing unstructured calls in estimate.rs and the stale
commented-out line in compressor.rs are removed.

A new `# Observability` section in the crate docs carries the full
target / span / event reference with `RUST_LOG` recipes.

Signed-off-by: Claude <noreply@anthropic.com>
Instrument `BtrBlocksCompressor::compress` with a
`#[tracing::instrument]` on the `vortex_compressor::cascade` target so
downstream trace consumers (tracing-perfetto, tracing-opentelemetry)
get a distinct BtrBlocks entry frame nested above the generic
`CascadingCompressor::compress` pipeline span.

Also delete the stray `tracing::debug!("zigzag output: {}", ...)` line
in `schemes/integer.rs` — it predates the centralized
`scheme.compress_result` event and is now redundant.

Add a short `# Observability` section to the crate docs pointing at
`vortex_compressor`'s full reference, plus one recipe.

Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants