Conversation
|
This is a preview of the changelog of the next release. If this branch is not up-to-date with the current main branch, the changelog may not be accurate. Rebase your branch on the main branch to get the most accurate changelog. Note that this might contain changes that are on main, but not yet released. Changelog: 0.11.1 (2026-03-11)Features
Bug Fixes
|
There was a problem hiding this comment.
Pull request overview
Adds a new MutationProfile filter expression to SILO (per #1179) and introduces an optimized compilation path for large N-Of expressions over sequence positions to avoid repeated vertical-index lookups. This enables efficient “distance to profile” queries that expand into many per-position symbol conditions.
Changes:
- Introduces
NucleotideMutationProfile/AminoAcidMutationProfileexpression that rewrites into anN-Of/Notform. - Adds a single-pass vertical-index DP helper (
VerticalSequenceIndex::buildNOfDpTable) and a newNOfcompile fast-path forSymbolInSetchildren on the same sequence. - Adds docs, integration tests, and performance benchmarks/utilities for profiling the new behavior.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/silo/query_engine/filter/expressions/mutation_profile.h | Defines the new MutationProfile expression template and JSON parsing hook. |
| src/silo/query_engine/filter/expressions/mutation_profile.cpp | Implements profile construction (querySequence / sequenceId / mutations) and rewrite to Not(N-Of(...)). |
| src/silo/query_engine/filter/expressions/expression.cpp | Registers new expression types: NucleotideMutationProfile and AminoAcidMutationProfile. |
| src/silo/query_engine/filter/expressions/symbol_in_set.h | Adds getters needed for the new NOf compilation optimization. |
| src/silo/query_engine/filter/expressions/nof.cpp | Adds optimized compile path that batches vertical-index access and inlines the threshold DP. |
| src/silo/storage/column/vertical_sequence_index.h | Declares PositionQuery and buildNOfDpTable DP helper. |
| src/silo/storage/column/vertical_sequence_index.cpp | Implements buildNOfDpTable with a forward scan over vertical_bitmaps. |
| src/silo/test/mutation_profile.test.cpp | Adds integration tests for NucleotideMutationProfile behavior and error cases. |
| documentation/query_documentation.md | Documents NucleotideMutationProfile and AminoAcidMutationProfile JSON formats and semantics. |
| performance/sequence_generator.h | Adds shared benchmark utilities for generating synthetic sequences/reads and initializing DBs. |
| performance/nof_sequence_filter.cpp | Adds a benchmark targeting the large-N-Of optimization via MutationProfile. |
| performance/many_short_read_filters.cpp | Refactors to reuse sequence_generator.h. |
| performance/CMakeLists.txt | Ensures benchmarks can include performance headers and adds the new benchmark target. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3c324c7 to
d15c000
Compare
d15c000 to
9832a96
Compare
| } | ||
|
|
||
| template <typename SymbolType> | ||
| std::unique_ptr<Expression> MutationProfile<SymbolType>::rewrite( |
There was a problem hiding this comment.
After this, we get a gigantic log message:
[2026-03-18 08:52:33.409] [logger] [debug] [database.cpp:531] Request Id [7abbed49-a609-4294-9941-4c4173da3621] - Filter after rewrite for partition 0: !([-2147483647-of:(main:symbol at position 1 in {-, C, G, T, Y, S, K, B}), (main:symbol at position 2 in {-, A, C, G, R, S, M, V}), (... goes on a bit for every position in the genome)
Does it make sense to do something about that?
| "filterExpression": { | ||
| "type": "NucleotideMutationProfile", | ||
| "distance": 0, | ||
| "mutations": [{"position": 1, "symbol": "C"}] |
There was a problem hiding this comment.
Let's add a test where a mutation is out of bounds?
resolves #1179
Summary
This adds a
MutationProfilefilter to silo. The behavior of this filter is outlined in #1179PR Checklist