This file contains a high-level description of changes that were merged into the Inseq main branch since the last release. Refer to the releases page for an exhaustive overview of changes introduced at each release.
-
Added treescope for interactive model and tensor visualization (#283).
-
New
treescope
-powered methodsFeatureAttributionOutput.show_granular
andFeatureAttributionSequenceOutput.show_tokens
for interactive visualization of multidimensional attribution tensors and token highlights (#283). -
Added new models
DbrxForCausalLM
,OlmoForCausalLM
,Phi3ForCausalLM
,Qwen2MoeForCausalLM
,Gemma2ForCausalLM
,OlmoeForCausalLM
,GraniteForCausalLM
,GraniteMoeForCausalLM
to model config. -
Add
rescale_attributions
to Inseq CLI commands forrescale=True
(#280). -
Rows and columns in the visualization now have indices alongside tokens to facilitate index-based slicing, aggregation and alignment (#282).
-
New parameter
clean_special_chars
inmodel.attribute
to automatically clean special characters from output tokens, such as▁
andĠ
(#289). -
Added a
scores_precision
toFeatureAttributionOutput.save
to enable efficient saving infloat16
andfloat8
formats. This is useful for saving large attribution outputs in a more memory-efficient way (#273).
import inseq
attrib_model = inseq.load_model("gpt2", "attention")
out = attrib_model.attribute("Hello world", generation_kwargs={'max_new_tokens': 100})
# Previous usage, memory inefficient
out.save("output.json")
# Memory-efficient saving
out.save("output_fp16.json", scores_precision="float16") # or "float8"
# Automatic conversion to float32
out_loaded = inseq.FeatureAttributionOutput.load("output_fp16.json")
-
- A new
SliceAggregator
("slices"
) is added to allow for slicing source (in encoder-decoder) or target (in decoder-only) tokens from aFeatureAttributionSequenceOutput
object, using the same syntax ofContiguousSpanAggregator
. The__getitem__
method of theFeatureAttributionSequenceOutput
is a shortcut for this, allowing slicing with[start:stop]
syntax. #282
- A new
import inseq
from inseq.data.aggregator import SliceAggregator
attrib_model = inseq.load_model("gpt2", "attention")
input_prompt = """Instruction: Summarize this article.
Input_text: In a quiet village nestled between rolling hills, an ancient tree whispered secrets to those who listened. One night, a curious child named Elara leaned close and heard tales of hidden treasures beneath the roots. As dawn broke, she unearthed a shimmering box, unlocking a forgotten world of wonder and magic.
Summary:"""
full_output_prompt = input_prompt + " Elara discovers a shimmering box under an ancient tree, unlocking a world of magic."
out = attrib_model.attribute(input_prompt, full_output_prompt)[0]
# These are all equivalent ways to slice only the input text contents
out_sliced = out.aggregate(SliceAggregator, target_spans=(13,73))
out_sliced = out.aggregate("slices", target_spans=(13,73))
out_sliced = out[13:73]
- A new
StringSplitAggregator
("split"
) is added to allow for supporting more complex aggregation procedures beyond simple subword merging inFeatureAttributionSequenceOutput
objects. More specifically, splitting supports regex expression to match split points even when these are (potentially overlapping) parts of existing tokens. Thesplit_mode
parameter can be set to"single"
(default) to keep tokens containing matched split points separate while aggregating the rest, or"start"
or"end"
to concatenate them to the preceding/following aggregated token sequence. #290
# Split on newlines. Default split_mode = "single".
out.aggregate("split", split_pattern="\n").aggregate("sum").show(do_aggregation=False)
# Split on whitespace-separated words of length 5.
# Note: this works if clean_special_chars = True is used, otherwise the split_pattern should be adjusted to split on special characters like "Ġ" or "▁".
out.aggregate("split", split_pattern=r"\s(\w{5})(?=\s)", split_mode="end")
- The
__sub__
method inFeatureAttributionSequenceOutput
is now used as a shortcut forPairAggregator
(#282).
import inseq
attrib_model = inseq.load_model("gpt2", "saliency")
out_male = attrib_model.attribute(
"The director went home because",
"The director went home because he was tired",
step_scores=["probability"]
)[0]
out_female = attrib_model.attribute(
"The director went home because",
"The director went home because she was tired",
step_scores=["probability"]
)[0]
(out_male - out_female).show()
-
Fix the issue in the attention implementation from #268 where non-terminal position in the tensor were set to nan if they were 0s (#269).
-
Fix the pad token in cases where it is not specified by default in the loaded model (e.g. for Qwen models) (#269).
-
Fix bug reported in #266 making
value_zeroing
unusable for SDPA attention. This enables using the method on models using SDPA attention as default (e.g.GemmaForCausalLM
) without passingmodel_kwargs={'attn_implementation': 'eager'}
(#267). -
Fix multi-device support and duplicate BOS for chat template models (#280).
-
The directions of generated/attributed tokens were clarified in the visualization using arrows instead of x/y (#282).
-
Fix support for multi-EOS tokens (e.g. LLaMA 3.2, see #287).
-
Fix copying configuration parameters to aggregated
FeatureAttributionSequenceOutput
objects (#292).
- Updated tutorial with
treescope
usage examples.
- Dropped support for Python 3.9. Current support is Python >= 3.10, <= 3.12 (#283).