Changelog

This file contains a high-level description of changes that were merged into the Inseq main branch since the last release. Refer to the releases page for an exhaustive overview of changes introduced at each release.

🚀 Features

Added treescope for interactive model and tensor visualization (#283).
New treescope-powered methods FeatureAttributionOutput.show_granular and FeatureAttributionSequenceOutput.show_tokens for interactive visualization of multidimensional attribution tensors and token highlights (#283).
Added new models DbrxForCausalLM, OlmoForCausalLM, Phi3ForCausalLM, Qwen2MoeForCausalLM, Gemma2ForCausalLM, OlmoeForCausalLM, GraniteForCausalLM, GraniteMoeForCausalLM to model config.
Add rescale_attributions to Inseq CLI commands for rescale=True (#280).
Rows and columns in the visualization now have indices alongside tokens to facilitate index-based slicing, aggregation and alignment (#282).
New parameter clean_special_chars in model.attribute to automatically clean special characters from output tokens, such as ▁ and Ġ (#289).
Added a scores_precision to FeatureAttributionOutput.save to enable efficient saving in float16 and float8 formats. This is useful for saving large attribution outputs in a more memory-efficient way (#273).

import inseq

attrib_model = inseq.load_model("gpt2", "attention")
out = attrib_model.attribute("Hello world", generation_kwargs={'max_new_tokens': 100})

# Previous usage, memory inefficient
out.save("output.json")

# Memory-efficient saving
out.save("output_fp16.json", scores_precision="float16") # or "float8"

# Automatic conversion to float32
out_loaded = inseq.FeatureAttributionOutput.load("output_fp16.json")

- A new SliceAggregator ("slices") is added to allow for slicing source (in encoder-decoder) or target (in decoder-only) tokens from a FeatureAttributionSequenceOutput object, using the same syntax of ContiguousSpanAggregator. The __getitem__ method of the FeatureAttributionSequenceOutput is a shortcut for this, allowing slicing with [start:stop] syntax. #282

import inseq
from inseq.data.aggregator import SliceAggregator

attrib_model = inseq.load_model("gpt2", "attention")
input_prompt = """Instruction: Summarize this article.
Input_text: In a quiet village nestled between rolling hills, an ancient tree whispered secrets to those who listened. One night, a curious child named Elara leaned close and heard tales of hidden treasures beneath the roots. As dawn broke, she unearthed a shimmering box, unlocking a forgotten world of wonder and magic.
Summary:"""

full_output_prompt = input_prompt + " Elara discovers a shimmering box under an ancient tree, unlocking a world of magic."

out = attrib_model.attribute(input_prompt, full_output_prompt)[0]

# These are all equivalent ways to slice only the input text contents
out_sliced = out.aggregate(SliceAggregator, target_spans=(13,73))
out_sliced = out.aggregate("slices", target_spans=(13,73))
out_sliced = out[13:73]

A new StringSplitAggregator ("split") is added to allow for supporting more complex aggregation procedures beyond simple subword merging inFeatureAttributionSequenceOutput objects. More specifically, splitting supports regex expression to match split points even when these are (potentially overlapping) parts of existing tokens. The split_mode parameter can be set to "single" (default) to keep tokens containing matched split points separate while aggregating the rest, or "start" or "end" to concatenate them to the preceding/following aggregated token sequence. #290

# Split on newlines. Default split_mode = "single".
out.aggregate("split", split_pattern="\n").aggregate("sum").show(do_aggregation=False)

# Split on whitespace-separated words of length 5.
# Note: this works if clean_special_chars = True is used, otherwise the split_pattern should be adjusted to split on special characters like "Ġ" or "▁".
out.aggregate("split", split_pattern=r"\s(\w{5})(?=\s)", split_mode="end")

The __sub__ method in FeatureAttributionSequenceOutput is now used as a shortcut for PairAggregator (#282).

import inseq

attrib_model = inseq.load_model("gpt2", "saliency")

out_male = attrib_model.attribute(
    "The director went home because",
    "The director went home because he was tired",
    step_scores=["probability"]
)[0]
out_female = attrib_model.attribute(
    "The director went home because",
    "The director went home because she was tired",
    step_scores=["probability"]
)[0]
(out_male - out_female).show()

🔧 Fixes and Refactoring

Fix the issue in the attention implementation from #268 where non-terminal position in the tensor were set to nan if they were 0s (#269).
Fix the pad token in cases where it is not specified by default in the loaded model (e.g. for Qwen models) (#269).
Fix bug reported in #266 making value_zeroing unusable for SDPA attention. This enables using the method on models using SDPA attention as default (e.g. GemmaForCausalLM) without passing model_kwargs={'attn_implementation': 'eager'} (#267).
Fix multi-device support and duplicate BOS for chat template models (#280).
The directions of generated/attributed tokens were clarified in the visualization using arrows instead of x/y (#282).
Fix support for multi-EOS tokens (e.g. LLaMA 3.2, see #287).
Fix copying configuration parameters to aggregated FeatureAttributionSequenceOutput objects (#292).

📝 Documentation and Tutorials

Updated tutorial with treescope usage examples.

💥 Breaking Changes

Dropped support for Python 3.9. Current support is Python >= 3.10, <= 3.12 (#283).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG.md

CHANGELOG.md

Changelog

🚀 Features

🔧 Fixes and Refactoring

📝 Documentation and Tutorials

💥 Breaking Changes

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

🚀 Features

🔧 Fixes and Refactoring

📝 Documentation and Tutorials

💥 Breaking Changes