Support for functional localization #240

BKHMSI · 2024-07-06T05:43:58Z

Users can now perform functional localization as described in Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network

Changes:

Localization stimuli saved in data/fedorenko2010_localization and can be loaded via the data_registry
Localization script can be found in model_helpers/localize that computes language mask according to the paper mentioned above
Language mask is cached in .brainio
HuggingfaceSubject class was adapted to extract activations from multiple layers at once and make use of the localization script if the use_localizer flag is set to True. This extracts only the language selective units from all the activations.

Usage:

Example script can be found in examples/score_localization

benchmark = load_benchmark('Pereira2018.243sentences-linear')

num_blocks = 12
layer_names = [f'transformer.h.{block}.{layer_type}' 
    for block in range(num_blocks) 
    for layer_type in ['ln_1', 'attn', 'ln_2', 'mlp']
]

model = HuggingfaceSubject(model_id='gpt2', 
    region_layer_mapping={ArtificialSubject.RecordingTarget.language_system: layer_names},
    use_localizer=True,
    localizer_kwargs={
        'hidden_dim': 768,
        'batch_size': 16,
        "top_k": 4096,
    }
)

model_score = benchmark(model)

mschrimpf · 2024-07-30T14:38:07Z

brainscore_language/data/fedorenko2010_localization/__init__.py

+    for stimuli_idx in range(3, 14):
+        data["sent"] += " " + data[f"stim{stimuli_idx}"].apply(str.lower)


what does this do? add comment

mschrimpf · 2024-07-30T14:40:58Z

brainscore_language/model_helpers/localize.py

+from brainscore_language import load_dataset
+
+BRAINIO_CACHE = os.environ.get("BRAINIO", f"{Path.home()}/.brainio")
+os.environ["TOKENIZERS_PARALLELISM"] = "False"


comment why this is necessary

mschrimpf · 2024-07-30T14:45:05Z

brainscore_language/model_helpers/localize.py

+
+class Fed10_langlocDataset(Dataset):
+    def __init__(self):
+        self.num_samples = 240


where is this being used?

line #103 in the extract_representations function

ok I'm not actually sure what this does -- looks like it's just used to zero-fill layer_name (??)
Could this not also be derived from self.sentences?

final_layer_representations = { "sentences": {layer_name: np.zeros((langloc_dataset.num_samples, hidden_dim)) for layer_name in layer_names}, "non-words": {layer_name: np.zeros((langloc_dataset.num_samples, hidden_dim)) for layer_name in layer_names} }

replaced langloc_dataset.num_samples with len(langloc_dataset.sentences)

BKHMSI added 3 commits July 6, 2024 07:32

added support for localization

ea5a6d8

changed variable names in localization example

f98c6ab

Update .gitignore

5961de0

mschrimpf approved these changes Jul 30, 2024

View reviewed changes

BKHMSI added 5 commits August 8, 2024 09:21

added comments

da7672e

removed num_samples from Fed10_langlocDataset

d35d254

SUMA now supported

a07c9d2

added support for ridge regression

aa4fac8

added rdm and cka metrics

856f530

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for functional localization #240

Support for functional localization #240

BKHMSI commented Jul 6, 2024

mschrimpf Jul 30, 2024

BKHMSI Aug 8, 2024

mschrimpf Jul 30, 2024

BKHMSI Aug 8, 2024

mschrimpf Jul 30, 2024

BKHMSI Aug 8, 2024

mschrimpf Aug 11, 2024

BKHMSI Aug 12, 2024

		for stimuli_idx in range(3, 14):
		data["sent"] += " " + data[f"stim{stimuli_idx}"].apply(str.lower)

Support for functional localization #240

Are you sure you want to change the base?

Support for functional localization #240

Conversation

BKHMSI commented Jul 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment