[Bug Report] run_with_cache(device=...) permanently moves the model and leaves cfg.device stale

**Describe the bug**
On a single-device model, `TransformerBridge.run_with_cache(input, device=...)` moves the underlying model (and the input tensors) to that device and never moves it back. The `device=` argument is meant to choose where cached activations are stored, not to relocate the model. Both the legacy [`get_caching_hooks`](https://github.com/TransformerLensOrg/TransformerLens/blob/v3.2.1/transformer_lens/hook_points.py#L799) (which documents `device` as "the device to store on") and [`ActivationCache.to`](https://github.com/TransformerLensOrg/TransformerLens/blob/v3.2.1/transformer_lens/ActivationCache.py#L192-L211) (whose `move_model` is deprecated) confirm this. After the call, model lives on `device` while `cfg.device` still reports the original device, and any subsequent forward/generate fails.

**Code example**
```python
import torch
from transformer_lens.model_bridge import TransformerBridge

m = TransformerBridge.boot_transformers("distilgpt2", device="mps")  # or "cuda"
toks = m.to_tokens("hello world")
_, cache = m.run_with_cache(toks, device="cpu")        # intent: offload the cache to CPU

print(next(m.original_model.parameters()).device)      # cpu  <- the MODEL was moved
print(m.cfg.device)                                     # mps  <- now stale / inconsistent
m.generate(m.to_tokens("again"), max_new_tokens=3)      # RuntimeError: Placeholder storage has not been allocated on MPS device!
```
Root cause: the single-device branch of `run_with_cache` ([bridge.py L2058-L2063](https://github.com/TransformerLensOrg/TransformerLens/blob/v3.2.1/transformer_lens/model_bridge/bridge.py#L2058-L2063)) runs `self.original_model = self.original_model.to(cache_device)` with no restore (the `finally` at [L2082-L2084](https://github.com/TransformerLensOrg/TransformerLens/blob/v3.2.1/transformer_lens/model_bridge/bridge.py#L2082-L2084) only removes hooks). The per-activation caching hook already offloads cache tensors via `tensor.detach().to(cache_device)` ([L1980](https://github.com/TransformerLensOrg/TransformerLens/blob/v3.2.1/transformer_lens/model_bridge/bridge.py#L1980)), so the model move is unnecessary, the `n_devices > 1` branch ([L2046](https://github.com/TransformerLensOrg/TransformerLens/blob/v3.2.1/transformer_lens/model_bridge/bridge.py#L2046)) already declines to move the model (it warns and leaves cache entries on their per-layer devices).

**System Info**
 * Installed from source; also present in released v3.1.0 through v3.2.1
 * macOS (Apple Silicon / MPS), reproduced above; the same code path affects any non-CPU primary device (e.g. CUDA)
 * Python 3.12

Note: it does not reproduce on a CPU-only setup, where `device="cpu"` makes the move a no-op. It surfaces only when the model's device differs from the cache device, which is why CI (CPU) does not catch it.

**Additional context**
Found while implementing #697 (adding `return_cache` to `generate`), where exposing a `device=` cache-offload option ([maintainer's note](https://github.com/TransformerLensOrg/TransformerLens/issues/697#issuecomment-4548602069)) led me to `run_with_cache(device=...)`.

### Checklist
- [x] I have checked that there is no similar issue in the repo


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] run_with_cache(device=...) permanently moves the model and leaves cfg.device stale #1336

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug Report] run_with_cache(device=...) permanently moves the model and leaves cfg.device stale #1336

Description

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions