Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements Vera #763

Open
wants to merge 68 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
93bee1a
Make generation tests generic
TimoImhof Sep 16, 2024
f51cfdb
Merge remote-tracking branch 'origin/main' into dev/test-refactoring
TimoImhof Oct 16, 2024
7e65e82
Draft Refactoring AdapterTestBase
TimoImhof Oct 28, 2024
793cbe5
Merge branch 'adapter-hub:main' into dev/test-refactoring
TimoImhof Oct 30, 2024
65c3fb7
Replace import class names
TimoImhof Oct 30, 2024
afdcfdd
Merge branch 'dev/test-refactoring' of https://github.com/TimoImhof/a…
TimoImhof Oct 30, 2024
ee6166c
Base refactoring:
TimoImhof Nov 1, 2024
630b722
remove redundant imports
TimoImhof Nov 1, 2024
0d3577f
Add pytest markers and respective pytest commands
TimoImhof Nov 1, 2024
1300856
Add draft of README
TimoImhof Nov 1, 2024
78387db
Refactoring:
TimoImhof Nov 5, 2024
83d3b32
Fix make quality
TimoImhof Nov 5, 2024
5e8e1b8
Add gpt2 tests
TimoImhof Nov 5, 2024
53eb0b9
Fix config union and head tests
TimoImhof Nov 7, 2024
1dbd412
Fix paths and imports
TimoImhof Nov 7, 2024
cf4f6a7
remove accidently added prompt tuning from gpt2 and make style
TimoImhof Nov 7, 2024
b390d61
Revert PromptTuning changes
TimoImhof Nov 7, 2024
2193aee
Revert "Revert PromptTuning changes"
TimoImhof Nov 7, 2024
f555484
Re-add missing adapter model tests
TimoImhof Nov 7, 2024
8dccda2
Refactoring:
TimoImhof Nov 7, 2024
c665948
Introduce generic test creator function
TimoImhof Nov 8, 2024
fb425b6
Re-add beit adapter method tests
TimoImhof Nov 8, 2024
225439c
Refactor & Re-add bertgeneration and bert
TimoImhof Nov 9, 2024
09f9cdc
Re-add clip tests
TimoImhof Nov 11, 2024
7934350
Re-add:
TimoImhof Nov 11, 2024
5f55935
Add more models
TimoImhof Nov 21, 2024
147c8af
Re-add whisper
TimoImhof Nov 27, 2024
57c5131
initial commit
julian-fong Dec 1, 2024
259a268
improved docstring and fixed formatting issues
julian-fong Dec 1, 2024
b66571c
fixed formatting
julian-fong Dec 1, 2024
acee994
updates
julian-fong Dec 12, 2024
18182af
Updates
julian-fong Dec 12, 2024
f28e508
Updates
julian-fong Dec 12, 2024
f38b0e3
removed typo
julian-fong Dec 12, 2024
385cd35
fix black
julian-fong Dec 12, 2024
46af3fd
updates
julian-fong Dec 14, 2024
9f3a202
fixed typo
julian-fong Dec 14, 2024
b2979ce
Changes:
TimoImhof Dec 16, 2024
ffd21a9
Add debug statements and only execute failing test
TimoImhof Dec 18, 2024
0dba87c
Add verbose information
TimoImhof Dec 18, 2024
c333467
check package versions
TimoImhof Dec 18, 2024
aac4038
More debugging statements
TimoImhof Dec 18, 2024
0f4c9b6
Merge branch 'adapter-hub:main' into dev/test-refactoring
TimoImhof Dec 22, 2024
12379e3
Merge branch 'main' into implement_vera
julian-fong Dec 23, 2024
0c0f7e6
updates
julian-fong Dec 23, 2024
99cfb68
Merge branch 'implement_vera' of github.com:julian-fong/adapters into…
julian-fong Dec 23, 2024
9ac515c
Merge branch 'adapter-hub:main' into dev/test-refactoring
TimoImhof Dec 23, 2024
4af10df
Fix failing test:
TimoImhof Dec 23, 2024
1229fc5
added review updates
julian-fong Dec 24, 2024
20ddb5c
apply fix from #770
julian-fong Dec 24, 2024
dbd4965
Update README
TimoImhof Dec 24, 2024
25fe0a9
updated docstring
julian-fong Dec 24, 2024
7f79832
updated docstring
julian-fong Dec 24, 2024
d1a4a09
Merge branch 'main' of https://github.com/TimoImhof/adapters into dev…
TimoImhof Dec 27, 2024
c516464
Fix hf version and clip tests
TimoImhof Dec 27, 2024
470169f
Merge branch 'adapter-hub:main' into implement_vera
julian-fong Jan 2, 2025
bb019b0
Merge branch 'adapter-hub:main' into implement_vera
julian-fong Jan 6, 2025
87c0998
Merge branch 'adapter-hub:main' into dev/test-refactoring
TimoImhof Jan 8, 2025
2c80a5c
Polish:
TimoImhof Jan 8, 2025
be69f0a
Merge branch 'main' into dev/test-refactoring
TimoImhof Jan 8, 2025
f1b1136
Merge branch 'main' into dev/test-refactoring
TimoImhof Jan 8, 2025
ebdf0a7
Merge remote-tracking branch 'github-desktop-TimoImhof/dev/test-refac…
julian-fong Jan 11, 2025
cd95c06
configure vera tests to #740
julian-fong Jan 11, 2025
a0e578a
fix quality:
julian-fong Jan 11, 2025
3e7f2a5
update model_mixin.py to use forwardcontext in merge_adapter and rese…
julian-fong Jan 18, 2025
f62a44d
update black
julian-fong Jan 18, 2025
ac914c4
updated black
julian-fong Jan 18, 2025
ef574bf
updates
julian-fong Jan 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 15 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,29 @@ style:
isort $(check_dirs)
${MAKE} extra_style_checks

# Run tests for the library
# Library Tests

# run all tests in the library
test:
python -m pytest -n auto --dist=loadfile -s -v ./tests/
python -c "import transformers; print(transformers.__version__)"

# run all tests for the adapter methods for all adapter models
test-adapter-methods:
python -m pytest --ignore ./tests/models -n auto --dist=loadfile -s -v ./tests/
python -m pytest -n auto --dist=loadfile -s -v ./tests/test_methods/

# run a subset of the adapter method tests for all adapter models
# list of all subsets: [core, heads, embeddings, composition, prefix_tuning, prompt_tuning, reft, unipelt, compacter, bottleneck, ia3, lora, config_union]
subset ?=
test-adapter-method-subset:
@echo "Running subset $(subset)"
python -m pytest -n auto --dist=loadfile -s -v ./tests/test_methods/ -m $(subset)


# run the hugginface test suite for all adapter models
test-adapter-models:
python -m pytest -n auto --dist=loadfile -s -v ./tests/models
python -m pytest -n auto --dist=loadfile -s -v ./tests/test_models/

# Run tests for examples

test-examples:
python -m pytest -n auto --dist=loadfile -s -v ./examples/pytorch/
5 changes: 5 additions & 0 deletions conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,3 +87,8 @@ def check_output(self, want, got, optionflags):


doctest.OutputChecker = CustomOutputChecker


def pytest_collection_modifyitems(items):
# Exclude the 'test_class' group from the test collection since it's not a real test class and byproduct of the generic test class generation.
items[:] = [item for item in items if 'test_class' not in item.nodeid]
1 change: 1 addition & 0 deletions docs/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ Identifiers and configuration classes are explained in more detail in the [next
| `prefix_tuning_flat` | `PrefixTuningConfig(flat=True)` | [Prefix Tuning](methods.html#prefix-tuning) |
| `lora` | `LoRAConfig()` | [LoRA](methods.html#lora) |
| `ia3` | `IA3Config()` | [IA³](methods.html#ia-3) |
| `vera` | `VeraConfig()` | [Vera](methods.html#vera) |
| `mam` | `MAMConfig()` | [Mix-and-Match Adapters](method_combinations.html#mix-and-match-adapters) |
| `unipelt` | `UniPELTConfig()` | [UniPELT](method_combinations.html#unipelt) |
| `prompt_tuning` | `PromptTuningConfig()` | [Prompt Tuning](methods.html#prompt-tuning) |
Expand Down
2 changes: 1 addition & 1 deletion examples/pytorch/language-modeling/run_clm.py
Original file line number Diff line number Diff line change
Expand Up @@ -442,7 +442,7 @@ def main():
else:
model = AutoModelForCausalLM.from_config(config, trust_remote_code=model_args.trust_remote_code)
n_params = sum({p.data_ptr(): p.numel() for p in model.parameters()}.values())
logger.info(f"Training new model from scratch - Total size={n_params/2**20:.2f}M params")
logger.info(f"Training new model from scratch - Total size={n_params / 2**20:.2f}M params")

# Convert the model into an adapter model
adapters.init(model)
Expand Down
15 changes: 13 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,10 +1,21 @@
[tool.black]
line-length = 119
target-version = ['py38', 'py39', 'py310']

# copied from HF for testing
[tool.pytest.ini_options]
markers = [
"core: marks tests as core adapter test",
"composition: marks tests as composition adapter test",
"heads: marks tests as heads adapter test",
"embeddings: marks tests as embeddings adapter test",
"class_conversion: marks tests as class conversion adapter test",
"prefix_tuning: marks tests as prefix tuning adapter test",
"prompt_tuning: marks tests as prompt tuning adapter test",
"reft: marks tests as reft adapter test",
"unipelt: marks tests as unipelt adapter test",
"compacter: marks tests as compacter adapter test",
"bottleneck: marks tests as bottleneck adapter test",
"ia3: marks tests as ia3 adapter test",
"lora: marks tests as lora adapter test",
"flash_attn_test: marks tests related to flash attention (deselect with '-m \"not flash_attn_test\"')",
"bitsandbytes: select (or deselect with `not`) bitsandbytes integration tests",
"generate: marks tests that use the GenerationTesterMixin"
Expand Down
3 changes: 3 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ use_parentheses = True
[flake8]
ignore = E203, E501, E731, E741, W503, W605
max-line-length = 119
per-file-ignores =
tests/test_methods/generator.py: F401, F403, F405
tests/test_methods/test_*.py:F403,F405

[tool:pytest]
doctest_optionflags=NUMBER NORMALIZE_WHITESPACE ELLIPSIS
2 changes: 2 additions & 0 deletions src/adapters/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@
"SeqBnInvConfig",
"StaticAdapterFusionConfig",
"UniPELTConfig",
"VeraConfig",
],
"context": [
"AdapterSetup",
Expand Down Expand Up @@ -181,6 +182,7 @@
SeqBnInvConfig,
StaticAdapterFusionConfig,
UniPELTConfig,
VeraConfig,
)
from .context import AdapterSetup, ForwardContext
from .heads import (
Expand Down
37 changes: 36 additions & 1 deletion src/adapters/configuration/adapter_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -487,11 +487,20 @@ class LoRAConfig(AdapterConfig):
(addition of decomposed matrix, as in LoRA) or "scale" (element-wise multiplication of vector, as in
(IA)^3). "scale" can only be used together with r=1. Defaults to "add".
init_weights (:obj:`str`, optional): Initialization method for the weights of the LoRA modules.
Currently, this can be either "lora" (default) or "bert".
Currently, this can be either "lora" (default) or "bert", or "vera".
use_gating (:obj:`bool`, optional):
Place a trainable gating module besides the added parameter module to control module activation. This is
e.g. used for UniPELT. Defaults to False. Note that modules with use_gating=True cannot be merged using
`merge_adapter()`.
vera_d (:obj:`float`, optional):
The value of d used in the VeraConfig. Defaults to None. Places a trainable
scaling parameter `d` before the decomposition matrix A to allow scaling of the
internal weights.

vera_b (:obj:`float`, optional):
The value of b used in the VeraConfig. Defaults to None. Places a trainable
scaling parameter `b` before the decomposition matrix B to allow scaling of the
internal weights.
dtype (str, optional): torch dtype for reparametrization tensors. Defaults to None.
"""

Expand All @@ -509,6 +518,8 @@ class LoRAConfig(AdapterConfig):
composition_mode: str = "add"
init_weights: str = "lora"
use_gating: bool = False
vera_d: float = None
vera_b: float = None
dtype: Optional[str] = None


Expand All @@ -535,6 +546,29 @@ class IA3Config(LoRAConfig):
dtype: Optional[str] = None


@dataclass(eq=False)
class VeraConfig(LoRAConfig):
"""
Lora Config that applies vector-based random matrix adaptation. It adds
trainable matrices 'd' and 'b' while keeping the original LoRA matrices
frozen, random, and shared across layers. See more through their paper:
https://arxiv.org/pdf/2310.11454. Note that `r` will still be supplied
since we are still initializing decomposition matrices A and B.
The `composition_mode` parameter should also be set to `add`.
"""

selfattn_lora: bool = True
intermediate_lora: bool = False
output_lora: bool = False

r: int = 8
vera_d: float = 0.1
vera_b: float = 0.0
init_weights: str = "vera"
composition_mode: str = "add"
dtype: Optional[str] = None


@dataclass(eq=False)
class ReftConfig(AdapterConfig):
"""
Expand Down Expand Up @@ -770,6 +804,7 @@ def __init__(
"prompt_tuning": PromptTuningConfig(),
"lora": LoRAConfig(),
"ia3": IA3Config(),
"vera": VeraConfig(),
"loreft": LoReftConfig(),
"noreft": NoReftConfig(),
"direft": DiReftConfig(),
Expand Down
128 changes: 126 additions & 2 deletions src/adapters/methods/lora.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

from ..composition import Average, BatchSplit, Parallel, Stack
from ..configuration import LoRAConfig, ModelAdaptersConfig
from ..context import ForwardContext
from .adapter_layer_base import AdapterLayerBase, ComposableAdapterLayerBase
from .utils import dequantize_bnb_weight

Expand All @@ -37,6 +38,7 @@ def __init__(
lora_B_shape,
config: LoRAConfig,
gating_heads: int = 1,
name: str = None,
):
super().__init__()
assert config.composition_mode == "add", "LoRA module only supports composition_mode='add'."
Expand All @@ -45,6 +47,7 @@ def __init__(
self.composition_mode = config.composition_mode
self.attn_matrices = config.attn_matrices
self.use_gating = config.use_gating
self.name = name
# Optional dropout
if config.dropout > 0.0:
self.lora_dropout = nn.Dropout(p=config.dropout)
Expand All @@ -69,6 +72,9 @@ def __init__(
elif config.init_weights == "ia3":
nn.init.ones_(self.lora_A)
nn.init.ones_(self.lora_B)
elif config.init_weights == "vera":
nn.init.kaiming_uniform_(self.lora_A)
nn.init.kaiming_uniform_(self.lora_B)
else:
raise ValueError("Unknown init_weights type: {}".format(config.init_weights))

Expand Down Expand Up @@ -112,6 +118,7 @@ def __init__(
lora_B_shape,
config: LoRAConfig,
gating_heads: int = 1,
name: str = None,
):
super().__init__()
assert config.composition_mode == "scale", "IA3 module only supports composition_mode='scale'."
Expand All @@ -122,6 +129,7 @@ def __init__(
self.composition_mode = config.composition_mode
self.attn_matrices = config.attn_matrices
self.use_gating = config.use_gating
self.name = name
# Optional dropout
if config.dropout > 0.0:
raise ValueError("IA3 module does not support dropout.")
Expand All @@ -133,7 +141,7 @@ def __init__(
# For compatibility with LoRA, allow all init_weights types here.
# Usually should be "ia3".
if config.init_weights == "lora":
logger.warning("(IA)^3 module initialized with LoRA zeo init. Ignore if this is intended.")
logger.warning("(IA)^3 module initialized with LoRA zero init. Ignore if this is intended.")
nn.init.zeros_(self.lora_B)
elif config.init_weights == "bert":
nn.init.normal_(self.lora_B, std=0.02)
Expand Down Expand Up @@ -177,6 +185,116 @@ def forward(self, hidden_states: Optional[torch.Tensor], layer_input: torch.Tens
return hidden_states, gate


class Vera(nn.Module):
def __init__(
self,
lora_A_shape,
lora_B_shape,
config: LoRAConfig,
gating_heads: int = 1,
name: str = None,
):
super().__init__()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also add an assert for composition mode "add" here (same as in LoRA init), just to make sure

self.d = config.vera_d
self.b = config.vera_b
self.r = config.r
self.alpha = config.alpha
self.use_gating = config.use_gating
self.name = name

# check to make sure that the `composition_mode` is set to `add`
self.composition_mode = config.composition_mode
if self.composition_mode != "add":
raise ValueError("Vera module only supports composition_mode='add'.")

# Optional dropout
if config.dropout > 0.0:
self.lora_dropout = nn.Dropout(p=config.dropout)

self.lora_A_shape = lora_A_shape
self.lora_B_shape = lora_B_shape
self.d_shape = self.lora_A_shape[0]
self.b_shape = self.lora_B_shape[0]

# Actual trainable parameters
self.vera_D = nn.Parameter(torch.diag(torch.ones(self.d_shape) * self.d))
self.vera_B = nn.Parameter(torch.diag(torch.ones(self.b_shape) * self.b))
self.scaling = self.alpha / self.r

if self.use_gating:
self.gate = nn.Linear(lora_A_shape[-1], gating_heads)
nn.init.normal_(self.gate.weight, std=0.02)

@property
def delta_w(self) -> torch.Tensor:
parameters = ForwardContext.get_context().shared_parameters[self.name]
lora_A = parameters["lora_A"]
lora_B = parameters["lora_B"]
return self.vera_B @ lora_B @ self.vera_D @ lora_A

def com(self, weights: torch.Tensor, added: torch.Tensor, scaling=None) -> torch.Tensor:
"""Performs the composition operation between existing and injected weights."""
if scaling is None:
scaling = self.scaling
return weights + added * scaling

def com_inv(self, weights: torch.Tensor, added: torch.Tensor) -> torch.Tensor:
"""Inverts the composition operation between existing and injected weights."""
return weights - added * self.scaling

def forward(self, hidden_states: Optional[torch.Tensor], layer_input: torch.Tensor):
parameters = ForwardContext.get_context().shared_parameters[self.name]
lora_A = parameters["lora_A"]
lora_B = parameters["lora_B"]

if hidden_states is None:
hidden_states = layer_input

if getattr(self, "lora_dropout"):
hidden_states = self.lora_dropout(hidden_states)

hidden_states = hidden_states @ torch.t(self.vera_B @ lora_B @ self.vera_D @ lora_A)

if self.use_gating:
gate = torch.sigmoid(self.gate(layer_input))
gate = torch.mean(gate, dim=1).unsqueeze(-1)
hidden_states = hidden_states * gate
else:
gate = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this is likely merged after #770, the same fix from there should be applied here

hidden_states = hidden_states * self.scaling

return hidden_states, gate


def init_shared_vera_parameters(model_config, adapter_config, device):
hidden_size = model_config.hidden_size
r = adapter_config["r"]

parameters = nn.ParameterDict()

# initialize frozen, random tensors A, B
parameters["lora_A"] = torch.zeros(r, hidden_size).to(device)
parameters["lora_B"] = torch.zeros(hidden_size, r).to(device)

if adapter_config["init_weights"] == "lora":
# initialize A the same way as the default for nn.Linear and B to zero
nn.init.kaiming_uniform_(parameters["lora_A"], a=math.sqrt(5))
nn.init.zeros_(parameters["lora_B"])
elif adapter_config["init_weights"] == "bert":
nn.init.normal_(parameters["lora_A"], std=0.02)
nn.init.normal_(parameters["lora_B"], std=0.02)
elif adapter_config["init_weights"] == "ia3":
nn.init.ones_(parameters["lora_A"])
nn.init.ones_(parameters["lora_B"])
elif adapter_config["init_weights"] == "vera":
nn.init.kaiming_uniform_(parameters["lora_A"])
nn.init.kaiming_uniform_(parameters["lora_B"])
else:
raise ValueError("Unknown init_weights type: {}".format(adapter_config["init_weights"]))

return parameters


class LoRALayer(AdapterLayerBase):
adapter_modules_name = "loras"

Expand All @@ -202,6 +320,7 @@ def _get_lora_shapes(self, config: LoRAConfig):

def add_adapter(self, adapter_name: str, layer_idx: int) -> bool:
self.layer_idx = layer_idx

lora_config = self.adapters_config.match(
adapter_name,
config_type=LoRAConfig,
Expand All @@ -210,7 +329,10 @@ def add_adapter(self, adapter_name: str, layer_idx: int) -> bool:
)
if lora_config is not None and self._check_lora_location(lora_config):
if lora_config.composition_mode == "add":
lora_cls = LoRA
if isinstance(lora_config.vera_d, float) or isinstance(lora_config.vera_b, float):
lora_cls = Vera
else:
lora_cls = LoRA
elif lora_config.composition_mode == "scale":
lora_cls = IA3
else:
Expand All @@ -219,7 +341,9 @@ def add_adapter(self, adapter_name: str, layer_idx: int) -> bool:
*self._get_lora_shapes(lora_config),
lora_config,
gating_heads=self.get_n_heads(lora_config),
name=adapter_name,
)

lora.train(self.training)
lora = lora.to(self.weight.device)
self.loras[adapter_name] = lora
Expand Down
Loading