Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add diffllama #34083

Open
wants to merge 44 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
3bd9e34
first adding diffllama
weak-kajuma Oct 11, 2024
269055e
add Diff Attention and other but still with errors
weak-kajuma Oct 11, 2024
dbbf073
complate make attention Diff-Attention
weak-kajuma Oct 16, 2024
c4ea9df
fix some bugs which may be caused by transformer-cli while adding model
weak-kajuma Oct 16, 2024
e072544
fix a bug caused by forgetting KV cache...
weak-kajuma Oct 16, 2024
674d7a2
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 20, 2024
9eac636
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 20, 2024
0e99dbd
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 20, 2024
1e445c7
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 20, 2024
cca6a5c
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 20, 2024
dd167af
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 20, 2024
23099cb
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 20, 2024
faac378
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 20, 2024
53e13aa
I found Attention missed implemented from paper still on e072544a3bfc…
weak-kajuma Oct 20, 2024
63b018a
re-implemented
weak-kajuma Oct 20, 2024
204bec8
adding groupnorm
weak-kajuma Oct 20, 2024
bce12e5
align with transformers code style
weak-kajuma Oct 20, 2024
44d8423
fix typo
weak-kajuma Oct 20, 2024
6dc6f81
adding groupnorm
weak-kajuma Oct 20, 2024
48b38e8
change SdpaAttention to DiffSdpaAttention
weak-kajuma Oct 20, 2024
997f561
fix bug
weak-kajuma Oct 20, 2024
107bd3c
Update src/transformers/models/diffllama/modeling_diffllama.py
weak-kajuma Oct 21, 2024
26307d9
fix bugs of places of "GroupNorm with scale" and etc
weak-kajuma Oct 21, 2024
22aa145
Revert "fix bugs of places of "GroupNorm with scale" and etc"
weak-kajuma Oct 21, 2024
cc472be
simplify multiple of attention (matmul) operations into one by repeat…
weak-kajuma Oct 22, 2024
e834129
simplify multiple of attention (matmul) operations into one by repeat…
weak-kajuma Oct 22, 2024
e9d94e5
simplify multiple of attention (matmul) operations into one by repeat…
weak-kajuma Oct 22, 2024
0352999
remove missed type
weak-kajuma Oct 22, 2024
843178a
add diffllama model_doc
weak-kajuma Oct 29, 2024
71c8d12
apply make style/quality
weak-kajuma Oct 29, 2024
fea95fa
apply review comment about model
weak-kajuma Oct 30, 2024
b3f8dd5
apply review comment about test
weak-kajuma Oct 30, 2024
50ce353
place diffllama alphabetically on the src/transformers/__init__.py
weak-kajuma Oct 30, 2024
6f25333
fix forgot code
weak-kajuma Oct 31, 2024
dd2282e
Supports parameters that are not initialized with standard deviation …
weak-kajuma Oct 31, 2024
9e7a9c3
add DiffLlamaConfig to CONFIG_CLASSES_TO_IGNORE_FOR_DOCSTRING_CHECKPO…
weak-kajuma Oct 31, 2024
8c98d19
remove unused property of config
weak-kajuma Nov 1, 2024
cbf217d
add to supported model list
weak-kajuma Nov 1, 2024
c873982
add to spda supported model list
weak-kajuma Nov 1, 2024
b003a53
fix copyright, remove pretraining_tensor_parallel, and modify for ini…
weak-kajuma Nov 7, 2024
37c7a88
remove unused import and etc.
weak-kajuma Nov 7, 2024
ba92d5c
empty commit
weak-kajuma Nov 7, 2024
8cc823e
empty commit
weak-kajuma Nov 7, 2024
d47631d
empty commit
weak-kajuma Nov 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -376,6 +376,8 @@
title: DeBERTa-v2
- local: model_doc/dialogpt
title: DialoGPT
- local: model_doc/diffllama
title: DiffLlama
- local: model_doc/distilbert
title: DistilBERT
- local: model_doc/dpr
Expand Down Expand Up @@ -969,4 +971,4 @@
- local: internal/time_series_utils
title: Utilities for Time Series
title: Internal Helpers
title: API
title: API
63 changes: 63 additions & 0 deletions docs/source/en/model_doc/diffllama.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# DiffLlama

## Overview

The DiffLlama model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>.
<INSERT SHORT SUMMARY HERE>

The abstract from the paper is the following:

*<INSERT PAPER ABSTRACT HERE>*

Tips:

<INSERT TIPS ABOUT MODEL HERE>

This model was contributed by [INSERT YOUR HF USERNAME HERE](https://huggingface.co/<INSERT YOUR HF USERNAME HERE>).
The original code can be found [here](<INSERT LINK TO GITHUB REPO HERE>).


## DiffLlamaConfig

[[autodoc]] DiffLlamaConfig

## DiffLlamaModel

[[autodoc]] DiffLlamaModel
- forward

## DiffLlamaForCausalLM

[[autodoc]] DiffLlamaForCausalLM
- forward

## DiffLlamaForSequenceClassification

[[autodoc]] DiffLlamaForSequenceClassification
- forward

## DiffLlamaForQuestionAnswering

[[autodoc]] DiffLlamaForQuestionAnswering
- forward

## DiffLlamaForTokenClassification

[[autodoc]] DiffLlamaForTokenClassification
- forward
28 changes: 28 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -524,6 +524,7 @@
"models.levit": ["LevitConfig"],
"models.lilt": ["LiltConfig"],
"models.llama": ["LlamaConfig"],
"models.diffllama": ["DiffLlamaConfig"],
"models.llava": [
"LlavaConfig",
"LlavaProcessor",
Expand Down Expand Up @@ -1007,6 +1008,7 @@
_import_structure["models.gpt_sw3"].append("GPTSw3Tokenizer")
_import_structure["models.layoutxlm"].append("LayoutXLMTokenizer")
_import_structure["models.llama"].append("LlamaTokenizer")
_import_structure["models.diffllama"].append("DiffLlamaTokenizer")
Cyrilvallez marked this conversation as resolved.
Show resolved Hide resolved
_import_structure["models.m2m_100"].append("M2M100Tokenizer")
_import_structure["models.marian"].append("MarianTokenizer")
_import_structure["models.mbart"].append("MBartTokenizer")
Expand Down Expand Up @@ -2554,6 +2556,16 @@
"LlamaPreTrainedModel",
]
)
_import_structure["models.diffllama"].extend(
[
"DiffLlamaForCausalLM",
"DiffLlamaForQuestionAnswering",
"DiffLlamaForSequenceClassification",
"DiffLlamaForTokenClassification",
"DiffLlamaModel",
"DiffLlamaPreTrainedModel",
]
)
_import_structure["models.llava"].extend(
[
"LlavaForConditionalGeneration",
Expand Down Expand Up @@ -4728,6 +4740,7 @@
)
_import_structure["models.gptj"].extend(["FlaxGPTJForCausalLM", "FlaxGPTJModel", "FlaxGPTJPreTrainedModel"])
_import_structure["models.llama"].extend(["FlaxLlamaForCausalLM", "FlaxLlamaModel", "FlaxLlamaPreTrainedModel"])
_import_structure["models.diffllama"].extend(["FlaxDiffLlamaForCausalLM", "FlaxDiffLlamaModel", "FlaxDiffLlamaPreTrainedModel"])
_import_structure["models.gemma"].extend(["FlaxGemmaForCausalLM", "FlaxGemmaModel", "FlaxGemmaPreTrainedModel"])
_import_structure["models.longt5"].extend(
[
Expand Down Expand Up @@ -5364,6 +5377,7 @@
from .models.levit import LevitConfig
from .models.lilt import LiltConfig
from .models.llama import LlamaConfig
from .models.diffllama import DiffLlamaConfig
from .models.llava import (
LlavaConfig,
LlavaProcessor,
Expand Down Expand Up @@ -5903,6 +5917,7 @@
from .models.gpt_sw3 import GPTSw3Tokenizer
from .models.layoutxlm import LayoutXLMTokenizer
from .models.llama import LlamaTokenizer
from .models.diffllama import DiffLlamaTokenizer
from .models.m2m_100 import M2M100Tokenizer
from .models.marian import MarianTokenizer
from .models.mbart import MBartTokenizer
Expand Down Expand Up @@ -7207,6 +7222,14 @@
LlamaModel,
LlamaPreTrainedModel,
)
from .models.diffllama import (
DiffLlamaForCausalLM,
DiffLlamaForQuestionAnswering,
DiffLlamaForSequenceClassification,
DiffLlamaForTokenClassification,
DiffLlamaModel,
DiffLlamaPreTrainedModel,
)
from .models.llava import (
LlavaForConditionalGeneration,
LlavaPreTrainedModel,
Expand Down Expand Up @@ -8956,6 +8979,11 @@
FlaxLlamaModel,
FlaxLlamaPreTrainedModel,
)
from .models.diffllama import (
FlaxDiffLlamaForCausalLM,
FlaxDiffLlamaModel,
FlaxDiffLlamaPreTrainedModel,
)
from .models.longt5 import (
FlaxLongT5ForConditionalGeneration,
FlaxLongT5Model,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@
levit,
lilt,
llama,
diffllama,
llava,
llava_next,
llava_next_video,
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@
("levit", "LevitConfig"),
("lilt", "LiltConfig"),
("llama", "LlamaConfig"),
("diffllama", "DiffLlamaConfig"),
("llava", "LlavaConfig"),
("llava_next", "LlavaNextConfig"),
("llava_next_video", "LlavaNextVideoConfig"),
Expand Down Expand Up @@ -452,6 +453,7 @@
("levit", "LeViT"),
("lilt", "LiLT"),
("llama", "LLaMA"),
("diffllama", "DiffDiffLlama"),
weak-kajuma marked this conversation as resolved.
Show resolved Hide resolved
("llama2", "Llama2"),
("llama3", "Llama3"),
("llava", "LLaVa"),
Expand Down
5 changes: 5 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,7 @@
("levit", "LevitModel"),
("lilt", "LiltModel"),
("llama", "LlamaModel"),
("diffllama", "DiffLlamaModel"),
("longformer", "LongformerModel"),
("longt5", "LongT5Model"),
("luke", "LukeModel"),
Expand Down Expand Up @@ -497,6 +498,7 @@
("jamba", "JambaForCausalLM"),
("jetmoe", "JetMoeForCausalLM"),
("llama", "LlamaForCausalLM"),
("diffllama", "DiffLlamaForCausalLM"),
("mamba", "MambaForCausalLM"),
("mamba2", "Mamba2ForCausalLM"),
("marian", "MarianForCausalLM"),
Expand Down Expand Up @@ -954,6 +956,7 @@
("led", "LEDForSequenceClassification"),
("lilt", "LiltForSequenceClassification"),
("llama", "LlamaForSequenceClassification"),
("diffllama", "DiffLlamaForSequenceClassification"),
("longformer", "LongformerForSequenceClassification"),
("luke", "LukeForSequenceClassification"),
("markuplm", "MarkupLMForSequenceClassification"),
Expand Down Expand Up @@ -1039,6 +1042,7 @@
("led", "LEDForQuestionAnswering"),
("lilt", "LiltForQuestionAnswering"),
("llama", "LlamaForQuestionAnswering"),
("diffllama", "DiffLlamaForQuestionAnswering"),
("longformer", "LongformerForQuestionAnswering"),
("luke", "LukeForQuestionAnswering"),
("lxmert", "LxmertForQuestionAnswering"),
Expand Down Expand Up @@ -1136,6 +1140,7 @@
("layoutlmv3", "LayoutLMv3ForTokenClassification"),
("lilt", "LiltForTokenClassification"),
("llama", "LlamaForTokenClassification"),
("diffllama", "DiffLlamaForTokenClassification"),
("longformer", "LongformerForTokenClassification"),
("luke", "LukeForTokenClassification"),
("markuplm", "MarkupLMForTokenClassification"),
Expand Down
7 changes: 7 additions & 0 deletions src/transformers/models/auto/tokenization_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,13 @@
"LlamaTokenizerFast" if is_tokenizers_available() else None,
),
),
(
"diffllama",
(
"LlamaTokenizer" if is_sentencepiece_available() else None,
"LlamaTokenizerFast" if is_tokenizers_available() else None,
),
),
("llava", ("LlamaTokenizer", "LlamaTokenizerFast" if is_tokenizers_available() else None)),
("llava-onevision", ("LlamaTokenizer", "LlamaTokenizerFast" if is_tokenizers_available() else None)),
("llava_next", ("LlamaTokenizer", "LlamaTokenizerFast" if is_tokenizers_available() else None)),
Expand Down
63 changes: 63 additions & 0 deletions src/transformers/models/diffllama/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Copyright 2024 EleutherAI and The HuggingFace Inc. team. All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the company here, it is very likely wrong!

#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import (
OptionalDependencyNotAvailable,
_LazyModule,
is_torch_available,
)


_import_structure = {
"configuration_diffllama": ["DiffLlamaConfig"],
}

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_diffllama"] = [
"DiffLlamaForCausalLM",
"DiffLlamaModel",
"DiffLlamaPreTrainedModel",
"DiffLlamaForSequenceClassification",
"DiffLlamaForQuestionAnswering",
"DiffLlamaForTokenClassification",
]

if TYPE_CHECKING:
from .configuration_diffllama import DiffLlamaConfig

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_diffllama import (
DiffLlamaForCausalLM,
DiffLlamaForQuestionAnswering,
DiffLlamaForSequenceClassification,
DiffLlamaForTokenClassification,
DiffLlamaModel,
DiffLlamaPreTrainedModel,
)

else:
import sys

sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
Cyrilvallez marked this conversation as resolved.
Show resolved Hide resolved
Loading