Fix: Update docstrings.

huggingface · Dec 20, 2024 · 827133b · 827133b
1 parent ba5c457
commit 827133b
Show file tree

Hide file tree

Showing 2 changed files with 8 additions and 17 deletions.
diff --git a/docs/source/en/model_doc/xlstm.md b/docs/source/en/model_doc/xlstm.md
@@ -14,31 +14,22 @@ rendered properly in your Markdown viewer.
 
 -->
 
-# xLSTM
-
-# xLSTM
-
-# xLSTM
-
-# xLSTM
 
 # xLSTM
 
 ## Overview
 
-The xLSTM model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>.
-<INSERT SHORT SUMMARY HERE>
+The xLSTM model was proposed in [xLSTM: Extended Long Short-Term Memory](https://openreview.net/forum?id=ARAxPPIAhq) by Maximilian Beck*, Korbinian Pöppel*, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter and Sepp Hochreiter.
+xLSTM updates the original LSTM architecture to be competitive with Transformer models, it introduces exponential gating, a matrix memory expansion and parallelizable training / ingestion.
 
-The abstract from the paper is the following:
+The [7B model](https://hf.co/NX-AI/xLSTM-7b) variant was trained by the xLSTM team Maximilian Beck, Korbinian Pöppel, Phillip Lippe, Richard Kurle, Patrick Blies, Sebastian Böck and Sepp Hochreiter at NXAI.
 
-*<INSERT PAPER ABSTRACT HERE>*
-
-Tips:
+The abstract from the paper is the following:
 
-<INSERT TIPS ABOUT MODEL HERE>
+*In the 1990s, the constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and contributed to numerous deep learning success stories, in particular they constituted the first Large Language Models (LLMs). However, the advent of the Transformer technology with parallelizable self-attention at its core marked the dawn of a new era, outpacing LSTMs at scale. We now raise a simple question: How far do we get in language modeling when scaling LSTMs to billions of parameters, leveraging the latest techniques from modern LLMs, but mitigating known limitations of LSTMs? Firstly, we introduce exponential gating with appropriate normalization and stabilization techniques. Secondly, we modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule. Integrating these LSTM extensions into residual block backbones yields xLSTM blocks that are then residually stacked into xLSTM architectures. Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.*
 
-This model was contributed by [INSERT YOUR HF USERNAME HERE](https://huggingface.co/<INSERT YOUR HF USERNAME HERE>).
-The original code can be found [here](<INSERT LINK TO GITHUB REPO HERE>).
+This model was contributed by [NX-AI](https://huggingface.co/NX-AI).
+The original code can be found [here](https://github.com/NX-AI/xlstm).
 
 
 ## xLSTMConfig

diff --git a/src/transformers/models/xlstm/configuration_xlstm.py b/src/transformers/models/xlstm/configuration_xlstm.py
@@ -46,7 +46,7 @@ class xLSTMConfig(PretrainedConfig):
     """
     This is the configuration class to store the configuration of a [`xLSTM`]. It is used to instantiate a xLSTM
     model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
-    defaults will yield a similar configuration to that of the XLSTM
+    defaults will yield a similar configuration to that of the xLSTM-7b [NX-AI/xLSTM-7b](https://huggingface.co/NX-AI/xLSTM-7b) model.
 
     Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
     documentation from [`PretrainedConfig`] for more information.