This repository contains the implementation of Structured State Space Models as part of the final project for the course Development Tools for Scientific Computing held at SISSA during the academic year 2024–2025.
Authors: @FilippoOlivo, @GiovanniCanali.
State Space Models (SSMs) are an emerging class of deep learning architectures that have demonstrated significant promise in the domain of sequence modeling. These models have recently established state-of-the-art performance in tasks such as time series forecasting and audio generation, outperforming traditional recurrent and convolutional approaches in both accuracy and efficiency.
This project investigates structured variants of SSMs with a focus on computational efficiency and scalability. The following architectures are implemented and evaluated:
- S4
- S6
- H3
- Gated MLP
- Mamba
The primary objective is to perform a comparative analysis of the aforementioned models based on the following criteria: accuracy, training time, memory consumption.
To ensure consistency and relevance, the evaluation is conducted on synthetic sequence modeling tasks, including copy task, and selective copy task. These tasks serve as controlled benchmarks to assess the models' ability to retain and manipulate sequential information over long contexts.
Follow these steps to set up the environment:
- Clone the repository and navigate into it:
git clone https://github.com/FilippoOlivo/SSM.git
cd SSM
- Create a Conda environment with Python:
conda create --name ssm-env python=3.12 -y
- Activate the environment:
conda activate ssm-env
- Install the package:
python -m pip install .
- S4 and low-rank S4 blocks: Efficiently Modeling Long Sequences with Structured State Spaces.
- Diagonal S4 block: On the Parameterization and Initialization of Diagonal State Space Models.
- H3 model and shift S4 block: Hungry Hungry Hippos: Towards Language Modeling with State Space Models.
- Mamba model and S6 block: Mamba: Linear-Time Sequence Modeling with Selective State Spaces.
- Parallel scan algorithm: Efficient Parallelization of a Ubiquitous Sequential Computation.
- Swish activation function: Searching for Activation Functions.