Skip to content

INGV/ScatCluster

Repository files navigation

ScatCluster

A workflow for clustering continuous time series with a deep scattering network.

Python Version PyPI version GitHub last commit Documentation Status

Installation

Installation is achieved via pip. To install using pip, execute the

# CPU-only installation
pip install scatcluster

If you want to use a GPU, you need to install the package with the package CuPy. The code will try to find it and use it if it is installed. You can install it with the following command.

# GPU usage
pip install scatcluster[gpu]

Getting Started

To help you get started with ScatCluster, we have created a set of notebooks located in the ./docs/source/getting_started directory. These notebooks guide you through the entire workflow, which is divided into three main sections: Processing, Visualisation, and Analysis. Below is a detailed description of each section and its corresponding notebooks.

1. Processing

The Processing section includes three notebooks that prepare your data for interpretation:

  • 01_processing_01_choose_network.ipynb: This notebook offers an interactive means to determine the best network architecture for your dataset.

  • 01_processing_02_set_config.ipynb: In this notebook, you will set the parameters for your experiment and specify the location where results will be stored. This step is crucial for ensuring that your data processing and analysis are properly configured.

  • 01_processing_03_processing_workflow.ipynb: This notebook executes the processing of the ScatCluster transform. The duration of this step depends on the dataset parameters you set in the previous notebook. It is designed to handle the heavy lifting of data transformation, preparing your time series data for clustering.

2. Analysis

The Analysis section contains notebook that helps you interpret the clustering results:

  • 02_analysis_01_dendrograms_waveforms.ipynb: Use this notebook to build hierarchical clustering diagrams and generate appropriate clusters. Visualization is key to understanding the structure and relationships within your time series data.

  • 02_analysis_02_timewindow_cluster_comparison.ipynb: This notebook allows you to analyze the different clusters identified in the previous stages. By comparing clusters across different time windows, you can gain insights into the dynamics and patterns within your time series data.

By following these notebooks, you will be able to execute the full workflow of ScatCluster, from data preparation and transformation to visualization and in-depth analysis. This structured approach ensures a comprehensive understanding and effective clustering of your time series data. Happy clustering!

Dependencies

This package can be executed on CPU or GPU respectively. The main dependencies are:

  • obspy>=1.4.0
  • scatseisnet>=0.2.1
  • fastcluster

If GPU is required, ensure that cupy>=11.3.0 is available.

Additional dependencies are available within requirements.txt.

Documentation

Please check the documentation.

Question and Issues

Should you have any issues with this code, you can reach out on [email protected] or [email protected]

Contribution guidelines

Thank you for your interest in contributing to this project! Here are some guidelines to help ensure a smooth and successful contribution process. Please read them carefully before contributing. We are happy to answer any questions you may have, and to welcome you as a contributor.

  1. Fork the project to your own GitHub account by clicking the "Fork" button in the top right corner of the repository page. This will allow you to make changes to the project without affecting the main project.

  2. Create a new branch for your contribution. This will keep your changes separate from the main branch and make it easier to review and merge your changes. The name of your branch should be concise and descriptive. For example, if you are adding a new feature, you might call your branch "add-feature".

  3. Write concise commit messages that describe the changes you made. Use the present tense and avoid redundant information. We try to follow the Conventional Commits specification.

  4. Make sure your changes work as intended and do not introduce new bugs or problems. Write tests if applicable.

  5. Document your changes with following the numpydoc format. This step is important to ensure that the package documentation is up-to-date and complete. If you are not sure about this step, we can help you.

  6. When you are ready to submit your changes, create a pull request from your branch to the main branch of the original repository. Provide a clear description of your changes and why they are necessary, and we will review your contribution.

Thank you again for your interest in contributing to this project!

Making a Release

To make a release, please follow these steps:

  1. Stage all your changes with:

    git stage .
  2. Commit the code to the master branch. Make sure to use a proper commit message. Use the following command:

    git commit -s -m "this is a proper commit message"
  3. Tag the code with the next version number x.x.x. Please note that version numbering needs to follow proper major, minor, micro increments. Use the following command:

    git tag x.x.x
  4. Push to origin with:

    git push origin --tags

    This will trigger GitHub Actions which will:

    1. Automatically push to PyPI and PyPI test: GitHub Actions PyPI release workflow

    2. Run pre-commit changes to check code quality: GitHub Actions pre-commit workflow

    3. Build documentation: GitHub Actions documentation build workflow