Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download all LargeMix datasets in a single command #1

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 0 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@

---

[![Run on Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://ipu.dev/sdGggS)
[![PyPI](https://img.shields.io/pypi/v/graphium)](https://pypi.org/project/graphium/)
[![Conda](https://img.shields.io/conda/v/conda-forge/graphium?label=conda&color=success)](https://anaconda.org/conda-forge/graphium)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/graphium)](https://pypi.org/project/graphium/)
Expand Down Expand Up @@ -34,10 +33,6 @@ A deep learning library focused on graph representation learning for real-world

Visit https://graphium-docs.datamol.io/.

[![Run on Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://ipu.dev/sdGggS)

You can try running Graphium on Graphcore IPUs for free on Gradient by clicking on the button above.

## Installation for developers

### For CPU and GPU developers
Expand Down
33 changes: 33 additions & 0 deletions download_datasets.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/bash

# Function to download dataset
download_dataset() {
local dataset_url=$1
local dataset_path=$2

# Create directory if it does not exist
mkdir -p $(dirname "${dataset_path}")

# Download the dataset
wget -O "${dataset_path}" "${dataset_url}"
}

# L1000_VCAP
download_dataset "https://storage.googleapis.com/graphium-public/datasets/neurips_2023/Large-dataset/LINCS_L1000_VCAP_0-4.csv.gz" "graphium/data/neurips2023/large-dataset/LINCS_L1000_VCAP_0-2_th2.csv.gz"
download_dataset "https://storage.googleapis.com/graphium-public/datasets/neurips_2023/Large-dataset/l1000_vcap_random_splits.pt" "graphium/data/neurips2023/large-dataset/l1000_vcap_random_splits.pt"

# l1000_MCF7
download_dataset "https://storage.googleapis.com/graphium-public/datasets/neurips_2023/Large-dataset/LINCS_L1000_MCF7_0-4.csv.gz" "graphium/data/neurips2023/large-dataset/LINCS_L1000_MCF7_0-2_th2.csv.gz"
download_dataset "https://storage.googleapis.com/graphium-public/datasets/neurips_2023/Large-dataset/l1000_mcf7_random_splits.pt" "graphium/data/neurips2023/large-dataset/l1000_mcf7_random_splits.pt"

# PCBA_1328
download_dataset "https://storage.googleapis.com/graphium-public/datasets/neurips_2023/Large-dataset/PCBA_1328_1564k.parquet" "graphium/data/neurips2023/large-dataset/PCBA_1328_1564k.parquet"
download_dataset "https://storage.googleapis.com/graphium-public/datasets/neurips_2023/Large-dataset/pcba_1328_random_splits.pt" "graphium/data/neurips2023/large-dataset/pcba_1328_random_splits.pt"

# PCQM4M_G25
download_dataset "https://storage.googleapis.com/graphium-public/datasets/neurips_2023/Large-dataset/PCQM4M_G25_N4.parquet" "graphium/data/neurips2023/large-dataset/PCQM4M_G25_N4.parquet"
download_dataset "https://storage.googleapis.com/graphium-public/datasets/neurips_2023/Large-dataset/pcqm4m_g25_n4_random_splits.pt" "graphium/data/neurips2023/large-dataset/pcqm4m_g25_n4_random_splits.pt"

# PCQM4M_N4
download_dataset "https://storage.googleapis.com/graphium-public/datasets/neurips_2023/Large-dataset/PCQM4M_G25_N4.parquet" "graphium/data/neurips2023/large-dataset/PCQM4M_G25_N4.parquet"
download_dataset "https://storage.googleapis.com/graphium-public/datasets/neurips_2023/Large-dataset/pcqm4m_g25_n4_random_splits.pt" "graphium/data/neurips2023/large-dataset/pcqm4m_g25_n4_random_splits.pt"