Skip to content

integrate dev branch and update software environments #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 45 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
4a3fdc3
run dev branch
btraven00 May 8, 2025
e89adda
docs: use the public repo URI
btraven00 Mar 17, 2025
52ebb55
chore: add convenience target to build singularity env
Mar 18, 2025
83c6f0b
feat: parametrize num threads on the makefile
Mar 18, 2025
dc2d629
chore: ignore common temporary outputs and image build artifacts
Mar 18, 2025
f91603a
feat: parametrize num threads on the makefile
Mar 18, 2025
bea2a75
fix: use --cores, --task-timeout
May 5, 2025
67e8cf8
update .eb files to easybuild 5.0
btraven00 May 7, 2025
931389f
remove remote storage
btraven00 May 8, 2025
60ac47b
do not run artifact if not in main repo
btraven00 May 8, 2025
1b972bf
Update Makefile
btraven00 May 8, 2025
49646db
streamline envmodules yaml
btraven00 May 8, 2025
fc53991
update clustbench
btraven00 May 8, 2025
54b7279
add rmarkdown-python bundles, without checksums
btraven00 May 10, 2025
1b57e44
inject checksums to rmarkdown easyconfig
btraven00 May 10, 2025
dfd5b93
update sklearn singularity definition
btraven00 May 11, 2025
0056b7f
factorize sklearn singularity pip block
btraven00 May 11, 2025
cef3a6b
extract variable in build script
btraven00 May 11, 2025
2ee17ca
revert include, should use m4
btraven00 May 11, 2025
c4cbe5c
update python version
btraven00 May 11, 2025
21bdd66
do a little bit of cleanup with the multiple envs
btraven00 May 11, 2025
e8e0f7e
escape
btraven00 May 11, 2025
a8336fb
install updated python
btraven00 May 11, 2025
518c2f6
sync the two build recipes
btraven00 May 11, 2025
2f4131f
delete source folder
btraven00 May 11, 2025
c72eb27
add microbenchmark for numpy operations
btraven00 May 11, 2025
937e455
fix path
btraven00 May 11, 2025
b0bd85a
default reps
btraven00 May 11, 2025
83f9b07
refs
btraven00 May 11, 2025
744c978
duplicate the apptainer clustering yaml
btraven00 May 12, 2025
ec18dcf
update the oras yaml. not working, just to keep in sync
btraven00 May 12, 2025
cf52a2c
update the rmarkdown environment
btraven00 May 12, 2025
934ce8b
update makefile
btraven00 May 12, 2025
3890cb4
add apptainer definition for rmarkdown
btraven00 May 12, 2025
c80adc1
remove unneeded dependencies
btraven00 May 12, 2025
b19a489
update makefile
btraven00 May 12, 2025
ebd69b7
cleanup r/fcps deps
btraven00 May 12, 2025
1afaa2f
cleanup image
btraven00 May 12, 2025
9e2168a
update readme
btraven00 May 12, 2025
6199c0a
fixes
btraven00 May 12, 2025
b017cb0
apptainer smoketest
btraven00 May 12, 2025
98777a5
add git in the image
btraven00 May 12, 2025
f4ae29d
try to debug fastcluster problem
btraven00 May 12, 2025
72cdc59
fail if the exit code fails
btraven00 May 14, 2025
01243de
use conda short for test
btraven00 May 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 5 additions & 8 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ jobs:
run-benchmark:
name: Run Benchmark
runs-on: ubuntu-latest
## runs-on: self-hosted
steps:
- name: Check out repository
uses: actions/checkout@v4
Expand Down Expand Up @@ -49,7 +48,7 @@ jobs:
shell: bash -l {0}
run: |
mamba install -y pip
pip install git+https://github.com/omnibenchmark/omnibenchmark.git@reduce_install_scope
pip install git+https://github.com/omnibenchmark/omnibenchmark.git@dev

- name: Load benchmark cache
id: cache-benchmark
Expand All @@ -60,16 +59,15 @@ jobs:

- name: Run benchmark
shell: bash -l {0}
continue-on-error: true
continue-on-error: false
run: |
echo "y" | ob run benchmark -b Clustering.yaml --local --cores 3 --continue-on-error
echo "y" | ob run benchmark -b Clustering_conda_smoketest.yml --local --cores 3 --continue-on-error

upload-artifact:
name: Benchmark Artifact
runs-on: ubuntu-latest
## runs-on: self-hosted
needs: run-benchmark
if: always()
if: github.ref == 'refs/heads/main' && github.repository_owner == 'omnibenchmark'
steps:
- name: Check out repository
uses: actions/checkout@v4
Expand Down Expand Up @@ -100,12 +98,11 @@ jobs:

- name: Deploy to GitHub Pages
uses: actions/deploy-pages@v4

- name: Create Job Summary
if: always()
run: |
echo "### Reports" >> $GITHUB_STEP_SUMMARY
echo "- [Plotting Report](https://${{ github.repository_owner }}.github.io/${{ github.event.repository.name }})" >> $GITHUB_STEP_SUMMARY
echo "### All Outputs" >> $GITHUB_STEP_SUMMARY
echo "- [Complete Benchmark Output](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}#artifacts)" >> $GITHUB_STEP_SUMMARY

10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# image build artifacts
envs/*.sif

# snakemake
snakemake.log
.snakemake/

# vim swaps
*.swp
*.swo
6 changes: 3 additions & 3 deletions Clustering.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@ id: clustering_example
description: Clustering benchmark on Gagolewski's, true number of clusters plus minus 2.
version: 1.2
benchmarker: "Izaskun Mallona, Daniel Incicau"
storage: https://play.min.io
benchmark_yaml_spec: 0.04
storage_api: S3
storage_bucket_name: clustering_example
# storage: https://play.min.io
# storage_api: S3
# storage_bucket_name: clustering_example
software_backend: conda
software_environments:
clustbench:
Expand Down
160 changes: 33 additions & 127 deletions Clustering_singularity.yml → Clustering_apptainer_optimized.yml
Original file line number Diff line number Diff line change
@@ -1,42 +1,37 @@
id: clustering_example_apptainer
id: clustering_example_apptainer_optimized
description: Clustering benchmark on Gagolewski's, true number of clusters plus minus 2.
version: 1.4
benchmarker: "Izaskun Mallona, Daniel Incicau"
storage: http://omnibenchmark.org:9000
benchmark_yaml_spec: 0.04
storage_api: S3
storage_bucket_name: clusteringexampleapptainer

version: 1.5
benchmarker: "Izaskun Mallona, Daniel Incicau, Ben Carrillo"
benchmark_yaml_spec: 0.4

software_backend: apptainer

software_environments:

clustbench:
description: "clustbench on py3.12.6"
conda: envs/clustbench.yml
envmodule: clustbench
apptainer: envs/clustbench.sif
sklearn:
description: "Daniel's on py3.12.6"
conda: envs/sklearn.yml
apptainer: envs/sklearn.sif
envmodule: clustbench # not true, but
R:
description: "Daniel's R with readr, dplyr, mclust, caret"
conda: envs/r.yml
apptainer: envs/r.sif
envmodule: fcps # not true, but
rmarkdown:
description: "R with some plotting dependencies"
conda: envs/rmarkdown.yml
apptainer: envs/r.sif # not true, but
envmodule: fcps # not true, but
description: "clustbench on py3.12.9, optimized python build"
conda: envs/clustbench.yml # not used
envmodule: na
apptainer: envs/clustbench-optimized.sif

fcps:
description: "CRAN's FCPS"
conda: envs/fcps.yml
conda: envs/fcps.yml # not used
envmodule: na
apptainer: envs/fcps.sif
envmodule: fcps

rmarkdown:
description: "R with some plotting dependencies"
conda: envs/rmarkdown.yml # not used
envmodule: na
apptainer: envs/rmarkdown.sif


metric_collectors:
- id: plotting
name: "Single-backend metric collector."
software_environment: "rmarkdown"
software_environment: rmarkdown
repository:
url: https://github.com/imallona/clustering_report
commit: 1d6bdf5
Expand All @@ -45,14 +40,14 @@ metric_collectors:
outputs:
- id: plotting.html
path: "{input}/{name}/plotting_report.html"

stages:
## clustbench data ##########################################################

- id: data
modules:
- id: clustbench
name: "clustbench datasets, from https://www.sciencedirect.com/science/article/pii/S0020025521010082#t0005 Table1"
software_environment: "clustbench"
software_environment: clustbench
repository:
url: https://github.com/imallona/clustbench_data
commit: 366c5a2
Expand Down Expand Up @@ -125,16 +120,13 @@ stages:
- id: data.true_labels
path: "{input}/{stage}/{module}/{params}/{dataset}.labels0.gz"

## clustbench methods (fastcluster) ###################################################################

- id: clustering
modules:
- id: fastcluster
name: "fastcluster algorithm"
software_environment: "clustbench"
software_environment: clustbench
repository:
url: https://github.com/imallona/clustbench_fastcluster
# url: /home/imallona/src/clustbench_fastcluster/
commit: "45e43d3"
parameters:
- values: ["--linkage", "complete"]
Expand All @@ -143,12 +135,12 @@ stages:
- values: ["--linkage", "weighted"]
- values: ["--linkage", "median"]
- values: ["--linkage", "centroid"]

- id: sklearn
name: "sklearn"
software_environment: "clustbench"
name: sklearn
software_environment: clustbench
repository:
url: https://github.com/imallona/clustbench_sklearn
#url: /home/imallona/src/clustbench_sklearn
commit: 5877378
parameters:
- values: ["--method", "birch"]
Expand All @@ -166,8 +158,8 @@ stages:
- values: ["--linkage", "complete"]
- values: ["--linkage", "ward"]
- id: genieclust
name: "genieclust"
software_environment: "clustbench"
name: genieclust
software_environment: clustbench
repository:
url: https://github.com/imallona/clustbench_genieclust
commit: 6090043
Expand Down Expand Up @@ -206,7 +198,7 @@ stages:
modules:
- id: partition_metrics
name: "clustbench partition metrics"
software_environment: "clustbench"
software_environment: clustbench
repository:
url: https://github.com/imallona/clustbench_metrics
commit: 9132d45
Expand All @@ -229,89 +221,3 @@ stages:
outputs:
- id: metrics.scores
path: "{input}/{stage}/{module}/{params}/{dataset}.scores.gz"

# ## daniel's data ###########################################################################

# - id: danielsdata
# modules:
# - id: iris_manual
# name: "Iris Dataset"
# software_environment: "sklearn"
# repository:
# url: https://github.com/omnibenchmark-example/iris.git
# commit: 47c63f0
# - id: penguins
# name: "Penguins Dataset"
# software_environment: "sklearn"
# repository:
# url: https://github.com/omnibenchmark-example/penguins.git
# commit: 9032478
# outputs:
# - id: data.features
# path: "{input}/{stage}/{module}/{params}/{dataset}.features.csv"
# - id: data.labels
# path: "{input}/{stage}/{module}/{params}/{dataset}.labels.csv"

# ## daniel's distances ########################################################################

# - id: distances
# modules:
# - id: D1
# software_environment: "sklearn"
# parameters:
# - values: ["--measure", "cosine"]
# - values: ["--measure", "euclidean"]
# - values: ["--measure", "manhattan"]
# - values: ["--measure", "chebyshev"]
# repository:
# url: https://github.com/omnibenchmark-example/distance.git
# commit: dd99d4f
# inputs:
# - entries:
# - data.features
# outputs:
# - id: distances
# path: "{input}/{stage}/{module}/{params}/{dataset}.distances.csv"

# ## daniel's methods ###################################################################

# - id: danielmethods
# modules:
# - id: kmeans
# software_environment: "sklearn"
# repository:
# url: https://github.com/omnibenchmark-example/kmeans.git
# commit: 049c8b1
# - id: ward
# software_environment: "R"
# repository:
# url: https://github.com/omnibenchmark-example/ward.git
# commit: 976e3f3
# inputs:
# - entries:
# - distances
# outputs:
# - id: methods.clusters
# path: "{input}/{stage}/{module}/{params}/{dataset}.clusters.csv"

# ## daniel's metrics ###################################################################

# - id: danielsmetrics
# modules:
# - id: ari
# software_environment: "R"
# repository:
# url: https://github.com/omnibenchmark-example/ari.git
# commit: 72708f0
# - id: accuracy
# software_environment: "R"
# repository:
# url: https://github.com/omnibenchmark-example/accuracy.git
# commit: e26b32f
# inputs:
# - entries:
# - methods.clusters
# - data.labels
# outputs:
# - id: metrics.mapping
# path: "{input}/{stage}/{module}/{params}/{dataset}.metrics.txt"
Loading
Loading