Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated pipeline for evaluating pre-computed fingerprints #2

Open
wants to merge 48 commits into
base: main
Choose a base branch
from

Conversation

blazejba
Copy link
Member

@blazejba blazejba commented Dec 21, 2023

Included scripts:

  • analyze_results.py - it uses W&B API to access fingerprinting sweeps, and then, for each sweep it finds the "absolutely" best test score and the "fair" best test score (using the lowest validation loss as a proxy).
  • download_datasets.sh - downloads all LargeMix datasets and splits and saves them under the path that the existing configs expect them to exist.
  • finetune_on_fingerprints.py - takes a file with fingerprints and a TDC dataset name and trains a small model, reporting the results to W&B. The script takes a bunch of arguments that allow customizing the model and training. This script is called multiple times when running sweeps. Everything is run on the CPUs and is built independently from Graphium.
  • sweep_checkpoint_on_all_tdc.py - sweeps all 22 TDC datasets, uses finetune_on_fingerprints.py as the main program and the sweep is based on the parameters defined in finetune_on_fingerprints_config.yaml.
  • graphium/cli/get_final_fingerprints.py - allows extraction of fingerprints by running all molecules from TDC through a selected checkpoint. Requires a checkpoint and an associated config. Everything is run on the CPUs. Requires a lot of RAM for larger models.
  • graphium/nn/architectures/global_architectures.py - includes changes that are necessary to run graphium/cli/get_final_fingerprints.py.

The workflow:

  1. [1 day - 2 weeks] Pretrain a model in Graphium
  2. [5 minutes - few hours] Extract fingerprints using graphium/cli/get_final_fingerprints.py; an explanation of how to use it can be found in this issue -> https://github.com/graphcore-research/scaling-molecular-gnns/issues/45
  3. [5 days] Sweep on all TDC using sweep_checkpoint_on_all_tdc.py
  4. [minutes] Analyze results using analyze_results.py
  5. Update the results in the table (ask @blazejba for access)

@blazejba blazejba marked this pull request as draft December 21, 2023 17:04
@blazejba blazejba changed the title Enable quick finetuning on precomputed fingerprints [WIP] Enable quick finetuning on precomputed fingerprints Dec 21, 2023
Copy link

@kerstink-GC kerstink-GC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@blazejba blazejba changed the title [WIP] Enable quick finetuning on precomputed fingerprints Enable quick finetuning on precomputed fingerprints Jan 3, 2024
@blazejba blazejba self-assigned this Jan 3, 2024
@blazejba blazejba marked this pull request as ready for review January 3, 2024 09:01
@blazejba blazejba changed the title Enable quick finetuning on precomputed fingerprints Automated pipeline for evaluating pre-computed fingerprints Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants