Add resume functionality #1107

fcanogab · 2025-02-21T14:38:23Z

Add functionality to resume a garak scan after it has stopped before finishing, for example, because an error.

CLI Argument:
Add --resume, -R to specific the Garak JSONL report file that will be the reference.

Example usage:
$ garak --model_type rest -G rest_config.json --resume ~/.local/share/garak/garak_runs/garak.e784d299-a838-4d20-b18a-6ef8055f9491.report.jsonl

github-actions · 2025-02-21T14:38:38Z

DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅

fcanogab · 2025-02-21T14:45:14Z

I have read the DCO Document and I hereby sign the DCO

fcanogab · 2025-02-21T14:45:59Z

recheck

leondz · 2025-02-21T14:51:27Z

thanks! and I can see this assumes that the config is still available, i.e. doesn't attempt to load it from the report.jsonl. which is good. we'll take a look!

jmartin-tech

Thank you for the interest in moving this issue forward.

There is a lot of runtime state related work that needs to be accounted for here. Support for the various combinations of configuration needs to be in place to officially support resuming a run.

To resume a run all original configuration needs to be loaded, consider the somewhat contrived execution config below:

~/hf_gpu_enabled.yaml

plugins:
  generators:
    huggingface:
      Pipeline:
        hf_args:
          device: cuda
          torch_dtype: float16

With cli call such as:

garak -m huggingface -n meta-llama/Llama-3.1-8B-Instruct --config ~/hf_gpu_enabled.yml -p lmrc

This is a valid way to run garak, note that the runtime configuration is a combination of the configuration file provided which in this case can contains the details required to ensure cuda acceleration of inference for the generator and the cli argument -p which restricts the probe set in this case to the lmrc.* probes.

Depending on where in the set of 7 probe classes that utilize 17 detectors during the run there are a number of runtime state actions and object that may need to be skipped or restored to determine where to resume testing.

A preferred user experience to resume from an unexpected failure might be:

garak --resume <filename_for_failed.report.jsonl>

Or a little less optimally

garak -m huggingface -n meta-llama/Llama-3.1-8B-Instruct --resume <filename_for_failed.report.jsonl>

While it may be possible to require the user to provide the exact original command when adding --resume on the cli, in practice this is not likely to be a reasonable request to enforce in all cases, also some validation that the same command is being run would be needed to be able to trust the output as complete.

A basic diagram of the current runtime flow for the default harness as of v0.10.2, though subject to change, is available here, to resume entry points need to be identified that will either start a probe over if it failed or restore the attempts in memory to be processed by the detectors that are enabled based on the original command.

To that end, there may need to be some execution flow and runtime architecture changes implemented to enable a resumed run to load up state of partially complete probes or detectors or rewind to resume from the last successfully completed probe class or some other verifiable state.

There are a number of pre-requisites for this feature that are not yet well supported in the tooling. One of which is full access to validate the configuration passed is the same as the original run. This is something the project has recently seen needs to be better supported likely in the report.jsonl. Or in lieu of validation some notation in the results that clarifies the resumed run may not be as comprehensive or even consistent with a clean one.

jmartin-tech · 2025-02-21T20:33:24Z

garak/probes/base.py

+            if hasattr(_config.system, 'previous_attempts') and (seq, prompt) in _config.system.previous_attempts:
+                continue


The probe() method should not access _config directly, previously completed state should to be injected in some way.

Some probes also randomize selection of a subset of prompts this would not properly match state to resume.

Add resume functionality

07d49c8

fcanogab mentioned this pull request Feb 21, 2025

support continuation / resume of runs #141

Open

github-actions bot added a commit that referenced this pull request Feb 21, 2025

@fcanogab has signed the CLA in #1107

38aa21f

jmartin-tech reviewed Feb 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add resume functionality #1107

Add resume functionality #1107

fcanogab commented Feb 21, 2025

github-actions bot commented Feb 21, 2025 •

edited

Loading

fcanogab commented Feb 21, 2025

fcanogab commented Feb 21, 2025

leondz commented Feb 21, 2025

jmartin-tech left a comment •

edited

Loading

jmartin-tech Feb 21, 2025

		if hasattr(_config.system, 'previous_attempts') and (seq, prompt) in _config.system.previous_attempts:
		continue

Add resume functionality #1107

Are you sure you want to change the base?

Add resume functionality #1107

Conversation

fcanogab commented Feb 21, 2025

github-actions bot commented Feb 21, 2025 • edited Loading

fcanogab commented Feb 21, 2025

fcanogab commented Feb 21, 2025

leondz commented Feb 21, 2025

jmartin-tech left a comment • edited Loading

Choose a reason for hiding this comment

jmartin-tech Feb 21, 2025

Choose a reason for hiding this comment

github-actions bot commented Feb 21, 2025 •

edited

Loading

jmartin-tech left a comment •

edited

Loading