Skip to content

feat: support opt-in $ref loading in workflows#1867

Open
njzjz-bot wants to merge 11 commits intodeepmodeling:develfrom
njzjz-bot:feat/allow-ref-secure-default
Open

feat: support opt-in $ref loading in workflows#1867
njzjz-bot wants to merge 11 commits intodeepmodeling:develfrom
njzjz-bot:feat/allow-ref-secure-default

Conversation

@njzjz-bot
Copy link
Contributor

@njzjz-bot njzjz-bot commented Feb 24, 2026

Summary

Enable dargs $ref loading in key DP-GEN workflows with explicit opt-in.

What changed

  • Add allow_ref parameter to dpgen.util.normalize
  • Thread allow_ref into:
    • dpgen run
    • dpgen simplify
    • dpgen init_reaction
  • Add CLI option --allow-ref for the commands above
  • Keep default behavior unchanged (disabled by default)
  • Bump minimal dependency to dargs>=0.5.0
  • Document usage in docs:
    • doc/run/overview-of-the-run-process.md
    • doc/simplify/simplify.md
    • doc/init/init-reaction.md
  • Add parameter docs for updated function signature

Security

$ref loading is disabled by default and must be explicitly enabled.

Authored by OpenClaw (model: gpt-5.3-codex)

Summary by CodeRabbit

  • New Features

    • Added a new --allow-ref CLI flag to init-reaction, run, and simplify to optionally allow loading external JSON/YAML snippets (disabled by default for security).
  • Documentation

    • Updated guides and examples to show how to use the --allow-ref flag.
  • Tests

    • Added checks that the help output advertises the --allow-ref flag for relevant subcommands.
  • Dependencies

    • Bumped dargs requirement to >=0.5.0.

njzjz-bot and others added 2 commits February 24, 2026 19:14
- add --allow-ref to run/simplify/init_reaction\n- add allow_ref parameter in normalize path\n- keep  disabled by default for security\n- bump minimum dargs to >=0.5.0\n- document usage in README\n\nAuthored by OpenClaw (model: gpt-5.3-codex)
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

Warning

Rate limit exceeded

@njzjz-bot has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 38 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between bbabb98 and f259aa7.

📒 Files selected for processing (1)
  • tests/test_cli.py
📝 Walkthrough

Walkthrough

Adds an explicit allow_ref option to normalization and runtime paths, threads it through CLI, generator, simplify, and data modules to enable optional $ref-based external snippet loading (disabled by default). Also bumps dargs to >=0.5.0 and updates docs/tests.

Changes

Cohort / File(s) Summary
Core normalization & plumbing
dpgen/util.py, dpgen/data/reaction.py
normalize signature gains allow_ref: bool = False; calls to normalization in reaction now pass the flag.
Generator runtime
dpgen/generator/run.py
run_iter signature gains allow_ref: bool = False and passes it into normalize; call sites updated to propagate flag.
Simplify flow
dpgen/simplify/simplify.py
run_iter updated to accept allow_ref (docstring updated) and forwards it to normalize; invocation sites updated.
CLI
dpgen/main.py
Added --allow-ref (store_true) to init_reaction, run, and simplify subcommands; flag wired into downstream calls.
Documentation
doc/init/init-reaction.md, doc/run/overview-of-the-run-process.md, doc/simplify/simplify.md
Notes and examples added documenting --allow-ref usage and security implications.
Tests
tests/test_cli.py
Help-output tests updated to assert presence of --allow-ref for the relevant subcommands.
Project metadata
pyproject.toml
Bumped dargs requirement from >=0.4.0 to >=0.5.0.

Sequence Diagram(s)

sequenceDiagram
    participant U as User/CLI
    participant M as dpgen/main
    participant R as run_iter (generator/simplify)
    participant N as dpgen.util.normalize
    participant L as ExternalRefLoader

    U->>M: invoke subcommand (e.g. run --allow-ref)
    M->>R: call run_iter(..., allow_ref=flag)
    R->>N: normalize(data, allow_ref=flag)
    alt allow_ref == true
        N->>L: resolve `$ref` entries from external files
        L-->>N: loaded snippets
    else
        N-->>R: validate/normalize without external refs
    end
    N-->>R: normalized data
    R-->>M: continue processing
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: support opt-in $ref loading in workflows' directly summarizes the main change: adding optional support for external reference loading ($ref) in DP-GEN workflows with a secure-by-default approach.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
dpgen/simplify/simplify.py (1)

511-529: ⚠️ Potential issue | 🟡 Minor

Update run_iter docstring to include allow_ref.

The signature changed but the docstring doesn’t describe the new parameter, so the Numpy-style documentation is now incomplete.

📝 Proposed docstring update
@@
     06 make_fp
     07 run_fp (same as generator)
     08 post_fp (same as generator)
+
+    Parameters
+    ----------
+    param_file : str
+        parameter file (json/yaml)
+    machine_file : str
+        machine file (json/yaml)
+    allow_ref : bool, default=False
+        allow external JSON/YAML snippets via $ref during normalization
     """

As per coding guidelines: “Write Numpy-style docstrings for Python code”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dpgen/simplify/simplify.py` around lines 511 - 529, The docstring for
function run_iter is missing documentation for the new allow_ref parameter;
update the Numpy-style docstring at the top of run_iter to describe allow_ref
(type: bool), its default value, and its behavior/impact on normalization (e.g.,
whether it allows references during normalize), and add it to the Parameters
section alongside param_file and machine_file so the docstring is complete and
follows the project’s Numpy-style docs.
dpgen/util.py (1)

81-105: ⚠️ Potential issue | 🟡 Minor

Document the new allow_ref parameter in the docstring.

The normalize docstring is now missing the new parameter, which makes the Numpy-style documentation incomplete.

📝 Proposed docstring update
@@
     strict_check : bool, default=True
         strict check data or not
+    allow_ref : bool, default=False
+        allow external JSON/YAML snippets via $ref during normalization

As per coding guidelines: “Write Numpy-style docstrings for Python code”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dpgen/util.py` around lines 81 - 105, The docstring for function normalize is
missing the new allow_ref parameter; update the Numpy-style "Parameters" section
of normalize to document allow_ref: describe it as a bool (default False) that
controls whether reference values are permitted during normalization and
checking, and mention its effect on calls to arginfo.normalize_value and
arginfo.check_value so readers know it toggles allow_ref behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dpgen/data/reaction.py`:
- Around line 214-215: The call to normalize uses args.allow_ref which raises
AttributeError for programmatic callers without that attribute; update the call
in the gen_init_reaction flow to read allow_ref via a safe fallback (e.g., use
getattr(args, "allow_ref", False) or check hasattr(args, "allow_ref") and
default to False) when computing jdata (after
init_reaction_jdata_arginfo()/before calling normalize) so legacy args objects
don't break.

---

Outside diff comments:
In `@dpgen/simplify/simplify.py`:
- Around line 511-529: The docstring for function run_iter is missing
documentation for the new allow_ref parameter; update the Numpy-style docstring
at the top of run_iter to describe allow_ref (type: bool), its default value,
and its behavior/impact on normalization (e.g., whether it allows references
during normalize), and add it to the Parameters section alongside param_file and
machine_file so the docstring is complete and follows the project’s Numpy-style
docs.

In `@dpgen/util.py`:
- Around line 81-105: The docstring for function normalize is missing the new
allow_ref parameter; update the Numpy-style "Parameters" section of normalize to
document allow_ref: describe it as a bool (default False) that controls whether
reference values are permitted during normalization and checking, and mention
its effect on calls to arginfo.normalize_value and arginfo.check_value so
readers know it toggles allow_ref behavior.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 663d8d6 and 8aa1294.

📒 Files selected for processing (7)
  • README.md
  • dpgen/data/reaction.py
  • dpgen/generator/run.py
  • dpgen/main.py
  • dpgen/simplify/simplify.py
  • dpgen/util.py
  • pyproject.toml

- document --allow-ref in run/simplify/init_reaction docs\n- add allow_ref parameter docs in dpgen.util.normalize\n\nAuthored by OpenClaw (model: gpt-5.3-codex)
@njzjz-bot njzjz-bot changed the title feat: add opt-in `` support (secure by default) feat: support opt-in $ref loading in workflows (secure by default) Feb 24, 2026
@njzjz njzjz requested a review from Copilot February 24, 2026 19:41
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds opt-in support for $ref loading in DP-GEN workflows, enabling users to reference external JSON/YAML snippets in their configuration files. The feature is secure by default, requiring explicit opt-in via the --allow-ref CLI flag or allow_ref=True parameter in the Python API.

Changes:

  • Updated dargs dependency from >=0.4.0 to >=0.5.0 to support the allow_ref parameter
  • Added allow_ref parameter to dpgen.util.normalize() function with default value False
  • Threaded allow_ref parameter through three key workflows: dpgen run, dpgen simplify, and dpgen init_reaction
  • Added --allow-ref CLI flags to the three commands above
  • Updated documentation to explain the feature and its security implications

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pyproject.toml Bumped dargs dependency from >=0.4.0 to >=0.5.0 to support allow_ref parameter
dpgen/util.py Added allow_ref parameter to normalize() function with security documentation
dpgen/generator/run.py Updated run_iter() to accept allow_ref parameter and threaded it through gen_run()
dpgen/simplify/simplify.py Updated run_iter() to accept allow_ref parameter and threaded it through gen_simplify()
dpgen/data/reaction.py Updated gen_init_reaction() to pass allow_ref from CLI args to normalize()
dpgen/main.py Added --allow-ref CLI option to init_reaction, run, and simplify commands
doc/run/overview-of-the-run-process.md Added documentation for --allow-ref flag usage with dpgen run
doc/simplify/simplify.md Added documentation for --allow-ref flag usage with dpgen simplify
doc/init/init-reaction.md Added documentation for --allow-ref flag usage with dpgen init_reaction
README.md Added section explaining $ref support and its secure-by-default behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

njzjz-bot and others added 2 commits February 24, 2026 20:03
- use getattr(args, 'allow_ref', False) in init_reaction\n- add allow_ref docs in simplify.run_iter docstring\n\nAuthored by OpenClaw (model: gpt-5.3-codex)
@njzjz-bot
Copy link
Contributor Author

Addressed review comments:\n- now uses for compatibility with programmatic callers\n- added missing documentation in docstring\n- docs remain in pages (not README)\n\nAlso kept secure default () unchanged.

@njzjz-bot
Copy link
Contributor Author

Follow-up (clean summary): addressed review comments.

  • gen_init_reaction now uses getattr(args, "allow_ref", False) for programmatic compatibility.
  • Added missing allow_ref docs in dpgen.simplify.run_iter docstring.
  • Kept docs in doc/ pages only.
  • Secure default remains unchanged (allow_ref=False).

Local syntax check passed (python3 -m py_compile).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
dpgen/simplify/simplify.py (1)

634-639: Consider using getattr for allow_ref for consistency with gen_init_reaction.

gen_init_reaction in dpgen/data/reaction.py (line 215) defensively uses getattr(args, "allow_ref", False), while gen_simplify accesses args.allow_ref directly. If gen_simplify is ever invoked programmatically with a legacy args object, this will raise AttributeError.

♻️ Proposed fix for consistency
-        run_iter(args.PARAM, args.MACHINE, allow_ref=args.allow_ref)
+        run_iter(args.PARAM, args.MACHINE, allow_ref=getattr(args, "allow_ref", False))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dpgen/simplify/simplify.py` around lines 634 - 639, gen_simplify currently
accesses args.allow_ref directly which can raise AttributeError for legacy args
objects; change the call to run_iter in gen_simplify to pass
allow_ref=getattr(args, "allow_ref", False) (mirror the defensive pattern used
in gen_init_reaction) and ensure any logging or debug checks still use
args.debug via getattr if needed to maintain consistency and robustness when
args lacks that attribute.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@dpgen/simplify/simplify.py`:
- Around line 634-639: gen_simplify currently accesses args.allow_ref directly
which can raise AttributeError for legacy args objects; change the call to
run_iter in gen_simplify to pass allow_ref=getattr(args, "allow_ref", False)
(mirror the defensive pattern used in gen_init_reaction) and ensure any logging
or debug checks still use args.debug via getattr if needed to maintain
consistency and robustness when args lacks that attribute.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8aa1294 and 25c7af6.

📒 Files selected for processing (6)
  • doc/init/init-reaction.md
  • doc/run/overview-of-the-run-process.md
  • doc/simplify/simplify.md
  • dpgen/data/reaction.py
  • dpgen/simplify/simplify.py
  • dpgen/util.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • dpgen/util.py

- guard allow_ref for programmatic callers in init_reaction\n- complete docstrings for allow_ref path\n- add CLI test assertions for --allow-ref\n\nAuthored by OpenClaw (model: gpt-5.3-codex)
@njzjz-bot
Copy link
Contributor Author

Addressed review comments in latest commits:\n\n- made robust for programmatic callers via \n- completed docstrings for path\n- added CLI coverage assertions for in \n\nAlso removed README changes; docs are in only.

@njzjz-bot njzjz-bot changed the title feat: support opt-in $ref loading in workflows (secure by default) feat: support opt-in $ref loading in workflows Feb 24, 2026
Keep documentation updates in doc/ only.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_cli.py`:
- Around line 16-18: Replace direct "dpgen" subprocess invocations with the
interpreter module form and suppress Ruff S603: import sys at the top of the
test file, change sp.check_output(["dpgen", ...]) to
sp.check_output([sys.executable, "-m", "dpgen", ...]) for both the initial dpgen
-h call and the subcommand checks, and add a trailing "# noqa: S603" to those
sp.check_output lines to satisfy the linter while avoiding a partial executable
path; use the existing sp.check_output symbol and sys.executable -m dpgen in the
updated calls.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25c7af6 and d4d044b.

📒 Files selected for processing (1)
  • tests/test_cli.py

Use explicit args.allow_ref consistent with CLI contract.\n\nAuthored by OpenClaw (model: gpt-5.3-codex)
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
dpgen/data/reaction.py (1)

215-215: ⚠️ Potential issue | 🟠 Major

args.allow_ref direct access re-introduces the AttributeError risk for programmatic callers.

A getattr fallback was added to address this in a prior commit, then removed by "remove getattr fallback for allow_ref". Any programmatic caller that constructs an args-like object without an allow_ref attribute (e.g., unit tests, legacy scripts, or a bare argparse.Namespace()) will now hit an AttributeError at this line.

If the intent is to enforce that all callers must provide allow_ref explicitly, the function signature should be updated to accept it as a direct parameter instead of relying on duck-typing of args. Otherwise, restore the fallback:

🛡️ Option A – restore getattr fallback (minimal change, backward compatible)
-    jdata = normalize(jdata_arginfo, jdata, allow_ref=args.allow_ref)
+    allow_ref = getattr(args, "allow_ref", False)
+    jdata = normalize(jdata_arginfo, jdata, allow_ref=allow_ref)
♻️ Option B – accept allow_ref as an explicit parameter (cleaner API)
-def gen_init_reaction(args):
+def gen_init_reaction(args, allow_ref: bool = False):
     jdata = load_file(args.PARAM)
     mdata = load_file(args.MACHINE)

     jdata_arginfo = init_reaction_jdata_arginfo()
-    jdata = normalize(jdata_arginfo, jdata, allow_ref=args.allow_ref)
+    jdata = normalize(jdata_arginfo, jdata, allow_ref=allow_ref)

CLI call site would then pass allow_ref=args.allow_ref explicitly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dpgen/data/reaction.py` at line 215, The call uses args.allow_ref directly
which can raise AttributeError for programmatic callers; either restore the safe
fallback by replacing args.allow_ref with getattr(args, "allow_ref", False) when
calling normalize(jdata_arginfo, jdata, allow_ref=...), or change the
surrounding function signature to accept an explicit allow_ref parameter and
update the call to normalize(jdata_arginfo, jdata, allow_ref=allow_ref) (and
update CLI call sites to pass args.allow_ref). Target the call site with the
normalize invocation and the surrounding function parameters to implement one of
these two fixes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@dpgen/data/reaction.py`:
- Line 215: The call uses args.allow_ref directly which can raise AttributeError
for programmatic callers; either restore the safe fallback by replacing
args.allow_ref with getattr(args, "allow_ref", False) when calling
normalize(jdata_arginfo, jdata, allow_ref=...), or change the surrounding
function signature to accept an explicit allow_ref parameter and update the call
to normalize(jdata_arginfo, jdata, allow_ref=allow_ref) (and update CLI call
sites to pass args.allow_ref). Target the call site with the normalize
invocation and the surrounding function parameters to implement one of these two
fixes.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4d044b and bbabb98.

📒 Files selected for processing (1)
  • dpgen/data/reaction.py

Authored by OpenClaw (model: gpt-5.3-codex)
Use sys.executable -m dpgen.main for lint-safe invocation and working module path in CI.\n\nAuthored by OpenClaw (model: gpt-5.3-codex)
@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 38.46154% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.79%. Comparing base (663d8d6) to head (f259aa7).

Files with missing lines Patch % Lines
dpgen/main.py 0.00% 3 Missing ⚠️
dpgen/generator/run.py 33.33% 2 Missing ⚠️
dpgen/simplify/simplify.py 33.33% 2 Missing ⚠️
dpgen/data/reaction.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #1867      +/-   ##
==========================================
- Coverage   49.80%   49.79%   -0.01%     
==========================================
  Files          83       83              
  Lines       14988    14991       +3     
==========================================
  Hits         7465     7465              
- Misses       7523     7526       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants