Skip to content

GOBBLIN-2263: Refresh DagAction latency anchor#4198

Merged
Blazer-007 merged 1 commit into
apache:masterfrom
agam-99:agsingh/fix-dag-action-latency-anchor
Jun 1, 2026
Merged

GOBBLIN-2263: Refresh DagAction latency anchor#4198
Blazer-007 merged 1 commit into
apache:masterfrom
agam-99:agsingh/fix-dag-action-latency-anchor

Conversation

@agam-99

@agam-99 agam-99 commented May 27, 2026

Copy link
Copy Markdown
Contributor

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):

GaaS launch latency is emitted when each job is submitted to the executor. The metric reads its anchor from dagAction.launch.storeInsertTimeMillis in the submitted JobSpec.

For multi-hop DAGs, downstream REEVALUATE DagActions already have fresh store-insert timestamps, but the downstream JobSpecs still carried the original flow LAUNCH timestamp from compilation time. As a result, downstream jobs reported job submission time - original flow launch time, which made normal long-running DAG progress look like multi-minute or multi-hour launch latency.

This PR refreshes dagAction.launch.storeInsertTimeMillis on the JobSpec immediately before each job submission using the current DagTask lease params. That makes the metric measure job submission time - current DagAction insert time for both initial LAUNCH jobs and downstream REEVALUATE/RESUME jobs.

Implementation details:

  • DagProcUtils replaces stale dagAction.launch.storeInsertTimeMillis in both JobSpec config and configAsProperties before producer.addSpec(jobSpec).
  • LaunchDagProc, ReevaluateDagProc, and ResumeDagProc pass their current LeaseParams into the submission path.
  • Existing overloads remain for callers that do not have DagAction lease metadata.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    • Added DagProcUtilsTest.testSubmitJobToExecutorReplacesStaleStoreInsertTimeMillis to verify stale anchors are replaced in both JobSpec config and properties before submission.
    • Added ReevaluateDagProcTest.testCurrentJobToRunReplacesStaleStoreInsertTimeMillis to cover the downstream REEVALUATE path.
    • JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_282-msft.jdk/Contents/Home" ./gradlew :gobblin-service:test --tests "org.apache.gobblin.service.modules.orchestration.proc.DagProcUtilsTest"
    • git diff --check origin/master...HEAD
    • Previously attempted JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_282-msft.jdk/Contents/Home" ./gradlew :gobblin-service:test --tests "org.apache.gobblin.service.modules.orchestration.proc.DagProcUtilsTest" --tests "org.apache.gobblin.service.modules.orchestration.proc.ReevaluateDagProcTest" --tests "org.apache.gobblin.service.modules.orchestration.proc.LaunchDagProcTest"; the suite compiled but failed during TestMetastoreDatabaseFactory setup because Docker/Testcontainers is not available in this environment.
    • mint build not applicable: this checkout is not a LinkedIn multiproduct directory.

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Made with Cursor

@agam-99 agam-99 force-pushed the agsingh/fix-dag-action-latency-anchor branch from 1f9e13a to 5bc0ce0 Compare June 1, 2026 06:00
@Blazer-007 Blazer-007 requested a review from Copilot June 1, 2026 07:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes inflated “launch latency” metrics for downstream jobs in multi-hop DAGs by refreshing the dagAction.launch.storeInsertTimeMillis anchor on each JobSpec immediately before submission, using the current DagTask lease metadata.

Changes:

  • Add DagProcUtils.submitNextNodes(...) / submitJobToExecutor(...) overloads that accept LeaseParams, and stamp JobSpec config + configAsProperties before producer.addSpec(jobSpec).
  • Update LaunchDagProc, ReevaluateDagProc, and ResumeDagProc to pass their current LeaseParams into the submission path.
  • Add unit tests verifying stale timestamps are replaced in both the JobSpec config and properties during submission.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/proc/DagProcUtils.java Adds lease-aware submission overloads and stamps JobSpec with current store-insert time prior to executor submission.
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/proc/LaunchDagProc.java Passes current lease params into submitNextNodes so initial job submissions use the fresh anchor.
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/proc/ReevaluateDagProc.java Passes current lease params into job (re)submission and downstream node submission to keep anchors fresh.
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/proc/ResumeDagProc.java Passes current lease params into submitNextNodes when resuming a failed DAG.
gobblin-service/src/test/java/org/apache/gobblin/service/modules/orchestration/proc/DagProcUtilsTest.java Adds a test asserting stale store-insert time is replaced in config + properties before submission.
gobblin-service/src/test/java/org/apache/gobblin/service/modules/orchestration/proc/ReevaluateDagProcTest.java Refactors task construction and adds REEVALUATE-path coverage for replacing stale store-insert timestamps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 51.23%. Comparing base (26406a8) to head (5bc0ce0).
⚠️ Report is 12 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #4198      +/-   ##
============================================
+ Coverage     43.26%   51.23%   +7.96%     
- Complexity     2583     6235    +3652     
============================================
  Files           516     1102     +586     
  Lines         22087    43000   +20913     
  Branches       2505     4841    +2336     
============================================
+ Hits           9557    22032   +12475     
- Misses        11566    19132    +7566     
- Partials        964     1836     +872     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Stamp each submitted job with the current DagAction store insert time
so the launch latency metric measures per-action queueing rather than
elapsed DAG runtime.

Co-authored-by: Cursor <cursoragent@cursor.com>
@agam-99 agam-99 force-pushed the agsingh/fix-dag-action-latency-anchor branch from 5bc0ce0 to fd239ac Compare June 1, 2026 07:53
@Blazer-007 Blazer-007 merged commit 87834d5 into apache:master Jun 1, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants