Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rush] Design proposal: balance the minimum number of tasks executed with the maximum level of parallel execution #5119

Open
L-Qun opened this issue Feb 20, 2025 · 3 comments

Comments

@L-Qun
Copy link
Contributor

L-Qun commented Feb 20, 2025

Summary

I believe Rush's task scheduling capabilities are excellent, but there are still some flaws that I think are intolerable. Specifically, it’s about how to balance executing the minimum number of tasks with achieving the maximum level of parallel execution

Details

Let's say our project's dependency relationships are as follows:

Image

Usually, on CI we need to run the build, lint, and test tasks. If we want to execute these three tasks in maximum parallel, we need to define them in command-line.json as follows:

{
  "commands": [
    {
      "name": "test",
      "commandKind": "phased",
      "phases": ["_phased:build", "_phased:lint", "_phased:test"],
      // ...
    }
  ]
}

On CI, we will run the command:

rush test --from git:origin/master

Thus, for the above project, when project B changes, we need to run the above command on projects A, B, E, C, D, and G.

Image

In the end, we need to execute 6 * 3 = 18 tasks. However, in this case, we don't need to run lint and test for A and E, right?
At this point, we can split the above command-line.json into:

{
  "commands": [
    {
      "name": "build",
      "commandKind": "phased",
      "phases": ["build"],
      // ...
    },
    {
      "name": "test",
      "commandKind": "phased",
      "phases": ["_phased:lint", "_phased:test"],
      // ...
    }
  ]
}

On CI, we will execute the following commands separately:

1. rush build --from git:origin/master
2. rush test --impacted-by git:origin/master

At this point, we only need to execute 6 + 2 * 4 = 14 tasks, which means we don't need to run lint and test for A and E.
However, splitting the entire execution process into two separate runs means we cannot maximize the parallel execution of all tasks.

Therefore, we need a way to both execute tasks in parallel and minimize the number of tasks executed.

So, back to the beginning, let's assume the dependencies of the _phase script are as follows:

"phases": [
  {
    "name": "_phase:lint",
    "dependencies": {
      "self": ["_phase:build"]
    },
    // ...
  },
  {
    "name": "_phase:test",
    "dependencies": {
      "self": ["_phase:build"]
    },
    // ...
  }
]

This means we need to execute the build before lint and test, so the command can now be simplified to:

rush test --impacted-by git:origin/master (--phase-safe | --with-phase-deps | --include-phase-deps)

In the background, Rush will execute --impacted-by in a safe manner, meaning it will execute the build of A and E as shown in the diagram above.

@L-Qun
Copy link
Contributor Author

L-Qun commented Feb 20, 2025

@dmichon-msft

@dmichon-msft
Copy link
Contributor

Generally speaking, the model to date has been that we expect the unchanged tasks to replay from the build cache and therefore have a minimal impact on overall runtime.

@L-Qun
Copy link
Contributor Author

L-Qun commented Feb 21, 2025

Generally speaking, the model to date has been that we expect the unchanged tasks to replay from the build cache and therefore have a minimal impact on overall runtime.

Yes, caching is a huge optimization technique, and at the same time, we also need a smarter scheduling strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Needs triage
Development

No branches or pull requests

2 participants