Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch windows workflow to run on Azure hosted ARC runner (#18859) #18866

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Eliasj42
Copy link
Contributor

Progress on #18813

---------

Signed-off-by: Elias Joseph <[email protected]>
Co-authored-by: Elias Joseph <[email protected]>
@Eliasj42 Eliasj42 requested review from ScottTodd and removed request for ScottTodd October 21, 2024 22:26
@ScottTodd ScottTodd added infrastructure Relating to build systems, CI, or testing platform/windows 🚪 Windows-specific build, execution, benchmarking, and deployment labels Oct 21, 2024
Comment on lines 29 to 34
- name: "Install choco reqs"
run: |
Get-PSDrive | Format-Table Name, Used, Free
Test-Path C:\mnt\azure
Get-ChildItem -Path C:\mnt\azure
fsutil volume diskfree C:\mnt\azure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All generic (non-project-specific) setup should be moved to the Dockerfile. I'm less concerned about steps taking up to 1 minute, but the 15 minute MSVC install will need to be moved to ahead of time. Our target for presubmit should be 10-20 minutes per run, including the source build (warm cache). We can budget for 5 minutes or so of overhead (git checkout, installing deps, etc.).

defaults:
run:
shell: bash
runs-on: arc-runner-set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many cores is this configured with right now? https://github.com/iree-org/iree/actions/runs/11448943622/job/31853515374 is making slow progress.

@saienduri said that we might only have quota for small runners, but we need to make a case for larger runners based on utilization or something? If so, that's silly - we know that we need 32 (or even 64/96) core runners and we already have data to back that up. We could switch the current nightly build to these new runners if it is necessary to make a case for quota increases, but we're currently paying $0 for free runners that take 4h30m. Paying for runners that are similarly slow doesn't make sense to me :P

cc @amd-chrissosa

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this cluster is using instances with 8 cpu cores

Comment on lines 56 to 77
uses: ilammy/[email protected]
- name: "Installing iree reqs"
run: |
choco install cmake
choco install ninja
choco install Ninja
Import-Module $env:ChocolateyInstall\helpers\chocolateyProfile.psm1
refreshenv
- name: "check space"
run: |
Get-PSDrive | Format-Table Name, Used, Free
Test-Path C:\mnt\azure
- name: "Install Bash"
run: |
$gitPath = "C:\Program Files\Git\bin"
$env:PATH += ";C:\Program Files\Git\bin;C:\Program Files\Git\usr\bin"
[System.Environment]::GetEnvironmentVariable("PATH", "Process")
Test-Path -Path "C:/Program Files/Git/bin/bash.exe"
- name: Add Bash to PATH
run: |
echo "Adding Bash to PATH"
echo "C:\Program Files\Git\bin" >> $Env:GITHUB_PATH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(All of these steps should move to the Dockerfile too. Some amount of workflow scripting is okay if it unblocks these new runners faster... if the overhead is low enough)

Comment on lines +82 to +83
- name: "Clean up build dir"
run: Remove-Item -Path "C:\mnt\azure\build-windows" -Recurse -Force
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? The GitHub runner should do some automatic cleanup already. We can tolerate some scripting like this if it unblocks other work, but this does increase workflow complexity. This also fails to clean up the directory if the workflow fails or is cancelled (it would need a if(always)... but even that can fail to run if the runner goes offline before running that step).

Maybe keep the build directory in the github workspace? Why did you put things under C:\mnt\azure\?

Copy link
Collaborator

@saienduri saienduri Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, the default storage drive is hard set to less space than needed and flipping different switches/requesting more memory wasn't able to resolve it for the pod instances. So, we decided to use a storage mount that we have full control over

Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this is great progress, thanks so much! Getting the runners set up with the general set of software installed is a huge milestone.

Comment on lines +78 to +79
- name: "Building Iree"
run: bash ./build_tools/cmake/build_all.sh "/c/mnt/azure/build-windows"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build error: https://github.com/iree-org/iree/actions/runs/11448943622/job/31853515374#step:13:8554

[7708/8119] Linking CXX shared library tools\IREECompiler.dll
FAILED: tools/IREECompiler.dll lib/IREECompiler.lib 
C:\Windows\system32\cmd.exe /C "cd . && "C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\CMake\bin\cmake.exe" -E vs_link_dll --intdir=compiler\src\iree\compiler\API\CMakeFiles\iree_compiler_API_SharedImpl.dir --rc="C:\Program Files (x86)\Windows Kits\10\bin\10.0.22621.0\x64\rc.exe" --mt="C:\Program Files (x86)\Windows Kits\10\bin\10.0.22621.0\x64\mt.exe" --manifests  -- "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\bin\Hostx64\x64\link.exe" /nologo @CMakeFiles\iree_compiler_API_SharedImpl.rsp  /out:tools\IREECompiler.dll /implib:lib\IREECompiler.lib /pdb:tools\IREECompiler.pdb /dll /version:0.0 /machine:x64 -fuse-ld=lld /debug /INCREMENTAL  -natvis:C:/home/runner/_work/iree/iree/runtime/iree.natvis && cd ."
LINK Pass 1: command "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\bin\Hostx64\x64\link.exe /nologo @CMakeFiles\iree_compiler_API_SharedImpl.rsp /out:tools\IREECompiler.dll /implib:lib\IREECompiler.lib /pdb:tools\IREECompiler.pdb /dll /version:0.0 /machine:x64 -fuse-ld=lld /debug /INCREMENTAL -natvis:C:/home/runner/_work/iree/iree/runtime/iree.natvis /MANIFEST /MANIFESTFILE:compiler\src\iree\compiler\API\CMakeFiles\iree_compiler_API_SharedImpl.dir/intermediate.manifest compiler\src\iree\compiler\API\CMakeFiles\iree_compiler_API_SharedImpl.dir/manifest.res" failed (exit code 1140) with the following output:
LINK : warning LNK4044: unrecognized option '/fuse-ld=lld'; ignored

LINK : warning LNK4044: unrecognized option '/fuse-ld=lld'; ignored

   Creating library lib\IREECompiler.lib and object lib\IREECompiler.exp

LINK : fatal error LNK1318: Unexpected PDB error; LIMIT (12) 'C:\mnt\azure\build-windows\tools\IREECompiler.pdb'

Hmmm... could be multiple reasons for that. First thing I'd suspect is out of disk / RAM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, sorry I've been neglecting this page while troubleshooting the above issue. I fixed it, the only remaining issue is a certification error

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)

https://github.com/iree-org/iree/actions/runs/11469526800/job/31916920407

For some reason when I install msvc in the Docker container, the workflow can't find it.

This reverts commit 06aaf4a.
@Eliasj42 Eliasj42 force-pushed the windows-runner-staging branch 2 times, most recently from 44955f9 to 642a9e8 Compare October 22, 2024 16:27
@ScottTodd
Copy link
Member

If iterating on a single workflow via a pull request, consider using skip-ci or the other git trailers to avoid running unrelated workflows: https://iree.dev/developers/general/contributing/#ci-behavior-manipulation . You can also use workflow_dispatch to run just when you want and avoid sending notifications for every test you run

@Eliasj42 Eliasj42 force-pushed the windows-runner-staging branch 2 times, most recently from 1b535fe to 4ea1578 Compare October 22, 2024 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Relating to build systems, CI, or testing platform/windows 🚪 Windows-specific build, execution, benchmarking, and deployment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants