-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch windows workflow to run on Azure hosted ARC runner (#18859) #18866
base: main
Are you sure you want to change the base?
Conversation
Progress on #18813 --------- Signed-off-by: Elias Joseph <[email protected]> Co-authored-by: Elias Joseph <[email protected]>
- name: "Install choco reqs" | ||
run: | | ||
Get-PSDrive | Format-Table Name, Used, Free | ||
Test-Path C:\mnt\azure | ||
Get-ChildItem -Path C:\mnt\azure | ||
fsutil volume diskfree C:\mnt\azure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All generic (non-project-specific) setup should be moved to the Dockerfile. I'm less concerned about steps taking up to 1 minute, but the 15 minute MSVC install will need to be moved to ahead of time. Our target for presubmit should be 10-20 minutes per run, including the source build (warm cache). We can budget for 5 minutes or so of overhead (git checkout, installing deps, etc.).
defaults: | ||
run: | ||
shell: bash | ||
runs-on: arc-runner-set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many cores is this configured with right now? https://github.com/iree-org/iree/actions/runs/11448943622/job/31853515374 is making slow progress.
@saienduri said that we might only have quota for small runners, but we need to make a case for larger runners based on utilization or something? If so, that's silly - we know that we need 32 (or even 64/96) core runners and we already have data to back that up. We could switch the current nightly build to these new runners if it is necessary to make a case for quota increases, but we're currently paying $0 for free runners that take 4h30m. Paying for runners that are similarly slow doesn't make sense to me :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this cluster is using instances with 8 cpu cores
uses: ilammy/[email protected] | ||
- name: "Installing iree reqs" | ||
run: | | ||
choco install cmake | ||
choco install ninja | ||
choco install Ninja | ||
Import-Module $env:ChocolateyInstall\helpers\chocolateyProfile.psm1 | ||
refreshenv | ||
- name: "check space" | ||
run: | | ||
Get-PSDrive | Format-Table Name, Used, Free | ||
Test-Path C:\mnt\azure | ||
- name: "Install Bash" | ||
run: | | ||
$gitPath = "C:\Program Files\Git\bin" | ||
$env:PATH += ";C:\Program Files\Git\bin;C:\Program Files\Git\usr\bin" | ||
[System.Environment]::GetEnvironmentVariable("PATH", "Process") | ||
Test-Path -Path "C:/Program Files/Git/bin/bash.exe" | ||
- name: Add Bash to PATH | ||
run: | | ||
echo "Adding Bash to PATH" | ||
echo "C:\Program Files\Git\bin" >> $Env:GITHUB_PATH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(All of these steps should move to the Dockerfile too. Some amount of workflow scripting is okay if it unblocks these new runners faster... if the overhead is low enough)
- name: "Clean up build dir" | ||
run: Remove-Item -Path "C:\mnt\azure\build-windows" -Recurse -Force |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? The GitHub runner should do some automatic cleanup already. We can tolerate some scripting like this if it unblocks other work, but this does increase workflow complexity. This also fails to clean up the directory if the workflow fails or is cancelled (it would need a if(always)
... but even that can fail to run if the runner goes offline before running that step).
Maybe keep the build directory in the github workspace? Why did you put things under C:\mnt\azure\
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, the default storage drive is hard set to less space than needed and flipping different switches/requesting more memory wasn't able to resolve it for the pod instances. So, we decided to use a storage mount that we have full control over
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW this is great progress, thanks so much! Getting the runners set up with the general set of software installed is a huge milestone.
- name: "Building Iree" | ||
run: bash ./build_tools/cmake/build_all.sh "/c/mnt/azure/build-windows" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build error: https://github.com/iree-org/iree/actions/runs/11448943622/job/31853515374#step:13:8554
[7708/8119] Linking CXX shared library tools\IREECompiler.dll
FAILED: tools/IREECompiler.dll lib/IREECompiler.lib
C:\Windows\system32\cmd.exe /C "cd . && "C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\CMake\bin\cmake.exe" -E vs_link_dll --intdir=compiler\src\iree\compiler\API\CMakeFiles\iree_compiler_API_SharedImpl.dir --rc="C:\Program Files (x86)\Windows Kits\10\bin\10.0.22621.0\x64\rc.exe" --mt="C:\Program Files (x86)\Windows Kits\10\bin\10.0.22621.0\x64\mt.exe" --manifests -- "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\bin\Hostx64\x64\link.exe" /nologo @CMakeFiles\iree_compiler_API_SharedImpl.rsp /out:tools\IREECompiler.dll /implib:lib\IREECompiler.lib /pdb:tools\IREECompiler.pdb /dll /version:0.0 /machine:x64 -fuse-ld=lld /debug /INCREMENTAL -natvis:C:/home/runner/_work/iree/iree/runtime/iree.natvis && cd ."
LINK Pass 1: command "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\bin\Hostx64\x64\link.exe /nologo @CMakeFiles\iree_compiler_API_SharedImpl.rsp /out:tools\IREECompiler.dll /implib:lib\IREECompiler.lib /pdb:tools\IREECompiler.pdb /dll /version:0.0 /machine:x64 -fuse-ld=lld /debug /INCREMENTAL -natvis:C:/home/runner/_work/iree/iree/runtime/iree.natvis /MANIFEST /MANIFESTFILE:compiler\src\iree\compiler\API\CMakeFiles\iree_compiler_API_SharedImpl.dir/intermediate.manifest compiler\src\iree\compiler\API\CMakeFiles\iree_compiler_API_SharedImpl.dir/manifest.res" failed (exit code 1140) with the following output:
LINK : warning LNK4044: unrecognized option '/fuse-ld=lld'; ignored
LINK : warning LNK4044: unrecognized option '/fuse-ld=lld'; ignored
Creating library lib\IREECompiler.lib and object lib\IREECompiler.exp
LINK : fatal error LNK1318: Unexpected PDB error; LIMIT (12) 'C:\mnt\azure\build-windows\tools\IREECompiler.pdb'
Hmmm... could be multiple reasons for that. First thing I'd suspect is out of disk / RAM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, sorry I've been neglecting this page while troubleshooting the above issue. I fixed it, the only remaining issue is a certification error
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)
https://github.com/iree-org/iree/actions/runs/11469526800/job/31916920407
For some reason when I install msvc in the Docker container, the workflow can't find it.
This reverts commit 06aaf4a.
44955f9
to
642a9e8
Compare
If iterating on a single workflow via a pull request, consider using |
1b535fe
to
4ea1578
Compare
4ea1578
to
422bd21
Compare
Progress on #18813
workflow running: https://github.com/iree-org/iree/actions/runs/11448943622/job/31853515374