Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch windows workflow to run on Azure hosted ARC runner (#18859) #18866

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 37 additions & 11 deletions .github/workflows/ci_windows_x64_msvc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,31 +22,57 @@ concurrency:

jobs:
windows_x64_msvc:
runs-on: windows-2022
defaults:
run:
shell: bash
runs-on: arc-runner-set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many cores is this configured with right now? https://github.com/iree-org/iree/actions/runs/11448943622/job/31853515374 is making slow progress.

@saienduri said that we might only have quota for small runners, but we need to make a case for larger runners based on utilization or something? If so, that's silly - we know that we need 32 (or even 64/96) core runners and we already have data to back that up. We could switch the current nightly build to these new runners if it is necessary to make a case for quota increases, but we're currently paying $0 for free runners that take 4h30m. Paying for runners that are similarly slow doesn't make sense to me :P

cc @amd-chrissosa

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this cluster is using instances with 8 cpu cores

env:
BUILD_DIR: build-windows
BUILD_DIR: C:\mnt\azure\build-windows
steps:
- name: "Checking out repository"
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
uses: actions/[email protected]
with:
submodules: true
- name: "Clean up build dir"
run: |
Remove-Item -Path "C:\mnt\azure\build-windows" -Recurse -Force
mkdir "C:\mnt\azure\build-windows"
- name: "Setting up Python"
uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3 # v5.1.0
uses: actions/setup-python@v4
with:
python-version: "3.10" # Needs pybind >= 2.10.1 for Python >= 3.11
- name: "Installing Python packages"
run: |
python3 -m venv .venv
.venv/Scripts/activate.bat
python3 -m pip install -r runtime/bindings/python/iree/runtime/build_requirements.txt
python3 -m pip install --upgrade certifi
- name: "Installing requirements"
run: choco install ccache --yes
- name: "Installing MSVC requirements"
run: |
choco install visualstudio2022buildtools -y
choco install visualstudio2022community --package-parameters "--add Microsoft.VisualStudio.Workload.CoreEditor --add Microsoft.VisualStudio.Workload.ManagedDesktop --add Microsoft.VisualStudio.Workload.DesktopDevelopmentWithC++" -y
choco install visualstudio2022-workload-nativedesktop -y
- name: "Configuring MSVC"
uses: ilammy/msvc-dev-cmd@0b201ec74fa43914dc39ae48a89fd1d8cb592756 # v1.13.0
- name: "Building IREE"
run: ./build_tools/cmake/build_all.sh "${BUILD_DIR}"
uses: ilammy/[email protected]
- name: "Installing iree reqs"
run: |
choco install cmake
choco install ninja
choco install Ninja
Import-Module $env:ChocolateyInstall\helpers\chocolateyProfile.psm1
refreshenv
- name: "Install Bash"
run: |
$gitPath = "C:\Program Files\Git\bin"
$env:PATH += ";C:\Program Files\Git\bin;C:\Program Files\Git\usr\bin"
[System.Environment]::GetEnvironmentVariable("PATH", "Process")
Test-Path -Path "C:/Program Files/Git/bin/bash.exe"
- name: Add Bash to PATH
run: |
echo "Adding Bash to PATH"
echo "C:\Program Files\Git\bin" >> $Env:GITHUB_PATH
- name: "Building Iree"
run: bash ./build_tools/cmake/build_all.sh "/c/mnt/azure/build-windows"
Comment on lines +73 to +74
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build error: https://github.com/iree-org/iree/actions/runs/11448943622/job/31853515374#step:13:8554

[7708/8119] Linking CXX shared library tools\IREECompiler.dll
FAILED: tools/IREECompiler.dll lib/IREECompiler.lib 
C:\Windows\system32\cmd.exe /C "cd . && "C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\CMake\bin\cmake.exe" -E vs_link_dll --intdir=compiler\src\iree\compiler\API\CMakeFiles\iree_compiler_API_SharedImpl.dir --rc="C:\Program Files (x86)\Windows Kits\10\bin\10.0.22621.0\x64\rc.exe" --mt="C:\Program Files (x86)\Windows Kits\10\bin\10.0.22621.0\x64\mt.exe" --manifests  -- "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\bin\Hostx64\x64\link.exe" /nologo @CMakeFiles\iree_compiler_API_SharedImpl.rsp  /out:tools\IREECompiler.dll /implib:lib\IREECompiler.lib /pdb:tools\IREECompiler.pdb /dll /version:0.0 /machine:x64 -fuse-ld=lld /debug /INCREMENTAL  -natvis:C:/home/runner/_work/iree/iree/runtime/iree.natvis && cd ."
LINK Pass 1: command "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\bin\Hostx64\x64\link.exe /nologo @CMakeFiles\iree_compiler_API_SharedImpl.rsp /out:tools\IREECompiler.dll /implib:lib\IREECompiler.lib /pdb:tools\IREECompiler.pdb /dll /version:0.0 /machine:x64 -fuse-ld=lld /debug /INCREMENTAL -natvis:C:/home/runner/_work/iree/iree/runtime/iree.natvis /MANIFEST /MANIFESTFILE:compiler\src\iree\compiler\API\CMakeFiles\iree_compiler_API_SharedImpl.dir/intermediate.manifest compiler\src\iree\compiler\API\CMakeFiles\iree_compiler_API_SharedImpl.dir/manifest.res" failed (exit code 1140) with the following output:
LINK : warning LNK4044: unrecognized option '/fuse-ld=lld'; ignored

LINK : warning LNK4044: unrecognized option '/fuse-ld=lld'; ignored

   Creating library lib\IREECompiler.lib and object lib\IREECompiler.exp

LINK : fatal error LNK1318: Unexpected PDB error; LIMIT (12) 'C:\mnt\azure\build-windows\tools\IREECompiler.pdb'

Hmmm... could be multiple reasons for that. First thing I'd suspect is out of disk / RAM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, sorry I've been neglecting this page while troubleshooting the above issue. I fixed it, the only remaining issue is a certification error

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)

https://github.com/iree-org/iree/actions/runs/11469526800/job/31916920407

For some reason when I install msvc in the Docker container, the workflow can't find it.

- name: "Testing IREE"
run: ./build_tools/cmake/ctest_all.sh "${BUILD_DIR}"
run: bash ./build_tools/cmake/ctest_all.sh "/c/mnt/azure/build-windows"
- name: "Clean up build dir"
run: Remove-Item -Path "C:\mnt\azure\build-windows" -Recurse -Force
Comment on lines +77 to +78
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? The GitHub runner should do some automatic cleanup already. We can tolerate some scripting like this if it unblocks other work, but this does increase workflow complexity. This also fails to clean up the directory if the workflow fails or is cancelled (it would need a if(always)... but even that can fail to run if the runner goes offline before running that step).

Maybe keep the build directory in the github workspace? Why did you put things under C:\mnt\azure\?

Copy link
Collaborator

@saienduri saienduri Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, the default storage drive is hard set to less space than needed and flipping different switches/requesting more memory wasn't able to resolve it for the pod instances. So, we decided to use a storage mount that we have full control over

Loading