Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 0 additions & 104 deletions .github/workflows/codeql.yml

This file was deleted.

17 changes: 7 additions & 10 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,41 +7,38 @@

jobs:
linters:
runs-on: ubuntu-20.04
runs-on: ubuntu-24.04

steps:
- name: Check out Git repository
uses: actions/checkout@v4

- name: Install ClangFormat
run: sudo apt-get install -y clang-format

- name: Run git-clang-format
run: git clang-format --style=file --diff

- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: 3.8
python-version: '3.12'

- name: Install Python dependencies
run: python3.8 -m pip install black
run: pip install black

- name: Run black
run: python3.8 -m black --check --config pyproject.toml .
- name: Run lint
run: bash tools/lint.sh dry

spelling:

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}
runs-on: ubuntu-20.04
runs-on: ubuntu-24.04

steps:
- name: Check out Git repository
uses: actions/checkout@v4

- name: Download misspell
run: |
curl -L https://github.com/client9/misspell/releases/download/v0.3.4/misspell_0.3.4_linux_64bit.tar.gz -o /tmp/misspell_0.3.4_linux_64bit.tar.gz
tar -xzf /tmp/misspell_0.3.4_linux_64bit.tar.gz -C .

- name: Check spelling
run: |
./misspell -error .github ark examples python scripts

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}
64 changes: 0 additions & 64 deletions .github/workflows/ut-rocm.yml

This file was deleted.

57 changes: 41 additions & 16 deletions .github/workflows/ut-cuda.yml → .github/workflows/ut.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: "Unit Tests (CUDA)"
name: "Unit Tests"

on:
push:
Expand All @@ -8,49 +8,68 @@ on:
branches:
- main
types: [opened, synchronize, reopened, ready_for_review]
schedule:
- cron: '42 20 * * 4'

jobs:
UnitTest:
runs-on: [ self-hosted, A100 ]
if: github.event_name != 'schedule'
defaults:
run:
shell: bash
timeout-minutes: 30
timeout-minutes: 60
permissions:
actions: read
contents: read
strategy:
fail-fast: false
matrix:
cuda: [ cuda11.8, cuda12.2 ]
include:
- platform: cuda
runner: [self-hosted, CUDA]
container: nvcr.io/nvidia/pytorch:26.03-py3
container_options: --privileged --ipc=host --gpus=all --ulimit memlock=-1:-1
- platform: rocm
runner: [self-hosted, ROCM]
container: rocm/pytorch:rocm6.2.3_ubuntu22.04_py3.10_pytorch_release_2.3.0
container_options: --privileged --ipc=host --security-opt seccomp=unconfined --group-add video --ulimit memlock=-1:-1
runs-on: ${{ matrix.runner }}
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.cuda }}
group: ${{ github.workflow }}-${{ matrix.platform }}-${{ github.ref }}
cancel-in-progress: true
container:
image: "ghcr.io/microsoft/ark/ark:base-dev-${{ matrix.cuda }}"
options: --privileged --ipc=host --gpus=all --ulimit memlock=-1:-1
image: ${{ matrix.container }}
options: ${{ matrix.container_options }}

steps:
- name: Checkout
uses: actions/checkout@v4

- name: LockGPUClock
run: |
sudo nvidia-smi -pm 1
for i in $(seq 0 $(( $(nvidia-smi -L | wc -l) - 1 ))); do
sudo nvidia-smi -ac $(nvidia-smi --query-gpu=clocks.max.memory,clocks.max.sm --format=csv,noheader,nounits -i $i | sed 's/\ //') -i $i
done

- name: Dubious ownership exception
run: |
git config --global --add safe.directory /__w/ark/ark

- name: Build
run: |
apt-get update && apt-get install -y lcov
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Debug ..
CMAKE_ARGS="-DCMAKE_BUILD_TYPE=Debug"
if [ "${{ matrix.platform }}" = "rocm" ]; then
CMAKE_ARGS="$CMAKE_ARGS -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc"
fi
cmake $CMAKE_ARGS ..
make -j ut ark_py

- name: Run C++ UT
if: github.event_name != 'schedule'
run: |
cd build
ARK_ROOT=$PWD ctest --stop-on-failure --verbose --schedule-random

- name: C++ Coverage
if: github.event_name != 'schedule'
run: |
cd build
lcov --capture --directory . --output-file cpp_coverage.info
lcov --remove cpp_coverage.info \
'/usr/*' \
Expand All @@ -65,19 +84,22 @@ jobs:
lcov --list cpp_coverage.info

- name: Install Python Dependencies
if: github.event_name != 'schedule'
run: |
python3 -m pip install -r requirements.txt

- name: Run Python UT
if: github.event_name != 'schedule'
run: |
cd build
PYTHONPATH=$PWD/python ARK_ROOT=$PWD python3 -m pytest \
--cov=python/ark \
--cov-report lcov:py_coverage.info \
--verbose \
../python/unittest/test.py
../python/unittest/

- name: Report Coverage
if: github.event_name != 'schedule'
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
run: |
Expand All @@ -86,9 +108,12 @@ jobs:
bash <(curl -s https://codecov.io/bash) -f coverage.info || echo "Codecov did not collect coverage reports"

- name: Install Python
if: github.event_name != 'schedule'
run: |
python3 -m pip install .

- name: Run Tutorials
if: github.event_name != 'schedule'
run: |
python3 ./examples/tutorial/quickstart_tutorial.py

22 changes: 18 additions & 4 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,27 @@ if(ARK_USE_CUDA)
endif()

# Set CUDA architectures
if(CUDAToolkit_VERSION_MAJOR GREATER_EQUAL 11)
if(CUDAToolkit_VERSION_MAJOR GREATER_EQUAL 13)
# CUDA 13+ dropped sm_60 and sm_70
set(CMAKE_CUDA_ARCHITECTURES 80 90)
elseif(CUDAToolkit_VERSION_MAJOR GREATER_EQUAL 12)
set(CMAKE_CUDA_ARCHITECTURES 60 70 80 90)
elseif(CUDAToolkit_VERSION_MAJOR GREATER_EQUAL 11)
set(CMAKE_CUDA_ARCHITECTURES 60 70 80)
endif()

# Hopper architecture
if(CUDAToolkit_VERSION_MAJOR GREATER_EQUAL 12)
set(CMAKE_CUDA_ARCHITECTURES ${CMAKE_CUDA_ARCHITECTURES} 90)
# CUDA 13+ moved CCCL headers into a cccl/ subdirectory.
# Add it to the include path so third-party code (e.g. MSCCL++)
# that includes <cuda/atomic> can still find the headers.
if(CUDAToolkit_VERSION_MAJOR GREATER_EQUAL 13)
list(GET CUDAToolkit_INCLUDE_DIRS 0 _CUDA_INCLUDE_FIRST)
set(CCCL_INCLUDE_DIR "${_CUDA_INCLUDE_FIRST}/cccl")
if(EXISTS "${CCCL_INCLUDE_DIR}")
include_directories(SYSTEM "${CCCL_INCLUDE_DIR}")
message(STATUS "CUDA 13+: added CCCL include dir ${CCCL_INCLUDE_DIR}")
else()
message(WARNING "CUDA 13+: CCCL include dir not found at ${CCCL_INCLUDE_DIR}. Build may fail.")
endif()
endif()
else() # ARK_USE_ROCM
set(CMAKE_HIP_STANDARD 17)
Expand Down
6 changes: 2 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,13 @@ A GPU-driven system framework for scalable AI applications.

[![Latest Release](https://img.shields.io/github/release/microsoft/ark.svg)](https://github.com/microsoft/ark/releases/latest)
[![License](https://img.shields.io/github/license/microsoft/ark.svg)](LICENSE)
[![CodeQL](https://github.com/microsoft/ark/actions/workflows/codeql.yml/badge.svg)](https://github.com/microsoft/ark/actions/workflows/codeql.yml)
[![Unit Tests](https://github.com/microsoft/ark/actions/workflows/ut.yml/badge.svg)](https://github.com/microsoft/ark/actions/workflows/ut.yml)
[![codecov](https://codecov.io/gh/microsoft/ark/graph/badge.svg?token=XmMOK85GOB)](https://codecov.io/gh/microsoft/ark)

| Pipelines | Build Status |
|-------------------|-------------------|
| Unit Tests (CUDA) | [![Build Status](https://dev.azure.com/binyli/HPC/_apis/build/status%2Fark-test?branchName=main)](https://dev.azure.com/binyli/HPC/_build/latest?definitionId=6&branchName=main) |
| Unit Tests (ROCm) | [![Unit Tests (ROCm)](https://github.com/microsoft/ark/actions/workflows/ut-rocm.yml/badge.svg?branch=main)](https://github.com/microsoft/ark/actions/workflows/ut-rocm.yml) |

*NOTE (Nov 2023): ROCm unit tests will be replaced into an Azure pipeline in the future.*
| Unit Tests | [![Unit Tests](https://github.com/microsoft/ark/actions/workflows/ut.yml/badge.svg?branch=main)](https://github.com/microsoft/ark/actions/workflows/ut.yml) |

See [Quick Start](docs/quickstart.md) to quickly get started.

Expand Down
4 changes: 1 addition & 3 deletions ark/api/context.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,6 @@ void Context::set(const std::string& key, const std::string& value,
this->impl_->set(key, value_json, type);
}

std::string Context::dump() const {
return this->impl_->dump().dump();
}
std::string Context::dump() const { return this->impl_->dump().dump(); }

} // namespace ark
Loading
Loading