Skip to content
Change the repository type filter

All

    Repositories list

    • Modified ucx library to track communications
      C
      Other
      425000Updated Oct 16, 2024Oct 16, 2024
    • Cuda
      1310Updated Jun 13, 2024Jun 13, 2024
    • Snoopie

      Public
      Multi-GPU communication profiler and visualizer
      C
      Other
      11720Updated Jun 10, 2024Jun 10, 2024
    • GPU fusion code and algorithm
      Cuda
      MIT License
      0100Updated May 24, 2024May 24, 2024
    • TypeScript
      0000Updated May 22, 2024May 22, 2024
    • barnes

      Public
      C
      0000Updated May 15, 2024May 15, 2024
    • 0020Updated May 10, 2024May 10, 2024
    • Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involvement of the CPU beyond the initial kernel launch.
      Cuda
      MIT License
      21600Updated Apr 25, 2024Apr 25, 2024
    • C
      0100Updated Apr 25, 2024Apr 25, 2024
    • C++
      1200Updated Apr 25, 2024Apr 25, 2024
    • BeyondMoore has an ambitious goal to develop a software framework that performs static and dynamic optimizations, issues accelerator-initiated data transfers, and reasons about parallel execution strategies that exploit both processor and memory heterogeneity.
      0000Updated Apr 25, 2024Apr 25, 2024
    • .github

      Public
      Homepage README.
      0000Updated Apr 4, 2024Apr 4, 2024
    • C
      Other
      0000Updated Mar 22, 2024Mar 22, 2024
    • DaCe - Data Centric Parallel Programming
      Python
      BSD 3-Clause "New" or "Revised" License
      129000Updated Feb 2, 2024Feb 2, 2024
    • splash2

      Public
      Splash 2 Benchmarks
      C
      11000Updated Nov 28, 2023Nov 28, 2023
    • ComScribe

      Public
      ComScribe is a tool to identify communication among all GPU-GPU and CPU-GPU pairs in a single-node multi-GPU system.
      C++
      BSD 3-Clause "New" or "Revised" License
      42512Updated Jul 6, 2023Jul 6, 2023
    • C++
      0000Updated Jun 13, 2023Jun 13, 2023
    • HPCToolkit performance tools: measurement and analysis components
      C++
      60001Updated Mar 17, 2023Mar 17, 2023
    • The microbenchmarks that are used to verify the accuracy of ComDetective.
      Makefile
      2000Updated Mar 17, 2023Mar 17, 2023
    • Mixed and Multi-Precision SpMV for GPUs with Row-wise Precision Selection.
      Cuda
      MIT License
      1410Updated Mar 12, 2023Mar 12, 2023
    • A fast and accurate reuse distance analyzer for multi-threaded applications. It leverages existing hardware features in commodity CPUs.
      Shell
      41510Updated Feb 3, 2023Feb 3, 2023
    • HPCToolkit performance tools: essential third party libraries for hpctoolkit
      Shell
      Other
      6000Updated Oct 9, 2022Oct 9, 2022
    • AMD Research Instruction Based Sampling Toolkit
      C
      16000Updated Aug 6, 2022Aug 6, 2022
    • pardnn

      Public
      C++
      1100Updated May 20, 2022May 20, 2022
    • C
      1000Updated Apr 16, 2022Apr 16, 2022
    • The split execution framework can automatically determine the suitability of an SpTRSV for split-execution, find the appropriate split point, and execute SpTRSV in a split fashion using two SpTRSV algorithms while automatically managing any required inter-platform communication. The model is implemented as a C++/CUDA library supporting multiple …
      C++
      Other
      0300Updated Sep 7, 2021Sep 7, 2021
    • The SpTRSV prediction framework is an automated prediction framework for the fastest sparse triangular solve (SpTRSV) algorithm for a given input sparse matrix on a CPU-GPU platform.
      C++
      Other
      2600Updated Aug 17, 2020Aug 17, 2020