cyPAPI Beta for adding Cuda GPU presets from PR#284 in PAPI #27

Treece-Burgess · 2025-03-03T16:17:38Z

This PR acts as a beta/template to use the Cuda GPU presets that are currently in PR #284.

Demo script of utilizing the Cuda GPU presets with Pytorch on a single A100:

#!/usr/bin/env python3

# imports necessary for script
import torch

import cypapi as cyp 

if __name__ == '__main__':
    # size of tensors
    m_rows = 1000
    n_cols = 1000

    # cuda presets to profile from an A100
    cudaPresets = [cyp.PAPI_CUDA_FP16_FMA, cyp.PAPI_CUDA_FP32_FMA, cyp.PAPI_CUDA_FP64_FMA, cyp.PAPI_CUDA_FP_FMA] 

    # check to see if a device is available
    if torch.cuda.is_available():
        unit = "cuda"
    else:
        raise ValueError("NVIDIA device needed.")

    try:
        # initialize cyPAPI
        cyp.cyPAPI_library_init(cyp.PAPI_VER_CURRENT)

        # check to see if cyPAPI was successfully initialized
        if cyp.cyPAPI_is_initialized() != 1:
            raise ValueError( "cyPAPI has not been initialized.\n" )

        # create  a cyPAPI EventSet 
        cuda_eventset = cyp.CypapiCreateEventset()

        # add cuda presets to the created eventset
        for preset in cudaPresets:
            cuda_eventset.add_event(preset)
 
        # start counting hardware events in the created EventSet
        cuda_eventset.start()
    
        # create tensors for computation
        matrix_A = torch.rand( m_rows, n_cols, device = unit )
        matrix_B = torch.rand( m_rows, n_cols, device = unit )
        # perform matrix multiplication
        result_tensor = torch.mm( matrix_A, matrix_B )

        # transfer results to cpu
        result_cpu = result_tensor.detach().cpu()
    
        # stop counting hardware events in the created EventSet
        hw_counts = cuda_eventset.stop()
     # Cuda component was not successfully built
    except Exception:
        print('\033[0;31mFAILED\033[0m');
        raise
    # Cuda component has been successfully built
    else:
        # show number of available devices
        print( "Number of available devices: ", torch.cuda.device_count() )
        # show device name
        print( "Device Name: ", torch.cuda.get_device_name( unit ) ) 
        # counts for Cuda presets
        for i, name in zip( range(0, len(hw_counts)),
                            ["PAPI_CUDA_FP16_FMA", "PAPI_CUDA_FP32_FMA", "PAPI_CUDA_FP64_FMA", "PAPI_CUDA_FP_FMA"] ):
            print( f"Hardware Counts for {name}: ", hw_counts[i])
        print("\033[0;32mPASSED\033[0m")

Output for the demo script:

Number of available devices:  1
Device Name:  NVIDIA A100-PCIE-40GB
Hardware Counts for PAPI_CUDA_FP16_FMA:  884736
Hardware Counts for PAPI_CUDA_FP32_FMA:  1058097408
Hardware Counts for PAPI_CUDA_FP64_FMA:  0
Hardware Counts for PAPI_CUDA_FP_FMA:  1058982144

From further testing the following methods are confirmed to work in the CypapiCreateEventset class:

num_events()
list_events()

For cyPAPI_enum_event, the following new modifiers work as expected:

CPU presets:
- PAPI_PRESET_ENUM_CPU
- PAPI_PRESET_ENUM_CPU_AVAIL
GPU presets:
- PAPI_PRESET_ENUM_FIRST_COMP

Current known issues in PR #284 for GPU presets that are still being worked through:

PAPI_event_code_to_name for GPU presets other than PAPI_CUDA_FP16_FMA,
- This will lead to issues in cyPAPI with cyPAPI_enum_events and cyPAPI_event_code_to_name

Treece Burgess and others added 2 commits March 3, 2025 08:14

cyPAPI beta for adding Cuda GPU presets from PR#284 in PAPI.

be77a94

Update cyPAPI_enum_events to work with new modifiers.

4d51f05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cyPAPI Beta for adding Cuda GPU presets from PR#284 in PAPI #27

cyPAPI Beta for adding Cuda GPU presets from PR#284 in PAPI #27

Treece-Burgess commented Mar 3, 2025 •

edited

Loading

cyPAPI Beta for adding Cuda GPU presets from PR#284 in PAPI #27

Are you sure you want to change the base?

cyPAPI Beta for adding Cuda GPU presets from PR#284 in PAPI #27

Conversation

Treece-Burgess commented Mar 3, 2025 • edited Loading

Treece-Burgess commented Mar 3, 2025 •

edited

Loading