Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cyPAPI Beta for adding Cuda GPU presets from PR#284 in PAPI #27

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Treece-Burgess
Copy link
Collaborator

@Treece-Burgess Treece-Burgess commented Mar 3, 2025

This PR acts as a beta/template to use the Cuda GPU presets that are currently in PR #284.

Demo script of utilizing the Cuda GPU presets with Pytorch on a single A100:

#!/usr/bin/env python3

# imports necessary for script
import torch

import cypapi as cyp 

if __name__ == '__main__':
    # size of tensors
    m_rows = 1000
    n_cols = 1000

    # cuda presets to profile from an A100
    cudaPresets = [cyp.PAPI_CUDA_FP16_FMA, cyp.PAPI_CUDA_FP32_FMA, cyp.PAPI_CUDA_FP64_FMA, cyp.PAPI_CUDA_FP_FMA] 

    # check to see if a device is available
    if torch.cuda.is_available():
        unit = "cuda"
    else:
        raise ValueError("NVIDIA device needed.")

    try:
        # initialize cyPAPI
        cyp.cyPAPI_library_init(cyp.PAPI_VER_CURRENT)

        # check to see if cyPAPI was successfully initialized
        if cyp.cyPAPI_is_initialized() != 1:
            raise ValueError( "cyPAPI has not been initialized.\n" )

        # create  a cyPAPI EventSet 
        cuda_eventset = cyp.CypapiCreateEventset()

        # add cuda presets to the created eventset
        for preset in cudaPresets:
            cuda_eventset.add_event(preset)
 
        # start counting hardware events in the created EventSet
        cuda_eventset.start()
    
        # create tensors for computation
        matrix_A = torch.rand( m_rows, n_cols, device = unit )
        matrix_B = torch.rand( m_rows, n_cols, device = unit )
        # perform matrix multiplication
        result_tensor = torch.mm( matrix_A, matrix_B )

        # transfer results to cpu
        result_cpu = result_tensor.detach().cpu()
    
        # stop counting hardware events in the created EventSet
        hw_counts = cuda_eventset.stop()
     # Cuda component was not successfully built
    except Exception:
        print('\033[0;31mFAILED\033[0m');
        raise
    # Cuda component has been successfully built
    else:
        # show number of available devices
        print( "Number of available devices: ", torch.cuda.device_count() )
        # show device name
        print( "Device Name: ", torch.cuda.get_device_name( unit ) ) 
        # counts for Cuda presets
        for i, name in zip( range(0, len(hw_counts)),
                            ["PAPI_CUDA_FP16_FMA", "PAPI_CUDA_FP32_FMA", "PAPI_CUDA_FP64_FMA", "PAPI_CUDA_FP_FMA"] ):
            print( f"Hardware Counts for {name}: ", hw_counts[i])
        print("\033[0;32mPASSED\033[0m")

Output for the demo script:

Number of available devices:  1
Device Name:  NVIDIA A100-PCIE-40GB
Hardware Counts for PAPI_CUDA_FP16_FMA:  884736
Hardware Counts for PAPI_CUDA_FP32_FMA:  1058097408
Hardware Counts for PAPI_CUDA_FP64_FMA:  0
Hardware Counts for PAPI_CUDA_FP_FMA:  1058982144

From further testing the following methods are confirmed to work in the CypapiCreateEventset class:

  • num_events()
  • list_events()

For cyPAPI_enum_event, the following new modifiers work as expected:

  • CPU presets:
    • PAPI_PRESET_ENUM_CPU
    • PAPI_PRESET_ENUM_CPU_AVAIL
  • GPU presets:
    • PAPI_PRESET_ENUM_FIRST_COMP

Current known issues in PR #284 for GPU presets that are still being worked through:

  • PAPI_event_code_to_name for GPU presets other than PAPI_CUDA_FP16_FMA,
    • This will lead to issues in cyPAPI with cyPAPI_enum_events and cyPAPI_event_code_to_name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant