Write a device kernel that calculates the single precision BLAS operation
saxpy, i.e. y = a * x + y.
- Initialise the vectors
xandywith some values on the CPU - Perform the computation on the host to generate reference values
- Allocate memory on the device for
xandy - Copy the host
xto devicex, and hostyto devicey - Perform the computation on the device
- Copy the device
yback to the hosty - Confirm the correctness: Is the host computed
yequal to the device computedy?
You may start from a skeleton code provided in saxpy.cpp.