Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

only warn about argument sizes on Nvidia GPUs #713

Merged
merged 4 commits into from
Mar 4, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 7 additions & 14 deletions pyopencl/invoker.py
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the context a bit more? Were there spurious warnings that were bothersome?

Copy link
Contributor Author

@matthiasdiener matthiasdiener Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our experience, only Nvidia GPUs actually have compilation errors when the argument size is too large; most other CL devices have artificially low limits (usually 1024 bytes, roughly corresponding to the 127 arguments requirement of the C standard), but compile and run much larger argument sizes fine, thus leading to lots of spurious warnings on any non-Nvidia device.

Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,7 @@ def _get_max_parameter_size(dev):
dev_limit = dev.max_parameter_size
pocl_version = get_pocl_version(dev.platform, fallback_value=(1, 8))
if pocl_version is not None and pocl_version < (3, 0):
# Current PoCL versions (as of 04/2022) have an incorrect parameter
# Older PoCL versions (<3.0) have an incorrect parameter
# size limit of 1024; see e.g. https://github.com/pocl/pocl/pull/1046
if dev_limit == 1024:
if dev.type & cl.device_type.CPU:
Expand All @@ -336,25 +336,27 @@ def _check_arg_size(function_name, num_cl_args, arg_types, devs):
"""Check whether argument sizes exceed the OpenCL device limit."""

for dev in devs:
from pyopencl.characterize import nv_compute_capability
if nv_compute_capability(dev) is None:
# Only warn on Nvidia GPUs, because actual failures related to
# the device limit have been observed only on such devices.
continue

dev_ptr_size = int(dev.address_bits / 8)
dev_limit = _get_max_parameter_size(dev)

total_arg_size = 0

is_estimate = False

if arg_types:
for arg_type in arg_types:
if arg_type is None:
is_estimate = True
total_arg_size += dev_ptr_size
elif isinstance(arg_type, VectorArg):
total_arg_size += dev_ptr_size
else:
total_arg_size += np.dtype(arg_type).itemsize
else:
# Estimate that each argument has the size of a pointer on average
is_estimate = True
total_arg_size = dev_ptr_size * num_cl_args

if total_arg_size > dev_limit:
Expand All @@ -364,15 +366,6 @@ def _check_arg_size(function_name, num_cl_args, arg_types, devs):
f"the limit of {dev_limit} bytes on {dev}. This might "
"lead to compilation errors, especially on GPU devices.",
stacklevel=3)
elif is_estimate and total_arg_size >= dev_limit * 0.75:
# Since total_arg_size is just an estimate, also warn in case we are
# just below the actual limit.
from warnings import warn
warn(f"Kernel '{function_name}' has {num_cl_args} arguments with "
f"a total size of {total_arg_size} bytes, which approaches "
f"the limit of {dev_limit} bytes on {dev}. This might "
"lead to compilation errors, especially on GPU devices.",
stacklevel=3)

# }}}

Expand Down
Loading