Skip to content

Reapply #9842: Save some size in dtype_util when dtype selective build is not in use #10490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: gh/swolchok/429/head
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 28 additions & 1 deletion kernels/portable/cpu/util/dtype_util.h
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ enum class SupportedTensorDtypes {
namespace internal {

template <typename CTYPE_COMPUTE, const char* op_name>
load_to_compute_fn<CTYPE_COMPUTE> get_load_to_compute_fn(
load_to_compute_fn<CTYPE_COMPUTE> get_load_to_compute_fn_impl(
const Tensor& t,
SupportedTensorDtypes dtypes) {
switch (dtypes) {
Expand All @@ -251,6 +251,10 @@ load_to_compute_fn<CTYPE_COMPUTE> get_load_to_compute_fn(
return nullptr;
}

// NOTE: applying the #ifdef EXECUTORCH_SELECTIVE_BUILD_DTYPE
// technique used for get_load_to_compute_fn in this path was a size
// regression rather than an improvement. Haven't fully investigated
// why; just be aware when trying to improve size further.
template <typename CTYPE_COMPUTE, const char* op_name>
store_compute_to_tensor_fn<CTYPE_COMPUTE> get_store_compute_to_tensor_fn(
const Tensor& t,
Expand Down Expand Up @@ -285,6 +289,29 @@ store_compute_to_tensor_fn<CTYPE_COMPUTE> get_store_compute_to_tensor_fn(
return nullptr;
}

#ifndef EXECUTORCH_SELECTIVE_BUILD_DTYPE
inline constexpr const char kGenericElementwiseOpName[] =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

marking this inline was needed for size (it changes the linkage for this, otherwise it's duplicated across translation units) and is a difference from the first attempt.

"generic_elementwise_op";
#endif // EXECUTORCH_SELECTIVE_BUILD_DTYPE

template <typename CTYPE_COMPUTE, const char* op_name>
load_to_compute_fn<CTYPE_COMPUTE> get_load_to_compute_fn(
const Tensor& t,
SupportedTensorDtypes dtypes) {
// NOTE: Selective build relies on the operator name being passed
// here. When it's *not* active, using the same operator name
// everywhere saves on size because we don't require a new template
// instantiation for every operator.
return get_load_to_compute_fn_impl<
CTYPE_COMPUTE,
#ifdef EXECUTORCH_SELECTIVE_BUILD_DTYPE
op_name
#else // EXECUTORCH_SELECTIVE_BUILD_DTYPE
kGenericElementwiseOpName
#endif // EXECUTORCH_SELECTIVE_BUILD_DTYPE
>(t, dtypes);
}

bool check_tensor_dtype(
const Tensor t,
SupportedTensorDtypes dtypes,
Expand Down
Loading