Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more motivational use cases #80

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 64 additions & 1 deletion nonmember_subscript_operator/main.tex
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,13 @@ \section{Problem}
That the subscript operator does not allow non\hyp member overloads appears like an arbitrary decision, apparently motivated by a conservative approach, to only add features when the use\hyp cases are clear.

Therefore, for the rest of this section I will give examples for non\hyp member subscript operators.
In general, non\hyp member subscript is needed if the class to be subscripted is independent of the type used for the subscript argument or if modifications of the subscripted type are not possible.
In general, non\hyp member subscript is needed if the class to be subscripted is independent of the type used for the subscript argument, if modifications of the subscripted type are not possible or the desired subscription functionality applies to a broader range of classes.
Consequently, several examples are very similar:
A generic container class only implements the \type{size\_t} subscript operator.
A user-defined type subsequently provides a good use\hyp case for an additional subscript overload for one or more container classes.

\subsection{Vector load/store/gather/scatter on std containers}
\label{sec:simd_load}
The use\hyp case that got me started was the design of a gather/scatter API for SIMD vector types.
Consider the classical container indexing operation:
\smallskip\begin{lstlisting}
Expand Down Expand Up @@ -125,6 +126,68 @@ \subsection{Compatiblity layer}

\end{lstlisting}

\subsection{Interpolated access}
Subscripting a random-access range with non-integral types can have interesting use cases,
like interpolating between the elements at the nearest smaller and larger integral index of the subscript:
\smallskip\begin{lstlisting}
auto operator[](std::ranges::random_access_range auto r, double index) {
double integ;
const double frac = std::modf(index, &integ);
return std::lerp(r[integ], r[integ + 1], frac);
}
std::vector v{0.0, 1.0, 4.5, 7.8};
double value = v[1.6]; // value == 3.1
\end{lstlisting}
However, adding such an overload in practice might be dangerous because the floating-point index might truncate to a range's integral subscript operator in case the floating point overload is not visible, or change the behavior of an existing floating point subscription relying on the truncation.
This issue could be worked around using a custom type which cannot truncate into an integral, allowing to e.g. also specify the used interpolation:
\smallskip\begin{lstlisting}
std::vector v{0.0, 1.0, 4.5, 7.8};
double value = v[1.6_linear]; // value == 3.1
\end{lstlisting}

With the adoption of P2128, we could even allow interpolation on higher dimensional objects, like the proposed `std::mdspan` or `std::mdarray`.
\smallskip\begin{lstlisting}
std::mdarray<double, N, M, K> a = ...;
double value = a[7.5, 8.9, 76.4]; // trilinear interpolation
\end{lstlisting}

While we are aware that such a usage might be niche or a separate function for interpolated access could be arguably better, we would like to give library writes the necessary machinery to implement such a feature within their domains if they desire so.

Comment on lines +129 to +155
Copy link
Contributor

@MFHava MFHava May 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be really interesting what EWG thinks about the interpolated access example (and the simd-example for that matter), as some people recently stated that subscripting should always offer reference semantics...

Copy link
Contributor Author

@bernhardmgruber bernhardmgruber May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this feature so much, that I am probably blind to the critique. @mattkretz also did not like it a lot, especially the float subscript. Being a bit of a 3D graphics fan, I am used to stuff being interpolated all the time, so this feels natural to me. But sure, if the use case is too controversal, we can also drop it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any precedent for a float-subscript in 3D graphics? Based on my albeit limited GLSL experience, I'm inclined to believe that whilst data is often interpolated, such an access is never expressed via subscripting...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, looking that up now you are probably right:

  • In OpenGL/GLSL there is the texture(texObj, coords...) function.
  • In D3D/HLSL there is the texObj.Sample(sampler, coords...) member function.
  • In Cg, there is the tex2D(sampler, coords) function.

So it seems there is no direct precedence in shading languages for interpolation using subscripting. Such functionality is indeed provided by functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question for me now becomes whether interpolation on subscription is even a bad idea in niche domains, domain specific libraries or closed internal code bases. I agree that such a feature does not belong into the standard library. But could there still be interesting applications in such limited scenarios?

\subsection{Generic subsetting}
The idea from section \ref{sec:simd_load} to use SIMD vectors as subscripts can be further generalized to allow any kind entity representing multiple indices as subscript.
\smallskip\begin{lstlisting}
auto operator[](std::ranges::random_access_range r, std::ranges::input_range indices);
\end{listing}
The concrete implementation and return type of such an operation is debatable and thus intentionally omitted.
Still, it could allow for better formulation of some algorithms or more efficient access for certain ranges, if the index set used for subscription is known in advance.
Comment on lines +156 to +162
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from a potentially terser notation, I'm not convinced that this would be better than something like:
vec | std::views::indexed(indices)

For this and all following examples: I'd recommend switching to more concrete examples than globally overloading based on concepts - as that wouldn't work anyways (ADL would never find them).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure: we can always replace any operator by a function :) The question is which syntax feels more "natural". Maybe I should think more in a mathematical direction, where subscripting happens with non-integers.

The ADL part is a fair point! If I template the non-member operator[] on the first argument, I cannot be found if it is in another namespace. That makes it useless for std containers, since I cannot put the operator in namespace std. This is no obstacle for @mattkretz, because simd will live in std. Which makes me wonder where else people add non-member operators outside the namespaces of their parameter types.


\subsection{Predicated subsetting}
Subscripting with a unary predicate would allow us to select a subset of an existing container.
Combined with an easy way to specify these predicates, e.g. Boost.Lambda2, this allows powerful and concise expressions:
\smallskip\begin{lstlisting}
template <std::ranges::random_access_range R, typename UnaryPredicate>
auto operator[](R&& r, UnaryPredicate&& p) {
return std::forward<R>(r) | std::views::filter(std::forward<UnaryPredicate>(p));
}
std::vector v{0.0, -1.0, 4.5, -7.8};
for (auto e : v[_1 > 0])
...;
\end{listing}
Comment on lines +164 to +175
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. This looks nice - at least with a terse lambda notation it's really readable, it would be hideous with the current lambda syntax...

Copy link
Contributor Author

@bernhardmgruber bernhardmgruber May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In @mattkretz's design for SIMD, I think he originally had this syntax instead of the where expression. With non-member operator[], such a facility could be added by users in their namespace for std::simd:

template <typename T, typename Abi>
decltype(auto) operator[](std::simd<T, Abi>& s, std::simd_mask<T, Abi> m) {
	return std::where(s, m);
}


\subsection{Slicing}
With the adoption of P2128, a generic slicing facility could be defined:
\smallskip\begin{lstlisting}
template <std::ranges::random_access_range R>
auto operator[](R r, std::size_t from, std::size_t to, std::size_t step = 0) { ... }

std::vector v{...};
auto slice1 = v[10, 20]; // elements from the 10th to the 20th (exclusive)
auto slice2 = v[10, 20, 2]; // every second element from the 10th to the 20th (exclusive)
\end{listing}
Such slicing operators are supported in other languages, e.g. Python.
There have also been C++ language extensions to add such slicing functionality, e.g. Intel's Cilk Plus Array Notations.
The adoption of P2128 and this proposal would allow for corresponding library solutions to emerge.
Comment on lines +177 to +189
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slicing ala Cilk+ would be quite nice for certain (numeric) domains. Generalizing the subscript operator enables emulating this without dedicated language support, but I'm not sure if that syntax would be desirable for such a feature...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v[10, 20, 2] is indeed a bit ambiguous. It could also mean take the 10th, 20th and 2nd element. But then, should we disallow users to write that if they want? In Python, I can write v[10:20:2] and people usually understand what that means. Good conventions can establish in the communities.

Copy link
Contributor

@MFHava MFHava May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v[10, 20, 2] is indeed a bit ambiguous. It could also mean take the 10th, 20th and 2nd element.

Without knowing the type of v it could be literally anything - e.g. for a matrix it could simply mean: v[10, 20..2].

In Python, I can write v[10:20:2] and people usually understand what that means.

Yes, but:

  1. Does Python actually have multi-dim subscripting?
  2. If yes: Does it use : for that purpose or is that a different syntax? (e.g. ,)

It's beyond the scope of the paper but IMHO a multi-dim subscript operator is ill-suited for a slicing mechanic as it conflates indices with slices.

Copy link
Contributor Author

@bernhardmgruber bernhardmgruber Jun 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I have seen so far, and I am by far not fluent in Python and the popular Numpy, there is a distinction:

  • Multidim array access is done with repeated subscripts: m[4][6]
  • Slicing is done using a different notation: m[4:6], meaning the two elements at index 4 and 5.
  • The notation m[4, 6] apparently creates a tuple and passes that to the subscript operator, resulting in an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a multi-dim subscript operator is ill-suited for a slicing mechanic as it conflates indices with slices.

I guess you are right here as well. Slicing using operator[] is probably confusing.


\section{Implementation}
Implementing non-member subscript overloads for GCC was as simple as removing one line of code (a special case).

Expand Down