Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interopping with containers in third-party libraries #213

Open
SidneyCogdill opened this issue Nov 4, 2024 · 4 comments
Open

Interopping with containers in third-party libraries #213

SidneyCogdill opened this issue Nov 4, 2024 · 4 comments

Comments

@SidneyCogdill
Copy link

SidneyCogdill commented Nov 4, 2024

I attempt to use flux with immer::vector (I know, that's a weird thing to do. immer iterators are already const and never invalidates). For the most part (except for .to<immer::vector<T>>()) it seems to work as long as this custom trait is defined:

The trait implementation
template<typename T>
struct flux::sequence_traits<immer::vector<T>>
{
    using value_type = std::ranges::range_value_t<immer::vector<T>>;

    static constexpr auto first(auto &) -> index_t { return index_t{0}; }

    static constexpr auto is_last(auto &self, index_t idx) { return idx >= size(self); }

    static constexpr auto inc(auto &self, index_t &idx)
    {
        FLUX_DEBUG_ASSERT(idx < size(self));
        idx = num::checked_add(idx, distance_t{1});
    }

    static constexpr auto read_at(auto &self, index_t idx) -> decltype(auto)
    {
        indexed_bounds_check(idx, size(self));
        return self[idx];
    }

    static constexpr auto read_at_unchecked(auto &self, index_t idx) -> decltype(auto)
    {
        return self[idx];
    }

    static constexpr auto dec(auto &, index_t &idx)
    {
        FLUX_DEBUG_ASSERT(idx > 0);
        idx = num::checked_sub(idx, distance_t{1});
    }

    static constexpr auto last(auto &self) -> index_t { return size(self); }

    static constexpr auto inc(auto &self, index_t &idx, distance_t offset)
    {
        FLUX_DEBUG_ASSERT(num::checked_add(idx, offset) <= size(self));
        FLUX_DEBUG_ASSERT(num::checked_add(idx, offset) >= 0);
        idx = num::checked_add(idx, offset);
    }

    static constexpr auto distance(auto &, index_t from, index_t to) -> distance_t
    {
        return num::checked_sub(to, from);
    }

    static constexpr auto size(auto &self) -> distance_t
    {
        return checked_cast<distance_t>(std::ranges::ssize(self));
    }

    static constexpr auto for_each_while(auto &self, auto &&pred) -> index_t
    {
        auto iter = std::ranges::begin(self);
        auto const end = std::ranges::end(self);

        while (iter != end) {
            if (!std::invoke(pred, *iter)) {
                break;
            }
            ++iter;
        }

        return checked_cast<index_t>(iter - std::ranges::begin(self));
    }
};

You just duplicate the default implementation for contiguous containers and make minimal adaptation changes and it just works.

However, that's a lot of duplication you can't get rid of and sub-classing from flux::sequence_traits<std::vector<T>> won't help in this case, because the functions with calls to read_at needs to be shadowed by the derived class because they're not virtual methods, otherwise they will call the base version of read_at which then calls on .data() (immer::vector has no .data() because it's not contiguous).

virtual also doesn't seem to work in this case as the return type needs to be covariant and there's a lot of other limitations (no return type deduce, no templates, can't be static).

Relevant issue: P2279 We need a language mechanism for customization points

This is more of a "for your information" type of issue. I'm not aware of any good ways to make customization point easier to use. A workaround could be separating the trait into multiple different traits which the user can independently override, or offer a default implementation that looks for free functions which the user can provide, but that could make the customization point more confusing to use.

@tcbrindle
Copy link
Owner

First of all, I'm glad to hear that this (mostly) works :)

I agree that this does require more boilerplate than would be ideal. One thing I'd like to do (and I'm slowly making progress on) is to provide more default implementations when the cursor type (as returned by first()) is detected to be a signed integer type. In this case, we would only require the user to additionally provide last() and read_at_unchecked() and then be able to provide correct and efficient implementations of everything else automatically.

As it stands, it's worth pointing out that for_each_while() already has a default implementation, so you don't technically need to implement it separately unless you want to try to do something more efficient than the default impl. I don't know much about immer::vector but it might be worth commenting out the customised for_each_while and running a benchmark or two to see whether the customised version has any benefit or not.

Also, please feel free to submit an issue about immer::vector not working with flux::to -- I can't promise I'll be able to fix it but it would be interesting to see whether we can do anything.

@SidneyCogdill
Copy link
Author

SidneyCogdill commented Nov 5, 2024

(although micro-benchmarks are meaningless; profiling is much more useful) just ran benchmarks with and without the for_each_while() custom implementation.

static void run1(benchmark::State &state)
{
    auto a = v::iota(0) | v::take(2000);
    immer::vector<int> v = to_immer_vector(a);

    for (auto _ : state) {
        auto seq = f::ref(v);
        seq.for_each([](const auto &e) { boundary((void *) &e); });
    }
}
  • boundary(void *) acts as an optimization barrier (defined in a separate TU) to prevent eliding the unused variable reads.

got nearly identical result

Mistakenly used from_range (and the range_sequence was used, therefore I got nearly identical result because they're indeed identical code path). The updated result:

immer::vector<int> with default implementation:

-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
run1/repeats:10_mean         6277 ns         3992 ns           10
run1/repeats:10_median       6361 ns         4129 ns           10
run1/repeats:10_stddev        206 ns          594 ns           10
run1/repeats:10_cv           3.29 %         14.88 %            10

immer::vector<int> with custom implementation (duplicated from contiguous sequence trait):

-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
run1/repeats:10_mean         3305 ns         2048 ns           10
run1/repeats:10_median       3304 ns         2065 ns           10
run1/repeats:10_stddev       76.2 ns          134 ns           10
run1/repeats:10_cv           2.30 %          6.56 %            10

immer::vector<int> with from_range:

-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
run1/repeats:10_mean         3169 ns         1788 ns           10
run1/repeats:10_median       3147 ns         1822 ns           10
run1/repeats:10_stddev       64.0 ns          177 ns           10
run1/repeats:10_cv           2.02 %          9.92 %            10

(updated) std::vector<int> with default implementation (contiguous sequence trait):

-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
run1/repeats:10_mean         3002 ns         1723 ns           10
run1/repeats:10_median       2990 ns         1784 ns           10
run1/repeats:10_stddev       39.7 ns          164 ns           10
run1/repeats:10_cv           1.32 %          9.53 %            10

std::vector<int> with from_range:

-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
run1/repeats:10_mean         2478 ns         1510 ns           10
run1/repeats:10_median       2462 ns         1522 ns           10
run1/repeats:10_stddev       44.0 ns          124 ns           10
run1/repeats:10_cv           1.78 %          8.23 %            10
environment
Run on (12 X 3493 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 512 KiB (x6)
  L3 Unified 32768 KiB (x1)
  • MSVC 17.11.5327.3
  • C++23 mode
  • CMake release build (/O2 /Ob2 /DNDEBUG)

(contiguous container is indeed faster in the benchmark. but immer::vector result is also acceptable given its other unique features)


as of .to<immer::vector>(), I feel like this is immer's issue because it also doesn't work with standard library's ranges::to<immer::vector>().

@SidneyCogdill
Copy link
Author

(Confirmed that the updated result is stable by re-running it multiple times.)

With the updated result, the custom implementation duplicated from contiguous sequence trait is significantly faster. I haven't looked into the default implementation of for_each_while() but I guess the default one uses cursor trait's read_at for element access, while the implementation from contiguous sequence trait reuses the range iterators. immer::vector is a node based data structure that separates elements into different chunks, so I think the result isn't surprising if the iterators returned from immer::vector performs faster on immer::vector.

@tcbrindle
Copy link
Owner

Yeah, immer::vector::iterator has knowledge of the internals of the data structure whereas a third party Flux sequence implementation doesn't, so I guess it's not a big surprise if the iterator version is faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants