-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intersect skewed sparse iterators by galloping #524
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took an initial pass here, and left some comments.
As you suggested in the email, we should make galloping a scheduling directive. To do this, I suggest the following: Look at transformations.cpp to see the style that other scheduling transformations are written in. This likely does not need to be as involved as other transformations. I imagine a directive .setMergeStrategy(IndexVar)
that takes an enum MergeStrategy
, where for now MergeStrategy
can either be something like TwoFingerMerge
or GallopingMerge
. This transformation can then just rewrite the referenced Forall
in the input IndexStmt
to tag the ForallNode
with this iteration strategy.
Additionally, I think galloping should be applied by default when users apply windowing through an index set iterator, similarly to how in numpy users can index arrays with other arrays to reference only a subset of the columns. The relevant code that I imagine you needing to change is at
taco/src/lower/lowerer_impl_imperative.cpp
Line 1641 in d0654a8
for (auto& iter : filter(iterators, [](Iterator it) { return it.hasIndexSet(); })) { |
FInally, I've allowed your PR to run all of the tests, so you can see what you're failing. We'll need the tests you have disabled to also start passing. You will likely have to also add the gallop function definition to the CUDA code generator, but I don't think that there is anything more sophisticated that will need to be changed there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed some low level comments; will address the rest after implementing scheduling.
@remysucre it looks like something in the test suite is hanging with your changes. It'll be best to run it locally to debug further. |
Ah yes, please disable the tests for now. The changes break them until I implement scheduling. |
Still WIP (so some tests may hang), I will mark the PR as ready for review once I'm done. |
I still need to add more tests, but currently some simple examples work and all existing tests pass on my machine. I probably won't touch the code until next week, so please feel free to make a pass. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good! Thanks for your attention to detail, especially in a new codebase. I've requested mostly nits and verification of preconditions at the time of transformation application rather than during lowering.
Once you add some tests we'll be good to merge this, since the full suite is passing CI right now.
I addressed the comments and added some tests. I'm a bit out of my depth for how galloping should interact with other scheduling directives (or even how the existing directives work), so I just pushed a few cases where the tests passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work remy, the code looks great! I don't think galloping has too many interactions with other scheduling directives and should mostly "just work" with the other directives. If problems come up we should be able to fix them.
Before I merge this, please squash all of your commits into a single one.
This commit addes a new scheduling directive called mergeby. This directive specifies if the iterators of a given variable is merged by Two Finger merge or Galloping. The default strategy is Two Finger merge which is the same as the old behavior. Galloping merges the iterators with exponential search, which can be more efficient if the iterator sizes are skewed.
Thank you for your help! I've squashed the commits. |
Thanks for your work remy! Please reach out if you have questions / need guidance around performing benchmarks etc. |
Currently intersection is done by advancing each iterator by 1 every loop iteration. This can be slow when the intersected iterators are skewed. For example, intersecting a sparse dimension with N entries with another one with a single entry at the last position will take N steps. In such cases intersecting by galloping can bring the cost down to logarithm time. The following shows the galloping code that intersects 2 sparse vectors: