Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added array optimimzation fuse notebook #89

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

TomAugspurger
Copy link
Member

@TomAugspurger TomAugspurger commented Jul 18, 2019

Copy link

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for writing this up, very useful!

applications/array-optimization.ipynb Outdated Show resolved Hide resolved
applications/array-optimization.ipynb Show resolved Hide resolved
"metadata": {},
"outputs": [],
"source": [
"inputs_rechunked.blocks[0, :2].visualize(optimize_graph=True)"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice trick, didn't know about this :-)

applications/array-optimization.ipynb Outdated Show resolved Hide resolved
applications/array-optimization.ipynb Outdated Show resolved Hide resolved
@TomAugspurger
Copy link
Member Author

Thanks @alimanfoo, I've applied your suggestions.

@mrocklin do you have high-level thoughts on this? Does this feel like we're just documenting a workaround to a weakness of Dask that we should instead be fixing?

@mrocklin
Copy link
Member

mrocklin commented Jul 19, 2019 via email

@mrocklin
Copy link
Member

mrocklin commented Jul 19, 2019 via email

@martindurant
Copy link
Member

@TomAugspurger , did you have plans to try to make the story here more general?

@TomAugspurger
Copy link
Member Author

TomAugspurger commented Jul 31, 2019 via email

@TomAugspurger
Copy link
Member Author

@mrocklin question on the HLG fusion: would you expect adding additional
operations to the end of a task graph (e.g. .store) to potentially result in
more fusion earlier on? My guess is that extra tasks won't lead to more fusion
earlier on, but I may be misreading fuse.

I ask because when I look at just the creation / stacking / rechunking, we don't
get fusion with the default parameters:

import dask.array as da

inputs = [da.random.random(size=500_000, chunks=90_000)
          for _ in range(5)]
inputs_stacked = da.vstack(inputs)
inputs_rechunked = inputs_stacked.rechunk((50, 90_000))
inputs_rechunked.visualize(optimize_graph=True)

image

So unless adding a .store() to the end results in more fusion earlier on (in
the creation / stacking / rechunking phase), we won't be solving this use-case.

Base automatically changed from master to main January 27, 2021 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants