-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow different PartitionsDefinition
s within an AssetsDefinition
#23824
Conversation
c44fe49
to
a0a0d47
Compare
a0a0d47
to
75b569c
Compare
5765906
to
ba81226
Compare
75b569c
to
dee20d5
Compare
ba81226
to
d44afc0
Compare
af8f164
to
6ff1131
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you create an invalid run that targets multiple assets within the multi asset which have different partitions defs?
I assume it just gets caught by the same machinery that would catch this in the non-multi-asset case, but might be nice to have a unit test there.
Otherwise I think this seems great -- just commenting instead of approving as I'm sure others will want to weigh in.
Long-term-vision-wise, I'm imagining a world where a run can have an AssetGraphSubset attached to it and so we could have a context.get_partition_subset(asset_key) to figure out the specific partitions for a given asset or something, which this fits fine with.
Oo good question. Added a block in a test: with pytest.raises(
DagsterInvalidDefinitionError,
match="Selected assets must have the same partitions definitions, but the selected assets ",
):
materialize(assets=[my_assets], partition_key="b")
Totally |
6ff1131
to
c58bf99
Compare
d44afc0
to
9444922
Compare
## Summary & Motivation This is a non-behavior-changing PR that refactors some methods within the internal execution context classes. This aim of this PR is to prepare for #23824, by centralizing the logic for getting the `PartitionsDefinition` for an execution. Working on this made me aware of a behavioral weirdness that I think we should consider addressing in 1.9: `context.partition_keys` etc. will return a non-`None` value in steps that target no partitioned assets, if other steps in the same run do target partitioned assets. ## How I Tested These Changes ## Changelog NOCHANGELOG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Queue management.
I think should include or have upstack a motivating example that would demonstrate how this would be used in the dbt use base.
@schrockn I'll put together that example, but FYI this is in a reviewable state. |
Given that the selected assets must be the same partition definition, isn't it more straightforward/better for now (for the consumer at least, the multi_asset splitting would be complicated)) to create N multi_assets split up by N partition definitions? This makes it so that we still handle all arbitrary subset selections (which this scheme does not allow) but "gets the job done" for now for the dbt use case. Long story short, I'm asking if we should wait to support this until we implement "Long-term-vision-wise, I'm imagining a world where a run can have an AssetGraphSubset attached to it and so we could have a context.get_partition_subset(asset_key) to figure out the specific partitions for a given asset or something, which this fits fine with." as @OwenKephart describes. |
Here's a change we'd be able to make in purina if this (and related dbt support) were merged: https://github.com/dagster-io/internal/pull/11784.
If I understand correctly, you mean don't make framework changes right now, but instead update the dbt integration etc. to create multiple multi-assets instead of just one? A couple limitations with that approach:
Are you able to say more about what you mean by "handle"? It will still be possible, after this PR, to select a set of differently-partitioned assets that have the same That all said, I don't think there are mega-pressing reasons we need to try to merge this now, if we have reservations about it. The two reasons I had been pressing on this:
|
Hmmm I am getting confused by this sentence and and attached screenshot. And yes for sure https://github.com/dagster-io/internal/pull/11784 is super motivating.
I think this is worthy workstream. But the framework change is a big one. Just asking questions about alternatives. And yeah I'm imagining a new entry point for this advanced use case that produces multiple |
Ah yes reasonable confusion:
This is equivalent to the experience you'd get today if the assets were split across multiple In an ideal future, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks for working through this one. Supportive of this!
I just have one outstanding question. This., IMO, is a very "advanced" API likely to be used by a small number of sophisticated users and integration authors.
Should we make this a "builder" method on AssetSpec
similar to with_io_manager
? I'm concerned about leading mainstream users astray with multi_asset
, asset
, and AssetSpec
having the same variable name and causing a bunch of confusion when there are conflicts or whatever.
We could even make the builder method have a more specific name to very explicitly say its use case.
with_asset_specific_partitions_def
or something (even though it is redundant admittedly)
I have a long docblock explaining its usage.
We would need some sort of principal around when to make stuff "builder methods" versus top-level property but just trying to solve the short-term problem.
c58bf99
to
e5c69f3
Compare
This stack of pull requests is managed by Graphite. Learn more about stacking. |
I think it's harder to do the I agree that there's a risk of confusion when people are creating multi_assets etc., but in practice because it is a real property I think it might be even more confusing if we try to hide that parameter and force you to use a separate method |
e5c69f3
to
f490efa
Compare
I think we should move forward with this. However to talk this out this does add multiple ways of doing things as well as invalid parameter combinations that are only caught at runtime. In particular: @multi_asset(specs=[
spec.replace_attributes(partition_def=some_partition_def)
for spec in [AssetSpec("asset_one"), AssetSpec("asset_two")]
])
def an_asset() -> None: ... and @multi_asset(
specs=[AssetSpec("asset_one"), AssetSpec("asset_two")],
partition_def=some_partition_def,
)
def an_asset() -> None: ... I don't think it is worth deprecating That all being said, I support this path forward, just want to write out the tradeoffs. |
f490efa
to
f4f8a97
Compare
Summary & Motivation
This makes it possible for assets within a single
AssetsDefinition
to have differentPartitionsDefinitions
. I.e. the example below is now allowed:For this to be allowed,
can_subset
must be true. This means that we don't need to change anything about the underlying execution. If someone wants to launch an execution that targets two assets with differentPartitionsDefinition
s within the sameAssetsDefinition
, then the assets are just executed in separate runs.This will help address a common complaint we've received with respect to our dbt integration, which is that you need to do a separate
@dbt_assets(select=...)
perPartitionsDefinition
.How I Tested These Changes
dagster dev
with the example above, launched runs, clicked around the launchpad, looked at the asset details page.Changelog
AssetSpecs
may now contain apartitions_def
. DifferentAssetSpecs
passed to the same invocation of@multi_asset
can now have differentPartitionsDefinitions
, as long ascan_subset=True
.