Skip to content

Conversation

@AdrianoKF
Copy link

@AdrianoKF AdrianoKF commented Jun 4, 2024

Open questions:

  • How to properly test the run name generation? Seems to be very hard to come up with a proper AssetExecutionContext in a test case.
  • Is logging without the Dagster run ID a bad idea? Multiple independent Dagster executions will end up in the same MLflow run that way, which could be confusing.
  • When executing with use_asset_run_key=False, the asset key tags on the MLflow run are incomplete (since every Dagster op/asset replaces the previous value). Should we drop that tag entirely? Does any other code depend on this feature currently?

@AdrianoKF AdrianoKF requested a review from janwillemkl June 4, 2024 11:16
@AdrianoKF AdrianoKF marked this pull request as ready for review June 4, 2024 11:17
@maxmynter
Copy link

Thoughts from @nicholasjng and me:

How to properly test the run name generation? Seems to be very hard to come up with a proper AssetExecutionContext in a test case.

No idea, but we can take time together tomorrow.

Is logging without the Dagster run ID a bad idea? Multiple independent Dagster executions will end up in the same MLflow run that way, which could be confusing.

No. It is not a bad idea.

Repeatedly overwriting the last run keeps the mlflow run logs lean and in the way we set it up with the start up script.

We only show the MLflow page once. If we do that after we execute the workflow it is also not surprising when the run already exists.

When executing with use_asset_run_key=False, the asset key tags on the MLflow run are incomplete (since every Dagster op/asset replaces the previous value). Should we drop that tag entirely? Does any other code depend on this feature currently?

If we drop the Dagster run id, it does not make sense to log the asset key. Plus, the logged names can be (are?) different if we prefix them with e.g. train_... and, if needed, we can use this info to trace the logging asset.

In our story we will not have time to dive into details like the asset key.

Thus we'd advocate for dropping use_asset_key.


if run_name_prefix is not None:
run_name = f"{run_name_prefix}-{run_name}"
parts.append(run_name_prefix)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should either name this suffix, or actually use it as a prefix

dagster_run_id = get_run_id(context, short=True)

run_name = f"{asset_key}-{dagster_run_id}"
if run_name_prefix is None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just call the param prefix? The method name _get_run_name_from_context makes it clear that the prefix concerns a run name

Co-authored-by: Nicholas Junge <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants