Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DAR-2707][External] Allow repeated polling of pending export releases #876

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

JBWilkie
Copy link
Contributor

Problem

Before a dataset release can be pulled, it needs to finish generating. The time taken for this can vary based on export size and current load on the export pipeline. If a release isn't ready for pulling, then darwin-py will throw an error

Solution

Introduce the optional retry parameter (SDK & CLI) that allows polling of pending dataset releases. If the pending release becomes available within the allotted time, it will be automatically downloaded

Changelog

Allow optional polling of pending dataset releases in case the release is not yet ready for download

Copy link

linear bot commented Jun 28, 2024

Copy link
Contributor

@shernshiou shernshiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not detect any problem codewise. Haven't tested it tho. 👍🏼

"""
Get a specific ``Release`` for this ``RemoteDataset``.

Parameters
----------
name : str, default: "latest"
Name of the export.
retry : bool, default: True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name of the argument retry doesn't match what it's doing "return all releases, if true". It should be something like incude_pending or similar that is a bit more self explanataory

)
else:
return sorted(
filter(lambda x: x.available, releases),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure you can have this nicer where only the first argument releases or filter(lambda x: x.available, releases) is chosen with an if and have a single return line:

return sorted(
                releases_fn,
                key=lambda x: x.version,
                reverse=True,
            )

"""
version: str = DatasetIdentifier.parse(dataset_slug).version or "latest"
if version == "latest" and retry:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions here:

  • I don't recall the details but is the name latest hardcoded by us?
  • What happens if a client deliberately passes the name latest with retry=True?
  • I don't think this restriction is necessary, can't we pick the name of the latest release ourselves before performing the download and then do the retry logic using it instead of latest. This would ensure we refer to the same export even if a new export would be created in the meantime

Copy link
Contributor Author

@JBWilkie JBWilkie Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall the details but is the name latest hardcoded by us?

Yes, latest is a reserved release name. If you try to create an export named latest, the api responds with {"errors":{"name":["is reserved"]}}

What happens if a client deliberately passes the name latest with retry=True?

We will return the latest available release

I don't think this restriction is necessary, can't we pick the name of the latest release ourselves before performing the download and then do the retry logic using it instead of latest. This would ensure we refer to the same export even if a new export would be created in the meantime

Actually yes, I think we can. This is because each release has an export_date of type datetime.datetime. This allows us to select the most recent release incase retry is passed as True. I'll make this change now, thank you for flagging

@@ -22,6 +23,8 @@ class Release:
The version of the ``Release``.
name : str
The name of the ``Release``.
status : str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure if it's common in darwin-py but this would be better as an enum as it only has a select few values.

@@ -16,6 +16,7 @@ def release(dataset_slug: str, team_slug_darwin_json_v2: str) -> Release:
team_slug=team_slug_darwin_json_v2,
version="latest",
name="test",
status="test_status",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For documentation purposes, it'd be best to use actual values of export statuses here instead of stubs as they are enums and not arbitrary strings.

if release.status == "pending":
if retry:
retry_duration = 300
retry_interval = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more conventional to have these configurable via CLI or some SDK settings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@balysv This makes sense. I can see 2 options, both of which involve building in some validation:

  • 1: Make these values configurable in the ~/.config.yaml file, or:
  • 2: Add two additional arguments: retry_duration and retry_interval with default values of ~10 minutes & ~10 seconds. These can be configured, but if they're passed without retry=True then we will throw an error

I'm leaning toward the additional arguments

@JBWilkie JBWilkie requested a review from balysv July 4, 2024 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants