Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV file download API #114

Merged
merged 4 commits into from
Feb 27, 2024
Merged

CSV file download API #114

merged 4 commits into from
Feb 27, 2024

Conversation

dogversioning
Copy link
Contributor

This PR makes the following changes:

  • Some cleanup of module vs function importing
  • Renamed testing utils.py to mock_utils.py
  • Adds endpoints for csv file download
    • listing for aggregates/last valid
    • individual urls for metadata download

dogversioning and others added 2 commits February 23, 2024 13:34
* PR feedback, removed unused pylint excepts
pyproject.toml Outdated Show resolved Hide resolved
src/handlers/shared/functions.py Show resolved Hide resolved
src/handlers/shared/functions.py Outdated Show resolved Hide resolved
"""convenience wrapper for update_metadata"""
if site is None:
if site is None and meta_type != enums.JsonFilename.COLUMN_TYPES.value:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I assume that this logic makes sense - I've not fully grokked it yet. But this is a subtle combo of logic before falling back to self.site -- if site is None: was obvious in a way this isn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add a comment - the idea here is that the column_types.json never has a site value in it, since it's describing a data schema that is tied to the version of the study, rather than a specific site's data.

src/handlers/site_upload/powerset_merge.py Outdated Show resolved Hide resolved
Comment on lines 74 to 75
# TODO: add x-column-descriptions once a source for column descriptions
# has been established
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe add empty descriptions now? Might help avoid some "if not present" logic on the dashboard if the key at least exists already. But only if you feel confident in the key name/format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vlad-ignatov can you confirm that passing an empty header will be ok on your end?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is OK.

The number of columns in the CSV file must be the same as the number of items in those headers (after you split by , ). However, empty or missing headers are ignored. You can omit x-column-names, or pass it as empty string - both should work.

src/handlers/dashboard/get_csv.py Outdated Show resolved Hide resolved
tests/dashboard/test_get_csv.py Show resolved Hide resolved
# Should only be hit if you add a new JSON dict and forget to add it
# to this function
case _:
raise OSError(f"{meta_type} does not have a handler for updates.")
return metadata


def write_metadata(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this one giant metadata file for all sites & studies? We should think about what that means for collisions and such - right now I'm guessing if two sites both uploaded to the aggregator at the same exact second, both would attempt to be written out, but one would win the atomic race - would there be an error about that, or would both callers think it worked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the callers are notified it worked on upload completion - everything inside the aggregator errors to cloudwatch only.

We do have a race condition issue here. it's low probability, but it could occur. this is somewhat related to #67, but the fix here would be the same - if we switch to running powerset merges on an SNS even to running on an SNS event in a queue, the race condition should go away.

We do need to do this eventually. With two or three sites uploading once every two weeks, it's not at the top of my list.

Comment on lines +128 to +132
if target == enums.ColumnTypesKeys.COLUMNS.value:
data_version_metadata[target] = value
else:
dt = dt or datetime.now(timezone.utc)
data_version_metadata[target] = dt.isoformat()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok... but having two different value arguments (value and dt) is feeling like maybe this method is doing too many different things? Or you could just combine them in one? Is it just the type hinting there that's a bother? Could do time | str and even do isinstance(value, ...) checks if that feels right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its definetely on the edge here - before the column_types dict, basically everything in here was a datetime.now() and so it didn't need to get passed explicitly, and now we're saying 'ok, it's a datetime.now() unless you're working on this metadata dictionary specifically'.

I'm inclined to leave this as is for now, but I'll add a comment here saying it should be split apart if it gets any worse.

@dogversioning dogversioning merged commit eb2fff4 into main Feb 27, 2024
2 checks passed
@dogversioning dogversioning deleted the mg/csv_get_api branch February 27, 2024 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants