-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV file download API #114
Conversation
* PR feedback, removed unused pylint excepts
"""convenience wrapper for update_metadata""" | ||
if site is None: | ||
if site is None and meta_type != enums.JsonFilename.COLUMN_TYPES.value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I assume that this logic makes sense - I've not fully grokked it yet. But this is a subtle combo of logic before falling back to self.site
-- if site is None:
was obvious in a way this isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me add a comment - the idea here is that the column_types.json never has a site
value in it, since it's describing a data schema that is tied to the version of the study, rather than a specific site's data.
src/handlers/dashboard/get_csv.py
Outdated
# TODO: add x-column-descriptions once a source for column descriptions | ||
# has been established |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe add empty descriptions now? Might help avoid some "if not present" logic on the dashboard if the key at least exists already. But only if you feel confident in the key name/format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vlad-ignatov can you confirm that passing an empty header will be ok on your end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is OK.
The number of columns in the CSV file must be the same as the number of items in those headers (after you split by ,
). However, empty or missing headers are ignored. You can omit x-column-names
, or pass it as empty string - both should work.
# Should only be hit if you add a new JSON dict and forget to add it | ||
# to this function | ||
case _: | ||
raise OSError(f"{meta_type} does not have a handler for updates.") | ||
return metadata | ||
|
||
|
||
def write_metadata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is this one giant metadata file for all sites & studies? We should think about what that means for collisions and such - right now I'm guessing if two sites both uploaded to the aggregator at the same exact second, both would attempt to be written out, but one would win the atomic race - would there be an error about that, or would both callers think it worked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the callers are notified it worked on upload completion - everything inside the aggregator errors to cloudwatch only.
We do have a race condition issue here. it's low probability, but it could occur. this is somewhat related to #67, but the fix here would be the same - if we switch to running powerset merges on an SNS even to running on an SNS event in a queue, the race condition should go away.
We do need to do this eventually. With two or three sites uploading once every two weeks, it's not at the top of my list.
if target == enums.ColumnTypesKeys.COLUMNS.value: | ||
data_version_metadata[target] = value | ||
else: | ||
dt = dt or datetime.now(timezone.utc) | ||
data_version_metadata[target] = dt.isoformat() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ok... but having two different value arguments (value
and dt
) is feeling like maybe this method is doing too many different things? Or you could just combine them in one? Is it just the type hinting there that's a bother? Could do time | str
and even do isinstance(value, ...)
checks if that feels right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its definetely on the edge here - before the column_types dict, basically everything in here was a datetime.now() and so it didn't need to get passed explicitly, and now we're saying 'ok, it's a datetime.now() unless you're working on this metadata dictionary specifically'.
I'm inclined to leave this as is for now, but I'll add a comment here saying it should be split apart if it gets any worse.
This PR makes the following changes: