Skip to content

Conversation

@JoshuaFortriede
Copy link

This PR completes two tasks.

Lookup of DRS_URI in file_descriptors:
According to v2.1.0 of the HCA schema file_descriptors, a DRS URI can be provided for a file. This means that the actual data file will not be present in the payload, but will be linked externally. As such, we should not flag these data files as being missing in the submission.

Note, for future, it would be good to check if the there is BOTH a DRS URI and the data_file. If so, you might throw an error.

Completion of TODO Item:
There was a note to remove an unused variable. This has been accomplished.

Check if the drs_uri key is in the file_descriptor of the sequence file. If yes, it is assumed that the FASTQ file will be located externally and will get (eventually) a DRS URI. As such, do not mark the data file as missing.
Changed to just get last split.
@github-actions github-actions bot added the orange [process] Done by the Azul team label May 9, 2025
# Sequence file data_files might not be present if they are managed access.
# File Descriptor v2.1.0 allows for the drs_uri to be a string or null.
# In both of these cases, we set found_data_file to True
if metadata_file["entity_type"] == "sequence_file":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessarily limited to sequencing files in the spec, though that is the use case for LungMap.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. This could be change to
if metadata_file["entity_type"].endswith("_file"):

@bahill
Copy link
Contributor

bahill commented May 20, 2025

@JoshuaFortriede this looks great - thank you!
I'll give this a quick test and then approve when that's all set.

@bahill bahill changed the title Lungmap modifications FE-400 - Lungmap modifications May 20, 2025
@JoshuaFortriede
Copy link
Author

No, not really. The bucket that we have is a "sharing" bucket and not a "staging" bucket, so it doesn't have the staging_area.json files... @ncalvanese1 might be able to share with you an appropriate staging bucket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

orange [process] Done by the Azul team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants