-
Notifications
You must be signed in to change notification settings - Fork 16
batch saving genomes #212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
batch saving genomes #212
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #212 +/- ##
==========================================
- Coverage 80.88% 80.30% -0.59%
==========================================
Files 11 11
Lines 2998 3011 +13
==========================================
- Hits 2425 2418 -7
- Misses 573 593 +20 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't looked at tests yet, but this is already a lot of comments
EDIT: All comments addressed
…me_mass loop; 3. make the note much more explicit
ws_name_to_id_func: Callable[[str], int] | ||
) -> Dict[str, Any]: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" | |
f""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I was wrong about this. From https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals
Formatted string literals cannot be used as docstrings, even if they do not include expressions.
... which is a bummer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed formatted string literals
Returns: | ||
Dict[str, Any]: A dictionary containing the workspace ID and the processed parameters. The dictionary | ||
has keys '_WSID' and '_INPUTS', where '_WSID' is the workspace ID and '_INPUTS' is a list containing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has keys '_WSID' and '_INPUTS', where '_WSID' is the workspace ID and '_INPUTS' is a list containing | |
has keys {_WSID} and {_INPUTS}, where {_WSID} is the workspace ID and {_INPUTS} is a list containing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed formatted string literals.
validate_params_func: Callable[[Dict[str, Any]], None] | ||
) -> None: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" | |
f""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed formatted string literals.
- _INPUTS: A list of parameter dictionaries, each of which must be validated by `validate_params_func`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- _WSID: A workspace ID, which must be present and valid. | |
- _INPUTS: A list of parameter dictionaries, each of which must be validated by `validate_params_func`. | |
- {_WSID}: A workspace ID, which must be present and valid. | |
- {_INPUTS}: A list of parameter dictionaries, each of which must be validated by `validate_params_func`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more of these need to be done below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed formatted string literals.
if contains: | ||
self.assertIn(error, str(context.exception)) | ||
else: | ||
self.assertEqual(error, str(context.exception)) | ||
|
||
def check_save_one_genome_output(self, ret, genome_name): | ||
def check_save_one_genome_output( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This checker barely does checks anything, but I realize it was that way when you got here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could've sworn you spent a bunch of time adding rigorous tests to this module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to the comment below - the tests use the check_save_one_genome_output method as a helper for testing mass saves, so the mass save method isn't actually getting tested if it relies on that function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mass saves use the check_save_one_genome_output method as a helper for testing. Why do you think the mass isn't actually being tested? In my opinion, it conducted some testing, but it wasn't tested thoroughly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the question - check_save_one_genome_output is a test method. You wouldn't add tests for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really confused at this point, because we definitely created a new issue about the checker function and also added it to our TODO list in the GFU parallelization tracking ticket. Basically, I'm asking: how is the crux of the issue mentioned in this comment different from that one - or are they essentially the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is that while we do need to fix that function eventually, it was a poor test helper when we got here and fixing every problem this repo has is out of scope for the mass import project. That being said, any code changes we make or code we add needs to be tested rigorously, and any tests we add need to be rigorous. If we're relying on check_save_one_genome_output for any of those scenarios we're not testing rigorously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, what needs to be done at this point? If you think we need to write more tests, what exactly do you want to check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue I'm seeing here is that the new test_genomes and test_genomes_hidden tests use it, which means they're hardly testing anything regarding what actually happened during the upload.
If you already have rigorous tests elsewhere (which I think you do) that check the object contents, any associated files in the blobstore, provenance, and everything else that changes in the KBase data systems when genomes are uploaded, you should do the same thing here (and maybe make a general helper function based on those rigorous tests you can reuse here if you don't have one already). If you don't already have rigorous tests LMK and we can go from there
def test_genomes_with_hidden(self): | ||
self.start_test() | ||
genome_name = 'test_genome_hidden' | ||
inputs = [ | ||
{ | ||
'name': genome_name, | ||
'data': self.test_genome_data, | ||
'hidden': 1, | ||
} | ||
] | ||
params = {'workspace_id': self.wsID, 'inputs': inputs} | ||
ret = self.genome_interface.save_genome_mass(params)[0] | ||
self.check_save_one_genome_output(ret, genome_name, warnings=[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't actually test that the genome is hidden
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because there's nothing tin check_save_one_genome_output that checks the genome is hidden
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how that comment applies here. The function under test is save_genome_mass, the test just uses check_save_one_genome as a helper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I know where I can find information about whether it is hidden? The hidden information is not in the return value from save_one_genome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little annoying; the easiest way, it seems, is to list that one object and see if it shows up. If not, and it does show up with showHidden
enabled, it's hidden:
https://ci.kbase.us/services/ws/docs/Workspace.html#funcdefWorkspace.list_objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_hidden function added: bda89a0
# check features | ||
self.gi.check_dna_sequence_in_features(genome_obj.genome_data) | ||
|
||
# validate genome | ||
genome_obj.genome_data['warnings'] = self.gi.validate_genome(genome_obj.genome_data) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there tests for G2G that exercise these code paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we have genbank_upload_full_test.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And there are tests that cause errors to be thown from the check / validate methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check_dna_sequence_in_features
function does not raise any errors.
validate_genome
will raise an error if the genome size is too large, but I assume this is already covered in genome_size_test.py by other devs.
https://github.com/kbaseapps/GenomeFileUtil/blob/master/test/utility/genome_size_tests.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold on, I'm confused - these two checks are being added to the import mass function, but genbank_upload_full_test isn't changing. How can it test that these functions are called correctly from the mass function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the confusion is probably because it's been 6 months since I looked at this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you remember why we added these two checks here? They were originally from the GenomeInterface.py file. It's been eight months, and I don't even know why they ended up in GenbankToGenome.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vaguely remember this discussion, but I think they're required checks for genomes and either they were missing from the mass import function or they were moved from somewhere else
No description provided.