batch saving genomes #212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

Xiangs18 wants to merge 30 commits into master from dev-batch_genome

Contributor

Xiangs18 commented Aug 15, 2024

No description provided.


          add save_genomes function

6208a9a

Xiangs18 requested review from jsfillman, jkbaumohl and Tianhao-Gu as code owners

August 15, 2024 00:15


          fix positional arg #1 is the wrong type bug

a2fabf4

codecov bot commented Aug 16, 2024 •

edited

Loading

Codecov Report

Attention: Patch coverage is 97.82609% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.30%. Comparing base (4819598) to head (a8bced6).

Files with missing lines	Patch %	Lines
lib/GenomeFileUtil/core/GenomeInterface.py	95.91%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #212      +/-   ##
==========================================
- Coverage   80.88%   80.30%   -0.59%     
==========================================
  Files          11       11              
  Lines        2998     3011      +13     
==========================================
- Hits         2425     2418       -7     
- Misses        573      593      +20

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Contributor Author

Xiangs18 commented Aug 16, 2024 •

edited

Loading

self note:

a2fabf4 verifies that the refactored code works with the save_one_genome function.
330d6c2 verifies that the refactored code works with the save_genome_mass function.

Xiangs18 added 3 commits

August 15, 2024 19:58


          use batch genome save in GenbankToGenome.py

330d6c2


          make save_genome_mass function internal

27158ec


          add tests for save_genome_mass function

04aa203

Xiangs18 changed the title ~~[WIP] add save_genomes~~ add save_genomes

Xiangs18 changed the title ~~add save_genomes~~ batch saving genomes


          fix bug

8d672d1

Xiangs18 requested a review from MrCreosote

August 17, 2024 06:22

MrCreosote reviewed

View reviewed changes

Member

MrCreosote left a comment •

edited

Loading

Haven't looked at tests yet, but this is already a lot of comments

EDIT: All comments addressed

RELEASE_NOTES.md Outdated Show resolved Hide resolved

lib/GenomeFileUtil/core/GenbankToGenome.py Show resolved Hide resolved

lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved

lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved

lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved

lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved

lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved

lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved

lib/GenomeFileUtil/core/GenomeInterface.py Show resolved Hide resolved

lib/GenomeFileUtil/core/GenomeInterface.py Outdated Show resolved Hide resolved

Xiangs18 added 10 commits

August 26, 2024 14:42


          update release notes && make the dicts in the loop

d1dd768


          remove logging && add NOTE for workspace_datatype

e79c297


          move set_up_single_params && validate_mass_params into GenomeUtils

35030a0


          remove redundant tests

4b1b88b


          add test to cover the missing line

b56d77f


          fix params name typo

788c7ee


          add metagenome json file && cover the missing line

65467ea


          rm gff_handle_ref

66af0b7


          add features_handle_ref && protein_handle_ref before upload

fcbb510


          add boolean flag for validate_genome

32f96a0

Xiangs18 requested a review from MrCreosote

September 10, 2024 23:50

Xiangs18 added 4 commits

September 10, 2024 17:10


          test validate_genome boolean flag

41eed0b


          1. add pydoc for save_genme_mass; 2. make the dicts in the _save_geno…

71670bc

…me_mass loop; 3. make the note much more explicit


          update release notes && remove tiny files

0cbe155


          add more info and warnings checks in test

52e280d

Xiangs18 added 2 commits

September 18, 2024 21:40


          remove metagenome from test

f018f0e


          remove unused lib

54fedf0

MrCreosote reviewed

View reviewed changes

lib/GenomeFileUtil/core/GenomeUtils.py

+                  ws_name_to_id_func: Callable[[str], int]
+              ) -> Dict[str, Any]:
+                  """

Member

MrCreosote Sep 19, 2024

Suggested change

Contributor Author

Xiangs18 Oct 30, 2024

👍

Member

MrCreosote Nov 4, 2024

Unfortunately I was wrong about this. From https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals

Formatted string literals cannot be used as docstrings, even if they do not include expressions.

... which is a bummer.

Contributor Author

Xiangs18 May 19, 2025

Removed formatted string literals

lib/GenomeFileUtil/core/GenomeUtils.py Outdated

+                  Returns:
+                      Dict[str, Any]: A dictionary containing the workspace ID and the processed parameters. The dictionary
+                          has keys '_WSID' and '_INPUTS', where '_WSID' is the workspace ID and '_INPUTS' is a list containing

Member

MrCreosote Sep 19, 2024

Suggested change

      
                        has keys '_WSID' and '_INPUTS', where '_WSID' is the workspace ID and '_INPUTS' is a list containing
          
                        has keys {_WSID} and {_INPUTS}, where {_WSID} is the workspace ID and {_INPUTS} is a list containing

Contributor Author

Xiangs18 Oct 30, 2024

👍

Contributor Author

Xiangs18 May 19, 2025

Removed formatted string literals.

lib/GenomeFileUtil/core/GenomeUtils.py

+                  validate_params_func: Callable[[Dict[str, Any]], None]
+              ) -> None:
+                  """

Member

MrCreosote Sep 19, 2024

Suggested change

Contributor Author

Xiangs18 Oct 30, 2024

👍

Contributor Author

Xiangs18 May 19, 2025

Removed formatted string literals.

lib/GenomeFileUtil/core/GenomeUtils.py Outdated

Comment on lines 548 to 549

                          - _INPUTS: A list of parameter dictionaries, each of which must be validated by `validate_params_func`.

            

Member

MrCreosote Sep 19, 2024

Suggested change

      
                        - _WSID: A workspace ID, which must be present and valid.
          
                        - _INPUTS: A list of parameter dictionaries, each of which must be validated by `validate_params_func`.
          
                        - {_WSID}: A workspace ID, which must be present and valid.
          
                        - {_INPUTS}: A list of parameter dictionaries, each of which must be validated by `validate_params_func`.

Member

MrCreosote Sep 19, 2024

Some more of these need to be done below

Contributor Author

Xiangs18 Oct 30, 2024

👍

Contributor Author

Xiangs18 May 19, 2025

Removed formatted string literals.

lib/GenomeFileUtil/core/GenomeUtils.py Outdated Show resolved Hide resolved

lib/GenomeFileUtil/core/GenbankToGenome.py Show resolved Hide resolved

test/problematic_tests/save_genome_test.py Show resolved Hide resolved

test/problematic_tests/save_genome_test.py Outdated

                       if contains:
                           self.assertIn(error, str(context.exception))
                       else:
                           self.assertEqual(error, str(context.exception))
-                  def check_save_one_genome_output(self, ret, genome_name):
+                  def check_save_one_genome_output(

Member

MrCreosote Sep 19, 2024

This checker barely does checks anything, but I realize it was that way when you got here

Member

MrCreosote Sep 19, 2024 •

edited

Loading

I could've sworn you spent a bunch of time adding rigorous tests to this module

Contributor Author

Xiangs18 Nov 1, 2024

Member

MrCreosote Nov 4, 2024 •

edited

Loading

This is similar to the comment below - the tests use the check_save_one_genome_output method as a helper for testing mass saves, so the mass save method isn't actually getting tested if it relies on that function

Contributor Author

Xiangs18 Nov 4, 2024 •

edited

Loading

Mass saves use the check_save_one_genome_output method as a helper for testing. Why do you think the mass isn't actually being tested? In my opinion, it conducted some testing, but it wasn't tested thoroughly.

Member

MrCreosote May 21, 2025

I don't understand the question - check_save_one_genome_output is a test method. You wouldn't add tests for that

Contributor Author

Xiangs18 May 21, 2025

I'm really confused at this point, because we definitely created a new issue about the checker function and also added it to our TODO list in the GFU parallelization tracking ticket. Basically, I'm asking: how is the crux of the issue mentioned in this comment different from that one - or are they essentially the same?

Member

MrCreosote May 21, 2025

The difference is that while we do need to fix that function eventually, it was a poor test helper when we got here and fixing every problem this repo has is out of scope for the mass import project. That being said, any code changes we make or code we add needs to be tested rigorously, and any tests we add need to be rigorous. If we're relying on check_save_one_genome_output for any of those scenarios we're not testing rigorously.

Contributor Author

Xiangs18 May 21, 2025

Okay, what needs to be done at this point? If you think we need to write more tests, what exactly do you want to check?

Member

MrCreosote May 21, 2025

The main issue I'm seeing here is that the new test_genomes and test_genomes_hidden tests use it, which means they're hardly testing anything regarding what actually happened during the upload.

If you already have rigorous tests elsewhere (which I think you do) that check the object contents, any associated files in the blobstore, provenance, and everything else that changes in the KBase data systems when genomes are uploaded, you should do the same thing here (and maybe make a general helper function based on those rigorous tests you can reuse here if you don't have one already). If you don't already have rigorous tests LMK and we can go from there

test/problematic_tests/save_genome_test.py Outdated

Comment on lines 233 to 245

+                  def test_genomes_with_hidden(self):
+                      self.start_test()
+                      genome_name = 'test_genome_hidden'
+                      inputs = [
+                          {
+                              'name': genome_name,
+                              'data': self.test_genome_data,
+                              'hidden': 1,
+                          }
+                      ]
+                      params = {'workspace_id': self.wsID, 'inputs': inputs}
+                      ret = self.genome_interface.save_genome_mass(params)[0]
+                      self.check_save_one_genome_output(ret, genome_name, warnings=[])

Member

MrCreosote Sep 19, 2024

This doesn't actually test that the genome is hidden

Contributor Author

Xiangs18 Sep 19, 2024

why?

Member

MrCreosote Sep 22, 2024

Because there's nothing tin check_save_one_genome_output that checks the genome is hidden

Contributor Author

Xiangs18 Nov 1, 2024

Member

MrCreosote Nov 4, 2024 •

edited

Loading

I don't understand how that comment applies here. The function under test is save_genome_mass, the test just uses check_save_one_genome as a helper

Contributor Author

Xiangs18 May 20, 2025

May I know where I can find information about whether it is hidden? The hidden information is not in the return value from save_one_genome.

Member

MrCreosote May 20, 2025

It's a little annoying; the easiest way, it seems, is to list that one object and see if it shows up. If not, and it does show up with showHidden enabled, it's hidden:

https://ci.kbase.us/services/ws/docs/Workspace.html#funcdefWorkspace.list_objects

Contributor Author

Xiangs18 May 20, 2025

check_hidden function added: bda89a0

lib/GenomeFileUtil/core/GenbankToGenome.py

Comment on lines +165 to +170

+                          # check features
+                          self.gi.check_dna_sequence_in_features(genome_obj.genome_data)
+                          # validate genome
+                          genome_obj.genome_data['warnings'] = self.gi.validate_genome(genome_obj.genome_data)

Member

MrCreosote Sep 19, 2024

Are there tests for G2G that exercise these code paths?

Contributor Author

Xiangs18 Oct 31, 2024

Yeah, we have genbank_upload_full_test.py.

Member

MrCreosote Nov 4, 2024

And there are tests that cause errors to be thown from the check / validate methods?

Contributor Author

Xiangs18 May 20, 2025

The check_dna_sequence_in_features function does not raise any errors.

validate_genome will raise an error if the genome size is too large, but I assume this is already covered in genome_size_test.py by other devs.

https://github.com/kbaseapps/GenomeFileUtil/blob/master/test/utility/genome_size_tests.py

Member

MrCreosote May 20, 2025

Hold on, I'm confused - these two checks are being added to the import mass function, but genbank_upload_full_test isn't changing. How can it test that these functions are called correctly from the mass function?

Member

MrCreosote May 20, 2025

Part of the confusion is probably because it's been 6 months since I looked at this...

Contributor Author

Xiangs18 May 21, 2025

Do you remember why we added these two checks here? They were originally from the GenomeInterface.py file. It's been eight months, and I don't even know why they ended up in GenbankToGenome.py.

Member

MrCreosote May 21, 2025

I vaguely remember this discussion, but I think they're required checks for genomes and either they were missing from the mass import function or they were moved from somewhere else

Xiangs18 added 2 commits

October 30, 2024 16:35


          fix documentation

861a50b


          add workspace_id in GenomeFileUtil.spec

01a1dc8

Xiangs18 requested a review from MrCreosote

November 1, 2024 18:16

Xiangs18 added 6 commits

May 19, 2025 15:33


          replace formatted string literals by docstring

654a5e9


          add check_hidden func and tests

bda89a0


          update check info, display prov and data

6bb1009


          move info, prov, and data checks to test_utils.py

2f546ac


          fix the failed test

0d192f7


          clean up genbank_upload_full_test.py file

a8bced6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

jsfillman Awaiting requested review from jsfillman

jkbaumohl Awaiting requested review from jkbaumohl jkbaumohl is a code owner

Tianhao-Gu Awaiting requested review from Tianhao-Gu Tianhao-Gu is a code owner

MrCreosote Awaiting requested review from MrCreosote

At least 1 approving review is required to merge this pull request.

Labels

None yet