Improving megastudy zero imputation #32

d-callan · 2023-12-19T18:18:41Z

Some things I tried to do, either intentionally or bc they came up along the way of doing other things:

improve test data so it better reflects real use cases. this meant only passing variables that we expect to see in real life, and passing all study vocabs all the time
improve our handling of variable collections some, mostly bc it was doing something slightly unexpected which kept getting me confused during testing and qc
fix a bug where, when multiple study-specific-vocab variables are present, some combinations between those different vocabs are overlooked
make sure this fxn that does the imputation doesnt remove columns from the data table it was passed
make things a bit more readable hopefully

a thing that happened along the way... i am no longer testing variable collections. the test in hindsight wasnt very good anyway. so im not sure were handling variable collections well here, but i figure we can add a better test and fix things as necessary if/ when we get a real use case.

bobular

Tests look comprehensive! I left a few comments where I wasn't 100% sure. Thanks!

bobular · 2023-12-20T20:05:19Z

tests/testthat/test-class-Megastudy.R

@@ -293,9 +334,9 @@ test_that("imputeZeroes method is sane", {
  expect_equal(nrow(imputedDT[imputedDT$sample.specimen_count == 0]), 0)


Does imputedDT have a column sample.specimen_count if we didn't ask for it?
If the column doesn't exist, does line 334 return zero and pass the test anyway?

Oh, I see that we kind of half-asked for it! (empty plotReference)

if the column doesnt exist, the test wont pass. in this case i know the column was there bc its in mComplete@data, and we are no longer removing columns based on their presence in the VariableMetadataList.

bobular · 2023-12-20T20:10:43Z

tests/testthat/test-class-Megastudy.R

+      dataType = new("DataType", value = 'STRING'),
+      dataShape = new("DataShape", value = 'CATEGORICAL'),
+      weightingVariableSpec = VariableSpec(variableId='specimen_count',entityId='sample'),
+      hasStudyDependentVocabulary = TRUE)


What's the test starting at line 383 doing? There's no comment and my spot-the-difference powers are weak!

the one at 341 tests the case where we have a special vocab, a downstream entity and NO specimen_count in the plot. the one at 383 tests the case where we have a special vocab, a downstream entity and also specimen_count in the plot. we said in these cases we would NOT impute zeroes. in the first case (341) bc the variable to impute zeroes on isnt present and in the second case (383) bc the values would be removed later by our requirement for complete cases in the map. if/when the map supports representations of missingness properly, well have to change this test and add support.

bobular · 2023-12-20T20:18:38Z

tests/testthat/test-class-Megastudy.R

@@ -368,7 +423,7 @@ test_that("imputeZeroes method is sane", {
  expect_equal(nrow(imputedDT[imputedDT$sample.specimen_count == 0]), 0)

  # multiple special vocabs in same plot, w one shared weighting var


Isn't this the same as the test starting at line 155?
Both have species, count, sex and attractant.
Ah, not quite. In the earlier one, sex was "not plotted".

bobular · 2023-12-20T20:29:26Z

tests/testthat/test-class-Megastudy.R

+      dataType = new("DataType", value = 'STRING'),
+      dataShape = new("DataShape", value = 'CATEGORICAL'),
+      weightingVariableSpec = VariableSpec(variableId='specimen_count',entityId='sample'),
+      hasStudyDependentVocabulary = TRUE)


Is the error expected because sex is repeated or because we asked for a mix of sample variables with and without hasStudyDependentVocabulary?

Presumably would we have used sample.eyecolor here if we had it.

not sure tbh. i think maybe i had a stroke while writing this one lol

ok. i removed the duplicate sample.sex reference in the VariableMetadataList. I now expect this to fail bc smaple.sex is present, and claims to have a study specific vocab, but no vocab was provided for it in the megastudy obj

d-callan added 7 commits December 18, 2023 09:21

allow impute zeroes to work with columns that arent in the final plot

c3fa47f

proper null handling for variable collections when imputing zeroes

373429d

keep improving getCollectionMemberVariableSpecs arg

4bf5936

update tests to better reflect real world data

164e28f

make sure we dont lose any combinations when merging imputed zeroes

d21bb0d

update tests

07855d9

clean up code, fix a bug, update some tests, etc

1e7f4a9

d-callan requested a review from bobular December 19, 2023 18:18

d-callan mentioned this pull request Dec 20, 2023

Always request all study vocabs VEuPathDB/EdaDataService#342

Merged

bobular approved these changes Dec 20, 2023

View reviewed changes

remove duplicate entry from variables in the megastudy tests

c0c47c0

d-callan merged commit 0743223 into main Dec 21, 2023
5 checks passed

d-callan deleted the all-study-vocabs branch December 21, 2023 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving megastudy zero imputation #32

Improving megastudy zero imputation #32

d-callan commented Dec 19, 2023

bobular left a comment •

edited

Loading

bobular Dec 20, 2023

d-callan Dec 21, 2023

bobular Dec 20, 2023

d-callan Dec 21, 2023

bobular Dec 20, 2023

bobular Dec 20, 2023

d-callan Dec 21, 2023

d-callan Dec 21, 2023

		@@ -293,9 +334,9 @@ test_that("imputeZeroes method is sane", {
		expect_equal(nrow(imputedDT[imputedDT$sample.specimen_count == 0]), 0)

		@@ -368,7 +423,7 @@ test_that("imputeZeroes method is sane", {
		expect_equal(nrow(imputedDT[imputedDT$sample.specimen_count == 0]), 0)

		# multiple special vocabs in same plot, w one shared weighting var

Improving megastudy zero imputation #32

Improving megastudy zero imputation #32

Conversation

d-callan commented Dec 19, 2023

bobular left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bobular left a comment •

edited

Loading