-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving megastudy zero imputation #32
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests look comprehensive! I left a few comments where I wasn't 100% sure. Thanks!
@@ -293,9 +334,9 @@ test_that("imputeZeroes method is sane", { | |||
expect_equal(nrow(imputedDT[imputedDT$sample.specimen_count == 0]), 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does imputedDT
have a column sample.specimen_count
if we didn't ask for it?
If the column doesn't exist, does line 334 return zero and pass the test anyway?
Oh, I see that we kind of half-asked for it! (empty plotReference)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the column doesnt exist, the test wont pass. in this case i know the column was there bc its in mComplete@data, and we are no longer removing columns based on their presence in the VariableMetadataList.
dataType = new("DataType", value = 'STRING'), | ||
dataShape = new("DataShape", value = 'CATEGORICAL'), | ||
weightingVariableSpec = VariableSpec(variableId='specimen_count',entityId='sample'), | ||
hasStudyDependentVocabulary = TRUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the test starting at line 383 doing? There's no comment and my spot-the-difference powers are weak!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the one at 341 tests the case where we have a special vocab, a downstream entity and NO specimen_count in the plot. the one at 383 tests the case where we have a special vocab, a downstream entity and also specimen_count in the plot. we said in these cases we would NOT impute zeroes. in the first case (341) bc the variable to impute zeroes on isnt present and in the second case (383) bc the values would be removed later by our requirement for complete cases in the map. if/when the map supports representations of missingness properly, well have to change this test and add support.
@@ -368,7 +423,7 @@ test_that("imputeZeroes method is sane", { | |||
expect_equal(nrow(imputedDT[imputedDT$sample.specimen_count == 0]), 0) | |||
|
|||
# multiple special vocabs in same plot, w one shared weighting var |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the same as the test starting at line 155?
Both have species, count, sex and attractant.
Ah, not quite. In the earlier one, sex was "not plotted".
dataType = new("DataType", value = 'STRING'), | ||
dataShape = new("DataShape", value = 'CATEGORICAL'), | ||
weightingVariableSpec = VariableSpec(variableId='specimen_count',entityId='sample'), | ||
hasStudyDependentVocabulary = TRUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the error expected because sex
is repeated or because we asked for a mix of sample variables with and without hasStudyDependentVocabulary
?
Presumably would we have used sample.eyecolor
here if we had it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure tbh. i think maybe i had a stroke while writing this one lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. i removed the duplicate sample.sex reference in the VariableMetadataList. I now expect this to fail bc smaple.sex is present, and claims to have a study specific vocab, but no vocab was provided for it in the megastudy obj
Some things I tried to do, either intentionally or bc they came up along the way of doing other things:
a thing that happened along the way... i am no longer testing variable collections. the test in hindsight wasnt very good anyway. so im not sure were handling variable collections well here, but i figure we can add a better test and fix things as necessary if/ when we get a real use case.