You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not sure where to post this so please move if appropriate.
I'm interested in this but while the proposed approach will create synthetic data, it will not create synthetic patients. Meaning the associations in the data will not be preserved. So we can expect pregnant males and married six-year-olds and all the other possible data weirdness. Any thoughts on this? It'd be nice to create a real set of synthetic patients (larger and more appropriate for PCORnet CDM than i2b2's 133).
Also it occurs to me that using counts does not tell you value distribution for e.g., lab values. You could do a distribution within the normal range for that I suppose.
Thoughts?
Thanks,
Jeff Klann
The text was updated successfully, but these errors were encountered:
Indeed, "ugly DECOY" may well exhibit pregnant males and such. The hope is that it's still useful as a framework for test-driven development. In fact, it could serve as test data for a tool that would point out pregnant males as an anomaly.
p.s. This is the right place! In fact, you get a bonus point for being the first to raise an issue here. Unfortunately, you conflated two issues into one, so we'll have to take that point back ;-) That is: please raise a separate issue for the "Also..." bit.
On numeric distributions and starting from more than just aggregate counts, we've started some related work, doing basic stats on tumor registry data and synthesizing data based on those stats. (code isn't public yet. IOU.)
Not sure where to post this so please move if appropriate.
I'm interested in this but while the proposed approach will create synthetic data, it will not create synthetic patients. Meaning the associations in the data will not be preserved. So we can expect pregnant males and married six-year-olds and all the other possible data weirdness. Any thoughts on this? It'd be nice to create a real set of synthetic patients (larger and more appropriate for PCORnet CDM than i2b2's 133).
Also it occurs to me that using counts does not tell you value distribution for e.g., lab values. You could do a distribution within the normal range for that I suppose.
Thoughts?
Thanks,
Jeff Klann
The text was updated successfully, but these errors were encountered: