-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve move_to_x #661
improve move_to_x #661
Conversation
for more information, see https://pre-commit.ci
A few points on this:
import ehrapy as ep
adata = ep.dt.diabetes_130(columns_obs_only=["race", "gender", "age"])
adata_prep = adata[
:,
(adata.var.index == "time_in_hospital_days")
| (adata.var.index == "num_lab_procedures")
| (adata.var.index == "num_procedures")
| (adata.var.index == "num_medications")
| (adata.var.index == "number_diagnoses"),
]
ep.pp.pca(adata_prep, n_comps=2)
adata_prep.varm["PCs"]
adata_prep_2 = ep.ad.move_to_x(adata_prep, "gender", copy_varm=True)
Considering that |
|
||
|
||
def test_move_to_x_copy_varm(adata_move_obs_mix): | ||
move_to_obs(adata_move_obs_mix, ["name"], copy_obs=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing copy_obs=True
here means that "name" is kept as variable; so the move_to_x
doesn't actually move it back; hence the failing example in my comment was a bit hidden :)
@xinyuejohn could you also outline for us again for what exactly you need this feature, please? I do totally see @eroell points |
Thanks for your reply and I totally got your points! Here's my use case:
In .obsm, there are some awkward arrays which still valid as long as the length of observations keep same. |
Thanks for the example - so to my understanding you add some (maybe externally computed) statistics to the Quick clarification, what is in the awkward arrays (I guess not a PCA embedding as in my comment above), which would be in the |
Thanks for the diagram, I think I get the point here - it is data in the .obsm in your case, not a computed result. It probably makes sense to not mix things computed from .X with raw data together in .obsm? We have yet to include time series support in a meaningful way - a data field alike your use of .obsm is one of the candidates for sure. Whatever this might look like, copying all the -data- as you suggest here should happen indeed. I think I would refrain from copying .obsm when doing What do you think @xinyuejohn ? |
I also think that it's weird if it happens automatically, but maybe we can just parameterize this and well document the use-case? |
That would be |
Think so ya? Or just |
So what do we do with this PR? |
I'd be in favor of not having these values copied now, but storing this data later more consistently. But I'm also fine having options for copying .obsm, .uns, .obsp (while setting the default to False) |
Okay, then let's make these options if it makes it easier for ehrdata. That can also depend on an evaluation of @eroell |
I'd close this PR for now, as I think the underlying outline of data handling patterns will not be the way to go
|
PR Checklist
docs
is updatedDescription of changes
Add three arguments to move_to_x:
copy_uns
,copy_obsm
, andcopy_varm
and they are all set to be True and optional.Technical details
Tests are already been added.
Additional context