Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation of survcheck() #277

Open
ThomasSoeiro opened this issue Aug 27, 2024 · 4 comments
Open

documentation of survcheck() #277

ThomasSoeiro opened this issue Aug 27, 2024 · 4 comments

Comments

@ThomasSoeiro
Copy link

I have trouble understanding how to use survcheck(). I do not think there is any issue in the code but maybe the documentation could be improved (in particular regarding the formula).

I have a survival data set that contains data for 3 cohorts of patients. A patient can be included in several cohorts. In the end in build a survival model for each cohort. First, I start the analysis by a crude comparison of the cohorts:

survdiff(Surv(time, status) ~ cohort, df)
km <- survfit(Surv(time, status) ~ cohort, df)
plot(km)

I wanted to check the data. My first try was to reuse the same formula as above, but the RHS of the formula seems to be ignored (see the "Overlap check"):

survcheck(Surv(time, status) ~ cohort, df, id = id)
# Call:
# survcheck(formula = Surv(time, status) ~ cohort, data = df, id = id)
# 
# Unique identifiers       Observations        Transitions 
#                107                150                140 
# 
# Transitions table:
#       to
# from     1 (censored)
#   (s0) 140          8
#   1      0          0
# 
# Number of subjects with 0, 1, ... transitions to each state:
#        count
# state   0  1  2 3
#   1     8 62 33 4
#   (any) 8 62 33 4
# 
# Overlap check: 39 ids (43 rows)

Finally I found that the following calls returned identical outputs (beside the call component):

survcheck(Surv(time, status) ~ cohort, df, id = id)
survcheck(Surv(time, status) ~ 1, df, id = id)
survcheck(Surv(time, status) ~ strata(cohort), df, id = id)

It seems that I need to split the data before runing survcheck():

by(
  df,
  ~ cohort,
  \(x) survcheck(Surv(time, status) ~ 1, x, id = id)
)
# cohort: 1
# Call:
# survcheck(formula = Surv(time, status) ~ 1, data = x, id = id)
# 
# Unique identifiers       Observations        Transitions 
#                 50                 50                 48 
# 
# Transitions table:
#       to
# from    1 (censored)
#   (s0) 48          2
#   1     0          0
# 
# Number of subjects with 0, 1, ... transitions to each state:
#        count
# state   0  1
#   1     2 48
#   (any) 2 48
# 
# ------------------------------------------------------------------------------------------------------- 
# cohort: 2
# Call:
# survcheck(formula = Surv(time, status) ~ 1, data = x, id = id)
# 
# Unique identifiers       Observations        Transitions 
#                 50                 50                 46 
# 
# Transitions table:
#       to
# from    1 (censored)
#   (s0) 46          4
#   1     0          0
# 
# Number of subjects with 0, 1, ... transitions to each state:
#        count
# state   0  1
#   1     4 46
#   (any) 4 46
# 
# ------------------------------------------------------------------------------------------------------- 
# cohort: 3
# Call:
# survcheck(formula = Surv(time, status) ~ 1, data = x, id = id)
# 
# Unique identifiers       Observations        Transitions 
#                 50                 50                 46 
# 
# Transitions table:
#       to
# from    1 (censored)
#   (s0) 46          4
#   1     0          0
# 
# Number of subjects with 0, 1, ... transitions to each state:
#        count
# state   0  1
#   1     4 46
#   (any) 4 46
# 

Some data to reproduce examples:

df <- veteran
df$id <- seq_len(nrow(df))
df <- replicate(3, df[sample(nrow(df), 50), ], simplify = FALSE)
df <- Map(transform, df, cohort = 1:3)
df <- do.call(rbind, df)
@therneau
Copy link
Owner

survcheck is intended for mulit-state survival.

@ThomasSoeiro
Copy link
Author

Currently, it does not appears in the title, nor in the Description, only in Details. However, if I understand correctly, some check are useful for "standard" survival dataset too.

I understand that this is low priority. I opened the issue just to let you know. Feel free to close without further comment. Thanks!

@therneau
Copy link
Owner

I do appreciate the comment. I work hard at making good documentation, but as someone who has worked in the package for a very long time there are blind spots where something is "obvious" to me but not the user. Input like yours is the best way for me to find out. But I do have too many projects to get to this right away.

@ThomasSoeiro
Copy link
Author

Your hard work has already paid off; I think that survival the documentation is already excellent! (even for someone like me with no formal training in statistics)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants