You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@JunaidMB,
May I suggest you the following improvements
tabnet is good at managing categorical predictors, so In your case I would turn the "_flag" vars into logicals, and the character predictors into factors.
tabnet now supports missing values for almost all of the tasks. So I would not remove the NAs samples.
there is no need to dummify categorical values as tabnet has a more powerfull embedding for that.
I recommand increasing the batch_size here. ( the new default is 1024^2 ) to improve convergence and lower training time.
strand_pat<-NHSRdatasets::stranded_data %>%
setNames(c("stranded_class", "age", "care_home_ref_flag", "medically_safe_flag",
"hcop_flag", "needs_mental_health_support_flag", "previous_care_in_last_12_month", "admit_date", "frail_descrip")) %>%
mutate(across(where(is.character),as.factor),
admit_date= as.Date(admit_date, format="%d/%m/%Y"),
across(ends_with("flag"), as.logical))
...## Define Recipe to be applied to the datasetstranded_rec<-
recipe(stranded_class~., data=train_data) %>%
# Make a day of week and month feature from admit date and remove raw admit date
step_date(admit_date, features= c("dow", "month")) %>%
step_rm(admit_date) %>%
# Upsample minority (positive) classthemis::step_upsample(stranded_class, over_ratio= as.numeric(upsample_ratio)) %>%
step_zv(all_predictors()) %>%
step_normalize(all_numeric_predictors())
The text was updated successfully, but these errors were encountered:
Thanks a lot @cregouby, I've implemented those improvements! Is there a place/ documentation where we can see best practices for implementing Tabnet? Your presentation here was excellent at showing how to get setup with Tabnet and basic commands but is there a place where we can know information like the tips you shared above?
I think many people will have my initial temptation of doing the exact same preprocessing for Tabnet that we might do for Random Forest or XGBoost for example. If you could share anything helpful, I'll include it in the references section in the README.
Got it ! I'll add a dedicated vignette in my todo list. But first, please measure if there is an improvement in your metric (or not) because no one knows without looking at the data...
@JunaidMB,
May I suggest you the following improvements
The text was updated successfully, but these errors were encountered: