The main function of this package is runCtree()
, which is a wrapper of
partykit::ctree() with
addition functions:
partykit::ctree()
only produces the best separation at each node, i.e. one tree. By settingrecursive = T
inrunCtree()
, all trees meeting p-val cutoff are produced and can be examined to see which one makes more sense according to domain knowledge. Each round of recursion is done by removing the 1st splitting variable from the input data.frame and runningrunCtree()
; the recursion stops if no splitting variable is found.- The info and stats of each node of each tree are collected and summarized in an excel file, which also contains ULRs to each tree.
- Before running
partykit::ctree()
, low-informative columns and rows are removed to reduce computation and adjustment on association p-vals. - Cases leading to crashes of
partykit::ctree()
are handled, e.g.Inf
and-Inf
are converted toNA
to avoid the following errors: ” ‘breaks’ are not unique”.
Note:
- ctree uses
coin::independence_test()
to test the association of two variables of any data type. See here for theory behind the test, and here for an explanation of the algorithm, and here for a nice tutorial. - see here for discussions on the pros and cons of ctree in comparison to other trees, e.g. rpart.
Since this is just a toy, I have no plan to submit it to CRAN. So please install from github directly:
devtools::install_github("blueskypie/citree")
library(citree)
data('mtcars')
re=runCtree(mtcars,'mtcars',oDir='tmp')
check the tmp
directory for output.