Skip to content

Releases: meantrix/corrp

0.3.0

26 May 19:49
7f7c2b6
Compare
Choose a tag to compare

0.3.0

  • Added C++ implementations of Average correlation clustering algorithm and the Average Silhouette width;
  • acca New function to clustering correlations;
  • sil_acca Computes the Average Silhouette width to ACCA clusters;
  • best_acca Find the optimal number of ACCA clusters;
  • Checks ok.

0.2.0

03 May 22:17
fba5d93
Compare
Choose a tag to compare
  • Changed package name corrP to corrp ;
  • Changelog file created ;
  • License file GLP3 created;
  • Added new correlations types analysis: pps ; dcor ; mic ; uncoef;
  • corrp function output has a new class clist with index matrix and data values;
  • corr_fun: New function to calculate correlation type inferences to pair of variables;
  • corr_matrix: New function to create correlation matrix ;
  • corr_rm: New function to remove highly correlated variables from a data.frame;
  • Added verbose param to corrp and corr_fun functions ;
  • Added testthat unit tests;
  • Checks ok;
  • Fixed some bugs in function'sand documentations;

0.1.1

26 Apr 16:02
a2a76dd
Compare
Choose a tag to compare

Details

The data.frame is allowed to have columns of these four classes: integer, numeric, factor and character. The character column is considered as categorical variable.

In this new package the correlation is automatically computed according to the variables types:

Also, the statistical significance of all correlation’s values in the matrix are tested. If the statistical tests do not obtain a significance level lower than p.value param the null hypothesis can’t be rejected and by default, the correlation between the variable pair will be zero.

Example:

library(corrP)
# run correlation in parallel backend
air_cor = corrP(airquality,parallel = TRUE, n.cores = 4, p.value = 0.05)
corrplot::corrplot(air_cor)
corrgram::corrgram(air_cor)

Another package function rh_corrP can remove highly correlated variables from data.frames using the CorrP matrix.

air_cor = corrP(airquality)
 airqualityH = rh_corrP(df=airquality,corrmat=air_cor,cutoff=0.5)

setdiff(colnames(airquality),(colnames( airqualityH )))

[1] "Ozone" "Temp"

The CoorP package is still very new, but it is already capable of providing some interesting features. In the next versions we will be including some types of plots to be made with corrP correlation matrix .