GitHub - daniel-yj-yang/machlearn: A Simple Yet Powerful Machine Learning Python Library

A Simple Yet Powerful Machine Learning Python Library

Install

pip install machlearn

Example 1: k-Nearest Neighbors

from machlearn import kNN
kNN.demo("iris")
kNN.demo_from_scratch("iris")

Selected Output:

This demo uses a public dataset of Fisher's Iris, which has a total of 150 samples from three species of Iris ('setosa', 'versicolor', 'virginica').
The goal is to use 'the length and the width of the sepals and petals, in centimeters', to predict which species of Iris the sample belongs to.

Using a grid search and a kNN classifier, the best hyperparameters were found as following:
   Step1: scaler: StandardScaler(with_mean=True, with_std=True);
   Step2: classifier: kNN_classifier(n_neighbors=12, weights='uniform', p=2.00, metric='minkowski').

Example 2: Naive Bayes

from machlearn import naive_bayes as nb
nb.demo_from_scratch()
nb.demo(dataset="SMS_spam")

Selected Output from nb.demo(dataset="SMS_spam"):

This demo uses a public dataset of SMS spam, which has a total of 5574 messages = 4827 ham (legitimate) and 747 spam.
The goal is to use 'term frequency in message' to predict whether the message is ham (class=0) or spam (class=1).

Using a grid search and a multinomial naive bayes classifier, the best hyperparameters were found as following:
   Step1: Tokenizing text: CountVectorizer(analyzer = <_lemmas>, ngram_range = (1, 1));
   Step2: Transforming from occurrences to frequency: TfidfTransformer(use_idf = True).

The top 2 terms with highest probability of a message being a spam (the classification is either spam or ham):
   "claim": 81.28%
   "prize": 80.24%
   "won": 76.29%

Application example:
   - Message: "URGENT! We are trying to contact U. Todays draw shows that you have won a 2000 prize GUARANTEED. Call 090 5809 4507 from a landline. Claim 3030. Valid 12hrs only."
   - Probability of spam (class=1): 95.85%
   - Classification: spam

Example 3: Decision Boundary Comparison (Classification with Two Features)

from machlearn import kNN
kNN.demo("Social_Network_Ads")

from machlearn import naive_bayes as nb
nb.demo("Social_Network_Ads")

from machlearn import SVM
SVM.demo("Social_Network_Ads")

from machlearn import decision_tree as DT
DT.demo("Social_Network_Ads", classifier_func = "DT")

from machlearn import logistic_regression as logreg
logreg.demo("Social_Network_Ads")

from machlearn import neural_network as NN
NN.demo("Social_Network_Ads")

from machlearn import ensemble
ensemble.demo("Social_Network_Ads")

Example 4: Imbalanced Data

from machlearn import imbalanced_data
imbalanced_data.demo()

Summary of output:

To mitigate the problem associated with class imbalance, downsampling the majority class (y=0) to match the minority case (y=1).

These are insensitive to class imbalance:
- Area Under ROC curve
- Geometric mean
- Matthew's Correlation Coefficient
- Recall, TPR
- Specificity, 1-FPR

These are sensitive to class imbalance:
- Area Under PR curve
- Accuracy
- F1 score
- Precision

Extreme Imbalanced Data	Majority Downsampled to Match Minority Class

Example 5: Regularization

from machlearn import linear_regression as linreg
linreg.demo_regularization()

Summary of output:

Issues: (a) high multicollinearity and (b) too many features; these lead to overfitting and poor generalization.
- After L2 Regularization (Ridge regression), reduced variance among the coefficient estimates [more robust/stable estimates], and better R-squared and lower RMSE with the testing set [better generalization]
- After L1 Regularization (Lasso regression), coefficient estimates becoming 0 for relatively trivial features [a simpler model], and better R-squared and lower RMSE with the testing set [better generalization]

Example 6: Gradient Descent

from machlearn import gradient_descent as GD
GD.demo("Gender")

Summary of output:

This example uses a batch gradient descent (BGD) procedure, a cost function of logistic regression, 30,000 # iterations, a learning rate of 0.00025, and with Male (1, 0) as the target.
- Theta estimates of [const, Height (inch), Weight (lbs)]: [-0.00977953, -0.4779923, 0.19667817]
- Compared to estimates from statsmodels ([0.69254314, -0.49262002, 0.19834042]), the estimates associated with Height and Weight are very close
- Accuracy of prediction:  0.919

Descriptive statistics	Batch Gradient Descent Training Loss vs. Epoch

Example 7: Decision Tree

from machlearn import decision_tree as DT
DT.demo()
DT.demo_from_scratch(question_type="regression") # dataset='boston'
DT.demo_from_scratch(question_type="classification") # dataset='Social_Network_Ads', X=not scaled, criterion=entropy, max_depth=2

Summary of output:

- DT.demo_from_scratch(question_type="regression") uses decision_tree_regressor_from_scratch()
- DT.demo_from_scratch(question_type="classification") provides results essentially identical to the tree graph below.

Example 8: Ensemble Methods

from machlearn import ensemble
ensemble.demo()
ensemble.demo("Social_Network_Ads")
ensemble.demo("boston")

Summary of output:

- These demos call the following functions developed from scratch and reflect the inner workings of them:
* random_forest_classifier_from_scratch();
* adaptive_boosting_classifier_from_scratch();
* gradient_boosting_regressor_from_scratch() (see training history plot below): R_squared = 0.753, RMSE = 4.419

Example 9: Assumption Testing

from machlearn import linear_regression as linreg
linreg.demo_assumption_test()

Summary of output:

The assumptions of linear regression include (1) linear relationship between X and y, (2) I.I.D. of the residuals (residuals are independently and identically distributed as normal), (3) little or no multicollinearity if multiple IVs.

Selected output:

image_linreg_assumption_test_homoscedasticity

module: model_evaluation

function	description
plot_ROC_and_PR_curves()	plots both the ROC and the precision-recall curves, along with statistics
plot_ROC_curve()	plots the ROC (Receiver Operating Characteristic) curve, along with statistics
plot_PR_curve()	plots the precision-recall curve, along with statistics
plot_confusion_matrix()	plots the confusion matrix, along with key statistics, and returns accuracy
demo_CV()	provides a demo of cross validation in this module
demo()	provides a demo of the major functions in this module

module: datasets

function	description
public_dataset()	returns a public dataset as specified (e.g., iris, SMS_spam, Social_Network_Ads)

module: kNN

function	description
kNN_classifier_from_scratch()	kNN classifier developed from scratch
demo_from_scratch()	provides a demo of selected functions in this module
demo()	provides a demo of selected functions in this module

module: naive_bayes

class/function	description
Multinomial_NB_classifier_from_scratch()	Multinomial NB classifier developed from scratch
demo_from_scratch()	provides a demo of selected functions in this module
Gaussian_NB_classifier()	when X are continuous variables
Multinomial_NB_classifier()	when X are independent discrete variables with 3+ levels (e.g., term frequency in the document)
Bernoulli_NB_classifier()	when X are independent binary variables (e.g., whether a word occurs in a document or not)
demo()	provides a demo of selected functions in this module

module: SVM

function	description
demo()	provides a demo of selected functions in this module

module: decision_tree

class/function	description
decision_tree_regressor_from_scratch()	decision tree regressor developed from scratch
decision_tree_classifier_from_scratch()	decision tree classifier developed from scratch
demo_from_scratch()	provides a demo of selected functions in this module
decision_tree_regressor()	decision tree regressor
decision_tree_classifier()	decision tree classifier
demo()	provides a demo of selected functions in this module

module: neural_network

function	description
multi_layer_perceptron_classifier()	multi-layer perceptron (MLP) classifier
rnn()	recurrent neural network
demo()	provides a demo of selected functions in this module

module: logistic_regression

function	description
logistic_regression_sklearn()	solutions using sklearn
logistic_regression_statsmodels()	solutions using statsmodels
demo()	provides a demo of selected functions in this module

module: linear_regression

function	description
assumption_test()	tests the assumptions of linear regression
lasso_regression()	lasso_regression
ridge_regression()	ridge_regression
linear_regression_normal_equation()	linear_regression_normal_equation
linear_regression()	linear_regression
demo()	provides a demo of selected functions in this module
demo_regularization()	provides a demo of selected functions in this module
demo_assumption_test()	provides a demo of selected functions in this module

module: DSA

function	description
demo()	provides a demo of selected functions in this module

module: stats

function	description
demo()	provides a demo of selected functions in this module

module: pipeline

class/function	description
demo()	provides a demo of selected functions in this module

module: imbalanced_data

function	description
demo()	provides a demo of selected functions in this module

module: decomposition

function	description
demo()	provides a demo of selected functions in this module

module: gradient_descent

class/function	description
logistic_regression_BGD_classifier()	logistic_regression_BGD_classifier class
batch_gradient_descent()	batch_gradient_descent class
demo()	provides a demo of selected functions in this module

module: ensemble

class/function	description
gradient_boosting_regressor_from_scratch()	gradient boosting regressor developed from scratch
adaptive_boosting_classifier_from_scratch()	adaptive boosting classifier developed from scratch
random_forest_classifier_from_scratch()	random forest classifier developed from scratch
bagging_classifier_from_scratch()	bagging classifier developed from scratch
gradient_boosting_classifier()	gradient boosting classifier
adaptive_boosting_classifier()	adaptive boosting classifier
random_forest_classifier()	random forest classifier
bagging_classifier()	bagging classifier
voting_classifier()	voting classifier
demo()	provides a demo of selected functions in this module

Name		Name	Last commit message	Last commit date
Latest commit History 584 Commits
.travis.dependencies		.travis.dependencies
examples		examples
machlearn		machlearn
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Simple Yet Powerful Machine Learning Python Library

Install

Example 1: k-Nearest Neighbors

Example 2: Naive Bayes

Example 3: Decision Boundary Comparison (Classification with Two Features)

Example 4: Imbalanced Data

Example 5: Regularization

Example 6: Gradient Descent

Example 7: Decision Tree

Example 8: Ensemble Methods

Example 9: Assumption Testing

module: model_evaluation

module: datasets

module: kNN

module: naive_bayes

module: SVM

module: decision_tree

module: neural_network

module: logistic_regression

module: linear_regression

module: DSA

module: stats

module: pipeline

module: imbalanced_data

module: decomposition

module: gradient_descent

module: ensemble

About

Releases

Packages

Languages

License

daniel-yj-yang/machlearn

Folders and files

Latest commit

History

Repository files navigation

A Simple Yet Powerful Machine Learning Python Library

Install

Example 1: k-Nearest Neighbors

Example 2: Naive Bayes

Example 3: Decision Boundary Comparison (Classification with Two Features)

Example 4: Imbalanced Data

Example 5: Regularization

Example 6: Gradient Descent

Example 7: Decision Tree

Example 8: Ensemble Methods

Example 9: Assumption Testing

module: model_evaluation

module: datasets

module: kNN

module: naive_bayes

module: SVM

module: decision_tree

module: neural_network

module: logistic_regression

module: linear_regression

module: DSA

module: stats

module: pipeline

module: imbalanced_data

module: decomposition

module: gradient_descent

module: ensemble

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages