ML-Models

For an exploratory study of various machine learning models. Analysis and precprocessing of data may often be avoided since this study focuses on usage of various sklearn models with default configurations.

Regression

Explores performance of various regression models.
After importing the regression.py module, you can call the function estimator(X, Y) and get the best regression model for the dataset (X, Y).
The parameter X is a pandas dataframe of input features and Y is a pandas series of the target variable.
Running the regression.py module as it is will give the following sample output which uses the classic California Housing Prices Dataset.

Sample Output

<class 'sklearn.linear_model._base.LinearRegression'>:
	Training MSE	: 0.4415847034707834
	Validation MSE	: 0.46836770999839755

<class 'sklearn.ensemble._bagging.BaggingRegressor'>:
	Training MSE	: 0.0495235228848702
	Validation MSE	: 0.2809632860237471

<class 'sklearn.ensemble._forest.RandomForestRegressor'>:
	Training MSE	: 0.03442765202858801
	Validation MSE	: 0.25679743116303017

<class 'sklearn.svm._classes.LinearSVR'>:
	Training MSE	: 2.126105308335331
	Validation MSE	: 2.0886867115616874

<class 'sklearn.neighbors._regression.KNeighborsRegressor'>:
	Training MSE	: 0.2680161881869106
	Validation MSE	: 0.42012969031690883

Best model: <class 'sklearn.ensemble._forest.RandomForestRegressor'>
Test  MSE: 0.25317961771776054

In the current setting, this output is reproducible when California Housing Prices Dataset and the chosen random seed (10) are used.4

Classification

Explores performance of various classification models.
After importing the classification.py module, you can call the function estimator(X, Y) and get the best classification model for the dataset (X, Y).
The parameter X is a pandas dataframe of input features and Y is a pandas series of the target label.
(Multi-class classification is assumed.)
Running the classification.py module as it is will give the following sample output which uses the classic 20-News-Groups Dataset.

Sample Output

<class 'sklearn.linear_model._logistic.LogisticRegression'>:
	Training F1 score	: 0.892072528976155
	Validation F1 score	: 0.8261910103702513

<class 'sklearn.ensemble._bagging.BaggingClassifier'>:
	Training F1 score	: 0.9883881204315708
	Validation F1 score	: 0.7191521215859012

<class 'sklearn.ensemble._forest.RandomForestClassifier'>:
	Training F1 score	: 0.999933668957722
	Validation F1 score	: 0.8263376379540432

<class 'sklearn.svm._classes.LinearSVC'>:
	Training F1 score	: 0.9911449973287118
	Validation F1 score	: 0.9085398890414774

<class 'sklearn.neighbors._classification.KNeighborsClassifier'>:
	Training F1 score	: 0.7839324365743485
	Validation F1 score	: 0.627218602829993

Best model: <class 'sklearn.svm._classes.LinearSVC'>
Test  F1-score: 0.9111325934890819

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
README.md		README.md
classification.py		classification.py
regression.py		regression.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML-Models

Regression

Sample Output

Classification

Sample Output

About

Releases

Packages

Languages

jincy-p-janardhanan/ML-Models

Folders and files

Latest commit

History

Repository files navigation

ML-Models

Regression

Sample Output

Classification

Sample Output

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages