Skip to content

jincy-p-janardhanan/ML-Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 

Repository files navigation

ML-Models

For an exploratory study of various machine learning models. Analysis and precprocessing of data may often be avoided since this study focuses on usage of various sklearn models with default configurations.

Regression

Explores performance of various regression models.
After importing the regression.py module, you can call the function estimator(X, Y) and get the best regression model for the dataset (X, Y).
The parameter X is a pandas dataframe of input features and Y is a pandas series of the target variable.
Running the regression.py module as it is will give the following sample output which uses the classic California Housing Prices Dataset.

Sample Output

image

<class 'sklearn.linear_model._base.LinearRegression'>:
	Training MSE	: 0.4415847034707834
	Validation MSE	: 0.46836770999839755

image

<class 'sklearn.ensemble._bagging.BaggingRegressor'>:
	Training MSE	: 0.0495235228848702
	Validation MSE	: 0.2809632860237471 

image

<class 'sklearn.ensemble._forest.RandomForestRegressor'>:
	Training MSE	: 0.03442765202858801
	Validation MSE	: 0.25679743116303017 

image

<class 'sklearn.svm._classes.LinearSVR'>:
	Training MSE	: 2.126105308335331
	Validation MSE	: 2.0886867115616874  

image

<class 'sklearn.neighbors._regression.KNeighborsRegressor'>:
	Training MSE	: 0.2680161881869106
	Validation MSE	: 0.42012969031690883

image

Best model: <class 'sklearn.ensemble._forest.RandomForestRegressor'>
Test  MSE: 0.25317961771776054

In the current setting, this output is reproducible when California Housing Prices Dataset and the chosen random seed (10) are used.4

Classification

Explores performance of various classification models.
After importing the classification.py module, you can call the function estimator(X, Y) and get the best classification model for the dataset (X, Y).
The parameter X is a pandas dataframe of input features and Y is a pandas series of the target label.
(Multi-class classification is assumed.)
Running the classification.py module as it is will give the following sample output which uses the classic 20-News-Groups Dataset.

Sample Output

image

<class 'sklearn.linear_model._logistic.LogisticRegression'>:
	Training F1 score	: 0.892072528976155
	Validation F1 score	: 0.8261910103702513

image

<class 'sklearn.ensemble._bagging.BaggingClassifier'>:
	Training F1 score	: 0.9883881204315708
	Validation F1 score	: 0.7191521215859012 

image

<class 'sklearn.ensemble._forest.RandomForestClassifier'>:
	Training F1 score	: 0.999933668957722
	Validation F1 score	: 0.8263376379540432 

image

<class 'sklearn.svm._classes.LinearSVC'>:
	Training F1 score	: 0.9911449973287118
	Validation F1 score	: 0.9085398890414774 

image

<class 'sklearn.neighbors._classification.KNeighborsClassifier'>:
	Training F1 score	: 0.7839324365743485
	Validation F1 score	: 0.627218602829993 

image

Best model: <class 'sklearn.svm._classes.LinearSVC'>
Test  F1-score: 0.9111325934890819

Releases

No releases published

Packages

No packages published

Languages