Collection of Data Science Notebooks

Exploratory Analysis and data curation coupled with utilization of different on a number of datasets on a number of datasets.

1) Housing

2) Power plant data: (Regression)

Power Plant Dataset :: The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP) of the plant. A combined cycle power plant (CCPP) is composed of gas turbines (GT), steam turbines (ST) and heat recovery steam generators. In a CCPP, the electricity is generated by gas and steam turbines, which are combined in one cycle, and is transferred from one turbine to another.

NOTEBOOK::

The notebook involves EDA for visualization and analysis of data as well as finding the significant features. In addition algorithms utilized on the data are Linear Regression, Multiple Regression and KNN followed by a comparative analysis

3) Random Forest and Trees (Classification)

APS Failure Dataset The datasets' positive class consists of component failures for a specific component of the APS system. The negative class consists of trucks with failures for components not related to the APS.

NOTEBOOK::

The notebook involves data analysis and data preparation such as exploring different methods for imputation of data, checking measures of central tendency and dispersion and checking for imbalance and outliers. Various algorithms were adopted and tried for classification such as Random Forest and XGBoost as well as Smote to resample and tackle class imbalance.

4) Urinary Tract Infection Diagnosis and Crime Rate in Communities datasets (Using Decision Trees and Regularisation )::

Acute Inflammations Dataset :: The main idea of this data set is to prepare the algorithm of the expert system, which will perform the presumptive diagnosis of two diseases of urinary system. It will be the example of diagnosing of the acute inflammations of urinary bladder and acute nephritises.

Communities and Crime dataset:: Communities within the United States. The data combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR.

NOTEBOOK Part 1 (For Urinary tract infection diagnosis) ::

The notebook consists of EDA as well as decision trees to correctly split the features based of Gini values and using cost complexity pruning to find a decison tree that is highly interpretable.

NOTEBOOK Part 2 (For Crime and communities regression) ::

Perfored EDA and performe comparative analysis of Linear Regression , Ridge Regression (L1) ,Lasso Regression (L2), Principal Component Regression and Boosting on the data.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Hotel Cancellation		Hotel Cancellation
Housing		Housing
LGBM		LGBM
Power Plant Data		Power Plant Data
Random Forest and Trees		Random Forest and Trees
Regularisation and Trees		Regularisation and Trees
Time Series Classification		Time Series Classification
Vertebral Column Analysis		Vertebral Column Analysis
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Collection of Data Science Notebooks

1) Housing

2) Power plant data: (Regression)

NOTEBOOK::

3) Random Forest and Trees (Classification)

NOTEBOOK::

4) Urinary Tract Infection Diagnosis and Crime Rate in Communities datasets (Using Decision Trees and Regularisation )::

NOTEBOOK Part 1 (For Urinary tract infection diagnosis) ::

NOTEBOOK Part 2 (For Crime and communities regression) ::

About

Releases

Packages

Languages

AlrikF/Data-science-statistical-modelling-projects

Folders and files

Latest commit

History

Repository files navigation

Collection of Data Science Notebooks

1) Housing

2) Power plant data: (Regression)

NOTEBOOK::

3) Random Forest and Trees (Classification)

NOTEBOOK::

4) Urinary Tract Infection Diagnosis and Crime Rate in Communities datasets (Using Decision Trees and Regularisation )::

NOTEBOOK Part 1 (For Urinary tract infection diagnosis) ::

NOTEBOOK Part 2 (For Crime and communities regression) ::

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages