Study_Datascience

1. Linear Regression

SST(total) = SSR(explained) + SSE(unexplained)
OLS (ordinary least squares) model - minimum SSE [lowest error]
R-squared = SSR/SST (measure how much of the total variability of the dataset, ranging 0 to 1 variability)

2. Multiple Linear Regression

adjusted R-squared (<R-squared) measures how well your model fits the data, but it penalizes the use of variables that are meaningless for the regression
increases R-squared but decreases adjusted R-squared ⇒ omit the variable

3. Logistic Regression

-> The logistic regression predicts the probability of an event occurring

Maximum likelihood estimation (MLE) : estimates how likely it is that the model at hand describes the real underlying relationship of the variables
LL-null ( log likelihood-null) : the log-likelihood of a model which has no independent variables
LLR(log likelihood ratio) : measures if our model is statistically different from LL-null

4. k-means clustering

multivariate statistical technique that groups observations on the basis some of their features or variables they are described by

→ maximize the similarity of observations within a cluster and maximize the dissimilarity between clusters

1. choose the num of clusters [The elbow method]
  * minimize the distance between points in a cluster (low WCSS within-cluster sum of squares)
2. specify the cluster seeds
3. assign each point to a centroid
4. adjust the centroids

5. Hierarchical clustering

pros and cons of dendrogram

Pros

Hierarchical clustering shows all the possible linkages between clusters
No need to preset the number of clusters like K-means
Many methods to perform hierarchical clustering

Cons

Scaleability
Computational expensive

6. Basic Neural Network

'Data - Model - Objective function - Optimization algorithm'

The objective function

: a measure of how well our model’s outputs match the targets.

two types

loss (supervised learning)
- The lower the loss function, the higher the level of accuracy of the model
reward (reinforcement learning)
- The higher the reward function, the higher the level of accuracy of the model

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
1. Simple Linear Regression with sklearn .ipynb		1. Simple Linear Regression with sklearn .ipynb
2. Multiple Linear Regression with sklearn.ipynb		2. Multiple Linear Regression with sklearn.ipynb
3. Binary Predictors in a Logistic Regression.ipynb		3. Binary Predictors in a Logistic Regression.ipynb
4. Species Segmentation with Cluster Analysis.ipynb		4. Species Segmentation with Cluster Analysis.ipynb
5. Hierarchical Clustering_Heatmaps.ipynb		5. Hierarchical Clustering_Heatmaps.ipynb
6. Basic Neural Network.ipynb		6. Basic Neural Network.ipynb
README.md		README.md
image_generator_no_validation.ipynb		image_generator_no_validation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Study_Datascience

1. Linear Regression

2. Multiple Linear Regression

3. Logistic Regression

4. k-means clustering

5. Hierarchical clustering

6. Basic Neural Network

About

Releases

Packages

Languages

rosiekwon/Study_Datascience

Folders and files

Latest commit

History

Repository files navigation

Study_Datascience

1. Linear Regression

2. Multiple Linear Regression

3. Logistic Regression

4. k-means clustering

5. Hierarchical clustering

6. Basic Neural Network

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages