To understand Logistic Regression, we need to understand the concept of Maximum likelihood. Following steps will help us in doing that.
Step-1
Suppose we have a distribution of weights i.e. mouse weights
Step-2
We need to find the normal distribution curve which will fit best on this data, for this we need optimum values of mu and sigma
Step-3
- We then try out various values of mean so that the weights are closest to the mean and gives maximum likelihood while doing this, we fix sigma to any value and not chnge it
- Once we get the optimum value of mean i.e. mu then we switch to sigma and vary sigma to finalize of curve's spread to get max likelihood value during this time the mean stays fixed to the optimum value obtained in previous step
- There is a catch here, we have n number of observations here so to calculate max likelihood, we calculate max likelihood for individual samples and multiply them together
- Finally we get our optimum values of mean and sd using which we can best fit our data to normal distribution
-
On the left side of the image we have our base data, we can have n number of variables in the data, just imagine an N dimension graph and all our data points plotted there.
-
Logistic regression uses a logit function which is used to calculate probabilities given a set of values for the variables. The graph with the formula of logit function is given below.
-
To get the best fitting curve of logit function, we need to follow a bunch of steps
- First being converting the y axis with probability to log odds using the formula: Log odds = Log(p/1-p)
- Once the probability is converted to log odds, the y axis ranges from -infinity(former 0) to + infinity(former 1)
- We now select a random line to start with and then project the points on the line
-
We now calculate the likelihood for each point, which is nothing but the probability value for all 1s and 1-p value for all 0s, we have seen earlier that to calculate the final likelihood value we multiplied the values but here, we calculate log likelihood which is nothing but sum of log of likelihood values
-
Now the algorithm rotates the line in such a way that it increases the value of log likelihood. We continue the above steps till we get the best fit squiggle/ maximum log likelihood