Linear Regression- Machine Learning

Kapil Bhise
Analytics Vidhya
Published in
3 min readMar 2, 2021

--

Get a complete code of linear regression algorithm from scratch at ml-lab

Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a statistical method that is used for predictive analysis. Linear regression makes predictions for continuous/real or numeric variables such as sales, salary, age, product price, etc.

Linear regression algorithms show a linear relationship between a dependent (y) and one or more independent (y) variables, hence called linear regression. Since linear regression shows the linear relationship, which means it finds how the value of the dependent variable is changing according to the value of the independent variable.

When working with linear regression, our main goal is to find the best fit line that means the error between predicted values and actual values should be minimized. The best fit line will have the least error.

The different values for weights or the coefficient of lines (a0, a1) gives a different line of regression, so we need to calculate the best values for a0 and a1 to find the best fit line, so to calculate this we use cost function.

Cost function-

  • The different values for weights or coefficient of lines (a0, a1) gives the different line of regression, and the cost function is used to estimate the values of the coefficient for the best fit line.
  • Cost function optimizes the regression coefficients or weights. It measures how a linear regression model is performing.
  • We can use the cost function to find the accuracy of the mapping function, which maps the input variable to the output variable. This mapping function is also known as Hypothesis function.

Pseudo Code for algorithm :

Discussion of algorithm:

In multiple linear regression, there are p explanatory variables, and the relationship between the dependent variable and the explanatory variables is represented by the following equation:

The Goodness of fit determines how the line of regression fits the set of observations. The process of finding the best model out of various models is called optimization. It can be achieved by below method:

1. R-squared method:

  • R-squared is a statistical method that determines the goodness of fit.
  • It measures the strength of the relationship between the dependent and independent variables on a scale of 0–100%.
  • The high value of R-square determines the less difference between the predicted values and actual values and hence represents a good model.
  • It is also called a coefficient of determination, or coefficient of multiple determination for multiple regression.
  • It can be calculated from the below formula:

You can see complete implementation of code on my GitHub account.

Thank You!

--

--

Kapil Bhise
Analytics Vidhya

Passionate about learning new technologies and implementing them. Enjoy contributing ideas to projects. Strong written and verbal communication skills;