why is it helpful to use a linear model for a set of data?

Accepted Solution

The sensible use of linear regression on a data set requires that four assumptions about that data set be true:

The relationship between the variables is linear.
The data is homoskedastic, meaning the variance in the residuals (the difference in the real and predicted values) is more or less constant.
The residuals are independent, meaning the residuals are distributed randomly and not influenced by the residuals in previous observations. If the residuals are not independent of each other, they’re considered to be autocorrelated.
The residuals are normally distributed. This assumption means the probability density function of the residual values is normally distributed at each x value. I leave this assumption for last because I don’t consider it to be a hard requirement for the use of linear regression, although if this isn’t true, some manipulations must be made to the model.