Ordinary Least Squares (OLS)

#optimization_algorithm #normal_equation

The Ordinary Least Squares (OLS) is an analytical method used to compute the θ parameters of the hypothesis function.

According to the Gauss Markov theorem, there are assumptions to meet in order to guarantee the validity of OLS for estimating the coefficients of a regression:

Lack of knowledge of these assumptions could result in incorrect results.

⚠️ This method can use feature scaling, but it doesn't need it.

Univariate OLS Linear Regression

Given the hypothesis function of a univariate linear regression:

hθ(x) = θ0 + θ1x + ε

One can compute the unbiased estimators θ^0 and θ^1 of θ0 and θ1:

θ^0 = y¯  θ^1 y¯θ^1 = Sx,ySx2

with Sx,y = the covariance of the training X and YSx2 = the variance of the training input Xx¯ = the means of the training inputs Xy¯ = the means of the training outputs Y

Then y can be estimated (predicted) with:

y^ = θ^0 + θ^1 x

with x = the input for which one wants the predict the output yθ^i = the computed estimator of θi

And that's done!


But various metrics can also be computed to evaluate the model.

The unexplained error of each training example:

ei = yiy^i            ei2 is called the residual error

with yi = the expected output of the ith sampley^i = the predicted output of the ith sample

The sum of residuals ε (which is the OLS cost function):

ε = i=1mei2 = i=1m(yi  y^i)2 = i=1m(yi  (θ^0 + θ^1 xi))2

with m = the number of samplesei2 = the residual for the erroreiof the ith sample

The unbiased residual variance (also called unexplained variance
or error variance)
:

σ^2 = 1m  1  1 i=1mei2

with m = the number of samplesei2 = the residual for the erroreiof the ith sample

Vectorized Univariate OLS Linear Regression

There exist vectorized formulas for batch predictions:

hθ(x) = θ0[11] + θ1[x0xm] + [e0em]ε = ||Y  (θ0 1n  θ1 X)||l22

with || . ||l2 = the euclidean normY = the ouput valuesX = the input values1m = an array filled with 1, of size mm = he number of samples

Although this approach is very simple and works very well with univariate linear regressions, this is not the case with the multivariate version.

Multivariate OLS Linear Regression

Given the hypothesis function of a Multivariate linear regression
with p predictors
(or features):

hθ(x) = θ0 x0 + θ1 x1 +  + θp xp + ε

One can compute the unbiased estimators θ^0θ^p of θ0θp using the normal equation

⚠️ X must be invertible.
And otherwise a pseudo inverse matrix can be used instead.

θ^ = (XT X)1XT Y

with X = a matrix of i.i.d. training examples arranged as rowsY = a vector of all the expected output values

Then y for a single new input x can be estimated (predicted):

y^i = θ^0 xi 0 + θ^1 xi 1+ + θ^p xi p

with xi = the input for which one wants the predict the output yθ^ = a vector of all the estimators computed with the normal equationp = the number of predictors / features

Or a batch of predictions can be made using the vectorized version:

Y^ = θ^  X[y^1y^m] = [θ^1θ^m][x1 1x1 pxm 1xm p]

with X = a matrix of i.i.d. features arranged as rowsθ^ = a vector of all the estimators computed with the normal equationp = the number of predictors / featuresm = the number of samples

And that's done!


And similarly to the univariate linear regression various metrics can be computed to evaluate the model (e.g. : statistical tests).

The unexplained error of each training example:

ei = yiy^i            ei2 is called the residual error = YY^

with yi = the expected output of the ith sampley^i = the predicted output of the ith sample

The sum of residuals ε (which is the OLS cost function):

ε = i=1mei2 = i=1m(yi  y^i)2 = i=1m(yi  (θ^0 x0 + θ^1 x1 +  + θ^p xp))2 = i=1m(yi  j=1p(θ^j xi j))2

with m = the number of samplesp = the number of predictors / featuresei2 = the residual for the erroreiof the ith sample

The unbiased residual variance (also called unexplained variance or error variance):

σ^2 = 1m  p  1 i=1mei2