#optimization_algorithm #normal_equation
The Ordinary Least Squares (OLS) is an analytical method used to compute the θ parameters of the hypothesis function .
According to the Gauss Markov theorem , there are assumptions to meet in order to guarantee the validity of OLS for estimating the coefficients of a regression:
Lack of knowledge of these assumptions could result in incorrect results.
⚠️ This method can use feature scaling, but it doesn't need it.
Univariate OLS Linear Regression
Given the hypothesis function of a univariate linear regression :
h θ ( x ) = θ 0 + θ 1 x + ε One can compute the unbiased estimators θ ^ 0 and θ ^ 1 of θ 0 and θ 1 :
θ ^ 1 = S x , y S x 2 θ ^ 0 = y ¯ − θ ^ 1 y ¯
with S x , y = the covariance of the training inputs X and outputs Y S x 2 = the variance of the training inputs X x ¯ = the means of the training inputs X y ¯ = the means of the training outputs Y
Then y can be estimated (predicted) with:
y ^ = θ ^ 0 + θ ^ 1 x
with x = the input for which one wants the predict the output y θ ^ i = the computed estimator of θ i
And that's done!
But various metrics can also be computed to evaluate the model.
The unexplained error of each training example:
e i = y i − y ^ i ⇐ e i 2 is called the residual error
with y i = the expected output of the i t h sample y ^ i = the predicted output of the i t h sample
The sum of residuals ε (which is the OLS cost function ) :
ε = ∑ i = 1 m e i 2 = ∑ i = 1 m ( y i − y ^ i ) 2 = ∑ i = 1 m ( y i − ( θ ^ 0 + θ ^ 1 x i ) ) 2
with m = the number of samples e i 2 = the residual for the error e i of the i t h sample
The unbiased residual variance (also called unexplained variance
or error variance ) :
σ ^ 2 = 1 m − 1 − 1 ∑ i = 1 m e i 2
with m = the number of samples e i 2 = the residual for the error e i of the i t h sample
Vectorized Univariate OLS Linear Regression
There exist vectorized formulas for batch predictions:
h θ ( x ) = θ 0 [ 1 ⋮ 1 ] + θ 1 [ x 0 ⋮ x m ] + [ e 0 ⋮ e m ] Y = θ 0 1 m + θ 1 X + ε ε = | | Y − ( θ 0 1 n − θ 1 X ) | | l 2 2
with | | . | | l 2 = the euclidean norm Y = the ouput values X = the input values 1 m = an array filled with 1, of size m m = he number of samples
Although this approach is very simple and works very well with univariate linear regressions , this is not the case with the multivariate version .
Multivariate OLS Linear Regression
Given the hypothesis function of a Multivariate linear regression
with p predictors (or features) :
h θ ( x ) = θ 0 x 0 + θ 1 x 1 + … + θ p x p + ε One can compute the unbiased estimators θ ^ 0 … θ ^ p of θ 0 … θ p using the normal equation with:
θ ^ = ( X T X ) − 1 X T Y
with X = a matrix of i.i.d. training examples arranged as rows Y = a vector of all the expected output values
⚠️ X must be invertible .
And otherwise a pseudo inverse matrix can be used instead.
Then y for a single new input x can be estimated (predicted) with:
y ^ i = θ ^ 0 x i 0 + θ ^ 1 x i 1 + … + θ ^ p x i p
with x i = the input for which one wants the predict the output y θ ^ = a vector of all the estimators computed with the normal equation p = the number of predictors / features
Or a batch of predictions can be made using the vectorized version:
Y ^ = θ ^ ⋅ X [ y ^ 1 ⋮ y ^ m ] = [ θ ^ 1 ⋮ θ ^ m ] [ x 1 1 … x 1 p ⋮ ⋮ ⋮ x m 1 … x m p ]
with X = a matrix of i.i.d. features arranged as rows θ ^ = a vector of all the estimators computed with the normal equation p = the number of predictors / features m = the number of samples
And that's done!
And similarly to the univariate linear regression various metrics can be computed to evaluate the model (e.g. : statistical tests) .
The unexplained error of each training example:
e i = y i − y ^ i ⇐ e i 2 is called the residual error = Y − Y ^
with y i = the expected output of the i t h sample y ^ i = the predicted output of the i t h sample
The sum of residuals ε (which is the OLS cost function ) :
ε = ∑ i = 1 m e i 2 = ∑ i = 1 m ( y i − y ^ i ) 2 = ∑ i = 1 m ( y i − ( θ ^ 0 x 0 + θ ^ 1 x 1 + … + θ ^ p x p ) ) 2 = ∑ i = 1 m ( y i − ∑ j = 1 p ( θ ^ j x i j ) ) 2
with m = the number of samples p = the number of predictors / features e i 2 = the residual for the error e i of the i t h sample
The unbiased residual variance (also called unexplained variance or error variance ) :
σ ^ 2 = 1 m − p − 1 ∑ i = 1 m e i 2