#loss_function #cost_function


J(θ) = MSE = E [ (YY^)2 ]

with MSE = the Mean Squared Error (MSE)
               Y = the ground truth output values for the training examples
               Y^ = the predicted ouput values for the training examples
           E[z] = the mean estimator: X¯=1mi=1mXi


On contrary to the Mean Squared Error (MSE), the expression of the RMSE doesn't need to be divided by 2, because the square root already eases the descent.

J(θ) = 1mi=1m(yi  y^i)2 = 1mi=1m(yi  hθ(xi))2

with m = the number of training examplesxi = the input (feature) of the ith training exampleyi = the ground truth output of the ith training examplehθ(x) or y^i = the predicted ouput of the ith training example


J(θ) = 1m( Xθ  y )T ( Xθ  y )

with X = a matrix of the training examples arranged as rows of Xy = a vector of all the expected output values