The MSE expression is often divided by 2 to make derivative calculations simpler and hence speed-up the gradient descent.
The factor isn't required, but it turns the cost function to a good approximation of the "generalization error" for a randomly chosen new sample (not in the TRAINING SET).
Adding this factor or not doesn't affect the final result at all since the minimization / optimization process is unaffected by constants.