Consider
and look for
where the gradient of the function vanishes.
Let
be
the set of vectors in the local neighbourhood of
. For a minima,
we want
The second term in the above formula converges to zero (as shown below):
Therefore, the gradient is zero at the solution of:
or
The above system of n linear equations in n unknowns are
known as the normal equations. The matrix is also known as the pseudo-inverse of
. The minimum
of the least square system corresponds to
.
NOTE: is s.p.d.
we can use Cholesky
decomposition (
). The overall cost would
be
Why is the minimum ?
So clearly obtains a minima.