Given $x_1,\ldots,x_n$, you can define a function $f(a) = \sum_{i=1}^n (x_i-a)^2.$ This function is defined for all real $a$, and in some sense it measures the centrality of $a$.
You can try some simple example on Wolfram alpha, say "plot((a-1)^2+(a-2)^2+(a-4)^2,a=0..5)". In this case the data points are $1,2,4$, and you get a nice, parabolic-like shape (this is no coincidence). The lowest point of the function is the minimum.
The function $f(a)$ is bounded from below (since $f(a) \geq 0$), and so it cannot happen that $f(a)$ can be arbitrarily small. Therefore there must be a value below which $f(a)$ cannot reach, and moreover an optimal one - this is known as the infimum. The infimum is a value $L$ such that $f(a) \geq L$ always, and moreover for each $l > L$, there is some value of $a$ such that $f(a) < l$; that expresses the fact that $L$ is optimal.
In general, it might be that $f(a)$ can get arbitrarily close to the infimum, but never reach it. In the case at hand, this doesn't happen, and the function $f(a)$ is actually minimized at some point, namely the average (or mean).
The equation you state can be used to justify the definition of average - the average is the value that minimizes the average squared distance from the datapoints. You could choose other criteria - for example, if you replace squared distance by absolute distance, then the optimal value of $a$ is the median.
The reason that people care about squared distance is that it's much easier to work with, and the resulting theory is very nice. For example, the central limit theorem states that in many cases, processes converge to a normal distribution whose parameters depend on the mean and variance of the original distribution - the variance is just $\min f(a)$ (normalized).