Going though some papers in clustering in machine learning, I often find the following claim:
$(a-b)(a-b) = \|a\|^2 + \|b\|^2 - 2a \cdot b$
My question is: When does this hold? (i.e. what type of norms or inner products satisfy this)? Is this only true for the $L_2$ inner product and its induced norm?
Also, would you call the above an example of distributivity in inner products? Does distributivity hold for inner products?