For optimization of Find operations (an operation that returns the representative of the set a node belongs to) in a disjoint-set structure, path compression is used. Analysis of the operation approximately yields amortized complexity $\mathcal{O}\left(1\right)$.
Wikipedia and a variety of other sources provide brief explanations - most identify the relation between path compression and the inverse Ackermann function, however do not provide a compendious description of the reasoning behind the relation.
Why is the approximated amortized complexity of path compression in Find operations $\mathcal{O}\left(1\right)$?