I've read the above in a book called "Translation equivalence in free groups" by Ilya Kapovich, Gilbert levitt, Paul Schupp, and Vladimir Shpirain.
Why does the infimum can be replaced by a minimum? How can I actually show that?
Here is a proof basically from Culler and Morgan's paper Group actions on $\mathbb{R}$-trees.
Say $g$ fixes no element of $X$, and consider the arc $[x,gx]$, with midpoint $m$, and the arcs $g[x,gx], g^{-1}[x,gx]$. Note the translated arcs do not contain $m$ since then $m$ would be the midpoints of those arcs, so $g$ would fix $m$.
Now consider $a=g[x,gx] \cap [x,gx]$ and $b=g^{-1}[x,gx] \cap [x,gx]$, and let $c$ be an arc joining $a$ and $b$, and note $c$ touches both $a$ and $b$ at exactly one point. Let $A=\bigcup \{ g^n c \mid n \in \mathbb{Z} \}$, which is isometric to $\mathbb{R}$, and is an invariant set under the action of $g$. Any $x$ is translated at least the length of $c$, since if we draw a geodesic arc from $x$ to $A$, and translate by $g$, we get a disjoint arc joining $gx$ to $A$, separated by some $g^nc$.
So $\ell_X(g)=$"length of $c$", which is the minimal translation length.
If $g$ fixes a point then it is obvious the minimal is $0$, which will be the translation length.
(For the above it is helpful to run through the proof using an example in $F_2$ acting on the standard tree.)