From Wikipedia
In probability theory, the total variation distance between two probability measures $P$ and $Q$ on a sigma-algebra $F$ is $ \sup\left\{\,\left|P(A)-Q(A)\right| : A\in F\,\right\}. $ Informally, this is the largest possible difference between the probabilities that the two probability distributions can assign to the same event.
For a finite alphabet we can write $ \delta(P,Q) = \frac 1 2 \sum_x \left| P(x) - Q(x) \right|\;. $ Sometimes the statistical distance between two probability distributions is also defined without the division by two.
I was wondering if there is some particular consideration when having that $\frac 1 2$ for the finite case, while not in the general case? My understanding of this total variation distance/metric is that it is induced from upper variation of the whole set(which is a norm if I am correct). From there, I can't see the need of dividing by 2.
Also in the finite case, why not define similarly in terms of $\sup$ over $A \in F$?
Thanks and regards!