I'm pretty sure Scipy's algorithm does the following:
- Uniformly scales your sample angles into the range $[0, 2\pi]$
- Converts each sample angle into a "point" on the unit circle
- Finds the mean of those points
- Computes the norm of that mean point, call it r
- Computes $\sqrt{\log(1 / r^2)}$
- Scales this value back to the original range of the samples
All the steps seem reasonable except maybe #5. Where does that formula come from? Well first we need to ask the question, where does this formula come from?
$$\hat\sigma = \sqrt{\frac{1}{n-1}\sum_{i=1}^n(x-\hat\mu)^2},\ \ \ \ \ \hat\mu=\frac{1}{n}\sum_{i=1}^nx_i$$
I write $\hat\sigma$ and $\hat\mu$ with hats because they are really estimators of the true population values of standard deviation and mean. I'll let Wikipedia do the variance proof for you, i.e. prove that $\mathbf{E}(\hat\sigma^2)=\sigma^2$.
Their results make use of some operators that we take for granted when our samples are from $\mathbb{R}$, like addition, subtraction, and a commutative multiplication. You have random angles though, which live on $\mathbb{SO2}$ and so we cannot be so care-free.
Rather than trying to guess an estimator of variance for random variables on $\mathbb{SO2}$, we can take an alternative perspective. The normal distribution on $\mathbb{R}$ is special for many reasons, but one is that its variance is exactly one of the two parameters that defines its probability density function (pdf),
$$\rho(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(x-\mu)^2}{2\sigma^2}}$$
We can "wrap" this pdf around the unit circle as described here by stacking up the density at $x$ with the density at $x+2\pi k,\ \ \forall k \in \mathbb{N}$. Once we do this, we must limit our sample-space to $[0, 2\pi]$ (or any interval of length $2\pi$) or else we won't be satisfying the definition of a pdf: that its integral over the whole sample-space is 1. Make sure you shift your samples into this range before you use a wrapped pdf!
Doing this results in the wrapped normal distribution; call its probability measure $P_o$ and its density $\rho_o$. Perhaps its variance can give us some insight to what a good "circular variance" estimator would look like. To compute the variance, we can't make use of the expectation over $\mathbb{R}$ like we usually do,
$$\mathbf{E}(X) \neq \int_{0}^{\infty}x\rho_o(x)dx$$
for the reason I stated a moment ago. We'll take our samples as the unit phasors (unit-magnitude complex numbers), a valid representation of $\mathbb{SO2}$ complete over the real domain $[0, 2\pi]$.
$$\mathbf{E}(X) = \int_{\mathbb{SO2}}x\ dP_o = \int_{0}^{2\pi}e^{j\theta}\rho_o(\theta)d\theta$$
This is what is done here where they are able to compute the nth moment and then yank out an expression for $\sigma$ from it, that looks much like the scipy equation. In my description of the scipy algorithm, I put the word "point" in quotes because they are really treating your samples as unit phasors, not vectors. In the next section of the Wikipedia article, they find an (albeit biased) estimator of this $\sigma$ which is exactly what scipy computes.
I would say that this method is "correct" in that it isn't ad-hoc. If you find that your estimator makes more sense for your application, perhaps it is really what you want, but I would consider the scipy function to be the "right" way to measure spread of angles.