I know I can use the definitions here to calculate the variance of $S$ and then take the square root of it.
Not sure I understand what you mean by that... but here we go.
Let $i=6$. For every $n\geqslant1$, call $X_n$ the result of the $n$th throw, uniformly distributed on $\{1,2,\ldots,i\}$, and $A_n$ the event that $X_k\ne i$ for every $1\leqslant k\leqslant n-1$. Then $ S=\sum\limits_{n=1}^{+\infty}X_n\cdot[A_n]. $ For every $n$, $\mathrm E(X_n)=x$ with $x=\frac12(i+1)$ and $\mathrm P(A_n)=a^{n-1}$ with $a=\mathrm P(X_1\ne i)$ hence $a=(i-1)/i$ and $ \mathrm E(S)=\sum\limits_{n=1}^{+\infty}x\cdot a^{n-1}=x\cdot(1-a)^{-1}=\tfrac12i(i+1). $ Likewise $\mathrm E(S^2)=u+2v$ with $ u=\sum\limits_{n=1}^{+\infty}\mathrm E(X_n^2\cdot[A_n]), \qquad v=\sum\limits_{n=1}^{+\infty}\sum\limits_{k=n+1}^{+\infty}\mathrm E(X_nX_k\cdot[A_k]). $ For every $n$, $\mathrm E(X_n^2)=y$ with $y=\mathrm E(X_1^2)=\frac16(i+1)(2i+1)$, and $X_n$ and $A_n$ are independent, hence $ u=\sum\limits_{n=1}^{+\infty}y\cdot a^{n-1}=y\cdot(1-a)^{-1}=yi. $ Likewise, for every $k\gt n$, $X_k$ is independent on $X_n\cdot[A_k]$ and $ \mathrm E(X_n\mid A_k)=\mathrm E(X_n\mid X_n\ne i)=z, $ with $z=\mathrm E(X_1\mid X_1\ne i)=\frac12i$, hence $ v=\sum\limits_{n=1}^{+\infty}\sum\limits_{k=n+1}^{+\infty}xz\cdot a^{k-1}=\sum\limits_{n=1}^{+\infty}xz\cdot a^{n}\cdot(1-a)^{-1}=xz\cdot a\cdot(1-a)^{-2}=xzi(i-1). $ Finally, $ \mbox{Var}(S)=u+2v-\mathrm E(S)^2=yi+2xzi(i-1)-x^2i^2=\tfrac1{12}i(i+1)(i-1)(3i-2). $ For $i=6$, $\mathrm E(S)=21$ and $\mbox{Var}(S)=280$.
Edit One sees that $\mathrm E(S)=\mathrm E(X_1)\mathrm E(N)$ where $N$ is the time of the first occurrence of $i$. This is Wald's formula. According to this WP page, the formula for the variance is known as Blackwell–Girshick equation. Proceeding as above, one gets $ \mathrm{Var}(S)=\mathrm{Var}(X_1)\cdot\mathrm E(N)+\mathrm E(X_1)^2\cdot\mathrm E(N). $