There's a way to approximate the answer using information theory. Let $X$ be the variable representing the side, and let $D$ be an outcome of a roll. Each roll tells you that much information about $X$: $ I(X;D) = H(D) - H(D|X) = \log 96 + 95 \cdot 0.01 \log 0.01 + 0.05 \log 0.05. $ As an approximation, the information gained from different rolls is additive; it's certainly at most additive. Since $H(X) = \log 96$ (since you have no a-priori belief about $X$), the number of rolls needed is at least $ \frac{H(X)}{I(X;D)} \approx 115. $
Another way to approach the same question is as follows. Suppose the die was thrown $N$ times. The good side came up approximately $N/20$ times, and all the others came up $\mathrm{Bin}(N,0.01) \approx \mathcal{N}(0.01N,0.0099N)$. So we want the probability that such a normal variable be larger than $N/20$ to be smaller than $1/95$ (since there are $95$ such variables; we're ignoring their dependence). This is $\frac{0.05N - 0.01N}{\sqrt{0.0099N}} \approx 0.4\sqrt{N}$ standard deviations. Looking at a table, we need roughly $2.3$ standard deviations to get a probability of $1/95$, so we need $N \approx (2.3/0.4)^2 = 33. $ Of course, this estimate is pretty rough (it doesn't even match our previous supposed lower bound).
The estimate in the previous paragraph is particularly problematic since in this range the convergence to the normal distribution is pretty bad (the expectation is very small, roughly $1$). In this regime, the probability is actually roughly Poissonian. Let's estimate the probability of success when $N=100$ using this approximation. The correct side is going to show up roughly $5$ times. Each other side is going to show up $P(1)$ times, so it's going to be as large as $4$ with probability $ 1 - e^{-1} \sum_{t=0}^4 \frac{1^t}{t!} \approx 0.00366. $ So the expected number of "contenders" is $ 0.00366 \cdot 95 \approx 0.35.$ Assuming independence, the probability of no contenders is roughly $ (1 - 0.00366)^{95} \approx 0.706. $ This approximation is only roughly vindicated by the experiments below.
In order to find a better approximation, let's take into account the roughly $P(5)$ distribution of the histogram of $X$: $ e^{-5} \sum_{H=0}^\infty \frac{5^H}{H!} \left(e^{-1} \sum_{t=0}^{H-1} \frac{1}{t!} \right)^{95} \approx 0.527. $ This conforms much better to the results below.
Finally, here are experimental results from one million trials, along with the Poisson approximation:
$ \begin{array}{r|c|c} N & \text{Experiment} & \text{Approx} \\\hline 50 & 0.272323 & 0.274900 \\ 100 & 0.526765 & 0.527424 \\ 150 & 0.718004 & 0.714956 \\ 200 & 0.840143 & 0.836808 \\ 250 & 0.912917 & 0.910115 \\ 300 & 0.954097 & 0.951953 \end{array}$
The results support the information-theoretic bound, as well as our estimates obtained using the Poisson approximation.
Note: When the success probability is $p$ and there are $T$ trials, the standard deviation is $\sqrt{\frac{p(1-p)}{T}} \leq \frac{1}{2\sqrt{T}}.$ In our case, $T = 10^6$ and so a standard deviation is at most $0.5\cdot 10^{-3}$. With probability $95\%$, the error is at most two standard deviations, i.e. $\pm 0.001$. So the experimental results are correct up to roughly the third digit (which may be slightly off itself).