My view is a probability is anything that follows the probability axioms. If the theorems of probability apply to it, it's a probability.
There's two main interpretations of probability: the Bayesian and frequency interpretations. Under the Bayesian interpretation, a probability represents a degree of confidence. Under the frequency interpretation, a probability represents the frequency of an event in an infinite number of trials. (or if you prefer to avoid talking about infinities, what the frequency approaches as you do more trials.)
There used to be fierce debate in statistics over which interpretation was appropriate for statistical analysis. For example if you read the introduction to Feller's 1950 probability textbook, he will carefully say that probability theory is for statistical probability, which he defines according to the frequency interpretation, and if you want to learn about degrees of confidence you should go get a textbook on inductive logic. But the 1939 textbook by Jeffreys takes the Bayesian degree of confidence interpretation. As an example of the debate you could read Confidence Intervals vs Bayesian Intervals, a talk by the Bayesian ET Jaynes, with responses by frequentist statisticians, and counter-responses by Jaynes. The tone is often unfriendly.
However I think by now statisticians are more or less over the debate and use both interpretations fluidly. Which makes sense because, to me, it's a non-issue: frequencies obviously obey the probability laws, and degrees of confidence do under certain conditions, so they are both equally valid interpretations.
There's a great article on interpretations of probability in the Stanford Encyclopedia of Philosophy.
So, back to your scenario. There are actually many interpretations of the probability you could take. For example, you could consider the Bayesian degree of confidence that a speaker you meet on the street will pronounce the R the first time you hear it from them. I'm going to focus, with this abundance of choices, on this interpretation: the fraction of tokens for which a particular individual (say, the next speaker you meet) pronounces the R's. I'll explain in a bit why I think I can pick an interpretation arbitrarily like this.
This is a number that has a certain theoretical existence; you can't really ask them to pronounce all tokens. So, there's counterfactually what they would say if they were asked to. Let's call this number $p_R$.
So, what are you doing with your corpus? You say you are "converting" to a probability, but I think the more appropriate term is "estimating." "Estimation" is the standard term, in statistics, for guessing an unknown number, based on some random data which has some connection to that number. So, what you are doing is estimating $p_R$ based on the frequencies within the corpus. This will be a good estimate if the corpus is from people similar to the next speaker you meet. But if there's a lot of variation--the speakers in the corpus are from a variety of times, or a variety of places, making them unlike that speaker--it will be less of a good estimate.
This is why I think the choice of interpretation is a bit arbitrary. Because, the frequency in the corpus is a good estimate of more than one thing. It's a good estimate of $p_R$, but it's also a good estimate of, for example, the frequency in a second, unobserved corpus. Or the frequency if you could hear all conversations simultaneously right now. Etc. Once you let go of the idea that you are doing a strictly logically justified conversion, it could be an estimate for many things.
However, you may notice that I'm dismissing the issue of variation within the corpus to some realm outside of probability theory. I'm saying, if there's a lot of temporal variation, it's a bad estimate for the language as currently spoken, but I'm saying it as some qualitative thing, not subject to calculation. But actually there are mathematical tools within probability theory for dealing with this kind of thing.
Suppose that the language changes every year. Let's consider an example of some probability, such as the probability of pronouncing an R, an event we'll name $R$.
Let $P(R|Y=y)$ be the probability of pronouncing the R in year $y$. This is a conditional probability.
Then, suppose you have a corpus, which is from a distribution of years. Let $P(Y=y)$ be the probability of a conversation from year $y$ ending up in your corpus.
Now, let's define the marginal probability $P(R)$ to be the probability of a random pronunciation that makes it into your corpus having the R pronounced.
This marginal probability is related to the conditional probability by the law of total probability:
$$P(R) = \sum_y P(R|Y=y)P(Y=y)$$
The frequency you observe will likely be close to $P(R)$.
You can view this as a weighted average. (Consider that if each of the last ten years has equal probability of inclusion, then $P(Y=y)=1/10$, there are ten conditional probabilities, and $P(R)$ is the average of them.) If what you're really interested in is $P(R|Y=2016)$, then $P(R)$ will only be close if $P(Y=y)$ is high for nearby years, since those will have similar conditional probabilities.
One keyword of interest might be hierarchical model: this is a simple hierarchical model in which there's a distribution of years, and then under that in the hierarchy a distribution of whether you pronounce R.
So, you asked if this conversion from a frequency to a probability assumes that you have a static system. My answer is: if what you are interested in is the language as spoken in the present moment, then a corpus of more recent speech provides a better estimate than a corpus going back several decades. As I type that out, I feel a bit stupid for saying something so obvious sounding. To go back to being philosophical and fancy, I'd say, I think you should reframe it from what do we have to assume in order for this conversion to be logically justified, to what are the conditions under which we get a good or bad estimate of what we're really interested in.