There are at least two properties of the situation that need to be captured by a reasonable probability model. One, you have positive skew. In other words, there are lots of people at the lower end of the spectrum but very few at the higher end of the spectrum. (there are lots of athletes with moderate ability but very few with high ability). Two, the variance at the higher end of the spectrum is smaller. In other words, the variance in ability among the athletes with high ability is lower compared to the variance in ability among the athletes with moderate ability.
One could in general capture the above phenomena by a positive skew distribution (e.g., the pareto). But, such a distribution may not necessarily yield a lower variance at the higher end of the spectrum. A natural way to model such a phenomena is to assume that you have two types of populations: 'novices' and 'experts'. Then you can model the distribution of novices and experts separately with their own means and variances as follows. For the sake of illustration, I will assume normality but that is not needed.
Let:
$x$ be the variable of interest (e.g.. how high a person jumps)
We can assume that:
$x \sim N(\mu_n,\sigma^2_n)$ if the person is a novice,
$x \sim N(\mu_e,\sigma^2_e)$ if the person is an expert,
$\pi$ be the proportion of novices in the population
By definition:
$1-\pi$ will be the proportion of experts in the population
Apriori you do not know if any particular athelete is a novice or an expert. Thus, the observed variable $x$ (e.g., how high a person jumps) can be modeled as a mixture of the above two distributions:
$f(x|-) = \pi f_n(x|\mu_n,\sigma^2_n) + (1-\pi) f_e(x|\mu_e,\sigma^2_e)$
Since, a linear combination of normal distributions is a normal distribution, we have:
$x \sim N(\pi \mu_n + (1-\pi) \mu_e , \pi^2 \sigma^2_n +(1-\pi)^2 \sigma^2_e)$
You could then use maximum likelihood or the EM algorithm to estimate the parameters of interest. Suppose that your population is such that:
$\mu_e > \mu_n$ (i.e., experts are better than novices),
$\pi > 1-\pi$ (i.e., you have more novices thane experts) and
$\sigma_e < \sigma_n$ (i.e., experts have lower variance than novices)
In the above situation, given sufficient data your estimates will accurately reflect the above pattern.