The key section of the Wikipedia article says
The median-calculating recursive call does not exceed worst-case
linear behavior because the list of medians is 20% of the size of the
list, while the other recursive call recurse on at most 70% of the
list, making the running time $$T(n) \leq T(n/5) + T(7 \cdot n/10) + O(n).$$
The O($n$) is for the partitioning work (we visited each element a
constant number of times, in order to form them into $n/5$ groups and
take each median in O($1$) time). From this, one can then show that
$$T(n) \leq c \cdot n \cdot (1 + (9/10) + (9/10)^2 + \cdots) \in O(n).$$
using the fact that at most 70% of the list is to one side of the median of the medians with groups of five.
If instead you had groups of three the first inequality would be $$T(n) \leq T(n/3) + T(2 \cdot n/3) + O(n)$$ so you would not get a convergent series in in the second inequality.
There is no reason why you should not use something greater than five; for example with seven the first inequality would be $$T(n) \leq T(n/7) + T(5 \cdot n/7) + O(n)$$ which also works, but five is the smallest odd number (useful for medians) which works.