The key section of the Wikipedia article says
The median-calculating recursive call does not exceed worst-case linear behavior because the list of medians is 20% of the size of the list, while the other recursive call recurse on at most 70% of the list, making the running time $T(n) \leq T(n/5) + T(7 \cdot n/10) + O(n).$
The O($n$) is for the partitioning work (we visited each element a constant number of times, in order to form them into $n/5$ groups and take each median in O($1$) time). From this, one can then show that $T(n) \leq c \cdot n \cdot (1 + (9/10) + (9/10)^2 + \cdots) \in O(n).$
using the fact that at most 70% of the list is to one side of the median of the medians with groups of five.
If instead you had groups of three the first inequality would be $T(n) \leq T(n/3) + T(2 \cdot n/3) + O(n)$ so you would not get a convergent series in in the second inequality.
There is no reason why you should not use something greater than five; for example with seven the first inequality would be $T(n) \leq T(n/7) + T(5 \cdot n/7) + O(n)$ which also works, but five is the smallest odd number (useful for medians) which works.