The most authors developed Bayesian inference by considering gamma distribution as prior distribution. What are the practical motivations behind using the gamma prior for Bayesian estimation apart from having computational ease and nice theoretical properties?
Why choose Gamma distribution as prior?
3 Answers
Your claim is incorrect. The gamma distribution is not always a suitable prior for a given Bayesian model of the data distribution.
If the data is (univariate) normally distributed, a suitable prior distribution for the mean would also be normal.
If the data is binomial or Bernoulli distributed, a suitable prior for the success probability parameter would be beta distributed.
So right there, I have mentioned two examples of commonly used parametric models for which the "suitable" prior is not gamma distributed. I have no idea why you think "most authors developed Bayesian inference by considering gamma distributions as priors." This statement makes no sense.
This raises a related question: what do I mean by "suitable?" Well, clearly it would not make sense to model a binomial probability of success as a random variable whose support is not on $(0,1)$. It makes no sense, for example, to model such a probability as being normally distributed when there is a strictly positive probability of observing a value that is negative, or greater than 1.
But there is more to the notion of "suitability" of prior distributions than considerations of support. The property that is frequently desired is the notion of conjugacy; i.e., the posterior parametric distribution belongs in the same family as the prior distribution.
So for example, with a normal likelihood and a normal prior, the posterior mean will also be normally distributed. For a Bernoulli or binomial likelihood and a beta prior, the posterior proportion will also be beta distributed.
For a Poisson likelihood and a gamma prior, the posterior Poisson rate parameter will be gamma distributed.
The practical motivation for desiring a conjugate prior is obvious: when the prior is conjugate, the posterior distribution, belonging to the same parametric family, facilitates the updating of one's posterior belief with the receipt of new data. If your posterior does not belong to the same family as the prior, then it is potentially difficult to update your belief about the distribution of the parameter of interest.
The normal distribution can be a standard choice for data on $-\infty \to \infty$ because that's the domain of that distribution, and the beta distribution can be standard for data varying on $0 \to 1$ because that's the domain of that distribution. Similarly the gamma distribution can be a standard choice for non-negative continuous data i.e. $0 \to \infty$ because that's the domain of the gamma distribution. It may thus often be used as a prior for the precision $\tau = \frac{1}{\sigma^2}$ of a normal distribution.
I believe the main motivation for the gamma prior is usually to constrain the random variables to positive values. This is especially useful when dealing with random variables that represents the variance of another distribution.