1
$\begingroup$

I'm given a nucleotide counts $ \{ n_X : X \in \Omega \} $ to model a DNA sequence $y$ where $ \Omega $ denotes the DNA alphabet with an independent model. Also we suppose that the prior density of nucleotide probabilities $ p = \{ p_X : X \in \Omega \}$ is given by a Dirichlet distribution with parameters $ \{ \beta_X : X \in \Omega \}$ and we set $n = \sum_{ \in \Omega} n_X$ and $\beta=\sum_{X\in\Omega} \beta_X$.
I derived the likelihood $f(y|p)$ to be a multinomial distribution with parameters $n$ and $p$ and then we have that $ \displaystyle f(y|p) = \frac{n!}{\prod_{X \in \Omega} n_X!} \prod_{X \in \Omega} p_X^{n_X}$ and then given that the prior has a Dirichlet distribution we also have that $\displaystyle f(p)=\frac{\Gamma(\beta)}{\prod_{X \in \Omega} \Gamma(\beta_X)} \prod_{X \in \Omega} p_X^{\beta_X-1}$ and I derived the joint density to be equal to $ \displaystyle f(y,p)= \frac{\Gamma(\beta)}{\prod_{X \in \Omega} \Gamma(\beta_X)} \frac{n!}{\prod_{X \in \Omega} n_X!}\prod_{X \in \Omega} p_X^{n_X+\beta_X-1} $

Then using the fact that on $ \Delta = \{ p_X > 0 : \sum_{ X \in \Omega} p_X = 1 \} $ we have $ \displaystyle \int_\Delta \prod_{X \in \Omega} p_X^{n_X+\beta_X-1} dp = \frac{ \prod_{X \in \Omega} \Gamma(n_X+\beta_X)}{\Gamma(n+\beta)}$ then I computed the marginal density $\displaystyle f(y) = \int_\Delta f(y,p) dp = \frac{n! \Gamma(\beta) }{\Gamma(n+\beta)} \prod_{X \in \Omega} \frac{\Gamma(n_X+\beta_X)}{\Gamma(\beta_X)n_X!} = \frac{1}{ {n+\beta-1 \choose n} } \prod_{X \in \Omega} { {n_X+\beta_X-1 \choose n_X} } $
(which seems to be the pmf of a Dirichlet-multinomial distribution if we check on wikipedia)
However in this exercise it seemed the answer should be $\displaystyle f(y)=\frac{1}{ {n+\beta \choose n} } \prod_{X \in \Omega} { {n_X+\beta_X \choose n_X} }$

Where is my mistake ?

Thanks in advance!

  • 0
    What I found is actually $ { n+\beta - 1 \choose n } $ instead of $ { n+\beta \choose n } $ ( and same for the other part ) ....2017-02-23
  • 0
    It's a bit weird to use factorial symbol for $n+\beta$ that might not be an integer, so maybe they assumed that we could write $\Gamma(n+b)=(n+b)!$ or $ \Gamma(b) = b! $ and in this case we have "their" result... because I really don't see where could be my mistake since the Dirichlet PDF is taken from wikipedia. Unless the likelihood is not a multinomial distribution ?2017-02-23
  • 0
    Maybe I missing something trivial, but why in every formula for $f(y...)$ there is no $y$ in the right-hand side?2017-02-24
  • 0
    Because $ f(y) = f( n_1, ... , n_X , ... ) $ , meaning the DNA sequence is represented by its nucleotide counts $ \{ n_X : x \in \Omega \}$.2017-02-25
  • 0
    . More precisely if $N_X$ is the random variable representing the number of nucleotide of type $ X \in \Omega $ then $f(y|p) = P( N_1 = n_1 , N_2 = n_2 , \dots , N_X = n_X , \dots ) = \frac{n!}{\prod_{x \in \Omega } n_X} \prod_{ x\in \Omega} p_x^{n_X}$2017-02-25

0 Answers 0