1
$\begingroup$

In the following data, I am trying to run a simple markov model.

block_M1    block_M2    hybrid_block    block_S1    block_S2
A|T        T|A         A|C        C|G        T|A
T|G        C|T         T|A        A|T        C|A
C|A        A|G         C|G        G|A        C|G
G|T        A|T         G|T        C|T        A|T

So, block M contains several strings that belongs to one category, where strings in block M1 are ATCG and TCAA, M2 has strings TCAA and GAGT. Similarly, block S also has 4 total strings.

Hybrid blocks have two strings (ATCG and CAGT) where one of the strings is inherited from block M and another strings is from block S.

I am trying to build a markov model which can help me identify which string in hybrid block came from which blocks. In this example I can tell that in hybrid block ATCG came from block M and CAGT came from block S.

I followed the example in this link: http://web.stanford.edu/class/stats366/exs/HMM1.html

Unlike in CpG island problem which compute transition probabilites from the nucleotide 's' to nucleotide 't' along the length of the given sequence using Markov model; our model will need to compute the transition probability from nucelotide 's' (in previous position) to nucelotide 't' in next position from both the block (M and S) using Markov Model.

So, a transition probability for using markov model for block M can be written as:

$a_{s,t}^{m} = \frac{{}c_{s,t}^{m}}{\sum_{k}c_{s,k}^m}$

Similar markov model can be prepared for S-block.

But, unlike in CpG island problem I have to model the markov process in a way that the probability of which string from hybrid block belongs to which main block needs summing the probability across all the observation. Something like p(A|C) p(G|A) p(C|G) p(C|C) for both M block and S block.

Any suggestion on how can I take this further?

Also, is it better to model start state as p(A|A) or rather model the end state as p(C|C).

Please let me know if the problem isn't clear.

Additionally, I want to write a program to model this out. What approach/module should I look into?

Thanks,

  • 0
    This may be better received on the stats or programming sites, but I am not sure. Just an FYI.2017-01-29
  • 0
    I want to focus on programming later after the model is complete. I though `StackExchange Maths` was more oriented towards this problem. But, I will check into `crossvalidated` now. Any other forum suggestions?2017-01-29
  • 0
    I mean, it is fine here in my opinion, but I'm still getting the hang of what exactly is and is not off-topic. No worries about posting here, just a heads-up.2017-01-29
  • 0
    ok thanks. I am hoping someone is open to discussing this problem. Stackoverflow is quite active, but How active is this forum?2017-01-29
  • 0
    Similarly active.2017-01-29

0 Answers 0