2
$\begingroup$

(first-time post alert)

Hello All,

I have an empirical binary sequence (i.e. 0/1) of observations taken for equal discrete time intervals. (A typical length of such a sequence would be about 120). The probability for a "1" may decrease over time.

Now I'd like to code a procedure which generates "similar" sequences, so that I can obtain longer series with "essentially" the same behaviour; and in order to do that I'd need parameters describing the sequences. So the goal here is "merely" description, not "real statistics" in the sense of trying to estimate any "true" parameters from those sample sequences.

In which general direction should I be looking? Moving averages? Autocorrelation? Markov chains? (Probably not the latter: it may well be that event(i) "looks back further" than event(i-1) ) Something completely different?

I'm aware it's a rather broad (and probably ill-defined :-) question, but I am not expecting detailed analyses or guidelines; I'm asking "just" for some appropriate pointers or keywords to get me started, so that I do not take off into a completely wrong direction.

Essentially, I am trying to find the right terms for the search engines, and then will take it from there... :-)

Thanks for your help!

  • 0
    There is an own stackexchange page just for data analysis: http://stats.stackexchange.com/2011-12-05
  • 0
    I see you didn't like my edit (!?). Note that it is common here not to include a greeting or other uneccessary information in the questions, nevertheless welcome on math.stackexchange :)2011-12-05
  • 0
    Without knowing _what_ behavior you consider to be "essentially the same", it is hard to know what to suggest. Different methods try to simulate different aspects. For example, if you want roughly equal numbers of $0$ and $1$ bits, you could use a PN sequence (a.k.a. pseudorandom or $m$-sequence) or you could just use a "divide-by-2" circuit on your clock to generate alternating $0$s and $1$s. Both replicate the "essential" roughly equal numbers of $0$s and $1$s. Which one is more suitable to your application?2011-12-05
  • 0
    @Listing #1: Oh I see, thanks, very good tip. But can I now re-post the question there, or would that be considered bad manners?(Like I said: newbie...)2011-12-05
  • 0
    @Listing #2: "I see you didn't like my edit (!?)" Some crossed wires here: My own edit was just adding the last sentence; I did in fact not notice yours. I did notice the tags had changed, and found that an improvement. Nothing else intentionally rejected.2011-12-05
  • 0
    @clüles you can ask a moderator to migrate your question, I think there was a confusion with the edit because we edited it at the same time so some changed I did were not applied, but its no problem.2011-12-05
  • 0
    @Dilip Sarwate: "behaviour": things like e.g. probabilities (over the whole sequence or parts thereof), change of those probabilities over time, number / length of "runs", etc. Finding out what "essentially the same" actually means is just the question. The practical goal is to generate a sequence which in its characteristics is "indistinguishable" from the observed ones. I do realize these questions may well be un-answerable. (It's not about _checking data against_ a hypothesis, it's about _finding_ one.2011-12-05

2 Answers 2

1

For some arbitrarily chosen (but not too big) $m$, consider the sequence as an order-$m$ Markov chain. That is, you look at the conditional probabilities for $a_t$ given $a_{t-1}, a_{t-2}, \ldots, a_{t-m}$.

  • 0
    Thanks, this sounds useful. I didn't realize (or rather, had forgotten) that there are M-chains which actually "look back" _more_ than just 1 step.2011-12-05
0

Example program to simulate flip of a biased coin in python. You can design a function that decreases probability of getting one with each toss.

import random  def flip(p):         return 1 if random.random() < p else 0  p = 0.5 # 0 and 1 are equally likely   for i in range(120):         p = p - 0.5/120  # example function that decreases probability of getting 1          print flip(p), 
  • 0
    Thanks, instructive example. But it would take care only of the changing _probability_, but not necessarily of other features such as run length which might also be needed to "reproduce" the observed sequence.2011-12-05
  • 0
    It will. Just change 120 to some other length.2011-12-05