5
$\begingroup$

I have a quite small data set (on the order of 8-20) from an essentially unknown system and would like to predict a value that will be higher than the next number generated by the same system 90% of the time. Both underestimation and overestimation are problematic.

What is the mathematically "correct" way to do this?

If I could also generate a level-of-confidence estimate, it would wow my manager. Also, let me say I'm not a math major, so thanks for any help, however remedial it may be :)

  • 0
    you don't have any information about the random numbers? Such as for example their distribution? Are all generated numbers equally likely? What is the range o the numbers?2011-02-14
  • 0
    @Matt: the number are not random, but the system is unknown. Values can range from 60 to 200M. In some sets, the system is very regular (350-400 across the board), and in others it goes wild (300 - 12M with no discernible distribution).2011-02-14
  • 0
    The Wikipedia article on this topic is worthless, which is surprising because nonparametric prediction intervals have seen wide use in environmental monitoring during the last 20 years. See http://info.ngwa.org/gwol/pdf/912554528.PDF , which includes a sketch of the theory.2011-02-14

4 Answers 4