Problem
I'd appreciate some ideas on how to define a formula to estimate the value of a future data point for a continuously sampled event, based on past measurements and their tendency.
At any given time, I have exactly 15 past measurements of the event.
Let's assume that what I'm trying to predict is the free throw(FT) accuracy(%) of a basketball player on Game 18.
My 15 past measurements are his FT% on Games 3, 4, 5, (...), 16 and 17.
Approach
I'm able to specify three concepts that my formula should account for:
1. Consistency
On game 18, a performer that has been scoring consistently close to his average accuracy (low st. deviation) is likely to do it again, much more so than a player with the exact same average but much higher deviation.
2. Tendency of accuracy score
E.g.: Let's assume that a basketball performer had a FT% of 80 on games 3 to 13.
Games 14 to 17 were a disaster, with his FT% dropping to values around 50.
For Game 18, altough his average FT% is 72, data indicates a recent drop in form, and the player is likely to score way below his average %.
I'd say 60 would be an acceptable estimation in this scenario.
Giving more weight to recent events should be enough to add this notion to the formula, I think.
3. Tendency of deviation
A player that used to be consistent but has recently shown high deviation is likely to score again far from the average - more likely than the simple deviation average (as in 1) ) seems to indicate.
(e.g., a player with a FT% average of 60 that has never scored off the 50-70 range until game 11 - recent history (games 12 to 17) being crazy up-and-down, with some values in the 90s, as well as in the 30s).
I suppose that this kind of formula will be heavily based on deviation values, adequately weighed for adding importance to the most recent data points, maybe applying some offset based on positive/negative tendencies.
Should probability also play a part here?
Remarks
I strongly feel that there must be a standard/known method to perform this kind of analysis, hence this question before trying to craft some intricate approach.
Standard or not, it should encompass the three concepts that I've mentioned above - I'm open to suggestions for additional metrics that could help the estimation!
An important clarification: a formula that outputs an estimated range (confidence region) instead of a single value is also acceptable. For my particular application I'll end up computing a single value from that range, but that's irrelevant to the issue in question.
Thanks!
