I have been working on a system to analyze horse racing data. Using experimentation with various inputs into machine leaning algorithms run in Weka I have found that the best result I have so far achieved is the intersection of the predictions from the Naive Bayes, J48 and Ridor algorithms.
I then run the predictions through a betting simulator. This best result achieves a hit rate of 75% with average odds of 1.26 (odds are decimal inclusive, equivalent to fractional 26/100, i.e. a 1 unit winning bet returns 1 + 0.26). This is obviously below the 1.33 required to break even for a hit rate of 75%, as this screenshot of my betting simulator demonstrates.
But what I have detected is that there are anomalies in the winning and losing streaks.
Winning Streak (loss occurred after wins)
0 151 22.01% 22.01%
1 129 18.80% 40.82%
2 119 17.35% 58.16%
3 65 9.48% 67.64%
4 61 8.89% 76.53%
5 35 5.10% 81.63%
6 27 3.94% 85.57%
7 34 4.96% 90.52%
8 17 2.48% 93.00%
9 15 2.19% 95.19%
10 9 1.31% 96.50%
11 9 1.31% 97.81%
12 2 0.29% 98.10%
13 4 0.58% 98.69%
14 2 0.29% 98.98%
15 2 0.29% 99.27%
16 2 0.29% 99.56%
18 1 0.15% 99.71%
20 2 0.29% 100.00%
686
Losing Streak (win occurred after losses)
0 1531 74.14% 74.14%
1 414 20.05% 94.19%
2 94 4.55% 98.74%
3 21 1.02% 99.76%
4 5 0.24% 100.00%
2065
2751
In other words, a win is 74.14% likely to follow a win, 94.19% likely to follow a single loss, 98.74% likely to follow two losses in a row; A loss is 22.01% likely to follow a loss, 40.82% likely to follow a single win, etc. (assuming I haven't blundered).
Can anyone suggest a betting strategy that can take advantage of these streaks?
Edit: After creating the question it seemed an obvious answer to only place a bet after a loss which would give at least 94.14% chance of a win and assuming average odds of 1.26 would result in long term wins. I modified the bet simulator to only place a bet following a loss but only achieved a 78% hit rate, so there is something fundamentally flawed with my thinking.

