My question is related to calssification in DataMining. But I believe anyone who has good background in Math/statistics can answer it.
As you remember a discrete-value training data of one dimension(e.g. sampling and quantizing a time signal) can be the
sequence [2,3,2,2] but a set should have unique values so the training set should be of the form
{2,3}.
I have the following table as my data set (the famous play golf example). this table stems from several realizations of a stochastic process:
outlook Temp. Humidity Windy Play Golf
Rainy Hot High False No Rainy Hot Low True No Overcast Hot High False yes Sunny cold High False Yes ? ? ? ? ?
As we see each feature(attribute) is a discrete variable. outlook can have 3 values. Temp.,Humidity and Windy can just have 2 values. As i said this table comes from several realizations. so i know that by doing many other experiments (realizations) all missing rows of this table can be completed (all combinations: 3*2*2*2=24).
This table is a representation of a Model(function or mapping) from these 4 features to ONE classes (Play Golf class).
Guys, Are you with me up to this point?
Here comes the question:
The (truth) above table is a way of representing a function(model) but it doesn't indicate how many times a realization can happen(its frequency).
for instance if i repeat the experiment 2 other times (2 realizations) and in both two cases i get for
outlook=Rainy, Temp=Hot, Humidity=High, Windy=False Play Golf=No ,
then i've obtained the same result as first row of my table. but I can't add it to this table, because this table shows the SETS not the repeated values.
In my real database creation, i have training data values (from different realizations) that are repetitive, so they won't complete my table because in a set we don't have a repeated value.So how can these repeated values help me in finding a Function(Model) for play golf?
thanks for your attentions and contributions