I have the assignment to implement a random tree classifier in MATLAB.
The lecture says:
Input: observations and lables
While stopping criterion not reached:
1. Node optimization: - several split candidates are randomly generated
- the best splitting function is chosen according to some quality measure
2. Data splitting: observations are pushed to the left or right branch.
3. Move to next node
Stopping criteria:
Quality measure - Number of data points in the current node/leaf
My problem now is I do not understand how to get the randomly generated split candidates? Get them from the input values? But then I would get a decision tree (pick a random element and say >x right node, < x left node.) Also I do not understand what the difference between the random tree and the decision tree is in the end.
Also the lecture says:
Choosing the best candidates: according to a quality measures
Out-of-bag error (OOB) - Minimize error rate after splitting using a test set
Information gain - Maximize information gain after splitting
But what test set should I use? The test set already in the tree used for training?
Wikipedia and Google did not help me either. The code of the MATLAB stub can be found here: http://pastebin.com/iuzqF8gG
I appreciate your help.