4
$\begingroup$

I have the assignment to implement a random tree classifier in MATLAB.

The lecture says:

Input: observations and lables While stopping criterion not reached: 1. Node optimization: - several split candidates are randomly generated    - the best splitting function is chosen according to some quality measure 2. Data splitting: observations are pushed to the left or right branch. 3. Move to next node  Stopping criteria:        Quality measure - Number of data points in the current node/leaf 

My problem now is I do not understand how to get the randomly generated split candidates? Get them from the input values? But then I would get a decision tree (pick a random element and say >x right node, < x left node.) Also I do not understand what the difference between the random tree and the decision tree is in the end.

Also the lecture says:

Choosing the best candidates: according to a quality measures Out-of-bag error (OOB)  -  Minimize error rate after splitting using a test set Information gain  -  Maximize information gain after splitting 

But what test set should I use? The test set already in the tree used for training?

Wikipedia and Google did not help me either. The code of the MATLAB stub can be found here: http://pastebin.com/iuzqF8gG

I appreciate your help.

  • 0
    BTW, here you can find load of useful links, especially one on Friedman book, that contain whole chapter about random forests: http://stats.stackexchange.com/a/635/24012012-02-05

2 Answers 2