The files in this directory recreate some of the experiments reported in the
paper

Maxout Networks. Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron
Courville, and Yoshua Bengio. arXiv 2013

If you use the code provided here, please cite that paper.

The experiments reproduced here are as follows:

1) MNIST permutation invariant result

The paper reports obtaining a test set error rate of 0.94%, the best ever
reported (as of this writing) for a multilayer perceptron with no unsupervised
pretraining. To reproduce this result, run

train.py mnist_pi.yaml
train.py mnist_pi_continued.yaml
print_monitor.py mnist_pi_continued.pkl | grep test_y_misclass

You should get 0.009399999999, i.e., 94 mistakes on the test set.

Note: you probably will want to run the above scripts with
THEANO_FLAGS="device=gpu,floatX=float32" for optimal speed, but if you don't
have a GPU it should work without one. I have, however, not tested this case.
The remaining results all require GPU as they are implemented using Alex Krizvhevsky's
CUDA convolution library.

2) MNIST result

The paper reports obtaining a test set error rate of 0.45%, the state of the
art for the MNIST dataset without data augmentation. To reproduce this result,
run:

THEANO_FLAGS="device=gpu,floatX=float32" train.py mnist.yaml
THEANO_FLAGS="device=gpu,floatX=float32" train.py mnist_continued.yaml
THEANO_FLAGS="device=gpu,floatX=float32" python compute_test_err.py mnist_continued.pkl

The final command should print out some progress messages followed by "0.0045"

3) CIFAR-10 results

There are several maxout results on the CIFAR-10 dataset. To reproduce these
results, you must first have prepared a version of the CIFAR-10 dataset with
global contrast normalization and whitening. To do so, run
pylearn2/scripts/datasets/make_cifar10_gcn_whitened.py.
(If you are in the LISA lab, don't do this. It has already been made for you)

You may then proceed to reproducing the various results described below:

3a) State of the art on CIFAR-10 without data augmentation

The ICML paper reports a state of the art result on the test set of 11.68%
classification error. You can reproduce this result by running:

THEANO_FLAGS="device=gpu,floatX=float32" train.py cifar10_no_aug.yaml

This should run for 785 epochs. In our tests, it obtained a test set error
of 11.62%. This slight improvement in the test set error relative to the
result in the ICML paper is likely due to running the script on a different
machine than we used for the original experiment.

The number of epochs was obtained by monitoring the error on the validation
set when running this script:

THEANO_FLAGS="device=gpu,floatX=float32" train.py cifar10_no_aug_valid.yaml

(valid_y_misclass should bottom out at 0.12009999156 on epoch 785, if you run
on exactly the same platform as I did. Expect small changes if your platform
is different)

3b) State of the art on CIFAR-10 with data augmentation

The ICML paper reports a state of the art result on the test set of 9.38%
classification error. You can reproduce this result by running:

THEANO_FLAGS="device=gpu, floatX=float32" train.py cifar10.yaml

This should run for 474 epochs. In our tests, it obtained a test set error
of 9.45%. This slight degradation in the test set error relative to the
result in the ICML paper is likely due to running the script on a different
machine than we used for the original experiment.

The number of epochs was obtained by monitoring the error on the validation
set when running this script:

THEANO_FLAGS="device=gpu,floatX=float32" train.py cifar10_valid.yaml

(valid_y_misclass should bottom out at 0.094499990344 on epoch 474, if you run
on exactly the same platform as I did. Expect small changes if your platform
is different)

3c) An older state of the art on CIFAR-10 without data augmentation, using a
smaller, faster model.

The February 2013 version of the paper that we originally posted on
arXiv reports a test set error of 12.93% on the CIFAR-10 dataset,
which was the state of the art without data augmentation. This result has since
been beaten by the model described in section 3a, but you may still be interested
in running the older model because it was smaller and cheaper to run.

THEANO_FLAGS="device=gpu,floatX=float32" train.py cifar10_old_small.yaml

At this point, the model has only been trained on the first 40k examples.
Run
THEANO_FLAGS="device=gpu,floatX=float32" python compute_test_err.py cifar10_best.pkl

It should report a test error of around 0.1405 at this point.

If you continue training on the remaining 10k training examples, the test error
should decrease to around 0.1293.

We have upgraded the ZCA code since these results were obtained, and we haven't
re-run this yaml file, so the numbers may have changed somewhat.

4) CIFAR-100 result

In the days ahead, we will provide our yaml config files for getting the state of
the art on CIFAR-100.

5) SVHN

The paper reports a test error of 2.45 on the SVHN dataset, which is the state
of the art for a single model.

Since the data is large and might not fit into the memory, we use PyTables for storing
the dataset. For pre-processing the data you need to make a copy of data to a local
path defined by SVHN_LOCAL_PATH environment variable. The reason behind is that with
PyTables any change to the data, will result in change to the data on the disk and
we do not want to modify the original raw data.

export SVHN_LOCAL_PATH=/tmp/SVHN/
python svhn_preprocessing.py

Afterwards we train our model using the svhn.yaml file

THEANO_FLAGS="device=gpu,floatX=float32" python train.py svhn.yaml

This should give the best validation error of 3.08% which corresponds to
test error of 2.50% at epoch 142.

Afterwards using the trick from this paper:

On the importance of momentum and initialization in deep learning,
Ilya Sutskever, James Martens, George Dahl, and Geoffery Hinton, ICML 2013

We can re-run the model from the previous best point by using smaller
learning rate which will result in best validation error of 3.07% and
test error of 2.47% after 9 more epochs.

THEANO_FLAGS="device=gpu,floatX=float32" python train.py svhn2.yaml
