|   | Department of Computer Science Pattern Recognition (Topics in Artificial Intelligence, Winter 2009-2010) 4005-759-01 (Calendar Description) | 
	Instructor: Richard Zanibbi, Office: 70-3551 (Golisano College) 
	
Office hours: 2-3pm Tues. and Thurs., 10am-12pm on Wednesdays
	
	Lectures: 4-5:50pm Tuesdays and Thursdays, Room 70-3445 (Golisano College)
	
Weight: 20% of final grade
For the course project, students will work in teams to construct a pattern recognizer for handwritten digits. Students will use the MNIST dataset for experiments. This data set contains 60,000 training images, and 10,000 testing images. Yann LeCun, one of the individuals that developed the MNIST dataset was also involved in the creation of algorithms for recognizing the set, including the LeNet5 Convolutional Neural Network.
Students may use any language of their choosing for the project, though at least starting with the PRTools MATLAB toolkit is recommended (see below). MATLAB is recommended, because it provides a simple, complete environment for implementing algorithms, running experiments, and visualizing results. PRTools makes it very easy to generate data, simple classifiers, etc. within the MATLAB environment. The course web pages provide links to available libraries for pattern recognition under the "Resources" link; there are others available elsewhere.
	Please consult the "Resource" page in the course web pages for materials on
	carrying out research and writing research papers.
	
	 
	To obtain the MNIST data set, download the
	files stored here, and place them in a directory within your
	MATLAB search path.  The training and testing sets are raw image data: each feature
	vector is a 28 x 28 image, stored as a single vector with 784 elements.
	
	 
	selx makes use of a more general selection function called selectx.
	To select all the testing data images containing the digit '3', and then print the
	first 200 of them, issue the following:
		 
	Once you have downloaded the data, you can use the built-in PRTools functions to
	generate simple classifiers (e.g. linear discriminant classifiers using ldc),
	apply feature transformations, etc. (see the PRTools documentation).
	
	 Due: December 17, 2009 (start of class Thurs. Week 3) Weight: 5% of final grade
	 
	Each team will provide a proposal for their digit recognizer (maximum 5 pages single-spaced,
	including references). It must include:
	 
	The instructor will use this to check that experiments are well-defined, appropriate, and
	executable within the quarter. Reports will be graded for clarity, completeness, and
	correctness. The proposal does not need to describe an implemented system.
	
	 Due: February 18, 2010 (start of class Thurs. Week 10) Weight: 15% of final grade
	 
	Each team will provide a technical report (maximum 10 pages, including references) summarizing the outcome of their experiment, and
	comparing their results to published results. The report will include:
	 
	 
	The performance of your algorithms is important, but do not worry if your algorithm is
	not performing as well as or better than the state-of-the-art: you primarily need to
	make a serious attempt at creating an effective algorithm, and then be able to intelligently discuss the outcome
	of your experiment.
	Reports will be graded for clarity, completeness, correctness, and
	reproducibility: the report should provide enough detail to allow
	someone working in pattern recognition to repeat the experiment. 
	 Make sure to use
use figures and tables to
	summarize your results and clarify your presentation.
	
	PRTools Data and Utility Functions for MNIST
	
		
			Threes = selectx( Test, findlabels(Test, [3]) );
			The semi-colon (;) prevents MATLAB from printing out the results of an operation
	(this can have a significant impact on the speed of a MATLAB program). Here
	findlabels is a PRTools method used to obtain a vector of index values
	for instances in a dataset with given label(s) (here just threes: [3]).
				show( selx(Threes, 1, 200) );
			
	Part I. Proposal
	
		
	Part II. Final Report