Ankur Teredesai

Data Mining Concepts
CS 590 / CS 759 – 06 : Fall 2004
Duration: Sept 7th - Nov 18th.

Lectures: 2pm - 3:50pm Tuesdays and Thursdays

Recitation: where? Golisano 70-DB Lab. when? every Wednesday 11 am.


About    Course Outline    Schedule    Grading Policy    Project Groups    Project Guidelines    Project Description and Resources    Links to Papers    Useful Links


About: This course is an introductory level course in data mining techniques. Data Mining is a confluence of several areas within computer science and other disciplines outside computer science. This course will introduce you to the general area of data mining and provide you with a sampling of the problems and solutions. A good background in the following areas is expected if you decide to enroll in this course.

 

For the graduate students this course is a required course in the Database cluster sequence. It is an advanced elective in the Intelligent Systems cluster sequence. It is an optional elective that can be taken with any other cluster sequence. In general it has proved to be a very good preparatory course for co-ops and full-time job opportunities due to the increasing importance of good algorithmic solutions for data intensive applications such as internet search, data visualization and knowledge management.

 

For the undergraduates this course is an AI and Database sequence elective.

 

For everyone else... well.. its a fun course if you want to know about the challenges and issues the world faces today due to data explosion and information overload.


Instructor: Ankur Teredesai

Teaching Assistant(s):

Muhammad Ahmad: maa2454@cs.rit.edu

Juveria Kanodia: jxk5005@cs.rit.edu

Jim Kang: jmk4644@cs.rit.edu

 

Textbooks:

We will follow the following two texts:

a) Data Mining: Concepts and Techniques by J. Han and M. Kamber, Morgan Kaufmann Publishers, 2001

b) The Elements of  Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, Jerome Friedman

At times we might follow some sections from:

c) Principles of Data Mining by D. Hand, H. Mannila and P. Smyth, MIT Press, 2001.

 

The instructor will provide additional study material such as papers from recent conferences like SIGKDD, ICML, SIGMOD, VLDB, etc. showcasing the state of the art in data mining.

 

Reference:

Machine Learning, Tom Mitchell, McGraw Hill, 1997

 

Course Outline:

 

Grading Policy*:

Undergraduates:

Homework: 10%

Project: 50%

Mid-Term (in-class on October 21st, 2004): 10%

Final (Take-Home To Be Submitted online and e-mailed to the instructor on or before Nov 14th, 2004 11:59 pm): 20%

Report Writing: 10%

Graduates:

Homework:10%

Project: 45%

Mid-Term (in-class on October 21st, 2004): 10%

Final (Take-Home To Be Submitted online and e-mailed to the instructor on or before Nov 14th, 2004 11:59 pm): 20%

Term Paper: 15%

 

*Bring all grading issues to the instructors notice within a week of getting back your grades.

 


Groups

Group 1

Group 2

Group 3
 
Group 4



Project

Specifications

Reference: www.cs.rit.edu/~dmrg/CoMMA for details..


Links to Papers
Schedule

 

General Links:
ACM - SIGKDD

Kdnuggets

How to convert a text file to XML.
Another Link to convert text to XML with source code and description.

Glossary of Data Mining Terms

Link to Data Mining Software Resources