Data Mining Concepts
CS 590 / CS 759 – 06 : Fall 2004
Duration: Sept 7th - Nov 18th.
Lectures: 2pm - 3:50pm Tuesdays and Thursdays
Recitation: where? Golisano 70-DB Lab. when? every Wednesday 11 am.
About Course Outline Schedule Grading Policy Project Groups Project Guidelines Project Description and Resources Links to Papers Useful Links
About: This course is an introductory level course in data mining techniques. Data Mining is a confluence of several areas within computer science and other disciplines outside computer science. This course will introduce you to the general area of data mining and provide you with a sampling of the problems and solutions. A good background in the following areas is expected if you decide to enroll in this course.
Databases
Theory of Computation
Algorithms and Data Structures
AI/Machine Learning/Pattern Recognition
Programming Skills
For the graduate students this course is a required course in the Database cluster sequence. It is an advanced elective in the Intelligent Systems cluster sequence. It is an optional elective that can be taken with any other cluster sequence. In general it has proved to be a very good preparatory course for co-ops and full-time job opportunities due to the increasing importance of good algorithmic solutions for data intensive applications such as internet search, data visualization and knowledge management.
For the undergraduates this course is an AI and Database sequence elective.
For everyone else... well.. its a fun course if you want to know about the challenges and issues the world faces today due to data explosion and information overload.
Instructor: Ankur Teredesai
Teaching Assistant(s):
Muhammad Ahmad: maa2454@cs.rit.edu
Juveria Kanodia: jxk5005@cs.rit.edu
Jim Kang: jmk4644@cs.rit.edu
Textbooks:
We will follow the following two texts:
a) Data Mining: Concepts and Techniques by J. Han and M. Kamber, Morgan Kaufmann Publishers, 2001
b) The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, Jerome Friedman
At times we might follow some sections from:
c) Principles of Data Mining by D. Hand, H. Mannila and P. Smyth, MIT Press, 2001.
The instructor will provide additional study material such as papers from recent conferences like SIGKDD, ICML, SIGMOD, VLDB, etc. showcasing the state of the art in data mining.
Reference:
Machine Learning, Tom Mitchell, McGraw Hill, 1997
There will be two lectures per week and one recitation session with the TA/Instructor conducting the recitation.
Lectures will cover the fundamental topics in data mining according to the schedule that will be posted here. Recitations are to help you with the homework and projects and to make sure that you are making adequate progress in the course. Attendance is not mandatory but strongly encouraged.
"Required" reading when assigned is "Required".
A project team consists 2 people max. So, its a paired project and not a team project. If you already know people taking the course, we encourage you to find someone you are comfortable working with early on. Last term there were two projects required. This term, we will have only one project that will be consist of several phases. Evaluation will be based on achieving the target at each phase.
Your choice of your partner will have a significant impact of your grade. There have been instances when good students have suffered and had to do all the work on their own because of an incompetent project partner. Avoid undue stress. Find someone you can work with effectively and who matches your skill level. Diversity in expertise is good. So a good team will consist of someone who knows databases well paired with someone who has a good knowledge of AI or machine learning techniques.
Team members typically are expected to be seated together in the classroom since this helps team problem solving.
Homework assignments will typically consist of solving problems on paper or a very short programming assignment to build your data mining skills. We will help you learn some of the out-of-the-box data mining packages as part of the homework. Homework assignments are to be completed individually.
Academic honesty is the best policy here at RIT and everywhere else, so why not follow it and stay out of trouble?
Undergraduates:
Homework: 10%
Project: 50%
Mid-Term (in-class on October 21st, 2004): 10%
Final (Take-Home To Be Submitted online and e-mailed to the instructor on or before Nov 14th, 2004 11:59 pm): 20%
Report Writing: 10%
Graduates:
Homework:10%
Project: 45%
Mid-Term (in-class on October 21st, 2004): 10%
Final (Take-Home To Be Submitted online and e-mailed to the instructor on or before Nov 14th, 2004 11:59 pm): 20%
Term Paper: 15%
*Bring all grading issues to the instructors notice within a week of getting back your grades.
| Group 1 |
|
| Group 2 |
|
| Group 3 |
|
| Group 4 |
|
Reference: www.cs.rit.edu/~dmrg/CoMMA for details.. |
How
to convert a text file to XML.
Another Link to
convert text to XML with source code and description.
Link to Data Mining Software Resources