EMTM 554: Data Mining
What is data mining? The data mining process. Data visualization.
Data mining methods: decision trees, regression, neural
nets, clustering, network analysis and feature selection. How the
methods work. When to use them. Evaluation of business intelligence
systems. Strategic use of information. Emerging data mining methods: text mining.
This is a technical elective and will go into detail on the
various methodologies, as well as looking at current industry trends.
Prerequisite: Statistics and knowledge of JMP.
Administrivia
- Course Syllabus
- Homework
- Bulkpack
- Textbook
- Data Mining Concepts and Techniques third edition (Han and Kamber)
- Professor
- Lyle Ungar (ungar@cis.upenn.edu)
- Canvas (replaces WebCafe)
- Quiz sample problems and solutions
- References
- Data sets
See in particular
the KDD
cups and the UCI
repository . Also of note are other competition web sites such as
kaggle and data from the
US and
European governments.
The first couple references are much cleaner data sets (since they are
prepared for competitions), but if you want more complex data such as
geospatial, then the government sites are useful.
- Software The Course will use JMP.
- Grades
- 25% based on the final quiz
- 50% based on homework other than the final project
- 25% based on the final project
Announcements
- Pre-assignment due the first day of class is posted as hw1
under the "Homework"
- Readings are listed under the day that they are discussed. Unless
otherwise specified, you can either read them before or after the lecture.
This page is http://www.cis.upenn.edu/~ungar/DBM/
return home
ungar@cis.upenn.edu