EMTM 554: Data Mining Syllabus
Administrivia
- Homework 1 is due at the first class. HW2 is due the second class, etc.
- Videos need to be watched before the
class they are listed under.
- Note:the videos are videos; it you can't see the
pictures, you're doing something wrong. They should play in QuickTime.
- All videos, readings (except those in the textbook), supplemental readings, homeworks, and lecture slides are on Canvas.
- The required readings are on Study.net
- Textbook: Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques,
3nd edition, Morgan Kaufmann.
- This is largely supplemental. Students in past years have
requested a reference.
- Software: JMP
- Please make sure you have a copy of JMP from the Statistics
course. If you do not, please contact the EMTM office.
- There will be a Quiz on the last day of class.
- Prerequisite: the EMTM Statistics course.
- Course webpage: http://www.cis.upenn.edu/~ungar/DBM/
(see also canvas)
Lecture 1: Overview of Data Mining
- Topics
- What is Data Mining, and what is it used for?
- Strategic marketing
- Data Warehousing
- introDBM.ppt, strategy.ppt, DataWarehousing.ppt
- Required readings
- Text: Chapt 1 Introduction
- Data mining in context from Mastering data mining (Berry and Linoff)
- Why master the art? from Mastering data mining (Berry and Linoff)
- The Long Tail (Chris Anderson)
- Capital One (Clemens and Thatcher)
- Supplemental readings
- Text: Chapt 5 (see HW for specific sections of text for Data Cubes)
- Text: Chapt 4 Data Warehouse and OLAP Technology
- Discovering Knowledge in Data - Chapt 1 (Larose)
- Homework 1
Lecture 2: The DBM process & software; Textmining
- Topics
- The DBM process, CRISP-DM
- DBM Software and Industries, vertical and horizontal
- Intro to web search
- Text mining: IR and IE, easy and hard
- textmining.pptx, process.pptx, tools.pptx, process.m4a, tools.m4a, costing.m4a
- Required Readings
- Data Mining Methodology: The Virtuous Cycle Revisited from Mastering data Mining (Berry and Linoff)
- Text Mining: Predictive Methods for Analyzing Unstructured Information - Chapt 1 (Weiss et al.)
- Supplemental readings
- Homework 2
Lecture 3: Methods
- Topics
- Visualization: PTDD
- Personalization: collaborative filtering
- Market segmentation: clustering
- Prediction: Decision trees and regression methods
- JMP for Data Mining: The good, the bad, and the ugly
- Linear regression in more depth
- methods.ppt, regression.m4a
- Required readings
- Text: Chapt 8.1, 8.2 - Classification and Regression
- Text: Chapt 10.1, 10.2.1 k-means - clustering
- Supplemental readings
- Information Visualization in Data Mining and Knowledge Discovery
Chapt 2 (Color Plates are separate)
- Discovering Knowledge in Data (Larose) chapt 6 & 7 Decision trees, Neural networks
- Homework 3
Lecture 4: Evaluation
- Topics
- Evaluation: prediction and pitfalls
- Correlation and causality
- evaluation.ppt, gazelle.ppt
- Required readings
- Text: Chapter 8.5 Accuracy and Error Measures
- Homework 4
Lecture 5: hands-on JMP tutorial
- Instead of class we'll have an optional help session to
go through the JMP homeworks.
- Homework 5
Lecture 6: Social Networks
- Topics
- Social network analysis
- summary.ppt, socialNets.ppt
- Readings
- Social Networks from The Economist special report Jan 28, 2010
- Final Project
return home
ungar@cis.upenn.edu