By the end of the course, you should know the following key concepts: Uses of Data Mining examples, industry areas CRM, targeted marketing, personalization strategic use (e.g., Capital One case) roles of data mining vs. experimental design Data Warehousing relational vs. transactional (OLAP vs. OLTP) OLAP functions: slice, dice, roll-up, drill down star and snowflake schemas ETL (Extract Transform Load) meta-data, importance of problems of data quality Methods visualization (PTDD) e.g., mosaic plots, interactive visualization (e.g., Spotfire) clustering kmeans, k-nearest neighbors, agglomerative clustering decision trees splitting, pruning linear and logistic regression independence, colinearity, interactions stepwise regression artificial neural networks support vector machines (SVM) categorical data how to use in regression methods missing data missing at random / not at random how to handle in regression, decision trees Evaluation accuracy and its limitations precision/ recall, lift, confusion matrix overfitting cross-validation, test/train, in-sample/hold-out causality vs. correlation lurking variables and the role of time (e.g. gazelle.com case) evaluation of model is not business evaluation "type 3 errors" Process CRISP methodology Business Understanding Data Understanding Data Preparation Data cleansing Evaluation Deployment Common causes for Data Mining project failure Failure to define the business problem Failure to get key people bought into the project early Insufficient Data size or quality - insufficient effective number of observations - missing key features Tools and their attributes Text Mining and Search information retrieval vs. information extraction collaborative filtering pagerank named entity recognition why text mining is hard