Final project - EMTM 554 For your final paper, you may either focus on the business side (and not collect and analyze data) or focus on the analytics (and look at real data, but not treat the management issues in detail), or do a mixture. Your deliverable is a project plan which addresses the key questions at each stage of the CRISP-DM process. For background you should watch the short quicktime videos (on Webcafe): process.m4a and process.m4a; you should already have seen tools.m4a. Powerpoint versions of the videos are also available on webcafe. For your report, please specify What business decisions would be affected, and their importance. (where "business" includes science, etc.) What is the scope of the project? How will you measure success? What data (and meta data) would need to be collected? How big are the these data sets? Any particular issues in data quality and data cleaning. What database/warehouse would be (or is) used to store the data? What software tools should be used to analyze the data? What analytical methods would be used to model the data? How the result of the data mining will be evaluated. (Be precise!) Any other relevant issues (e.g. privacy, data ownership, ...) For Data Analysis (optional) Do a preliminary analysis of some of the data using a data mining tool such as JMP. What patterns do you see? Which features emerge as most important? Which methods proved most successful? Describe your results, including selection of training and testing data. I am expecting a final report of something in the range of 10-15 double spaced pages, plus figures as needed. (Less if you are doing the project by yourself, and often less if you are doing mare data analysis.) Please include an executive summary. Final projects should be posted to the webcafe site. Please do not email to me. I will also accept hard copy if you would prefer not to have other students see your projects. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Be sure to address: How will you evaluate your performance? (Please give a precise metric!) e.g. 1) estimated potential market (in $) for cross sales from products in category X not "increased revenue" e.g. 2) predicted labor requirements each day for our call center for 2003 based on training data from 2000-2002. not "predicted call center volume" Depending on the mix you are planning between specifying a system to be built and analyzing a data set, the particular details you provide may vary. %%%%%%%%%%%%%%%%%%%%%%%%%%%% Suggested sections Project Summary Introduction: the business problem motivation, goals The data to be used Description of the data tables Products, people, categories, demographics ...; how many observations, how many features, what time periods and intervals? Is there structure (e.g. categories of product, customer, geography)? Data Mining Project Plan If you are talking about a large project (e.g. one which you will not be able to complete before the last day of class when the papers are due), then describe the different phases to the project, and provide estimates of the cost of each.) In particular, focus on the first phase to provide useful results. Example stages team formation warehouse creation, data merging (if not already done) data cleaning (if needed) tool to be used (justification) what types of model (neural net, logistic regression, association rule... do you expect to use?) what structure in the data must you deal with (data cubes ...) initial analysis What subset of the data will you start with? Give a detailed example of what an initial model might look like Evaluation metrics See below. Roll out of first phase extensions and/or Data Analysis If you plan to analyze data, what data do you have -- be realistic about the difficulties of data collection and merging databases (remember lecture 1). Data to be used (see above) Tool to be used (justify) Models tried (for now, what you plan to try) Results: clusters found, prediction accuracy predicted, ... Conclusions what you learned implications for action?