EMTM 554 Data Mining Homework 3 (Questions on methods) 1. Name (or describe) two good methods for viewing four-dimensional data. 2. When or why might k-means be better to use than k-nearest neighbors? 3. When would one prefer to build a dendrogram (tree) using agglomerative clustering to k-means? 4. When would one prefer k-means to agglomerative clustering? 5. Give an example other than the ones used in class of an "interaction" that might occur in real world data. 6. Which of the following methods DO NOT do a good job of handling interactions? a) linear regression b) logistic regressision c) decision trees d) artificial neural nets (If you think the question is at all ambiguous, please explain your answer) 7. What is a p-value for a regression coefficient? Why does it need to be "corrected" when doing data mining? 8. When should one use logistic regression instead of linear regression? Why? 9. Stepwise logistic regression was used to predict purchase of a luxury good as a function of a set of consumer demographic characteristics and prior purchases. The resulting model included "car model" and "value of house", but not "income". Does this mean that "income" is uncorrelated with purchase of this luxury good? Please explain.