Database Mining References
What is the best Data Mining Textbook?
- A statistician would say Hastie, Tibshirani & Friedman, Elements of Statistical Learning
- a database person might say:
Jiawei Han and Micheline Kamber, (2001), Data Mining: Concepts and ...
- a business person would choose something like
Data Mining Techniques by G. Linoff and M. Berry 2nd ed,
- somewhat farther afield, an EE might choose: Pattern Classification 2nd
ed. by Duda, Hart, and Stork, 2001.
All are good in their own right. The question is what level of
math you want and what your data looks like (a single table, or a
database), etc.
KDD, Data Mining - overview
- Data Mining Techniques ,
M. Berry and G. Linhoff,
John Wiley, 1997
- a readable, if manager-oriented, overview of data mining
- or their second book: Mastering Data Mining : Art and Science of Customer Relationship Management, Wiley and Sons, 1999
- KDNuggets: the best data mining site
Data Preparation
- Data Preparation for Data Mining,
D. Pyle,
Morgan Kaufmann, 1999.
Data Warehousing
- Data Mining Solutions,
C. Westphal and T. Blaxton,
John Wiley, 1998
Data Visualization
- E. Tufte,
The Visual Display of Quantitative Information,
Envisioning Information and
his other books, (Graphics Press).
- These are wonderful books about how to present data graphically.
- Visual Revelations,
H. Wainer,
Copernicus, 1997
Machine Learning
- The Elements of Statistical Learning
Hastie, Tibshirani & Friedman ,
- strongly biased towards a statistics viewpoint, but still the
best thing out there.
- Reinforcement Learning: An Introduction,
Sutton, R. and A. Barto
MIT Press, 1998
-
MLC++ code library
- best free wide coverage C++ code for machine learning
-
WEKA Java code library
- best free wide coverage Java code for machine learning
Clustering and Collaborative Filtering
- Recommender Systems
- Pointers to many companies and classic papers
- Everitt
Cluster Analysis, 3rd Edition,
Brian S.
Halsted Press, 1993.
- A very readable short overview of clustering methods.
- "Locally Weighted Learning",
C. G. Atkeson, S. A. Schaal and A. W. Moore,
AI Review,Volume 11, Pages 11-73 (Kluwer Publishers) 1997
html
- a detailed overview of K-nearest neighbor and related methods
- k-means clustering code
- with a cumbersome input format, but it runs well
- standard packages like R, Matlab, and all data mining software have many more options
Decision trees, CART and MARS
- Classification and regression trees,
Leo Breiman ... et al.,
Wadsworth International Group, 1984.
- The original CART book; a bit dated, but still a classic
- C4.5: Programs for Machine Learning,
J.R. Quinlan,
Morgan-Kaufmann, 1992
- A modern presentation of decision tree methods. Very readable and
comes with code.
- "Multivariate adaptive regression splines,"
Friedman, J.H.
Annals of Stat. 1991, 19(1) 1-141.
- A technical paper describing MARS
- CART and MARS software
- free Fortran version is apparently no longer available from Statlib
- commercial version available from
Salford Systems
- Other versions (which I have not tested) include ones from
Gauss
- IND
- A good free package for decision trees
- decision trees are also available in most statistics packages
Neural Networks
- Neural Networks for Pattern Recognition,
Bishop, C.M.,
Oxford Press, 1995.
- An excellent overview of multilayer perceptron and radial basis
function neural networks from a statician's point of view.
- Neural Networks A Comprehensive Foundation,
Haykin, S.,
Macmillan, 1994.
- A good overview of Neural Nets from an electrical engineers viewpoint;
covers a wide range of neural network types
- The Neural network FAQ
- overview of neural nets and pointers to software
-
More Neural net pointers [postscript]
Statistical Methods
- stepwise regression
- logistic regression
- Linear Statistical Methods,
Fox,
Wiley
- logistic regression is nicely covered on pp. 307-310.
- Statistical Models in S, Chambers and Hastie, Wadsworth, 1992
- covers a range of advanced statistical methods
Bayesian Belief Nets
-
Charniak, Eugene, "Bayesian Networks without tears", AI Magazine
12(4):50-63, Winter 1991.
- Intro to Bayesian networks for beginners.
-
Neapolitan, Richard E., "Probabilistic Reasoning in Expert Systems:
Theory and Algorithms", John Wiley and Sons, 1990.
- Practical guide to implementation.
- Finn V. Jensen, "Introduction to Bayesian Networks" 1996,
Springer Verlag; ISBN: 0387915028
available at amazon
-
Pearl, Judea, "Probabilistic Reasoning in Intelligent Systems:
Networks of Plausible Inference", Morgan Kaufmann, San Mateo,
California, 1988.
- Theoretical framework for Bayesian networks - The book that got the whole field going
- Lots more references
- Bayesian networks
- What are belief nets good for and where to get code.
- other good free software: Netica
Genetic Algorithms
- "Genetic Algorithms.",
J. Holland,
Scientific American. July 1992. pp. 66-72.
- a nice overview of genetic algorithms
- Genetic Algorithms in search, optimization, and machine learning,
Goldberg. D.,
Addison-Wesley, 1989
- An introduction to Genetic Algorithms,
Mitchell, M.,
MIT Press, 1996
Hidden Markov Models and Speech
- Statistical Methods for Speech Recognition,
Jelinek, F.
MIT Press, 1998
Information Theory
- Information Theory, T.M. Cover and J. A. Thomas.
Wiley, 1991
- a solid introduction to Information theory
Other
Database Mining Companies
A slightly out-of-date overview is in article by Two Crows.
ungar@cis.upenn.edu