Database Mining References
KDD, Data Mining - overview
- Data Mining Techniques ,
M. Berry and G. Linhoff,
John Wiley, 1997
- a readable, if manager-oriented, overview of data mining
- or their second book: Mastering Data Mining : Art and Science of Customer Relationship Management, Wiley and Sons, 1999
- KDNuggets: the
best data mining site
- quetek
has more pointers
Data Preparation
- Data Preparation for Data Mining,
D. Pyle,
Morgan Kaufmann, 1999.
Data Warehousing
- Data Mining Solutions,
C. Westphal and T. Blaxton,
John Wiley, 1998
Data Visualization
- E. Tufte,
The Visual Display of Quantitative Information,
Envisioning Information and
his other books, (Graphics Press).
- These are wonderful books about how to present data graphically.
- Visual Revelations,
H. Wainer,
Copernicus, 1997
Machine Learning
Clustering and Collaborative Filtering
- Recommender Systems
- Pointers to many companies and classic papers
- Everitt
Cluster Analysis, 3rd Edition,
Brian S.
Halsted Press, 1993.
- A very readable short overview of clustering methods.
- "Locally Weighted Learning",
C. G. Atkeson, S. A. Schaal and A. W. Moore,
AI Review,Volume 11, Pages 11-73 (Kluwer Publishers) 1997
html
- a detailed overview of K-nearest neighbor and related methods
- k-means clustering code
- with a cumbersome input format, but it runs well
- standard packages like R, Matlab, and all data mining software have many more options
Decision trees, CART and MARS
- Classification and regression trees,
Leo Breiman ... et al.,
Wadsworth International Group, 1984.
- The original CART book; a bit dated, but still a classic
- C4.5: Programs for Machine Learning,
J.R. Quinlan,
Morgan-Kaufmann, 1992
- A modern presentation of decision tree methods. Very readable and
comes with code.
- "Multivariate adaptive regression splines,"
Friedman, J.H.
Annals of Stat. 1991, 19(1) 1-141.
- A technical paper describing MARS
- CART and MARS software
- free Fortran version is apparently no longer available from Statlib
- commercial version available from
Salford Systems
- Other versions (which I have not tested) include ones from
Gauss
- IND
- A good free package for decision trees
- decision trees are also available in most statistics packages
Neural Networks
- Neural Networks for Pattern Recognition,
Bishop, C.M.,
Oxford Press, 1995.
- An excellent overview of multilayer perceptron and radial basis
function neural networks from a statician's point of view.
- Neural Networks A Comprehensive Foundation,
Haykin, S.,
Macmillan, 1994.
- A good overview of Neural Nets from an electrical engineers viewpoint;
covers a wide range of neural network types
- The Neural network FAQ
- overview of neural nets and pointers to software
- Nevprop
- is one of the better free packages
-
More Neural net pointers [postscript]
Statistical Methods
- stepwise regression
- logistic regression
- Linear Statistical Methods,
Fox,
Wiley
- logistic regression is nicely covered on pp. 307-310.
- Statistical Models in S, Chambers and Hastie, Wadsworth, 1992
- covers a range of advanced statistical methods
Bayesian Belief Nets
-
Charniak, Eugene, "Bayesian Networks without tears", AI Magazine
12(4):50-63, Winter 1991.
- Intro to Bayesian networks for beginners.
-
Neapolitan, Richard E., "Probabilistic Reasoning in Expert Systems:
Theory and Algorithms", John Wiley and Sons, 1990.
- Practical guide to implementation.
- Finn V. Jensen, "Introduction to Bayesian Networks" 1996,
Springer Verlag; ISBN: 0387915028
available at amazon
-
Pearl, Judea, "Probabilistic Reasoning in Intelligent Systems:
Networks of Plausible Inference", Morgan Kaufmann, San Mateo,
California, 1988.
- Theoretical framework for Bayesian networks - The book that got the whole field going
- Lots more references
- Bayesian networks
testimonial and
routine
- What are belief nets good for and where to get code.
- other good free software: Netica
-
Belief Network software
Genetic Algorithms
- "Genetic Algorithms.",
J. Holland,
Scientific American. July 1992. pp. 66-72.
- a nice overview of genetic algorithms
- Genetic Algorithms in search, optimization, and machine learning,
Goldberg. D.,
Addison-Wesley, 1989
- An introduction to Genetic Algorithms,
Mitchell, M.,
MIT Press, 1996
Hidden Markov Models and Speech
- Statistical Methods for Speech Recognition,
Jelinek, F.
MIT Press, 1998
Information Theory
- Information Theory, T.M. Cover and J. A. Thomas.
Wiley, 1991
- a solid introduction to Information theory
Other
Database Mining Companies
Many of these products - and others - are briefly described in an article by Two Crows.
University Centers
Current Research - General
ungar@cis.upenn.edu