Database Mining References
KDD, Data Mining  overview
 Data Mining Techniques ,
M. Berry and G. Linhoff,
John Wiley, 1997
 a readable, if manageroriented, overview of data mining
 or their second book: Mastering Data Mining : Art and Science of Customer Relationship Management, Wiley and Sons, 1999
 KDNuggets: the
best data mining site
 quetek
has more pointers
Data Preparation
 Data Preparation for Data Mining,
D. Pyle,
Morgan Kaufmann, 1999.
Data Warehousing
 Data Mining Solutions,
C. Westphal and T. Blaxton,
John Wiley, 1998
Data Visualization
 E. Tufte,
The Visual Display of Quantitative Information,
Envisioning Information and
his other books, (Graphics Press).
 These are wonderful books about how to present data graphically.
 Visual Revelations,
H. Wainer,
Copernicus, 1997
Machine Learning
Clustering and Collaborative Filtering
 Recommender Systems
 Pointers to many companies and classic papers
 Everitt
Cluster Analysis, 3rd Edition,
Brian S.
Halsted Press, 1993.
 A very readable short overview of clustering methods.
 "Locally Weighted Learning",
C. G. Atkeson, S. A. Schaal and A. W. Moore,
AI Review,Volume 11, Pages 1173 (Kluwer Publishers) 1997
html
 a detailed overview of Knearest neighbor and related methods
 kmeans clustering code
 with a cumbersome input format, but it runs well
 standard packages like R, Matlab, and all data mining software have many more options
Decision trees, CART and MARS
 Classification and regression trees,
Leo Breiman ... et al.,
Wadsworth International Group, 1984.
 The original CART book; a bit dated, but still a classic
 C4.5: Programs for Machine Learning,
J.R. Quinlan,
MorganKaufmann, 1992
 A modern presentation of decision tree methods. Very readable and
comes with code.
 "Multivariate adaptive regression splines,"
Friedman, J.H.
Annals of Stat. 1991, 19(1) 1141.
 A technical paper describing MARS
 CART and MARS software
 free Fortran version is apparently no longer available from Statlib
 commercial version available from
Salford Systems
 Other versions (which I have not tested) include ones from
Gauss
 IND
 A good free package for decision trees
 decision trees are also available in most statistics packages
Neural Networks
 Neural Networks for Pattern Recognition,
Bishop, C.M.,
Oxford Press, 1995.
 An excellent overview of multilayer perceptron and radial basis
function neural networks from a statician's point of view.
 Neural Networks A Comprehensive Foundation,
Haykin, S.,
Macmillan, 1994.
 A good overview of Neural Nets from an electrical engineers viewpoint;
covers a wide range of neural network types
 The Neural network FAQ
 overview of neural nets and pointers to software
 Nevprop
 is one of the better free packages

More Neural net pointers [postscript]
Statistical Methods
 stepwise regression
 logistic regression
 Linear Statistical Methods,
Fox,
Wiley
 logistic regression is nicely covered on pp. 307310.
 Statistical Models in S, Chambers and Hastie, Wadsworth, 1992
 covers a range of advanced statistical methods
Bayesian Belief Nets

Charniak, Eugene, "Bayesian Networks without tears", AI Magazine
12(4):5063, Winter 1991.
 Intro to Bayesian networks for beginners.

Neapolitan, Richard E., "Probabilistic Reasoning in Expert Systems:
Theory and Algorithms", John Wiley and Sons, 1990.
 Practical guide to implementation.
 Finn V. Jensen, "Introduction to Bayesian Networks" 1996,
Springer Verlag; ISBN: 0387915028
available at amazon

Pearl, Judea, "Probabilistic Reasoning in Intelligent Systems:
Networks of Plausible Inference", Morgan Kaufmann, San Mateo,
California, 1988.
 Theoretical framework for Bayesian networks  The book that got the whole field going
 Lots more references
 Bayesian networks
testimonial and
routine
 What are belief nets good for and where to get code.
 other good free software: Netica

Belief Network software
Genetic Algorithms
 "Genetic Algorithms.",
J. Holland,
Scientific American. July 1992. pp. 6672.
 a nice overview of genetic algorithms
 Genetic Algorithms in search, optimization, and machine learning,
Goldberg. D.,
AddisonWesley, 1989
 An introduction to Genetic Algorithms,
Mitchell, M.,
MIT Press, 1996
Hidden Markov Models and Speech
 Statistical Methods for Speech Recognition,
Jelinek, F.
MIT Press, 1998
Information Theory
 Information Theory, T.M. Cover and J. A. Thomas.
Wiley, 1991
 a solid introduction to Information theory
Other
Database Mining Companies
Many of these products  and others  are briefly described in an article by Two Crows.
University Centers
Current Research  General
ungar@cis.upenn.edu