Text Mining tutorials and referecnes
Database Mining References
What is the best Data Mining Textbook?
All are good in their own right. The question is what level of
math you want and what your data looks like (a single table, or a
- A statistician would say Hastie, Tibshirani & Friedman, Elements of Statistical Learning
- a database person might say:
Jiawei Han and Micheline Kamber, (2001), Data Mining: Concepts and ...
- a business person would choose something like
Data Mining Techniques by G. Linoff and M. Berry 2nd ed,
- somewhat farther afield, an EE might choose: Pattern Classification 2nd
ed. by Duda, Hart, and Stork, 2001.
KDD, Data Mining - overview
- Data Mining Techniques ,
M. Berry and G. Linhoff,
John Wiley, 1997
- a readable, if manager-oriented, overview of data mining
- or their second book: Mastering Data Mining : Art and Science of Customer Relationship Management, Wiley and Sons, 1999
- KDNuggets: the best data mining site
- Data Preparation for Data Mining,
Morgan Kaufmann, 1999.
- Data Mining Solutions,
C. Westphal and T. Blaxton,
John Wiley, 1998
- E. Tufte,
The Visual Display of Quantitative Information,
Envisioning Information and
his other books, (Graphics Press).
- These are wonderful books about how to present data graphically.
- Visual Revelations,
- The Elements of Statistical Learning
Hastie, Tibshirani & Friedman ,
- strongly biased towards a statistics viewpoint, but still the
best thing out there.
- Reinforcement Learning: An Introduction,
Sutton, R. and A. Barto
MIT Press, 1998
WEKA Java code library
- best free wide coverage Java code for machine learning;
very widely used
MLC++ code library
- best free wide coverage C++ code for machine learning;
not widely used
Clustering and Collaborative Filtering
- Recommender Systems
- Pointers to many companies and classic papers
Cluster Analysis, 3rd Edition,
Halsted Press, 1993.
- A very readable short overview of clustering methods.
- "Locally Weighted Learning",
C. G. Atkeson, S. A. Schaal and A. W. Moore,
AI Review,Volume 11, Pages 11-73 (Kluwer Publishers) 1997
- a detailed overview of K-nearest neighbor and related methods
- k-means clustering code
- with a cumbersome input format, but it runs well
- standard packages like R, Matlab, and all data mining software have many more options
Decision trees, CART and MARS
- C4.5: Programs for Machine Learning,
- A modern presentation of decision tree methods. Very readable and
comes with code.
- Classification and regression trees,
Leo Breiman ... et al.,
Wadsworth International Group, 1984.
- The original CART book; a bit dated, but still a classic
- CART and MARS software
- Neural Networks for Pattern Recognition,
Oxford Press, 1995.
- An excellent overview of multilayer perceptron and radial basis
function neural networks from a statician's point of view.
- Neural Networks A Comprehensive Foundation,
- A good overview of Neural Nets from an electrical engineers viewpoint;
covers a wide range of neural network types
- The Neural network FAQ
- overview of neural nets and pointers to software
More Neural net pointers [postscript]
- stepwise regression
- logistic regression
- Linear Statistical Methods,
- logistic regression is nicely covered on pp. 307-310.
- Statistical Models in S, Chambers and Hastie, Wadsworth, 1992
- covers a range of advanced statistical methods
Bayesian Belief Nets
Charniak, Eugene, "Bayesian Networks without tears", AI Magazine
12(4):50-63, Winter 1991.
- Intro to Bayesian networks for beginners.
Neapolitan, Richard E., "Probabilistic Reasoning in Expert Systems:
Theory and Algorithms", John Wiley and Sons, 1990.
- Practical guide to implementation.
- Finn V. Jensen, "Introduction to Bayesian Networks" 1996,
Springer Verlag; ISBN: 0387915028
available at amazon
Pearl, Judea, "Probabilistic Reasoning in Intelligent Systems:
Networks of Plausible Inference", Morgan Kaufmann, San Mateo,
- Theoretical framework for Bayesian networks - The book that got the whole field going
- Lots more references
- Bayesian networks
- What are belief nets good for and where to get code.
- other good free software: Netica
- "Genetic Algorithms.",
Scientific American. July 1992. pp. 66-72.
- a nice overview of genetic algorithms
- Genetic Algorithms in search, optimization, and machine learning,
- An introduction to Genetic Algorithms,
MIT Press, 1996
Hidden Markov Models and Speech
- Statistical Methods for Speech Recognition,
MIT Press, 1998
- Information Theory, T.M. Cover and J. A. Thomas.
- a solid introduction to Information theory
Sources of Data
- Papers: supplemental material
- A industry-oriented overview is in the article by Two Crows.