Penn Data Mining Group: Publications
Main Research Areas
Feature Selection
When one has far more potentially predictive features than
observations, standard penalty methods such as AIC and BIC fail. We
have developed a set of Streamwise Feature Selection (SFS) methods
which support interleaving the generation and selection of
features. SFS excels when a huge number of features can be generated
(given enough computer time), but only a small fraction of them are
significant.
- Multi-Task Feature Selection using the Multiple Inclusion Criterion (MIC).
Paramveer S. Dhillon, Brian Tomasik, Dean Foster and Lyle Ungar.
ECML-PKDD (European Conference on Machine Learning) Bled, Slovenia,
Sept. 2009
- Transfer Learning, Feature Selection and Word Sense Disambiguation.
Paramveer S. Dhillon and Lyle Ungar.
ACL-IJCNLP (Annual Meeting of the Association of Computational
Linguistics), Singapore, Aug. 2009
-
Efficient Feature Selection in the Presence of Multiple Feature Classes.
Paramveer S. Dhillon, Dean Foster and Lyle H. Ungar
IEEE International Conference on Data Mining (ICDM), 2008.
- Streamwise Feature Selection,
Jing Zhou, Bob Stine, Dean Foster, and Lyle Ungar, Journal of Machine
Learning Research (JMLR) 7 1861-1885, 2006.
-
Streaming Feature Selection using alpha investing,
Jing Zhou, Bob
Stine, Dean Foster, and Lyle Ungar SIGKDD-2005, 384-393, 2005.
-
Streaming Feature Selection,
Lyle Ungar, Jing Zhou, Dean Foster and Bob Stine,
AI and Statistics, 2005.
-
Streaming Feature Selection,
Lyle Ungar, Dean Foster and Bob Stine,
Snowbird Learning Conference, 2004. For more detail, see our
work in progress,
-
Feature selection methods have implicit assumptions as to distributions of
the features; estimating these distributions leads to methods with superior
performance.
-
Characterizing the generalization performance of model selection strategies
D. Schuurmans, D.P. Foster and L.H. Ungar, presented at ML 1997
(pdf)
Text Mining
-
Web-scale named entity recognition,
Whitelaw, Kehlenbeck, Petrovic, and Ungar,
Proceedings of the 17th ACM conference on Information and
knowledge management (CIKM '08) 2008
-
Positioning Knowledge: Schools of Thought and New Knowledge Creation,
S. Phineas Upham, Lori Rosenkopf and Lyle H. Ungar, Scientometrics
-
Using Text Mining to Analyze User Forums. in special issue ``Web Mining
for E-commerce & E-services'',
Journal of Online Information Review 2009
- Efficient Clustering of Web-Derived Data Sets.
Luis Sarmento, Alexander Kehelenbeck, Eugenio Oliveira, and Lyle Ungar
International Conference on Machine Learning and Data Mining
(MLDM) 2009
-
An Approach to Web-scale Named-Entity Disambiguation.
Luis Sarmento, Alexander Kehelenbeck, Eugenio Oliveira, and Lyle Ungar
International Conference on Machine Learning and Data Mining
(MLDM) 2009
-
Finding cohesive clusters for analyzing knowledge communities.
Vasileios Kandylas, S. Phineas Upham and Lyle H. Ungar,
IEEE Knowledge and Information Systems 17(3) p. 335, (2008)
- Multiway Clustering for Creating Biomedical Term Sets.
V Kandylas, L Ungar, T Sandler, S Jensen
Proceedings of the 2008 IEEE International Conference on
Bioinformatics and Biomedicine (BIBM '08), 2008
-
Using Text Mining to Analyze User Forums
R. Feldman, M. Fresko, J. Goldenberg, O. Netzer, L. Ungar
5th IEEE ICSSSM'08, Melbourne, 2008.
- Web-Scale Named Entity Recognition
Casey Whitelaw, Alex Kehlenbeck, Nemanja Petrovic and Lyle Ungar
ACM 17th Conference on Information and Knowledge Management
(CIKM), 2008
-
Information Extraction from Informal Texts
Presentation at ICDM 2007.
-
Knowledge Positioning: Schools of Thought and New Knowledge Creation,
Phin Upham, Lori Rosenkopf, and Lyle Ungar, 2006 Academy of
Management Annual Meeting, August 11-16, Atlanta, Georgia.
-
Automatic Term List Generation for Entity Tagging Ted Sandler, Andrew
I. Schein and Lyle H. Ungar 22(6): 651. Bioinformatics, 2006.
-
Integrated Annotation for Biomedical Information Extraction
Seth Kulick, Ann Bies, Mark Libeman, Mark Mandel, Ryan McDonald, Martha Palmer, Andrew Schein and Lyle Ungar,
HLT/NAACL, Boston, May 2004
-
Shallow Semantic Annotation of Biomedical Corpora for Information Extraction
Seth Kulick, Mark Liberman, Martha Palmer, and Andrew Schein.
Proceedings of the 2003 ISMB Special Interest Group Meeting on Text
Mining (a.k.a. BioLink). June 27, 2003. Brisbane, Australia. (slides)
- As part of a large project on mining the Bibliome
(Information Extraction from the Biomedical Literature), we
annotated medline documents with entities and their relations, and used
machine learning methods to do automatic tagging and information extraction.
See also Statistical Relational Learning.
Genomics and Proteomics
We apply a vaariety of different statisical methods to problems in genomics
and proteomics. Much of our recent work studies motifs involved in
protein-protein and protein-DAN binding, including
interactions between HIV and human proteins.
- Prediction of HIV-1 virus-host protein interactions using virus and
hostsequence motifs.
Perry Evans, Will Dampier, Lyle Ungar and Aydin Tozeren
BMC Medical Genomics 2009
- Host sequence motifs shared by HIV predict response to antiretroviral
therapy.
William Dampier, Perry Evans, Lyle Ungar and Aydin Tozeren
BMC Medical Genomics 2009 2:47.
- A predictive model for identifying mini-regulatory modules in the mouse genome.
Mahesh Yaragatti, Ted Sandler and Lyle Ungar
Bioinformatics 2008
- MetaProm: a neural network based meta-predictor for alternative human promoter prediction.
Junwen Wang, Sridhar Hannenhalli and Lyle H Ungar
BMC Genomics 8:374, 2007.
-
Patterns of sequence conservation in presynaptic neural Genes. Dexter Hadley
et al. Genome Biology, November 10, 2006.
- Identification of potential CSF biomarkers in ALS,
G. M. Pasinetti, L. H. Ungar et al., Neurology, February 15, 2006.
- Using Prior Knowledge to Improve Genetic Network Reconstruction from
Microarray Data, Bahl, Le, and Ungar, In Silico Biology (ISB), 2004.
- Maximum Entropy Methods for Biological Sequence Modeling
BIOKDD 2001 workshop.
(pdf)
-
Chloroplast Transit Peptide Prediction: a Peek Behind the Black Box.
Andrew I. Schein, Jessica C. Kissinger, and Lyle H. Ungar,
Nucleic Acids Research
Methods,
2001, Vol 29, No. 16 e82.
- Deriving Promoter Rules from Microarray Data,
Buehler, E. and L.H. Ungar,
presented at Microarray Algorithms and Statistical Analysis: Methods and Standards ,
1999.
Clustering and Collaborative Filtering
Collaborative filtering problems (e.g. recommending movies based
on what other people have liked) can be modeled and optimally solved
using generative statistical models. Current (not yet published) work shows how
EM methos on mixture models can systematically be changed to winner-take-all
hard clustering methods. The evolution of citation clusters over time has
implications for theories of the growth and decline of knowledge communities.
-
CROC: A New Evaluation Criterion for Recommender Systems.
Andrew I. Schein, Alexandrin Popescul, Lyle H. Ungar, David
M. Pennock, Electronic Commerce Research 5(1): 51-74, 2005.
A Generalized Linear Model for Principal Component Analysis of Binary Data.
Andrew I. Schein, Lawrence K. Saul, and Lyle H. Ungar.
Proc. 9th International Workshop of AI & Statistics 2003.
-
Methods and
Metrics for Cold-Start Recommendations,
A. I. Schein, A. Popescul, L. H. Ungar and D. M. Pennock
in Proceedings of the
25'th Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval
(SIGIR 2002), pp. 253-260.
(pdf) -
-
Generative Models for Cold-Start Recommendations,
A. I. Schein, A. Popescul, L. H. Ungar and D. M. Pennock,
Workshop on Recommender Systems, SIGIR2001, September 2001
(pdf)
-
PennAspect: A Two-Way Aspect Model Implementation ,
Andrew I. Schein, Alexandrin Popescul and Lyle H. Ungar
Pennsylvania Department of Computer and Information Science,
Technical Report MS-CIS-01-25. (software)
-
Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments,
A. Popescul, L. H. Ungar, D. M. Pennock, and S. Lawrence,
UAI 2001, Seattle, WA, August 2001
(pdf)
-
Clustering and Identifying Temporal Trends in Document Databases,
Alexandrin Popescul, Gary William Flake, Steve Lawrence, Lyle Ungar, C. Lee Giles,
In Proc. IEEE Advances in Digital Libraries 2000 Conference, Washington, DC, May 2000
(pdf)
-
String edit analysis for merging databases,
Zhu, J. J, and L.H. Ungar, KDD Workshop, 2000.
(pdf)
-
Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching. Andrew McCallum, Kamal Nigam and Lyle
Ungar. KDD00. 2000.
(pdf)
-
Automatic
Labeling of Document Clusters,
Alexandrin Popescul, Lyle H. Ungar,
Unpublished
-
Clustering Methods for Collaborative Filtering
L.H. Ungar and D.P. Foster,
AAAI Workshop on Recommendation Systems, 1998.
(pdf)
-
A Formal Statistical Approach to
Collaborative Filtering L.H. Ungar and D.P. Foster, Conference on
Automated Learning and Discovery (CONALD), 1998.
(pdf)
-
-
Demo of a recommender system by P. Labys
with L.H.Ungar and F. Herz, 1998.
Statistical Relational Learning
To build predictive models from data in relational data bases,
one needs to intelligently search the space of data base queries.
Clustering can be used to create new database tables, augmenting
the relational database, and reducing data sparsity problems.
Feature selection on infinite streams of features requires careful
control of false discovery rates.
-
Statisical Relational Learning at Penn
A. Popescul, D. Foster, L. Ungar
-
Cluster-based Concept Invention for Statistical Relational Learning , Alexandrin Popescul, Lyle H. Ungar,
In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004).
(pdf)
-
Dynamic Feature Generation for Relational Learning ,
Alexandrin Popescul, Lyle H. Ungar,
In Proceedings of Multi-Relational Data Mining Workshop (MRDM 2004) at KDD-04
(pdf)
-
Statistical Relational Learning for Document Mining , Alexandrin
Popescul, Lyle H. Ungar, Steve Lawrence, David M. Pennock , In
Proceedings of IEEE International Conference on Data Mining (ICDM
2003).
(pdf)
-
Structural Logistic Regression for Link Analysis , Alexandrin
Popescul, Lyle H. Ungar , Workshop on Multi-Relational Data Mining
at KDD 2003.
(pdf)
-
Statistical Relational Learning for Link Prediction , Alexandrin
Popescul, Lyle H. Ungar , Workshop on Learning Statistical Models
from Relational Data at IJCAI 2003.
(pdf)
-
Towards Structural Logistic Regression: Combining Relational and
Statistical Learning , Alexandrin Popescul, Lyle H. Ungar, Steve
Lawrence, David M. Pennock , Workshop on Multi-Relational Data
Mining at KDD 2002.
(pdf)
-
A Proposal for Learning by Ontological
Leaps
D. Foster and L. Ungar (pdf)
-
Towards Structural Logistic Regression: Combining Relational and
Statistical Learning, A. Popescul, L. H. Ungar, S. Lawrence,
D. M. Pennock,
Workshop on Multi-Relational Data Mining at the
Eighth ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining (KDD 2002).
-
Active Learning
The Bayesian A-optimality condition from experimental design provides a
principled foundation for active learning. Active learning
significantly reduces the cost of tagging for word sense disambiguation.
Multi-arm bandits and other methods support active learning for determining
how to grasp an object or what path a robot should follow.
-
Active Learning for Vision-Based Robot Grasping,
Salganicoff, M. L.H., Ungar, and R. Bajcsy
Machine Learning Journal 23, 251-278, 1996.
-
Active Exploration-Based ID-3 Learning for Robot Grasping,
Salganicoff, M., L.G. Kunin and L.H.Ungar,
Proceedings of the
Workshop on Robot Learning, 11th Intl Conf. on Machine Learning, July, 1994.
-
Active Exploration and Learning in Real-Valued Spaces using
Multi-Armed Bandit, M. Salganicoff and L.H. Ungar,
Proc. 12th Intl. Conf. on Machine Learning, July, 1995.
Neural Networks and Nonlinear Modeling
Neural networks can be viewed as regression models. Accurate prediction
intervals can be derived by correctly estimated the degrees of freedom of the
neural net model. Alternative statistical models such as MARS may or may not be
more accurate depending on the type of problem being solved. Combining neural
nets with first principles models improves performance.
-
Hybrid neural network models for environmental process control,
R.D. De Veaux, R. Bain, and L.H. Ungar,
Environmetrics 10(3), 225-236, 1999.
-
Prediction Intervals for Neural Networks via Nonlinear Regression
R. de Veaux, J. Schumi, J. Schweinsberg, D. Shellington and L.H. Ungar,
Ungar, Technometrics 40:(4) 273-282, 1998.
(pdf)
-
Estimating Monotonic Functions and Their Bounds.
H. Kay and L.H. Ungar,
AIChE Journal, 46(12), 2425-2434
(ps)
-
A Brief Introduction to Neural Networks R. De Veaux and L.H. Ungar,
Unpublished, 1997.
(pdf)
-
Estimating Prediction Intervals for
Artificial Neural Networks , E. Rosengarten and R. de Veaux,
Ninth Yale Workshop on Adaptive and Learning Systems, 1996.
(pdf)
-
Multicollinearity: A Tale of two
Non-parametric Regressions R.D.de Veaux and L.H. Ungar). In
Selecting Models from Data: AI and Statistics IV, (ed
P.Cheeseman and R.W. Oldford), pp. 293-302. Springer-Verlag, 1994.
(pdf)
-
SVD-Net: An Algorithm which
Automatically Selects Network Structure,
Psichogios, D.C. and L.H. Ungar,
IEEE Transactions on Neural
Networks, 5(3) 513-515, 1994.
-
A Hybrid Neural Network - First
Principles Approach to Process Modeling,
D.C. Psichogios and L.H. Ungar,
AIChE Journal, 1499--1512, October, 1992.
-
Using Radial Basis Functions to Approximate a Function and Its Error
Bounds,
Leonard, J.A., M.A. Kramer and L.H. Ungar,
IEEE Transactions on Neural Networks, 3(4), 624-627, 1992.
-
A Neural Network Architecture that Computes its own Reliability,
Leonard, J.A., M.A. Kramer, and L.H. Ungar,
Computers and Chem. Engr., 16(9), 819--837, 1992.
Neural networks, particularly radial basis functions can be attractive models
for model-based process control, but accurate models do not guarantee stable
control.
-
Radial Basis Function
Neural Networks for Process Control
L.H. Ungar, T. Johnson and R.D. de Veaux, CIMPRO Proceedings pp.357-364,
1994.
(pdf)
-
A Statistical Basis for Using Radial Basis
Functions for Process Control L.H. Ungar and R. de Veaux,
American Control Conference (ACC) Proceedings, 1995.
(pdf)
-
Neural Networks for Process Control,
Ungar, L.H. E.D. Hartman J.D. Keeler and G.D Martin,
Proc. Intelligent Systems in Process Engineering (ISPE '95), 1995.
-
Radial Basis Function Neural Networks for Process Control,
Ungar, L.H., T. Johnson and R.D. De Veaux,
Computer-Integrated Manufacturing in the PROcess industries (CIMPRO) Proceedings, 357-364, 1994.
-
Stability of Neural Net Based Model Predictive Control,
Eaton, J.W., J.B. Rawlings, and L.H. Ungar,
Proceedings of the ACC, 2481-85, 1994.
-
Direct and Indirect Model Based
Control Using Artificial Neural Networks,
Psichogios, D.C., and L.H. Ungar,
I \& EC Res. 30, 2564-2573, 1991.
Reinforcement Learning for Robotics and
Multi-agent Systems
Robotic learning requires search for optimal control policies. This is often
best done by exploring in the vicinity of a decision boundary between different
control policies.
-
Using Policy Gradient Reinforcement Learning on Autonomous Robot Controllers
G. Z. Grudic, Vijay Kumar and L. H. Ungar,
IEEE-RSJ International Conference on Intelligent Robots and Systems (IROS03), 2003
-
Rates of Convergence of Performance of
Gradient Estimates Using Function Approximation and Bias in
Reinforcement Learning,
G. Z. Grudic and L. H. Ungar.
Neural Information Processing Systems (NIPS*2001)
Vancouver, Canada, 2001 .
-
Learning Multi-agent Co-ordination using Secondary Reinforcers,
G. Z. Grudic and L. H. Ungar, submitted, 2003.
- Using Reinforcement Learning to Refine Autonomous Robot Controllers,
G. Z. Grudic, V. Kumar, L. H. Ungar, submitted,
2003.
-
Exploiting Multiple Secondary Reinforcers in Policy Gradient Reinforcement Learning,
G. Z. Grudic and H. Ungar,
Seventeenth International Joint Conference on Artificial Intelligence (IJCAI 01),
Seattle, USA, August, 2001
- Localizing Search in Reinforcement Learning, G. Z. Grudic
and L. H. Ungar Proc. 18th National
Conference on Artificial Intelligence, (AAAI-00), 590-595, 2000.
(postscript
(pdf)
-
Localizing Policy Gradient Estimates to Action Transitions, G. Z. Grudic
and L. H. Ungar ICML2000,
(postscript)
(pdf).
-
Active Learning for Vision-Based Robot Grasping,
Salganicoff, M. L.H., Ungar, and R. Bajcsy
Machine Learning Journal 23, 251-278, 1996.
-
Active Exploration-Based ID-3 Learning for Robot Grasping,
Salganicoff, M., L.G. Kunin and L.H.Ungar,
Proceedings of the
Workshop on Robot Learning, 11th Intl Conf. on Machine Learning, July, 1994.
-
Active Exploration and Learning in Real-Valued Spaces using
Multi-Armed Bandit, M. Salganicoff and L.H. Ungar,
Proc. 12th Intl. Conf. on Machine Learning, July, 1995.
|