PennAspect 1.01


The two-way aspect model is a statistical model for performing soft two-way clustering of data such as documents and words, or people and movies. In the document/word domain the model consists of parameters P(w|z), P(d|z), P(z) where z is a latent variable.

What is PennAspect?
PennAspect is our own implementation of the two-way aspect model. The package is distributed without support or warrantee under the GNU public license. The code can be used to train a model and calculate observation probabilities without modifications. Included in the package are java classes you might find handy for formatting your datasets in the appropriate layout, and reading parameters generated by the model. PennAspect uses flat files for inputs and outputs in anticipation of unix script pipelining and integration with external applications. The PennAspect implementation of the aspect model is described in the Penn CIS technical report MS-CIS-01-25. Earlier versions of this code were used in the research article:
Andrew I. Schein,  Alexandrin Popescul, Lyle H. Ungar, and David M. Pennock . 
Methods and Metrics for Cold-Start Recommendations. 
Appeared in Proceedings of the 25'th Annual International ACM SIGIR Conference
on Research and Development in Information Retrieval (SIGIR 2002), pp 253-260.


Downloads:
PennAspect is distributed as a jar (java archive) file containing source and class files. Download here.
In addition you might find the documentation handy.

What will I need to run PennAspect?
PennAspect requires a Java Runtime Environment that is compatible with Sun's version 1.2.2 or higher. Our research group is fond of IBM's JIT java compiler that comes with the IBM Developer Kit for Linux.

Also you will need to install a various other packages employed by our code. These packages come with their own license agreements. We are linking to the appropriate web pages rather than providing the packages on our own page in order to avoid license violations. Once you have downloaded the relevent jar files, update your CLASSPATH variable.

1. You will need the freeware sparse matrix package from Ops Research.com (we use release 1.2.4).
2. We employ sorting routines from the Colt package. More info about Colt can be obtained on the colt web site . We use release 1.0.1.
3. We use the Arguments package (java version). Full details about the Arguments package can be found on freshmeat or here. We have a local copy for download. We use release 1.0.

Changes:
Feb 3, 2003 - 1.01 is released replacing the initial 1.0 release. The changes are to the user interface: PennAspect gives improved feedback when bad command-line arguments are given.

FAQ:
Q: PennAspect overfits on the first iteration, even when I play with the 
annealing term.  What is going on? 

A: In our experience the aspect model does not fit well on extremely 
sparse data sets. This could explain your difficulty. See the 
Popescul et al. UAI-2001 paper.


This page is maintained by Andrew Schein .
hits since March 5, 2002.
Thanks to digits for their free counter!