The two-way aspect model is a statistical model for performing soft
two-way clustering of data such as documents and words, or
people and movies. In the document/word domain the model
consists of parameters P(w|z), P(d|z), P(z) where z is a
What is PennAspect?
PennAspect is our own implementation of the two-way aspect model. The
package is distributed without support or warrantee under the GNU
public license. The code can be used to train a model
and calculate observation probabilities
without modifications. Included in the package are
java classes you might find handy for formatting your
datasets in the appropriate layout, and reading
parameters generated by the model. PennAspect uses
flat files for inputs and outputs in anticipation
of unix script pipelining and integration with
external applications. The PennAspect implementation of the
aspect model is described in the Penn CIS technical report
MS-CIS-01-25. Earlier versions of this
code were used in the research article:
Andrew I. Schein, Alexandrin Popescul, Lyle H. Ungar, and David M. Pennock .
Methods and Metrics for Cold-Start Recommendations.
Appeared in Proceedings of the 25'th Annual International ACM SIGIR Conference
on Research and Development in Information Retrieval (SIGIR 2002), pp 253-260.
PennAspect is distributed as a jar (java archive)
file containing source and class
files. Download here.
In addition you might find the documentation handy.
What will I need to run
PennAspect requires a Java Runtime Environment that is compatible with
Sun's version 1.2.2 or higher. Our
research group is
fond of IBM's JIT java compiler that comes with
the IBM Developer Kit for Linux.
Also you will need to install a various other packages employed by our
code. These packages come with
their own license agreements. We are
linking to the appropriate web pages
rather than providing the packages
on our own page in order to avoid
license violations. Once you have downloaded the relevent jar files,
update your CLASSPATH variable.
1. You will need the freeware sparse matrix package from
Ops Research.com (we use release 1.2.4).
2. We employ sorting routines from the Colt package. More info about
Colt can be obtained on the colt
web site . We use release 1.0.1.
3. We use the Arguments package (java version). Full details about the
Arguments package can be found on
freshmeat or here. We have a local copy for download. We use release 1.0.
Feb 3, 2003 - 1.01 is released replacing the initial 1.0 release. The changes are to the user interface: PennAspect gives improved feedback when bad command-line arguments are given.
Q: PennAspect overfits on the first iteration, even when I play with the
annealing term. What is going on?
A: In our experience the aspect model does not fit well on extremely
sparse data sets. This could explain your difficulty. See the
Popescul et al. UAI-2001 paper.
This page is maintained by
Andrew Schein .
hits since March
Thanks to digits for their free counter!