ROCtools: ROC Curve Code
We are releasing our source code for generating ROC curves and
variants for recommender system applications. This code is
provided in source code form licensed under the GNU general
public license. It's
intended audience is experienced software developers who are
knowledgeable in java.
Earlier versions
of this code were used in:
Andrew I. Schein,
Alexandrin Popescul,
Lyle H. Ungar, and David M. Pennock. Methods and Metrics for Cold-Start Recommendations. Appeared in Proceedings of the 25'th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002), pp 253-260. August 11-15, 2002. Tampere, Finland. [.ps.gz] [.pdf]
What is an ROC curve?
An ROC curve demonstrates the tradeoff between true-positive rate and
false positive rate in binary classification
problems as a function of varying a classification
threshold. The method was pioneered in evaluating
radar operators where it was given the name: receiver
operating characteristic curve. The method gained
popularity among statisticians for evaluating
diagnosis tests, and was later adopted by the
machine learning community for general binary
classification evaluation. ROC curves are
similar to the precision/recall curves used in
information retrieval as well as lift curves
used in marketing communities.
Our .java files include code for generating Empirical ROC curves
(EmpiricalROCMaker.java). An empirical ROC curve is a
curve plotted using the empirical distribution of the
data as opposed to using a smoothing function. We
built this package after trying out several
free packages designed for medical diagnosis
evaluation. These tools had many useful
features including confidence intervals and
curve smoothing, but could not handle large
datasets well. Our code will handle as much
data as can fit in memory, but does not generate
confidence intervals. We found that
curve-smoothing was unattractive for our applications due to the large sizes of our data sets.
Recommender System Evaluation:
To make a GROC curve for evaluating recommender systems, use the
EmpiricalROCMaker class. The files Croc.java and
PerfectCroc.java contain classes to
plot the CROC curve and evaluate the
performance of a hypothetical perfect
recommender system according to CROC criteria.
Changes:
version 1.01: Tue May 21 2002
  EmpiricalROCMaker class now takes a third argument determining
how many points are drawn.
version 1.1: Tue February 11 2003
  CROC curve code has better user feedback. In the previous version there was a slight inconsistency between the paper description and the implementation. The implementation now matches the paper description exactly.
Downloads:
ROCtools is distributed as a jar (java archive)
file containing source
files. Download here.
What will I need to run
these tools?
ROCtools requires a Java Runtime Environment that is compatible with
Sun's version 1.2.2 or higher. Our
research group is
fond of IBM's JIT java compiler that comes with the IBM
Developer Kit for Linux.
Also you will need to install one other package employed by our
code. This package come with
its own license agreements. We are
linking to the appropriate web pages
rather than providing the packages
on our own page in order to avoid
license violations. Once you have downloaded the relevant jar files,
update your CLASSPATH variable.
1. We employ sorting routines from the Colt package. More info about
Colt can be obtained on the colt
web site . We use release 1.0.1.
Where can I learn more about ROC curves in general?
The University of Chicago folks have some tools available.
MedCalc has some interesting references.
This page is maintained by
Andrew Schein .
Page
created
on March
5,
2002.
Thanks to digits for their free counter!