ROCtools: ROC Curve Code


We are releasing our source code for generating ROC curves and variants for recommender system applications. This code is provided in source code form licensed under the GNU general public license. It's intended audience is experienced software developers who are knowledgeable in java.

Earlier versions of this code were used in:

Andrew I. Schein, Alexandrin Popescul, Lyle H. Ungar, and David M. Pennock. Methods and Metrics for Cold-Start Recommendations. Appeared in Proceedings of the 25'th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002), pp 253-260. August 11-15, 2002. Tampere, Finland. [.ps.gz] [.pdf]

What is an ROC curve?
An ROC curve demonstrates the tradeoff between true-positive rate and false positive rate in binary classification problems as a function of varying a classification threshold. The method was pioneered in evaluating radar operators where it was given the name: receiver operating characteristic curve. The method gained popularity among statisticians for evaluating diagnosis tests, and was later adopted by the machine learning community for general binary classification evaluation. ROC curves are similar to the precision/recall curves used in information retrieval as well as lift curves used in marketing communities.

Our .java files include code for generating Empirical ROC curves (EmpiricalROCMaker.java). An empirical ROC curve is a curve plotted using the empirical distribution of the data as opposed to using a smoothing function. We built this package after trying out several free packages designed for medical diagnosis evaluation. These tools had many useful features including confidence intervals and curve smoothing, but could not handle large datasets well. Our code will handle as much data as can fit in memory, but does not generate confidence intervals. We found that curve-smoothing was unattractive for our applications due to the large sizes of our data sets.

Recommender System Evaluation:
To make a GROC curve for evaluating recommender systems, use the EmpiricalROCMaker class. The files Croc.java and PerfectCroc.java contain classes to plot the CROC curve and evaluate the performance of a hypothetical perfect recommender system according to CROC criteria.

Changes:
version 1.01: Tue May 21 2002
  EmpiricalROCMaker class now takes a third argument determining how many points are drawn.

version 1.1: Tue February 11 2003
  CROC curve code has better user feedback. In the previous version there was a slight inconsistency between the paper description and the implementation. The implementation now matches the paper description exactly.

Downloads:
ROCtools is distributed as a jar (java archive) file containing source files. Download here.


What will I need to run these tools?
ROCtools requires a Java Runtime Environment that is compatible with Sun's version 1.2.2 or higher. Our research group is fond of IBM's JIT java compiler that comes with the IBM Developer Kit for Linux.

Also you will need to install one other package employed by our code. This package come with its own license agreements. We are linking to the appropriate web pages rather than providing the packages on our own page in order to avoid license violations. Once you have downloaded the relevant jar files, update your CLASSPATH variable.

1. We employ sorting routines from the Colt package. More info about Colt can be obtained on the colt web site . We use release 1.0.1.

Where can I learn more about ROC curves in general?
The University of Chicago folks have some tools available.
MedCalc has some interesting references.


This page is maintained by Andrew Schein .
Page created on March 5, 2002.


Thanks to digits for their free counter!