Towards Prioritizing Documentation Effort

This page is the online appendix to our paper documentation prioritization. If you have any questions,
please contact Paul W. McBurney at

Static Attributes and Textual Comparison Data

The following links contain zip files of our open and closed source projects. The files are broken into
.csv and .arff files. The files are also then broken into numeric (raw) data and data with a discrete target
class. For example, Top25, which we used in our correctness studies, is where the data is broken into "TOP"
for the highest scoring 25% and "MID" for the bottom scoring 75%. The average scores represent the scores
based on our user studies. All other columns are data columns.

Open-Source Files
Closed-Source Files

Survey Response Data

We have included our survey response data. Because there are differences between the surveys, view the
readme files in both subfolders. In both cases, a larger number corresponds to higher importance. In the
open source survey, 4 was the highest score. In the close-source survey, 5 was the highest score.

Open-Source Responses
Closed-Source Responses

Open-Source Survey Data


Open-Source Class Level Survey Data


Closed-Source Survey Data

Project-B Class Questions Data (Updated)
Project-D Class Questions Data (Updated)

STASIS implementation This requires the NLTK package. This is a "slow" implementation that does not use caching. It will
take several hours to run on programs as large as jGraphT and jxl.