The files below contain annotations and automatic classifier predictions for general or specific nature of sentences.
The annotations were obtained using Amazon Mechanical Turk for sentences from three different corpora: Wall Street Journal, Associated Press and the science section of New York Times. We also developed an automatic classifier that can make the binary distinction between general and specific sentences with 75% accuracy. The predictions using the best set of features are also included in these data files.
More details about the annotations and classifier can be found in our paper.
Annie Louis and Ani Nenkova, Automatic identification of general and specific sentences by leveraging discourse annotations, Proceedings of IJCNLP, 2011. [pdf]
The files below are one for each corpus that we had in our annotation set. There are approximately 300 sentences in each of them. Each sentence was annotated by 5 judges. The files contain tab-separated columns and the fields are the following: