Describes the tagset and language-specific characteristics of the English tagger and noun phrase recognizer. For information about how to use the tagging API and general tag information that applies to all languages, see the document LinguistX Tagging that comes with the runtime. For information about the noun phrase recognition service, see LinguistX Noun Phrase Recognition.
The English tagger output contains the following language-specific characteristics:
Proper nouns are identified with three main tags: Prop-Org, Prop-Name, and Prop-Place. Businesses and other organizations are tagged Prop-Org; place names are tagged Prop-Place. People and other things (buildings that aren't considered places, book titles, etc.) are tagged Prop-Name. Titles such as Mr. are tagged Prop-Title.
Hyphenated words, when used as modifiers, are tagged Adj.
Words such as his, her, and its, when used as modifiers, are tagged as possessive determiners (Det-Poss). The possessive pronouns his, hers, and its are tagged as pronouns (Pron).
Past-tense verbs are tagged V-Past; past perfect and passive uses are tagged past participle (V-PaPart); past participles that have become adjectives and are used as modifiers are tagged Adj.
Present participles are tagged V-Prog when used as a verb. They are tagged Nn-Sg when used as a noun, and Adj when used as a modifier.
The following table shows the complete English tag set.
| Tag | Description | Examples |
| Abbr | abbreviation that is not a title | i.e. |
| Abbr-Meas | abbreviation of measure | oz. |
| Adj | adjective | big |
| Adj-Comp | comparative adjective | bigger |
| Adj-Sup | superlative adjective | biggest |
| Adv | adverb | quickly |
| Adv-Comp | comparative adverb | earlier |
| Adv-Int | wh-adverb | how, when |
| Adv-Sup | superlative adverb | fastest |
| Aux | auxiliary or modal | has, could |
| Conj-Coord | coordinating conjunction | and |
| Conj-Sub | subordinating conjunction | if, that |
| Det | invariant determiner (singular or plural) | some, no |
| Det-Def | definite determiner | the |
| Det-Indef | indefinite determiner | a |
| Det-Int | wh-determiner | what, which, whose |
| Det-Pl | plural determiner | these, those |
| Det-Poss | possessive determiner | her, his, its |
| Det-Rel | relative determiner | whose |
| Det-Sg | singular determiner | this, that |
| Interj | interjection | oh, hello |
| Letter | letter | a, b, c |
| Markup-SGML | SGML markup | <TITLE> |
| Nn | invariant noun | sheep |
| Nn-Pl | plural noun | computers |
| Nn-Sg | singular noun | table |
| Num | number or numeric expression | 40.5 |
| Num-Money | monetary amount | $12.55 |
| Num-Percent | percentage | 12% |
| Num-Roman | roman numeral | XVII, xvii |
| Onom | onomatopoeia | meow |
| Ord | ordinal number | first, second |
| Part-Inf | infinitive marker | to |
| Part-Neg | negative particle | not |
| Part-Poss | possessive marker | 's, ' |
| Prep | preposition | in, on, to |
| Pron | pronoun | he |
| Pron-Int | wh-pronoun | who |
| Pron-Refl | reflexive pronoun | himself |
| Pron-Rel | relative pronoun | who, whom, that, which |
| Prop-Name | name of a person or thing | Graceland, Aesop |
| Prop-Name-Fam | last name | Jones |
| Prop-Name-Giv | first name | Susan, Jacob |
| Prop-Org | name of an organization | Xerox |
| Prop-Place | place name | Colorado |
| Prop-Title | title | Mr., Gen. |
| Punct | other punctuation | - ; / |
| Punct-Close | closing punctuation | ) ] } |
| Punct-Comma | comma | , |
| Punct-Open | opening punctuation | ( [ { |
| Punct-Quote | quote | ' " '' |
| Punct-Sent | sentence-ending punctuation | . ! ? |
| Time | time expression | 9:00 |
| V-PaPart | verb, past participle | understood |
| V-PaPart-be | past participle of to be | been |
| V-Past | verb, past tense | ran |
| V-Past-Pl-be | verb, past tense plural of to be | were |
| V-Past-Sg-be | verb, past tense singular of to be | was |
| V-Pres | verb, present tense or infinitive | walk |
| V-Pres-3-Sg | verb, present tense, 3rd person singular | runs |
| V-Pres-Pl-be | verb, present tense plural of to be | are |
| V-Pres-Sg-be | verb, present tense singular of to be | is |
| V-Prog | progressive verb | swimming |
| WordPart | part of a multi-word phrase | quo |
The English phrase extractor defines simple noun phrases as:
Prepositions other than of and at are excluded because of ambiguity in English of prepositional phrase binding.
Noun phrases with commas are recognized.
Proper-noun groups are kept together during subphrase finding.