Describes the tagging services of the LinguistX Platform library, version 2.2.
The LinguistX taggers label each word in a given sentence with a tag, which is chosen from a rich tag set. The tag contains part-of-speech information, plus noun or subject number information, verb tense information, and fine distinctions of use. Tagger features include:
The tags and language-specific characteristics of each tagger are described in the documents associated with each language module. However, some characteristics apply to all the taggers. These are described in this and the following sections. For the language-specific information, see the documents shipped with each tagger language module.
This section describes the various abbreviations used for features in tags. Tags consist of feature names separated by hyphens. The first feature name is called the major category and usually specifies the part of speech of the word. Not all languages use these feature names to mean the exact same categories, but the categories denoted are very similar.
The following table lists the parts of speech and other major categories identified by all the supported languages. Not all languages make the same distinctions between major categories or identify all categories listed:
| Category | Description |
| Abbr | abbreviation |
| Adj | adjective |
| Adv | adverb |
| Art | article |
| Aux | auxiliary |
| Aux/V | auxiliary /main verb |
| Cmpd | part of a compound |
| Conj | conjunction |
| Conn | connector (multiple functions) |
| Det | determiner |
| DetPron | determiner or pronoun |
| Foreign | foreign word |
| Func | function word (miscellaneous category) |
| Init | initials |
| Interj | interjection |
| Letter | single letter |
| Markup | formatting markup, e.g., SGML |
| Meas | unit of measure |
| Misc | uncategorized |
| Money | currency expression |
| Nn | noun |
| Num | numeric expression |
| Onom | onomatopoeia |
| Ord | ordinal number |
| Part | particle |
| Prep | preposition |
| Pron | pronoun |
| Prop | proper noun |
| Punct | punctuation |
| Time | time expression |
| Title | title |
| V | verb |
| V/A | verb or adjective |
| WordPart | part of a multi-word phrase |
Additional features are abbreviated as shown in the following table. Features not included in the table are single digits (1, 2, or 3), which denote 1st, 2nd, or 3rd person respectively. Groups of digits together denote disjunction. For example, Aux-3-Sg means 3rd person singular auxiliary, and Aux-12 means 1st or 2nd person auxiliary. When a feature appears in all lower case, as in the tag Prep-para from the Spanish tagger, it stands for a word in that language, and means that the word's distribution differs in some way from that of other words of its category. These features are not included in this table.
| Feature | Description |
| Acc | accusative (pronoun) |
| Adv | adverbial |
| Art | article |
| Attr | attributive |
| Circ | circumposition |
| Clitic | pronominal clitic |
| Close | closing (punctuation) |
| Comma | comma |
| Comp | comparative (adjective, adverb, or conjunction) |
| Coord | coordinating |
| Def | definite |
| Deg | degree |
| Dem | demonstrative |
| Det | determiner |
| Dig | digit form (of a number, as opposed to words) |
| Fam | family (name) |
| Fin | finite (verb) |
| Gen | genitive |
| Ger | gerund |
| Giv | given (name) |
| Imp | impersonal |
| Impv | imperative |
| Indef | indefinite |
| Indet | indeterminate |
| Inf | infinitive |
| Infin | infinitive |
| Init | initial |
| Int | interrogative |
| IntRel | interrogative or relative |
| Item | item |
| Left | left (part of a compound) |
| Meas | measure |
| Money | money expression |
| Name | name |
| Neg | negative |
| Nom | nominative (pronoun) |
| Open | opening (punctuation) |
| Org | organization (name) |
| PaPart | past participle |
| Part | left or right part of a compound |
| Past | past tense |
| Percent | percent expression |
| Pers | personal (pronoun) |
| PersRefl | personal or reflexive (pronoun) |
| Pl | plural |
| Place | place (name) |
| Poss | possessive |
| Post | postposition |
| PrPart | present participle |
| Pre | occurring before the major category (e.g., a pre-determiner modifies a determiner) |
| PreCoord | conjunctional adverb |
| Pred | predicative (adjective) |
| Prefix | prefix |
| Prep | preposition |
| Pres | present tense |
| Prog | progressive verb |
| Pron | pronoun |
| Quant | quantifier |
| Quote | quote (punctuation) |
| Recip | reciprocal |
| Refl | reflexive |
| Rel | relative |
| Right | right (part of a compound) |
| Roman | Roman (numeral) |
| Sent | sentence (punctuation) |
| SForm | special verbal form |
| Sg | singular |
| SGML | SGML markup tag |
| Slash | slash (punctuation) |
| SP | singular / plural |
| Sub | subordinating (conjunction) |
| Sup | superlative (adjective or adverb) |
| Title | title, used with a name |
| Word | word form (of a number, as opposed to digits) |
Some word classes are treated slightly differently in each language; these are described in each section. Some word types that receive particular attention here include demonstratives (e.g., this, that), quantifiers (some, all), interrogatives (where, when, who), and relativizers (that, which). Other words to be noted are number expressions and proper nouns.
In these taggers, words are marked for number but not for gender. The feature Pl stands for plural, and Sg for singular. If neither tag is used, or the tag is referred to as "invariant" in its description, the same word can be used for both singular and plural contexts.
In some cases, these descriptions may include a statement such as, "both uses of demonstratives are tagged Det-Dem." In this case the tag Det-Dem may be shorthand, standing for all three of the tags Det-Dem, Det-Dem-Sg, and Det-Dem-Pl, if they all exist for that particular tagger.
Each row of the tables in the language-specific sections contains a tag name for the given language, a brief description, and an example to illustrate the tag. Where some context is necessary to illustrate the meaning more clearly, the example word itself appears in bold, while the context words are in plain type. In examples without context, the illustrative word is in plain type.
For specific information about language-specific behavior of each of the language modules, please see the documentation that is shipped with the language modules.