Describes the noun-phrase recognition services of the LinguistX Platform library, version 2.2. For specific information about language-specific behavior of each of the language modules, please see the documentation that is shipped with the language modules.
The phrase extractor works on one or more complete sentences at a time, and finds simple noun phrases in the input. (For the purposes of this application, "simple noun phrases" don't include determiners--they are closer to what a linguist might call "N-bar".) Running noun phrase recognition requires first running tokenization and part-of-speech tagging. The input to the phrase extractor is the list of part-of-speech tags output by the tagger.
Default behavior of the phrase extractor is to find noun phrases of maximum extent in the input. For example, given the sentence:
The President considered the impact of foreign trade policy on American businesses.
The maximum-extent noun phrases found are:
President
impact of foreign trade policy
American businesses
The phrase extractor can also be directed to give all sub-phrases. Finding all subgroups results in:
President
impact of foreign trade policy
American businesses
impact
impact of foreign trade
foreign trade
foreign trade policy
trade
trade policy
policy
businesses