|
Parsing Engine | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectdanbikel.parser.lang.AbstractTreebank
danbikel.parser.chinese.Treebank
public class Treebank
Provides data and methods specific to the structures found in the Chinese Treebank or any other treebank that conforms to the same annotation guidelines.
| Field Summary |
|---|
| Fields inherited from class danbikel.parser.lang.AbstractTreebank |
|---|
augmentationDelimSet, canonicalAugDelimSym, nonterminalExceptionSet |
| Constructor Summary | |
|---|---|
Treebank()
Constructs a Chinese Treebank object. |
|
| Method Summary | |
|---|---|
String |
augmentationDelimiters()
Returns a string of the three characters that serve as augmentation delimiters in the Chinese Treebank: "-=|". |
Symbol |
baseNPLabel()
Returns the symbol with which AbstractTraining.addBaseNPs(Sexp) will
relabel base NPs. |
Symbol |
getCanonical(Symbol label)
Returns a canonical mapping for the specified nonterminal label; if label already is in canonical form, it is returned. |
Symbol |
getCanonical(Symbol label,
boolean stripAugmentations)
When the stripAugmentations argument is true, this method returns the same value as would be returned by getCanonical(Symbol)
when passed the label argument; otherwise, the specified nonterminal
is canonicalized unless it contains augmentations, in which case
it is returned untouched. |
boolean |
isComma(Symbol word)
Returns true if the specified word is a comma. |
boolean |
isConjunction(Symbol label)
Returns true if label is equal to the symbol
whose print name is "CC". |
boolean |
isLeftParen(Symbol word)
Returns true if the specified word is a left
parenthesis. |
boolean |
isNP(Symbol label)
Returns true if the canonical version of the specified label
is an NP for for Chinese Treebank. |
boolean |
isNullElementPreterminal(Sexp tree)
Returns true if the specified S-expression represents a
preterminal whose terminal element is the null element
("-NONE-") for the Chinese Treebank. |
boolean |
isPossessivePreterminal(Sexp tree)
Returns true if the specified S-expression represents
a preterminal that is the possessive part of speech. |
boolean |
isPreterminal(Sexp tree)
Returns true if tree represents a preterminal
subtree (part-of-speech tag and word). |
boolean |
isPuncToRaise(Sexp preterm)
Returns true if the specified S-expression is a preterminal
whose part of speech is "," or
". |
boolean |
isPunctuation(Symbol tag)
Returns true if the specified part of speech tag is one
for which AbstractTreebank.isPuncToRaise(Sexp) would return true. |
boolean |
isRightParen(Symbol word)
Returns true if the specified word is a right
parenthesis. |
boolean |
isSentence(Symbol label)
Returns true is the specified nonterminal label represents a
sentence in the Penn Treebank, that is, if the canonical version of
label is equal to "S". |
boolean |
isVerb(Sexp preterminal)
Returns true if preterminal represents a
terminal with one of the following parts of speech: VB, VBD, VBG,
VBN, VBP or VBZ. |
boolean |
isVerbTag(Symbol tag)
Returns true if the specified symbol is the part of speech
tag of a verb. |
boolean |
isWHNP(Symbol label)
Returns true if the canonical version of the specified label
is a WHNP in the Chinese Treebank. |
Symbol |
NPLabel()
Returns the symbol that AbstractTraining.addBaseNPs(Sexp) should
add as a parent if a base NP is not dominated by an NP. |
Nonterminal |
parseNonterminal(Symbol label,
Nonterminal nonterminal)
Calls AbstractTreebank.defaultParseNonterminal(Symbol, Nonterminal) with
the specified arguments. |
Symbol |
sentenceLabel()
Returns the canonical label for a sentence, for de-transforming sentences that were transformed via Training.relabelSubjectlessSentences(Sexp). |
Symbol |
subjectAugmentation()
Returns the symbol that is used to augment nonterminals to indicate matrix subjects in this language’s Treebank. |
Symbol |
subjectlessSentenceLabel()
Returns the symbol that relabelSubjectlessSentences
will use for sentences that have no subjects. |
| Methods inherited from class danbikel.parser.lang.AbstractTreebank |
|---|
addAugmentation, canonicalAugDelimiter, constructPreterminal, containsAugmentation, defaultParseNonterminal, getTag, getTraceIndex, isAugDelim, isBaseNP, makeWord, nonTreebankDelimiter, nonTreebankLeftBracket, nonTreebankRightBracket, parseNonterminal, removeAugmentation, removeAugmentation, stripAllButIndex, stripAllButIndex, stripAugmentation, stripIndex, stripIndex |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public Treebank()
Treebank object.
| Method Detail |
|---|
public final boolean isPreterminal(Sexp tree)
true if tree represents a preterminal
subtree (part-of-speech tag and word). Specifically, this method
returns true if tree is an instance of
SexpList, has a length of 2 and has a first list element
of type Symbol.
isPreterminal in interface TreebankisPreterminal in class AbstractTreebankpublic boolean isSentence(Symbol label)
true is the specified nonterminal label represents a
sentence in the Penn Treebank, that is, if the canonical version of
label is equal to "S".
isSentence in interface TreebankisSentence in class AbstractTreebankTraining.relabelSubjectlessSentences(Sexp)public Symbol sentenceLabel()
AbstractTreebankTraining.relabelSubjectlessSentences(Sexp).
sentenceLabel in interface TreebanksentenceLabel in class AbstractTreebankpublic Symbol subjectlessSentenceLabel()
relabelSubjectlessSentences
will use for sentences that have no subjects.
subjectlessSentenceLabel in interface TreebanksubjectlessSentenceLabel in class AbstractTreebankpublic Symbol subjectAugmentation()
AbstractTreebank
subjectAugmentation in interface TreebanksubjectAugmentation in class AbstractTreebankTraining.relabelSubjectlessSentences(Sexp)public boolean isNullElementPreterminal(Sexp tree)
true if the specified S-expression represents a
preterminal whose terminal element is the null element
("-NONE-") for the Chinese Treebank.
N.B.: Some null elements in the Chinese Treebank have indices appended. Consequently, this method simply checks if the print name of the preterminal starts with the string -NONE-.
isNullElementPreterminal in interface TreebankisNullElementPreterminal in class AbstractTreebankTraining.relabelSubjectlessSentences(Sexp)public boolean isPuncToRaise(Sexp preterm)
true if the specified S-expression is a preterminal
whose part of speech is "," or
".".
isPuncToRaise in interface TreebankisPuncToRaise in class AbstractTreebankpreterm - the preterminal to testTraining.raisePunctuation(Sexp)public boolean isPunctuation(Symbol tag)
AbstractTreebanktrue if the specified part of speech tag is one
for which AbstractTreebank.isPuncToRaise(Sexp) would return true.
isPunctuation in interface TreebankisPunctuation in class AbstractTreebanktag - the part of speech to testAbstractTreebank.isPuncToRaise(Sexp)public boolean isPossessivePreterminal(Sexp tree)
true if the specified S-expression represents
a preterminal that is the possessive part of speech. This method is
intended to be used by implementations of AbstractTraining.addBaseNPs(Sexp).
isPossessivePreterminal in interface TreebankisPossessivePreterminal in class AbstractTreebankTraining.addBaseNPs(Sexp)public boolean isNP(Symbol label)
true if the canonical version of the specified label
is an NP for for Chinese Treebank.
isNP in interface TreebankisNP in class AbstractTreebanklabel - the label to testAbstractTraining.addBaseNPs(Sexp)public Symbol baseNPLabel()
AbstractTraining.addBaseNPs(Sexp) will
relabel base NPs.
baseNPLabel in interface TreebankbaseNPLabel in class AbstractTreebankAbstractTraining.addBaseNPs(danbikel.lisp.Sexp)public boolean isWHNP(Symbol label)
true if the canonical version of the specified label
is a WHNP in the Chinese Treebank.
isWHNP in interface TreebankisWHNP in class AbstractTreebankAbstractTraining.addGapInformation(Sexp)public Symbol NPLabel()
AbstractTraining.addBaseNPs(Sexp) should
add as a parent if a base NP is not dominated by an NP.
NPLabel in interface TreebankNPLabel in class AbstractTreebankTraining.addBaseNPs(Sexp)public boolean isConjunction(Symbol label)
true if label is equal to the symbol
whose print name is "CC".
isConjunction in interface TreebankisConjunction in class AbstractTreebankpublic boolean isVerb(Sexp preterminal)
true if preterminal represents a
terminal with one of the following parts of speech: VB, VBD, VBG,
VBN, VBP or VBZ. It is an error to call this method
with a Sexp object for which isPreterminal(Sexp)
returns false.
isVerb in interface TreebankisVerb in class AbstractTreebankpreterminal - the preterminal to test
true if preterminal is a verbHeadTreeNode,
Trainerpublic boolean isVerbTag(Symbol tag)
AbstractTreebanktrue if the specified symbol is the part of speech
tag of a verb. This method should return true for exactly the same
parts of speech for which AbstractTreebank.isVerb(Sexp) returns true,
and is used to calculate the distance metric while decoding.
isVerbTag in interface TreebankisVerbTag in class AbstractTreebankCKYItem.containsVerb(),
Decoderpublic boolean isComma(Symbol word)
AbstractTreebanktrue if the specified word is a comma. This method
is used by the Decoder class when performing the comma
constraint on chart items.
isComma in interface TreebankisComma in class AbstractTreebankword - the word to testSettings.decoderUseCommaConstraintpublic boolean isLeftParen(Symbol word)
AbstractTreebanktrue if the specified word is a left
parenthesis. This method is used by the Decoder
class when performing the comma constraint on chart items.
isLeftParen in interface TreebankisLeftParen in class AbstractTreebankword - the word to testSettings.decoderUseCommaConstraintpublic boolean isRightParen(Symbol word)
AbstractTreebanktrue if the specified word is a right
parenthesis. This method is used by the Decoder
class when performing the comma constraint on chart items.
isRightParen in interface TreebankisRightParen in class AbstractTreebankword - the word to testSettings.decoderUseCommaConstraintpublic final Symbol getCanonical(Symbol label)
label already is in canonical form, it is returned.
The canonical mapping refers to transformations performed on nonterminals
during the training process. Before obtaining a label's canonical form,
it is also stripped of all augmentations (see
AbstractTreebank.stripAugmentation(Symbol)).
getCanonical in interface TreebankgetCanonical in class AbstractTreebanklabel - the label to be canonicalized
Symbol with the same print name as
label, except that all training transformations and Treebank
augmentations have been undone and strippedHeadFinder.findHead(Sexp)
public final Symbol getCanonical(Symbol label,
boolean stripAugmentations)
getCanonical(Symbol)
when passed the label argument; otherwise, the specified nonterminal
is canonicalized unless it contains augmentations, in which case
it is returned untouched.
getCanonical in interface TreebankgetCanonical in class AbstractTreebanklabel - the nonterminal label for which a canonical form is to be
returnedstripAugmentations - whether to strip augmentations from the
specified nonterminal label before canonicalization
public Nonterminal parseNonterminal(Symbol label,
Nonterminal nonterminal)
AbstractTreebank.defaultParseNonterminal(Symbol, Nonterminal) with
the specified arguments.
parseNonterminal in interface TreebankparseNonterminal in class AbstractTreebanklabel - to the nonterminal label to parsenonterminal - the Nonterminal object to fill with
the components of labelpublic String augmentationDelimiters()
"-=|".
augmentationDelimiters in interface TreebankaugmentationDelimiters in class AbstractTreebankAbstractTreebank.stripAugmentation(Symbol),
AbstractTreebank.defaultParseNonterminal(Symbol,Nonterminal)
|
Parsing Engine | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||