|
Parsing Engine | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectdanbikel.parser.lang.AbstractTraining
danbikel.parser.arabic.Training
public class Training
Provides methods for language-specific processing of training parse trees.
Even though this subclass of Training is
in the default English language package, its primary purpose is simply
to fill in the AbstractTraining.argContexts, AbstractTraining.semTagArgStopSet and
AbstractTraining.nodesToPrune data members using a metadata resource. If this
capability is desired in another language package, this class may be
subclassed.
This class also re-defined the method
AbstractTraining.hasPossessiveChild(Sexp).
| Field Summary | |
|---|---|
protected static String[] |
caseMarkers
An array of case markers in Arabic Treebank part-of-speech tags. |
protected static String[] |
definiteMarkers
An array of definite/indefinite markers in Arabic Treebank part-of-speech tags. |
protected static String[] |
detPrefixMarkers
An array of determiner markers in Arabic Treebank part-of-speech tags. |
protected static String[] |
genderMarkers
An array of gender markers in Arabic Treebank part-of-speech tags. |
protected static String[][] |
markers
An array of the various markers arrays. |
protected static String[] |
moodMarkers
An array of verb mood markers in Arabic Treebank part-of-speech tags. |
protected static String[] |
nounSuffixMarkers
An array of noun markers in Arabic Treebank part-of-speech tags. |
protected static String[] |
numberMarkers
An array of number markers in Arabic Treebank part-of-speech tags (Arabic has forms for singular, plural and dual). |
protected static String[] |
personMarkers
An array of person/number markers (indicating information such as “first person singular”) in Arabic Treebank part-of-speech tags. |
protected static String[] |
pronounMarkers
An array of pronoun markers in Arabic Treebank part-of-speech tags. |
protected static boolean |
regularizeVerbs
If regularizeVerbs is true, it indicates that part of speech
tags that contain any of the patterns in the verbPatterns array
should be transformed simply into the pattern itself. |
protected static boolean[] |
remove
Indicates which of the various types of markers should be removed from Arabic Treebank part-of-speech tags during preprocessing (currently unused). |
protected static Symbol |
tagMapSym
The symbol associated with tag map metadata. |
protected static String[] |
verbPatterns
The match patterns used when regularizeVerbs is
true. |
| Fields inherited from class danbikel.parser.lang.AbstractTraining |
|---|
addGapInfo, argAugmentations, argContexts, argNonterminals, baseNP, canonicalAugDelimSym, defaultArgAugmentation, delimAndGapStr, delimAndGapStrLen, gapAugmentation, headFinder, headPostSym, headPreSym, headSym, metadataPropertyPrefix, nodesToPrune, NP, prunedPreterms, prunedPunctuation, relabelHeadChildrenAsArgs, repairBaseNPs, semTagArgStopSet, traceTag, treebank, wordsToPrune |
| Constructor Summary | |
|---|---|
Training()
The default constructor, to be invoked by Language. |
|
| Method Summary | |
|---|---|
protected void |
canonicalizeNonterminals(Sexp tree)
For arabic, we do not want to transform preterminals (parts of speech) to their canonical forms, so this method is overridden. |
protected int |
contains(StringBuffer searchBuf,
String[] searchPatterns,
IntCounter patternIdx)
Helper method used by TagMap.transformTag(Word). |
protected void |
createArgNonterminalsSet()
An overridden version of AbstractTraining.createArgNonterminalsSet()
that adds argument nonterminal patterns, such as *-SBJ, to the
set of argument nonterminals. |
protected boolean |
hasPossessiveChild(Sexp tree)
We override this method so that it always returns false,
so that the default implementation of addBaseNPs(Sexp) |
boolean |
isValidTree(Sexp tree)
If the specified tree has a root label with a print name equal to "X", then this method returns false;
otherwise, this method returns the value of the default implementation in
the superclass with the specified tree
(super.isValidTree(tree)). |
static void |
main(String[] args)
Test driver for this class. |
Sexp |
preProcess(Sexp tree)
The method to call before counting events in a training parse tree. |
SexpList |
preProcessTest(SexpList sentence,
SexpList originalWords,
SexpList tags)
Preprocesses the specified test sentence and its coordinated list of part-of-speech tags, leaving the original sentence untouched but providing a modified version of the coordinated list of tags, where each tag has been mapped using the value of the original word and the original tag using TagMap.transformTag(Word). |
protected void |
readMetadataHook(Symbol dataType,
int metadataLen,
SexpList metadata)
Reads the tag map metadata if the specified data type is equal to tagMapSym. |
Symbol |
startSym()
Returns the symbol to indicate hidden nonterminals that precede the first in a sequence of modifier nonterminals. |
Word |
startWord()
Returns the Word object that represents the hidden "head word"
of the start symbol. |
Symbol |
stopSym()
Returns the symbol to indicate a hidden nonterminal that follows the last in a sequence of modifier nonterminals. |
Word |
stopWord()
Returns the Word object that represents the hidden "head word"
of the stop symbol. |
Symbol |
topSym()
Returns the symbol to indicate the hidden root of all parse trees. |
Word |
topWord()
Returns the Word object that represents the hidden "head word"
of the hidden root of all parse trees. |
protected Symbol |
transformTagOld(Word word)
Deprecated. This method is the old mechanism by which to transform the part-of-speech tag associated with an Arabic word; it has been superseded by the method TagMap.transformTag(Word). |
protected Sexp |
transformTags(Sexp tree)
Does an in-place transformation of the part-of-speech tags in the specified tree. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected static final Symbol tagMapSym
protected static final String[] nounSuffixMarkers
protected static final String[] detPrefixMarkers
protected static final String[] personMarkers
protected static final String[] numberMarkers
protected static final String[] genderMarkers
protected static final String[] caseMarkers
protected static final String[] definiteMarkers
protected static final String[] pronounMarkers
protected static final String[] moodMarkers
protected static final String[][] markers
nounSuffixMarkers,
detPrefixMarkers,
personMarkers,
numberMarkers,
genderMarkers,
caseMarkers,
definiteMarkers,
pronounMarkers,
moodMarkersprotected static final boolean[] remove
markers.
protected static final boolean regularizeVerbs
true, it indicates that part of speech
tags that contain any of the patterns in the verbPatterns array
should be transformed simply into the pattern itself. For example, the
tag IV2D+VERB_IMPERFECT+IVSUFF_SUBJ:D_MOOD:SJ would be
transformed into, simply, VERB_IMPERFECT.
protected static final String[] verbPatterns
regularizeVerbs is
true.
| Constructor Detail |
|---|
public Training()
throws FileNotFoundException,
IOException
Language.
This constructor looks for a resource named by the property
metadataPropertyPrefix + language
where metadataPropertyPrefix is the value of
the constant AbstractTraining.metadataPropertyPrefix and language
is the value of Settings.get(Settings.language).
For example, the property for English is
"parser.training.metadata.english".
FileNotFoundException
IOException| Method Detail |
|---|
protected void readMetadataHook(Symbol dataType,
int metadataLen,
SexpList metadata)
tagMapSym.
readMetadataHook in class AbstractTrainingdataType - the data type of the specified metadata resource; if
the specified symbol is equal to tagMapSym then this method
will read and store the associated tag map metadatametadataLen - the length of the metadata listmetadata - the metadata resourcepublic Symbol startSym()
startSym in interface TrainingstartSym in class AbstractTrainingTrainerpublic Word startWord()
Word object that represents the hidden "head word"
of the start symbol. This method overrides the default implementation so
as to return a Word containing symbols that do not contain a plus
sign (+), which is a nonterminal augmentation delimiter in the
Arabic Treebank.
startWord in interface TrainingstartWord in class AbstractTrainingstartSym,
Trainerpublic Symbol stopSym()
stopSym in interface TrainingstopSym in class AbstractTrainingTrainerpublic Word stopWord()
Word object that represents the hidden "head word"
of the stop symbol. This method overrides the default implementation so as
to return a Word containing symbols that do not contain a plus
sign (+), which is a nonterminal augmentation delimiter in the
Arabic Treebank.
stopWord in interface TrainingstopWord in class AbstractTrainingstopSym,
Trainerpublic Symbol topSym()
topSym in interface TrainingtopSym in class AbstractTrainingTrainerpublic Word topWord()
Word object that represents the hidden "head word"
of the hidden root of all parse trees. This method overrides the default
implementation so as to return a Word containing symbols that do
not contain a plus sign (+), which is a nonterminal augmentation
delimiter in the Arabic Treebank.
topWord in interface TrainingtopWord in class AbstractTrainingpublic Sexp preProcess(Sexp tree)
transformTags(Sexp)
AbstractTraining.prune(Sexp)
AbstractTraining.addBaseNPs(Sexp)
AbstractTraining.removeNullElements(Sexp)
AbstractTraining.raisePunctuation(Sexp)
AbstractTraining.identifyArguments(Sexp)
AbstractTraining.stripAugmentations(Sexp)
AbstractTraining.raisePunctuation(Sexp) should be run after
AbstractTraining.removeNullElements(Sexp) because a null element that is a
leftmost or rightmost child can block detection of a punctuation element
that needs to be raised after removal of the null element (if a punctuation
element is the next-to-leftmost or next-to-rightmost child of an interior
node)
AbstractTraining.stripAugmentations(Sexp) should be run after all methods
that may depend upon the presence of nonterminal augmentations, such as
AbstractTraining.identifyArguments(Sexp)
preProcess in interface TrainingpreProcess in class AbstractTrainingtree - the parse tree to pre-process
tree having been pre-processedprotected void createArgNonterminalsSet()
AbstractTraining.createArgNonterminalsSet()
that adds argument nonterminal patterns, such as *-SBJ, to the
set of argument nonterminals.
createArgNonterminalsSet in class AbstractTraining
public SexpList preProcessTest(SexpList sentence,
SexpList originalWords,
SexpList tags)
TagMap.transformTag(Word).
preProcessTest in interface TrainingpreProcessTest in class AbstractTrainingsentence - the list of words, where a known word is a symbol and
an unknown word is represented by a 3-element list
(see DecoderServerRemote.convertUnknownWords(danbikel.lisp.SexpList))originalWords - the list of unprocessed words (all symbols)tags - the list of tag lists, where the list at index
i is the list of possible parts of speech for
the word at that index
sentence and the
second of which is a processed version of tags; if
tags is null, then the returned list will
contain only one element (since SexpList objects are
not designed to handle null elements)TagMap.transformTag(Word)public boolean isValidTree(Sexp tree)
false;
otherwise, this method returns the value of the default implementation in
the superclass with the specified tree
(super.isValidTree(tree)).
isValidTree in interface TrainingisValidTree in class AbstractTrainingtree - the tree to test for validitiy
false if the specified tree's root label is equal to
Symbol.add("X"), or super.isValidTree(tree)
otherwiseAbstractTraining.isAllNodesToPrune(Sexp),
Treebank.isPreterminal(Sexp)
protected int contains(StringBuffer searchBuf,
String[] searchPatterns,
IntCounter patternIdx)
TagMap.transformTag(Word).
protected Symbol transformTagOld(Word word)
TagMap.transformTag(Word).
word - the word whose part-of-speech tag is to be transformed
Word object
TagMap.transformTag(Word)protected Sexp transformTags(Sexp tree)
tree - the tree whose part-of-speech tags are to be mapped
protected boolean hasPossessiveChild(Sexp tree)
false,
so that the default implementation of addBaseNPs(Sexp)
never considers an NP to be a possessive NP. Thus,
the behavior of addBaseNPs is much simpler: all and only
NPs that do not dominate other NPs will be relabeled
NPB.
- Overrides:
hasPossessiveChild in class AbstractTraining
- Parameters:
tree - the tree to be tested
- Returns:
false, regardless of the value of the specified tree
protected void canonicalizeNonterminals(Sexp tree)
canonicalizeNonterminals in class AbstractTrainingtree - the tree for which nonterminals, but not parts of speech,
are to be transformed into their canonical formsTreebank.getCanonical(Symbol)public static void main(String[] args)
|
Parsing Engine | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||