edu.upenn.cis.spinal
Class Sentence

java.lang.Object
  extended by edu.upenn.cis.spinal.Sentence
All Implemented Interfaces:
Serializable

public class Sentence
extends Object
implements Serializable

Represents a sentence (an LTAG-spinal derivation tree) in Libin Shen's LTAG-spinal treebank. A typical sentence is represented in Libin Shen's thesis, page 73.

Author:
Lucas Champollion, Ryan Gabbard
See Also:
Serialized Form

Constructor Summary
Sentence(String representation)
          Creates a new Sentence object from a string representation following the format defined in Libin Shen's thesis.
 
Method Summary
 void computeSpanTable()
          Computes the span table, a directory whose keys are word spans and whose values are the corresponding subtrees (if any).
 boolean containsAdjunction()
          Returns true iff at least one of the elementary trees in this Sentence is an auxiliary tree.
 boolean containsAttachment()
          Returns true iff at least one of the elementary trees in this Sentence is an initial tree.
 boolean containsCoordination()
          Returns true iff at least one of the elementary trees in this Sentence is a conjunction tree.
 ListIterator elemTreesIterator()
          Iterates over the elementary trees of which this Sentence consists, in the order in which they are numbered (left to right in the sentence).
 ElemTree getElemTree(int n)
          Returns the ElemTree associated with the nth word of the sentence.
 List getElemTrees()
          Returns a List of ElemTrees for the given word span.
 List getElemTrees(int from, int to)
          Returns a List of ElemTrees for the given word span.
 int getFileNumber()
          Returns the number of the Penn Treebank file in which the current sentence occurred, or -1 if the sentence is not a Penn Treebank sentence.
 String getLocation()
          Returns a String representing the location of the current sentence -- i.e.
 ElemTree getRoot()
          Returns the elementary tree at the root of this Sentence.
 int getSectionNumber()
          Returns the number of the Penn Treebank section in which the current sentence occurred, or -1 if the sentence is not a Penn Treebank sentence.
 int getSentenceNumber()
          Returns the number of the current sentence in the Penn Treebank file or parser output.
 ElemTree getSubTree(int start, int end)
          Returns the unique ElemTree that is the root of a subtree whose yield is the specified word span, or null if there is no such tree.
 ElemTree getSubTree(WordSpan w)
          Returns the unique ElemTree that is the root of a subtree whose yield is the specified word span, or null if there is no such tree.
 boolean isBidirectionalParserOutput()
          Returns true if this Sentence has been read in from the format used in the output of Shen's bidirectional parser.
 boolean isSkipped()
          Returns true iff the annotation for this sentence only consists of the word "skip", indicating that it is contained in the Penn Treebank but not in the LTAG-spinal treebank.
 int length()
          Returns the length of this Sentence, that is, the number of elementary trees in this derivation tree.
 Sentence ofString(String representation)
          Convenience method that calls the constructor, to follow the conventions in the Propbank API.
 String prettyPrintLocation()
          Returns a human-readable string representing the location of the current sentence.
static Sentence readTree(BufferedReader inp)
          Reads a string representation of a derivation tree from the specified BufferedReader.
 ElemTree subTreeForSpan(WordSpan w)
          Returns the ElemTree whose yield is the given word span, or null if there isn't one.
 String toGraphviz(boolean includeSpans, boolean beanPoleStyle, boolean showSpines)
          Returns a visual representation of this sentence in Graphviz format.
 String toGraphviz(int start, int end, boolean includeSpans, boolean beanPoleStyle, boolean showSpines)
          Returns a visual representation of a subpart of this sentence in Graphviz format.
 String toString()
          Returns a string representation of this sentence in LTAG-spinal format.
 void writeGraphvizTo(BufferedWriter b, boolean includeSpans, boolean beanPoleStyle, boolean showSpines)
          Writes a visual representation of this sentence in Graphviz format to the specified BufferedWriter.
 void writeGraphvizTo(BufferedWriter b, int start, int end, boolean includeSpans, boolean beanPoleStyle, boolean showSpines)
          Writes a visual representation of a subpart of this sentence in Graphviz format to the specified BufferedWriter.
 void writeGraphvizTo(Writer w, boolean includeSpans, boolean beanPoleStyle, boolean showSpines)
          Writes a visual representation of this sentence in Graphviz format to the specified Writer.
 void writeGraphvizTo(Writer w, int start, int end, boolean includeSpans, boolean beanPoleStyle, boolean showSpines)
          Writes a visual representation of a subpart of this sentence in Graphviz format to the specified Writer.
 void writeTo(BufferedWriter b)
          Prints this sentence to the specified output in LTAG-spinal format.
 void writeTo(Writer w)
          Prints this sentence to the specified output in LTAG-spinal format.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Sentence

public Sentence(String representation)
         throws ElemTreeFormatException
Creates a new Sentence object from a string representation following the format defined in Libin Shen's thesis.

Parameters:
representation - a String containing a specification of a sentence in LTAG-spinal format
Throws:
ElemTreeFormatException - if an error occurs while parsing the string representation
Method Detail

ofString

public Sentence ofString(String representation)
                  throws ElemTreeFormatException
Convenience method that calls the constructor, to follow the conventions in the Propbank API.

Parameters:
representation - a String containing a specification of a sentence in LTAG-spinal format
Returns:
a new Sentence constructed from the given String
Throws:
ElemTreeFormatException - if an error occurs while parsing the string representation

readTree

public static Sentence readTree(BufferedReader inp)
                         throws ElemTreeFormatException,
                                IOException
Reads a string representation of a derivation tree from the specified BufferedReader.

Parameters:
inp - the BufferedReader from which to read
Returns:
a Sentence element representing the derivation tree, or null if the input contained nothing or contained only whitespace
Throws:
ElemTreeFormatException - if an error occurs while parsing the string representation
IOException - if an error occurs while reading

writeTo

public void writeTo(Writer w)
             throws IOException
Prints this sentence to the specified output in LTAG-spinal format.

Parameters:
w - the Writer to which this sentence is to be written
Throws:
IOException - if an error occurs during writing

writeTo

public void writeTo(BufferedWriter b)
             throws IOException
Prints this sentence to the specified output in LTAG-spinal format.

Parameters:
b - the BufferedWriter to which this sentence is to be written
Throws:
IOException - if an error occurs during writing

writeGraphvizTo

public void writeGraphvizTo(BufferedWriter b,
                            int start,
                            int end,
                            boolean includeSpans,
                            boolean beanPoleStyle,
                            boolean showSpines)
                     throws IOException
Writes a visual representation of a subpart of this sentence in Graphviz format to the specified BufferedWriter.

Parameters:
b - the BufferedWriter to which the subsentence is to be written
start - the first word of the sentence to be included in the graphical output
end - the last word of the sentence to be included in the graphical output
includeSpans - if true, the word span of the subtree dominated by a node is appended to that node's representation; otherwise, it only consists of the node label
beanPoleStyle - chooses between two very different styles of output -- if true, the output looks like beanpoles, if false, it looks like tadpoles. See the illustrations on the LTAG-spinal website.
showSpines - if true, shows the internal structure of the elementary trees; otherwise, shows each elementary tree as one single node
Throws:
IOException - if an error occurs during writing

writeGraphvizTo

public void writeGraphvizTo(Writer w,
                            boolean includeSpans,
                            boolean beanPoleStyle,
                            boolean showSpines)
                     throws IOException
Writes a visual representation of this sentence in Graphviz format to the specified Writer.

Parameters:
w - the Writer to which this sentence is to be written
includeSpans - if true, the word span of the subtree dominated by a node is appended to that node's representation; otherwise, it only consists of the node label
beanPoleStyle - chooses between two very different styles of output -- if true, the output looks like beanpoles, if false, it looks like tadpoles. See the illustrations on the LTAG-spinal website.
showSpines - if true, shows the internal structure of the elementary trees; otherwise, shows each elementary tree as one single node
Throws:
IOException - if an error occurs while writing

writeGraphvizTo

public void writeGraphvizTo(Writer w,
                            int start,
                            int end,
                            boolean includeSpans,
                            boolean beanPoleStyle,
                            boolean showSpines)
                     throws IOException
Writes a visual representation of a subpart of this sentence in Graphviz format to the specified Writer.

Parameters:
w - the Writer to which the subsentence is to be written
start - the first word of the sentence to be included in the graphical output
end - the last word of the sentence to be included in the graphical output
includeSpans - if true, the word span of the subtree dominated by a node is appended to that node's representation; otherwise, it only consists of the node label
beanPoleStyle - chooses between two very different styles of output -- if true, the output looks like beanpoles, if false, it looks like tadpoles. See the illustrations on the LTAG-spinal website.
showSpines - if true, shows the internal structure of the elementary trees; otherwise, shows each elementary tree as one single node
Throws:
IOException - if an error occurs while writing

writeGraphvizTo

public void writeGraphvizTo(BufferedWriter b,
                            boolean includeSpans,
                            boolean beanPoleStyle,
                            boolean showSpines)
                     throws IOException
Writes a visual representation of this sentence in Graphviz format to the specified BufferedWriter.

Parameters:
b - the BufferedWriter to which this sentence is to be written
includeSpans - if true, the word span of the subtree dominated by a node is appended to that node's representation; otherwise, it only consists of the node label
beanPoleStyle - chooses between two very different styles of output -- if true, the output looks like beanpoles, if false, it looks like tadpoles. See the illustrations on the LTAG-spinal website.
showSpines - if true, shows the internal structure of the elementary trees; otherwise, shows each elementary tree as one single node
Throws:
IOException - if an error occurs while writing

toString

public String toString()
Returns a string representation of this sentence in LTAG-spinal format.

Overrides:
toString in class Object
Returns:
a string representing this sentence

toGraphviz

public String toGraphviz(boolean includeSpans,
                         boolean beanPoleStyle,
                         boolean showSpines)
Returns a visual representation of this sentence in Graphviz format.

Parameters:
includeSpans - if true, the word span of the subtree dominated by a node is appended to that node's representation; otherwise, it only consists of the node label
beanPoleStyle - chooses between two very different styles of output -- if true, the output looks like beanpoles, if false, it looks like tadpoles. See the illustrations on the LTAG-spinal website.
showSpines - if true, shows the internal structure of the elementary trees; otherwise, shows each elementary tree as one single node
Returns:
a String containing Graphviz format

toGraphviz

public String toGraphviz(int start,
                         int end,
                         boolean includeSpans,
                         boolean beanPoleStyle,
                         boolean showSpines)
Returns a visual representation of a subpart of this sentence in Graphviz format.

Parameters:
start - the first word of the sentence to be included in the graphical output
end - the last word of the sentence to be included in the graphical output
includeSpans - if true, the word span of the subtree dominated by a node is appended to that node's representation; otherwise, it only consists of the node label
beanPoleStyle - chooses between two very different styles of output -- if true, the output looks like beanpoles, if false, it looks like tadpoles. See the illustrations on the LTAG-spinal website.
showSpines - if true, shows the internal structure of the elementary trees; otherwise, shows each elementary tree as one single node
Returns:
a String containing Graphviz format

getLocation

public String getLocation()
Returns a String representing the location of the current sentence -- i.e. either a triple of section, file, and sentence number as in the LTAG-spinal treebank (following the Penn Treebank conventions), or simply a sentence number if the sentence is not from the LTAG-spinal treebank.

Returns:
three numbers indicating where this sentence is found in the input

prettyPrintLocation

public String prettyPrintLocation()
Returns a human-readable string representing the location of the current sentence. If the sentence is taken from the LTAG-spinal treebank, the string looks as follows:
 Section: X File: Y Sentence: Z
 
Otherwise, if the sentence only has a sentence number, the string looks like
 Sentence: Z
 

Returns:
a human-readable string indicating where this sentence is found in the input

getSubTree

public ElemTree getSubTree(WordSpan w)
Returns the unique ElemTree that is the root of a subtree whose yield is the specified word span, or null if there is no such tree.

Parameters:
w - the span from the first up to and including the last word
Returns:
an ElemTree or null
Throws:
SkippedSentenceException - if the current sentence is a skipped sentence in the LTAG-spinal treebank

getSubTree

public ElemTree getSubTree(int start,
                           int end)
Returns the unique ElemTree that is the root of a subtree whose yield is the specified word span, or null if there is no such tree.

Parameters:
start - the first (leftmost) word included in the span
end - the last (rightmost) word included in the span
Returns:
an ElemTree or null
Throws:
SkippedSentenceException - if the current sentence is a skipped sentence in the LTAG-spinal treebank

isBidirectionalParserOutput

public boolean isBidirectionalParserOutput()
Returns true if this Sentence has been read in from the format used in the output of Shen's bidirectional parser. If this is the case, no information about the spine is present. This is implemented as a simple lookup of the corresponding property of the ElemTree at the root of this sentence.

Returns:
a boolean value
See Also:
ElemTree.isBidirectionalParserOutput()

isSkipped

public boolean isSkipped()
Returns true iff the annotation for this sentence only consists of the word "skip", indicating that it is contained in the Penn Treebank but not in the LTAG-spinal treebank.

Returns:
true iff this sentence is skipped in the LTAG-spinal treebank

getSentenceNumber

public int getSentenceNumber()
Returns the number of the current sentence in the Penn Treebank file or parser output.

Returns:
the sentence number

getSectionNumber

public int getSectionNumber()
Returns the number of the Penn Treebank section in which the current sentence occurred, or -1 if the sentence is not a Penn Treebank sentence.

Returns:
the section number, or -1 if there is no such number

getFileNumber

public int getFileNumber()
Returns the number of the Penn Treebank file in which the current sentence occurred, or -1 if the sentence is not a Penn Treebank sentence.

Returns:
the file number, or -1 if there is no such number

subTreeForSpan

public ElemTree subTreeForSpan(WordSpan w)
Returns the ElemTree whose yield is the given word span, or null if there isn't one.

Parameters:
w - the WordSpan for which the dominating tree is to be returned
Returns:
an ElemTree or null
Throws:
SkippedSentenceException - if the current sentence is a skipped sentence in the LTAG-spinal treebank

computeSpanTable

public void computeSpanTable()
Computes the span table, a directory whose keys are word spans and whose values are the corresponding subtrees (if any).

Throws:
SkippedSentenceException - if the current sentence is a skipped sentence in the LTAG-spinal treebank

containsAttachment

public boolean containsAttachment()
Returns true iff at least one of the elementary trees in this Sentence is an initial tree.

Returns:
true iff there is at least one attachment operation in the present tree
Throws:
SkippedSentenceException - if the current sentence is a skipped sentence in the LTAG-spinal treebank

containsAdjunction

public boolean containsAdjunction()
Returns true iff at least one of the elementary trees in this Sentence is an auxiliary tree.

Returns:
true iff there is at least one adjunction operation in the present tree
Throws:
SkippedSentenceException - if the current sentence is a skipped sentence in the LTAG-spinal treebank

containsCoordination

public boolean containsCoordination()
Returns true iff at least one of the elementary trees in this Sentence is a conjunction tree.

Returns:
true iff there is at least one coordination operation in the present tree
Throws:
SkippedSentenceException - if the current sentence is a skipped sentence in the LTAG-spinal treebank

length

public int length()
Returns the length of this Sentence, that is, the number of elementary trees in this derivation tree.

Returns:
the number of elementary trees in this Sentence
Throws:
SkippedSentenceException - if the current sentence is a skipped sentence in the LTAG-spinal treebank

getRoot

public ElemTree getRoot()
Returns the elementary tree at the root of this Sentence.

Returns:
the ElemTree in which this derivation tree is rooted
Throws:
SkippedSentenceException - if the current sentence is a skipped sentence in the LTAG-spinal treebank

elemTreesIterator

public ListIterator elemTreesIterator()
Iterates over the elementary trees of which this Sentence consists, in the order in which they are numbered (left to right in the sentence).

Returns:
a ListIterator
Throws:
SkippedSentenceException - if the current sentence is a skipped sentence in the LTAG-spinal treebank

getElemTree

public ElemTree getElemTree(int n)
Returns the ElemTree associated with the nth word of the sentence.

Parameters:
n - a number between 0 and the length of the sentence
Returns:
an ElemTree for the nth word of the sentence
Throws:
IndexOutOfBoundsException - if index is out of range (index < 0 || index >= size()).

getElemTrees

public List getElemTrees()
Returns a List of ElemTrees for the given word span.

Returns:
an ordered list containing some of the elementary trees of which this sentence consists
Throws:
SkippedSentenceException - if the current sentence is a skipped sentence in the LTAG-spinal treebank

getElemTrees

public List getElemTrees(int from,
                         int to)
Returns a List of ElemTrees for the given word span.

Parameters:
from - the first word to be included in the list
to - the last word to be included in the list
Returns:
an ordered list containing some of the elementary trees of which this sentence consists
Throws:
SkippedSentenceException - if the current sentence is a skipped sentence in the LTAG-spinal treebank