* * Lecture notes by Edward Loper * * Course: CIS 630 (NLP Seminar: Structural Representations) * Professor: Joshi * Institution: University of Pennsylvania * [01/15/01 05:17 PM] >> Representationally Oriented Grammars (or Grammars for Analysis, Grammars as Constraint Satisfaction) Look at grammar as a set of constraints that the sentence must satisfy. Sentence is associated with a description: # S \to D Grammar's job is to decide whether a description D is consistant. Just tells you what representations are licensed, doesn't say how to get them. >> Derivationally Oriented Grammars (or grammars for generation) Say how to derive a grammar. How to construct a derivation D. Generative system. [01/22/01 04:34 PM] Define finite automata on tree: type\_node x (state\_child1, state\_child2, \ldots) \to state\_node e.g. The x () \to q1 table x () \to q2 NP x (q1, q2) \to q3 At the top, check if you're in the set of accepting states. "recognizable set" of trees is exactly those trees that are accepted by FSA on trees. > to do - send email re what i want to work on [01/29/01 04:41 PM] nested dependencies vs. cross dependencies: CFGs can only give nested dependancies. [02/05/01 04:34 PM] see joshi re lexical semantic info after class.. take initiative? ;) > Unification Implementing constraints on substitution and adjoining (and in particular adjoining) 3 types of constraints: - selective adjoining -- Feature structures implicitly specify constraints. - null adjoining - obligatory adjoining -- at that node, at least one adjunction must take place Adjoining changes already-built structure. It's a higher order operation than substitution, and a higher order abstraction.. # {\textasciicircum} {\textasciicircum} # / \backslash adjoin X2 gives / \backslash # / X1\backslash / \backslash / X4\backslash # /\_/\_\backslash \backslash /\_X3\backslash /\_/\_\backslash \backslash # /\_X5\backslash # /\_\backslash # Where: # t(X4) = t(X2) U t(X1) # b(X4) = b(X2) # t(X5) = t(X3) # b(x5) = b(X3) U b(X1) Where "U" is unification.. Substitution is a special case, where X is a leaf, and it has no bottom features. >> LTAG # G = (I, A) # I = initial trees # A = adjunction treess We don't give rewrite rules because adjunction and substitution are language independant. # T(G) = trees # L(G) = strings # TAL = \cup_G L(G) Theorems: 1. TAL is more powerful than CFGs (CFL \subset TAL, proper subset) Note: You can get crossed dependancies (2 nested dependancies sharing elements gives crossed dependancies) 2. even when TAG's surface form is CFG, SD's of CFGs \subset SD's of TAGs. (SD=structural description) # S # /| S # a S = crossed /|\backslash = nested # |\backslash a S b # S b proper analysis? [02/07/01 04:37 PM] Representation should make it clear what the constituants are. What about discontinuous constituants, etc.? we have to define what we mean by constituants.. Representations should also make the dependancies clear. what are dependencies? what are dependencies between constituants? [02/20/01 05:12 PM] > Generation talk >> Formal structure for generation - Syntax - Semantics - Links to context (& goals of conversation) Consider the LTAG derivation tree: # slide # / \backslash # coupling-nut onto # \backslash # elbow define: - new - assertion: move(e, h, n, p) is next(e) - shared - presup - pragmatics [03/01/01 04:39 PM] > Dependancy and Locality in TAG What is dependancy? - thematic dependancy: - relationship between predicates and arguments - locality: syntacitically realized within some constrained structural domain. - always local, but sometimes operating on traces, etc. - structural dependancy - relationship between 2 elements in a structure - e.g., moved element and trace (coindexing) - subject to locality constraints Why? - accounts for syntax data.. What *is* the local structural domain? Primary argument of Frank: a privledged structural domain to express locality dependancy relations can be defined. Terms: - Basis: atomic units out of which stuctures are built - Structural Composition: closure of basis over composition rules - Transformation: modify existing rep. transformations create structural dependancies! Kernel structures from chomsky 55: basically simple active sentences. can they be the domain of locality? but what about transformations?? if we interlevel transofmations and compositions, things get arbitrarily far apart.. In TAG, all transformations take place prior to formation of elementary trees: # basis --(move + merge)\longrightarrow elementary tree # elementary tree --(adjoin + subst)\longrightarrow sentences [03/06/01 04:47 PM] > Raising, Superraising, There-Insertion Raising in GB is defined by a transformational account. Raise an element to a site higher in the tree. Raising attempts to preserve some sense of locality -- trace.. In TAG, there's no transformation. Raising is defined by adjoining. Eg., define "seems" as an adjoining node that inserts between "John" and "to like broccoli." Locality is preserved because they come from the same fundamental tree. Recursion in GB: successive cyclic movement Recursion in TAG: multiple adjoins. >> Super-Raising - John_i seems [IP t_i to be likely [IP t_i to eat broccoli ]] - *John_i seems [IP it is likely [IP t_i to eat broccoli ]] Why is the second one bad? In GB: - representational constraint - derivational constraint Either way, we must give an explicit constraint. But in TAG, it comes for "free." How to deal with "it is likely"? No super-raising: you can't combine I'..IP and IP..I' to get I'..I' [03/08/01 04:39 PM] > Bob Frank - What makes an elementary tree valid for a language? - What makes a derived tree an acceptable sentences? - Not all elementary trees are acceptable sentences. Consider: - There [seems] to be a VP in the hospital * There [seems] a VP to be in the hospital Second one is invalid because we don't have the elementary tree: * There a VP to be in the hospital But what if we interpret it as: * [There seems] a VP to be in the hospital So why can't "there seems" be an elementary tree? maybe because seems wants to take predicates, and "there" isn't an argument.. But what about [it seems]? If our elementary tree for it is: # [TP it [T' [T \ldots] [VP [V seems] [CP \ldots] And what about "it is raining"? And what about sentences (in other languages) like: - it was danced by John. Does EPP hold on elementary trees? Yes. Otherwise, we'd never generate the elementary trees for "it is raining." But on the other hand we have elementary trees that don't satisfy the EPP. So why are "there" and "it" different? I.e., why do we get: # [TP it [T' [T \ldots] [VP [V seems] [CP \ldots] but not: # [TP there [T' [T \ldots] [VP [V seems] [CP \ldots] If we assume that subjects always begin in VP, we have to explain why/how they get to spec/TP. Define a lexical array = a list of lexical items with selectional features. Selectional features require that an item merge with certain types of object. Then use Merge & Move to construct your elementary trees from small LAs. LAs must have at most one semantically contentful element.. When creating elementary trees, keep going until you've dealt with as many uninterpretable features as you can.. E.g., assume T has [+EPP] feature. So: # [VP DP [V' [V expected] TP]] \to # [TP DP [VP t [V' [V expected] TP]]] But: # [VP [V seems] TP] \to # [TP [VP [V seems] TP]] \phi features are agreement features. But they're not selectional. "It" agrees with \phi, but "there" doesn't. Which means that we can get: # [TP it [T [VP [V seems] TP]]] because "it" satisfies both the EPP and the \phi features of T. But we can't get the same thing with "there" because there doesn't satisfy the \phi features: # * [TP there [T [VP [V seems] TP]]] c.f.: # it seems that A and B (note: seems is singular) # there are A and B (note: are is plural) Claim: after we've generated our elementary trees, the unchecked features play the role of placing restrictions on what adjunctions are allowed. A can only adjoin to B if doing so satisfies some of B features. comments to: rfrank@jhu.edu [03/27/01 03:07 PM] > Reduced Constructions >> Scrambling and Tag # that no one dared [ the bike to repair] # that the bike no one dared [ to repair] This movement is unbounded. Scrambling not allowed by all verbs. >> Clitic Climbing Clitic can "climb" to higher clauses, for some verbs. >> The problem moved constituant ends up in the *middle* of the upper clause.. For: # John seems to like the pizza We can just adjoin in "seems" to the E.T.: # NP to like NP. But where does "does" belong in: # Does John seem to like the pizza? ? X-tag makes "does" its own tree. But it seems like "does" is on C, so we might want: # [Cbar [C does] [IP [Ibar [I seem] \ldots]]] But then what adjoins into what? One option is to use multicomponent tree-local tag, and do: # [Cbar [C does] \ldots] and # [Ibar [I seem] \ldots] But you can get unbounded raising.. So the components can get arbitrarily far apart.. [04/03/01 03:37 PM] > More fun with C/G >> Reconstructing C/G in the form of LTAG.. CG derivation trees parallel LTAG elementary trees. 4 rules: # Function application: # (S/NP) NP \to S # NP (NP\backslash S) NP \to S # Function composition: # X/Y Y/Z \to X/Z # X\backslash Y Y\backslash Z \to X\backslash Z You can assume [A], and derive [B]. This lets us prove that A\to B. This is "withdrawing the assumption" or "discharging the assumption" # [A] ; make assumption # \vdots # B # ----- ; withdraw assumption # A\to B # CFG \longrightarrow LTAG # \downarrow \downarrow # CG(AB) \to CG(PPT) # # Where: # \to is strong-lex EDL, FRD # \downarrow is weak equivalance # CG(AB) = standard, vanilla CG # CG(PPT) = C/G with partial proof tree This is similar to what Bob Frank was doing with LTAG and minimalism, but for CG. Instead of assinging likes the type "(NP\backslash S)/S", assign it a partial proof tree: # likes # | # [NP] (NP\backslash S)/S) [NP] # ------------------ # NP\backslash S # --------------------------- # S Thus, these PPT types have assumptions. Normally, the only way to satisfy an assumption is to withdraw it. But we will introduce new ways of satisfying an assumption? Construct a finite collection of partial proof trees, where each PPT is a syntactic type associated with a lexical item. These are analagous to elementary trees. Then introduce a composition method. These must be inference rules.. >> Inference rule: linking B(PPT) = the set of basic partial proof trees. How do we construct them? * Unfold arguments of types by introducing assumptions. * No unfolding past an argument that's not an argument of the lexical item.. e.g., adverb is VP\to VP, i.e., its type is (NP\backslash S)\backslash(NP\backslash S). But how do we keep from unfolding the VP? Mark a node that stops unfolding with an asterisk: (NP\backslash S)\backslash(NP*\backslash S). * If a trace assumption is introduced while unfolding then it must be locally discharged, i.e., within the basic PPT which is being introduced. * While unfolding we can interpolate, say, from X to Y where X is a conclusion node and Y is an assumption node. >> Stretching # Y = u v w, with X = the single conclusion of v # Then we can say Y = u [X] w; X \to v e.g., stretched likes: # likes # | # [NP] (NP\backslash S)/S) [NP] # ------------------ # NP\backslash S # : # : # [NP\backslash S] # --------------------------- # S Then we can splice in "passionately" # passionately # | # [NP\backslash S] (NP\backslash S)\backslash(NP*\backslash S) # ------------------------ # NP\*S We are still just using linking here when we combine.. Stretching is used during composition. >> Traces Introduce a special assumption, which we'll call a trace assumption.. and then discharge it on the other side. You must discharge within one elementary tree. # likes e # | | # [NP] [NP] (NP\backslash S)/NP [NP] # --------------- # NP\backslash S # ---------------------- # S # -------- # Discharge assumption # NP\backslash S # ------------------- # S We must disallow the possibility of doing discharges outside of elementary trees, because then we lose locality.. we want to keep dependancies in elementary trees. Use a permutation operation to allow assumption and discharge to occur on different sides? In normal CG, you have to introduce assumptions at the periphery. So introducing e in a sentence like "who John saw e yesterday" would be difficult. But since we can stretch, we can stretch "who John saw e" and splice in "yesterday".. >> Interpolate Interpolation is basically introducing a gap in a PPT that must be satisfied at the time PPTs are put together (i.e., while constructing full proof trees for sentences). How to do something like "John seems to be happy"? We want a "John to be happy" tree with a spliced in "seems to".. but how do we rule out "John to be happy"? (In LTAG, we used features) Consider "John tries to walk" # [NP] walk # | | # NP NP\backslash Sinf # ---------------- # Sinf # tries # | # NP (NP\backslash S)/Sinf [Sinf] # -------------------- # NP\backslash S # ------------------ # S # seems # | # (NP*\backslash S)/(NP\backslash Sinf) [NP\backslash Sinf] # ------------------------------ # NP*\backslash S So we need a new tree for walk. # walk(inf) # | # [NP] NP\backslash Sinf # : \gets Interpolation. I must have a proof # : tree to splice in here. # [NP\backslash S] # ---------------- # S You can only interpolate while constructing PPTs, not while using them to construct sentences. (c.f., trace assumptions). Then at run-time, we can splice things in (e.g., seems).. This is equivalant to forced adjunction in LTAG.. How does interpolation relate to multicomponent LTAG? * Question: discharging and introducing on different sides?? [04/24/01 04:37 PM] > Synchronous Grammars - Grammars generate languages (sets of strings) - Synchronous grammars generate string relations (sets of pairs of strings). Useful for: - transalation - interpretation (to internal language) >> Synchronous CFG (aka Syntax-directed translation schemata) Set of pairs of rules. E.g.: # # # In additon, a system of coindexation.. # # # Indexes say that generated nonterminals are "linked." Start with a pair of "top" nonterminals, and rewrite them in pairs, using paired rules: # \Rightarrow If you rewrite a nonterminal with a given index on the left, you must rewrite the nonterminal with the same index on the right. In this case, if we rewrite X1, we must also rewrite B1. # \Rightarrow # \Rightarrow # \Rightarrow In this case, this is the only string generated by the grammar: # Derivation trees: # < (S (X x (Z z)) (Y y)), (S (A a) (B b (C c)))> Property of result: Structure of the tree doesn't change, except that sisters may be reversed and nodes may be renamed. This is (almost?) always a property of synchronous grammars. Indeces are what give us this isomorphism. >>> Synchronous Grammar \alpha: # \to # \to # \to <\epsilon, \epsilon> # \to # \to <\epsilon, \epsilon> >>> Synchronous Grammar \beta: # \to # \to # \to <\epsilon, \epsilon> # \to # \to <\epsilon, \epsilon> # \alpha\circ\beta = a^n^nc^n