However, we have not yet addressed one very important issue. It is not sufficient to obey the ranking of the constraints. If two or more output candidates violate the same constraint multiple times we should prefer the candidate or candidates with the smallest number of violations. This does not come for free. The system that we have sketched so far does not make that distinction. If the input form has no perfect outputs, we may get a set of outputs that differ with respect to the number of constraint violations. For example, the transducer in Figure 14 gives three outputs for the string bebop (Figure 15).
Because bebop has no output that meets the Parse constraint, lenient composition allows all outputs that contain a Parse violation regardless of the number of violations. Here the second alternative with just one violation should win but it does not.
Instead of viewing Parse as a single constraint, we need to
reconstruct it as a series of ever more relaxed parse constraints.
The ^>n operator in Figure 16 means ``more than
n iterations''.
| define Parse ~$["X["] ; |
| define Parse1 ~[[$"X["]^>1] ; |
| define Parse2 ~[[$"X["]^>2] ; |
| ... |
| define ParseN ~[[$"X["]^>N] ; |
Our original Parse constraint is violated by a single unparsed element. Parse1 allows one unparsed element. Parse2 allows up to two violations, and ParseN up to N violations.
The single Parse line in Figure 14 must be replaced by the sequence of lenient compositions in Figure 17 up to some chosen N.
If an input string has at least one output form that meets the Parse constraint (no violations), all the competing output forms with Parse violations are eliminated. Failing that, if the input string has at least one output form with just one violation, all the outputs with more violations are eliminated. And so on.
The particular order in which the individual parse constraints apply
actually has no effect here on the final outcome because the constraint
languages are in a strict subset relation: Parse
Parse1
Parse2
ParseN.
For example, if the best candidate incurs two violations, it is in
Parse2 and in all the weaker constraints. The ranking in Figure
17 determines only the order in which the losing candidates
are eliminated. If we start with the strictest constraint, all the
losers are eliminated at once when Parse2 is applied; if we start
with a weaker constraint, some output candidates will be eliminated
earlier than others but the winner remains the same.
As the number of constraints goes up, so does the size of the combined constraint network in Figure 14, from 66 states (no Parse violations) to 248 (at most five violations). It maps bebop to O[b]N[e]O[b]N[o]X[p] and abracadabra to O[]N[a]X[b]O[r]N[a]O[c]N[a]O[d]N[a]X[b]O[r]N[a] correctly and instantaneously.
It is immediately evident that while we can construct a cascade of constraints that prefer n violations to n+1 violations up to any given n, there is no way in a finite-state system to express the general idea that fewer violations is better than more violations. As Frank and Satta [7] point out, finite-state constraints cannot make infinitely many distinctions of well-formedness. It is not likely that this limitation is a serious obstacle to practical optimality computations with finite-state systems as the number of constraint violations that need to be taken into account is generally small.
It is curious that violation counting should emerge as the crucial issue that potentially pushes optimality theory out of the finite-state domain, thus making it formally more powerful than rewrite systems and two-level models. It has never been presented as an argument against the older models that they do not allow unlimited counting. It is not clear whether the additional power constitutes an asset or an embarrassment for OT.