Accurate Unlexicalized Parsing
Although the introduction of lexico-structural dependencies are clearly very important to the performance of advanced lexicalized parsers, they are by no means the only reason that these parsers out-perform naive PCFG parsers. In order to explore which non-lexical dependencies are important to improving parser performance, Klein & Manning applied a manual hill-climbing approach to develop a sequence of tree transformations that improve upon the performance of a baseline PCFG system. Using this method, they find a sequence of 17 transformations that increases the performance of their unlexicalized parser to a level comparable to that of basic lexicalized parsers.
Their baseline system differs from a simple PCFG in that it begins by decomposing all nodes with a branching factor greater than 2 into binary branching nodes. This binary branching decomposition is centered on the head node; and new node labels are created for the intermediate nodes. These new node labels, which Klein & Manning refer to as "intermediate symbols," initially consist of the original node label plus the part of speech of the head word; but they may be modified by transformation operations, as described below.
All of Klein & Manning's transformations consist of splitting select node labels into two or more specialized labels. The first two transformations relax the conditional independence assumptions of the simple PCFG model by adding contextual information about a node's parents or siblings to that node's label. The first of these transformations, vertical-markovization(), augments each non-intermediate node label with the labels of its closest ancestor nodes. This is essentially a generalization of Mark Johnson's parent transformation. The second transformation, horizontal-markovization(), is analogous, except that it applies to intermediate nodes, and thus adds information about siblings instead of ancestors. Klein & Manning also consider a variant of these transformations which does not split intermediate states that occur 10 or fewer times in the training corpus. For their overall system, they settle on a value of for both Markovization transformations.
Klein & Manning describe fifteen additional transformations, which split node labels based on a variety of contextual features, including both "internal context" (features of the phrase itself) and "external context" (features of the tree outside the phrase). Individually, these transformations improve performance by between 0.17% and 2.52%; in total, performance is improved by 14.4%, from 72.62% to 87.04%.