Intricacies of Collins' Parsing Model
ACM. Volume 30, Issue 4 (December 2004), pp 479-511.
Daniel Bikel provides a detailed analysis of Collins' parser, which provides some insight into which aspects of its decomposition are most beneficial to performance. This analysis is based upon a flexible re-implementation of Collins' parser, which can be used to turn various features of Collins' parser on and off, and to tweak them in different ways. Bikel evaluates the impact of individual features of Collins' parser by looking at how performance changes when those features are turned off in different combinations.
Bikel begins by describing a large number of previously unpublished details. Although these details have a significant joint effect on the parser's performance (11% error reduction), their individual contributions are relatively small.
He then analyzes the effect of three features thought to be important to the performance of Collins' parser: bi-lexical dependencies, choice of lexical head words, and lexico-structural dependencies. Somewhat surprisingly, he finds that the performance drop caused by omitting bi-lexical dependencies is relatively minor. He explains this small drop by showing that the bi-lexical dependencies seen in a new sentence are almost never present in the training corpus; in other words, the training corpus is too small for these very sparse features to be much help. Bikel also finds that the the head-choice heuristics do not have a major impact on performance. However, he finds that the use of lexico-structural dependencies (i.e., dependencies between a lexical word and a structural configuration) is quite important. Unlike bi-lexical dependencies, these lexico-structural dependencies are associated with enough training data to make them useful for evaluating novel sentences. And as has been shown before, lexical information is often important in making structural decisions, such as the decision of whether a prepositional phrase should attach at the verb phrase or noun phrase level.
Dan Bikel explores some of the details of Collins' Parsing model that were not explicitly mentioned in Collins' thesis, but that nevertheless contributed significantly to the parser's performance. This article also serves as a fairly compact description of Collins' parsing model.
My interest in this article is basically in what it says about how the parsing problem should be broken down into individual models, and what effects those models have. This includes preprocessing steps that transform the tree.
Collins' model performs 11 preprocessing steps. These steps are detailed in the paper, but it's worth noting a couple of them. First, base NPs are treated specially (here and elsewhere). They are tagged with a special new tag (NPB), which is sometimes inserted as a new node, and sometimes modified from an existing NP mode. Also, certain arguments are labeled as 'arguments'; gerund clauses are given a special node; traces are stripped; and punctuation is removed or raised.
Another important preprocessing step is the head-raising, which gives us a lexicalized tree.
This part (section 5) describes how the tree is broken up into 'independant' pieces to yield a generative model. Generation works as follows:
Given a node, generate its head child.
- For the root of the tree, first generate the root label and its part of speech (POS); then generate the root's head word, given its label and POS. [sec 5.9]
- For all other nodes, copy the POS & head word from the parent; and generate the label based on the parent label, head word, and POS. [sec 5.1]
- Given a node with a head child, generate its subcat class. Separate left and right subcat frames are generated, conditioned on the head label, the parent label, the head word, and the head POS. A subcat frame is a multiset of labels, taken from a small set (NP-A, S-A, etc).
etc. I'm not going to spell this all out right now, until I know how it will be relevant to my work. See the paper for more details.