PCFG Models of Linguistic Tree Representations
The early work on probabilistic parsing focused on PCFGs, which assign a probability to each rule in a CFG, and compute the probability of a parse as the product of the probabilities of the rules used to build it. Mark Johnson points out that this framework assumes that the form of the probabilistic model for a parse tree must exactly match the form of the tree itself. After showing that this assumption can lead to poor models, Johnson suggests that reversible transformations can be used to construct a probabilistic model whose form differs from the form of the desired output tree. He describes four transformations for prepositional-attachment structures, and evaluates those transformations using both a theoretical analysis based on toy training sets, and an empirical analysis based on performance on the Penn Treebank II.
Two of these transformations result in significant improvements to performance: flatten and parent. The flatten transformation replaces select nested tree structures with flat structures, effectively weakening the independence assumptions that are made by the original structure. The parent transformation augments each node label with the node label of its parent node, allowing nodes to act as "communication channels" to allow conditional dependency between a node and its grandparent node. Both of these transformations result in a weakening of the model's independence assumptions, while increasing the number of parameters that must be estimated (because they result in a larger set of possible productions). Thus, they can be thought of an example of the classical "bias versus variance" trade-off. Johnson's empirical results show that, in the case of these two transformations, the reduction in bias overcomes the increase in variance.