Tags: |
---|

Can Subcategorisation Probabilities Help a Statistical Parser?

Carroll and Minnen present a parse ranking algorithm which uses subcategorization improve performance. In particular, it ranks complete derivations produced by a baseline parser by the product of:

- The (structural) derivation probability, according to the baseline probabilistic LR model.
- P(VSUBCAT|V) for each verb in the sentence, where VSUBCAT is the subcategorization frame used by the verb (e.g., NONE, NP, AP, NP_NP, NP_AP, etc.); and V is the actual verb (e.g., "make," "admit," etc).

Add-1 (aka Laplace) smoothing is used for the lexical entries.

### Evaluation

Initially, they evaluated their performance using 4 standard metrics: bracket recall; bracket precision; zero crossings; and mean crossings per sentence. But these metrics are fairly forgiving to incorrect attatchments, so the new system didn't perform significantly different from the baseline. They then applied a metric based on grammatical relations, which basically seems like precision and recall over dependency graph links. Using this metric, they got an increase in precision from 79.2% to 88.2%, and a small decrease in recall (88.6% to 88.1%). See the paper for an error analysis, which breaks the errors into categories.

Bibtex

@InProceedings{carroll1998, author = {John Carroll and Guido Minnen}, title = {Can Subcategorisation Probabilities Help a Statistical Parser?}, booktitle = {The 6th ACL/SIGDAT Workshop on Very Large Corpora.}, year = 1998, address = {Montreal, Canada} }