A Fast, Accurate Deterministic Parser for Chinese
Wang et al define a chinese parser that uses SVM (or other classifiers) to do shift-reduce parsing in a single pass. They also look at combining classifiers. The main goal is to do fast parsing -- since parsing is done in a single pass, it is much faster than PCFG-type algorithms.
Wang et al use a classic shift-reduce parser. At each step, the classifier must decide whether to shift a new token onto the stack, or to perform a reduction. Reduction can be unary or binary, and binary reductions are marked with a head-direction. (This head-selection is used as a feature for later decisions). If the classifier selects an invalid action given the state, then the parse fails.
Wang et all tried several classifiers: SVMs with degree 2 polynomial kernel; maxent; decision trees; and memory-based learning. SVMs give the best performance, decision trees are fastest, and maxent is in the middle. (No paired features for maxent, which might explain its lower performance.)
I'm still a little unclear on the features used. E.g., some of the features below may actually be combined?? See the paper for more details (what little there is), but here is a basic list:
- boolean: are we expecting a close punctuation mark?
- boolean: is queue empty?
- boolean: is there a comman between stack[-1] and stack[-2]
- last action by the classifier
- number of words in stack[-1]
- number of words in stack[-2]
- headword of stack[i], i=[-1, -2, -3, -4]
- part of speech of stack[i], i=[-1, -2, -3, -4]
- headword of queue[i], i=[1, 2, 3, 4]
- part of speech of queue[i], i=[1, 2, 3, 4]
- nonterminal label of stack[-1]
- nonterminal label of stack[-2]
- number of punctuation chars in stack[-1]
- number of punctuation chars in stack[-2]
- rhythmic features of stack[-1] and of stack[-2]
- linear distance between headwords of stack[-1] and stack[-2]
- nonterminal label, pos, and hedadword of left & right child of stack[-1] and stack[-2]