|Date:||June 1-5, 2009|
Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction
Shay Cohen and Noah A. Smith
- theta = parameters
- z = hidden structure
- x = evidence
- beta = hyperparameters
Assume an underlying generative distribution
- Work in a parametric space.
- Use a logistic normal distribution for prior (not derichlet)
Use the prior: - to tie parameters in the grammar - to encode linguistic information - Used over a family of multinomials - Used for multilingual learning
Encoding prior knowledge: let the model know which part of speech tags are related to one another. (E.g., NN, NNS, NNP get grouped)
Logistic normal prior lets us tie between parameters that are within the same multinomial. But we can't tie between parameters in different multinomials. So instead of learning each multinomial on its now, we learn them in adjoined normal spaces.
Treat the entire parameter space as a single unit, with a single prior, rather than dividing it up.