1/17 Probabilistic Parsing … and some other approaches.

1/17 Probabilistic Parsing … and some other approaches

2/17 Probabilistic CFGs also known as Stochastic Grammars Date back to Booth (1969) Have grown in popularity with the growth of Corpus Linguistics

3/17 Probabilistic CFGs Essentially same as ordinary CFGS except that each rule has associated with it a probability S  NP VP.80 S  aux NP VP.15 S  VP.05 NP  det n.20 NP  det adj n.35 NP  n.20 NP  adj n.15 NP  pro.10 Notice that P for each set of rules sums to 1

4/17 Probabilistic CFGs Probabilities are used to calculate the probability of a given derivation –Defined as the product of the Ps of the rules used in the derivation Can be used to choose between competing derivations –As the parse progresses (so, can determine which rules to try first) as an efficiency measure –Or at the end, as a way of disambuiguating, or expressing confidence in the results

5/17 Where do the probabilities come from? 1)Use a corpus of already parsed sentences: a “treebank” –Best known example is the Penn Treebank Marcus et al. 1993 Available from Linguistic Data Consortium Based on Brown corpus + 1m words of Wall Street Journal + Switchboard corpus –Count all occurrences of each rule variation (e.g. NP) and divide by total number of NP rules –Very laborious, so of course is done automatically

6/17 Where do the probabilities come from? 2)Create your own treebank –Easy if all sentences are unambiguous: just count the (successful) rule applications –When there are ambiguities, rules which contribute to the ambiguity have to be counted separately and weighted

7/17 Where do the probabilities come from? 3)Learn them as you go along –Again, assumes some way of identifying the correct parse in case of ambiguity –Each time a rule is successfully used, its probability is adjusted –You have to start with some estimated probabilities, e.g. all equal –Does need human intervention, otherwise rules become self-fulfilling prophecies

8/17 Problems with PCFGs PCFGs assume that all rules are essentially independent –But, e.g. in English NP  pro more likely when in subject position Difficult to incorporate lexical information –Pre-terminal rules can inherit important information from words which help to make choices higher up the parse, e.g. lexical choice can help determine PP attachment

9/17 Probabilistic Lexicalised CFGs One solution is to identify in each rule that one of the elements on the RHS (daughter) is more important: the “head” –This is quite intuitive, e.g. the n in an NP rule, though often controversial (from linguistic point of view) Head must be a lexical item Head value is percolated up the parse tree Added advantage is that PS tree has the feel of a dependency tree

10/17 the man shot an elephant NP detnv n NP VP S the man shot an elephant NP(man) detnv n NP(elephant) VP(shot) S(shot) shot man elephant the an

11/17 Dependency Parsing Not much different from PSG parsing Grammar rules still need to be stated as A  [B c]* –except that one daughter is identified as the head, e.g. A  [x]* h [y]* –As structure is built, the trees are headed by “h” rather than “A”

12/17 Dependency grammar Interest postdates PSG in CL circles But dependency approach predates PSG –Tesnière, Helbig & Schenkel, Pāņini, ancient Greece

13/17 Some dependency formalisms Constraint grammar (Karlsson) Slot Grammar (McCord) Link Grammars (Sleator & Temperley) no name (Järvinen & Tapanainen)

14/17 Categorial grammars Ironically named, because they do away with traditional categories Lexicon contains syntactic and semantic information No grammar as such, just “combinatory rules” Categories are of two types: functors and arguments

15/17 Functors and arguments Arguments have simple categories (taken from a small set of possible categories) Functors are expressed as combinations of arguments Two operators: X/Y and X\Y express possibility of combination

16/17 Combination operators X/Y is something which combines with a Y to its right to form an X –e.g. a determiner is an NP/N a transitive verb is a VP/NP X\Y is something which combines with a Y to its left to form an X These can be combined –e.g. a ditransitive verb as a (VP/NP)/NP a VP is an S\NP Parsing consists of applying combination rules, e.g. X/Y + Y = X

17/17 Conclusion Basic parsing approaches (without constraints) not practical in real applications Whatever approach taken, bear in mind that the lexicon is the real bottleneck There’s a real trade-off between coverage and efficiency, so it’s a good idea to sacrifice broad coverage (e.g. domain-specific parsers, controlled language), or use a scheme that minimizes the disadvantages (e.g. probabilistic parsing)

1/17 Probabilistic Parsing … and some other approaches.

Similar presentations

Presentation on theme: "1/17 Probabilistic Parsing … and some other approaches."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1/17 Probabilistic Parsing … and some other approaches.

Similar presentations

Presentation on theme: "1/17 Probabilistic Parsing … and some other approaches."— Presentation transcript:

Similar presentations

About project

Feedback