Presentation is loading. Please wait.

Presentation is loading. Please wait.

101035 中文信息处理 Chinese NLP Lecture 9.

Similar presentations


Presentation on theme: "101035 中文信息处理 Chinese NLP Lecture 9."— Presentation transcript:

1 中文信息处理 Chinese NLP Lecture 9

2 句——语法分析(2) Grammatical Analysis (2)
句法分析(Syntactic parsing) 搜索式句法分析(Parsing as search) 结构歧义(Structural ambiguities) 动态规划句法分析(Dynamic programming parsing)

3 句法分析 Syntactic Parsing
Basics The goal of syntactic parsing is to construct a parse tree for a given sentence, based on a grammar or rule system. Parsing is essentially searching in the rule space by finding all possible rule combinations. The search is successful when a combination is found, wherein the rules can be used to generate a parse tree to represent the sentence structure.

4 Parsing Methods Basic Searching Methods Dynamic Programming Methods
Top-down Bottom-up Dynamic Programming Methods CKY Early Chart Parsing Statistical-based Methods Probabilistic parsing

5 搜索式句法分析 Parsing as Search
Top-Down Parsing A top-down parser builds a tree from the root node S down to the leaves. Bottom-Up Parsing A bottom-up parser starts with the words of the input and builds a tree rooted in the symbol of S.

6 Example Book that flight.

7 Example Top-down parsing Book that flight.

8 Example Bottom-up parsing Book that flight.

9 In-Class Exercise Provide the omitted steps of the top-down parsing in order to derive the final correct parse tree. (To save space, terminal nodes can be found in only one step.)

10 Top-Down vs Bottom-Up Top-down parsing does not waste time exploring trees that cannot result in an S, but bottom-up parsing generates many trees unable to lead to an S. Bottom-up parsing always generates trees that are consistent with the input words, but top-down parsing spends considerable effort on S trees that are not consistent with the input.

11 结构歧义 Structural Ambiguities
A Major Challenge One sentence usually corresponds to more than one parse tree, rendering different meanings. Structural ambiguities are a major challenge for syntactic parsing.

12 I shot an elephant in my pajamas.
Ambiguities in English Attachment ambiguity I shot an elephant in my pajamas.

13 I can see old men and women in the park.
Ambiguities in English Attachment ambiguity Ambiguities in Chinese “VP+的+是+NP”型 “N1+N2+N3”型 “ADJ+N1+N2”型 “VP+N1的+N2”型 I can see old men and women in the park. 反对的是少数人 北欧语言研究会 小学生词典 咬死了猎人的狗

14 Ambiguities in Chinese
“N1+的+N2+和+N3”型 “V+N1+N2”型 “MQ+NP1+的+NP2”型 “VP+ MQ +NP"型 衣服的袖子和口袋 赠意大利图书 三个学校的实验员 发了三天工资

15 动态规划句法分析 Dynamic Programming Parsing
Features Dynamic programming parsing methods are efficient because subtrees are discovered once, stored, and then used in all parses calling for that constituent. It partially solves the ambiguity problem by storing all possible parses.

16 CKY Parsing A dynamic programming bottom-up parsing method
Book the flight through Houston. Every non-terminal rule must be converted to CNF

17 CKY Parsing For a sentence of length n, CKY deals with the upper-triangular portion of an (n+1)×(n+1) matrix. Each cell [i, j] in this matrix contains a set of non-terminals that represent all the constituents that span positions i through j of the input. 0 Book 1 that 2 flight 3 [0, 3] CKY parsing is parse table filling.

18 CKY Parsing Algorithm

19 Book the flight through Houston.
CKY Parsing Example Book the flight through Houston.

20 Book the flight through Houston.
CKY Parsing Book the flight through Houston.

21 In-Class Exercise When CKY ends (on the previous page), it generates 3 possible parses at once (S1, S2, S3). Please draw their corresponding parse trees.

22 The dot lies at position 2
Earley A dynamic programming top-down parsing method Earley algorithm is a single left-to-right pass that fills an array called a chart that has N +1 entries. Earley’s word indexing method is the same as CKY’s. Dotted rule The structure of a state of the chart with a dot (•) A state’s position with respect to the input are represented by two numbers indicating where the state begins and where its dot lies. NP → Det • Nominal, [1,2] The dot lies at position 2 Parsed Expected NP begins at position 1 S → α•, [0,N] Successful parse

23 Earley Predictor Scanner Completer
It creates new states representing top-down expectations generated during the parsing process. It is applied to non-terminal to the right of the dot. Scanner When a state has a POS category to the right of the dot, Scanner is called to examine the input and incorporate a state corresponding to the prediction of a word with a particular POS into the chart. Completer It is applied to a state when its dot has reached the right end of the rule. The purpose of Completer is to find, and advance, all previously created states that were looking for a particular grammatical category that has just been discovered.

24 Earley Algorithm 3 core operations

25 Early Example Book that flight.

26 Early Example Book that flight.

27 Early States that lead to the correct parse. Book that flight.

28 Wrap-Up 句法分析 动态规划句法分析 搜索式句法分析 结构歧义 Parsing Methods Features
Top-Down Bottom-Up 结构歧义 Ambiguities in English Ambiguities in Chinese 动态规划句法分析 Features CKY Parsing Earley Examples


Download ppt "101035 中文信息处理 Chinese NLP Lecture 9."

Similar presentations


Ads by Google