Presentation is loading. Please wait.

Presentation is loading. Please wait.

NLP Midterm Solution 2006. #1 bilingual corpora –parallel corpus (document-aligned, sentence-aligned, word-aligned) (4) –comparable corpus (4) Source.

Similar presentations


Presentation on theme: "NLP Midterm Solution 2006. #1 bilingual corpora –parallel corpus (document-aligned, sentence-aligned, word-aligned) (4) –comparable corpus (4) Source."— Presentation transcript:

1 NLP Midterm Solution 2006

2 #1 bilingual corpora –parallel corpus (document-aligned, sentence-aligned, word-aligned) (4) –comparable corpus (4) Source of Corpora (2) –Association of Computational Linguistics’ Data Collection Initiative (ACL/DCI) –European Corpus Initiative (ECI) –International Computer Archive of Modern English (ICAME) –Linguistic Data Consortium (LDC)LDC –Consortium for Lexical Research (CLR) –Electronic Dictionary Research (EDR) –Text Encoding Initiative (TEI) –European Language Resources Distribution Agency (ELDA)ELDA –Association for Computational Linguistics and Chinese Language Processing (ROCLING)ROCLING

3 #2 … (15)

4 #3 Fundamental rule (4) If the chart contains edges and, where A and B are categories and W1, W2 and W3 are (possibly empty) sequences of categories or words, then add edge to the chart. i jk A  W1 B W2 B  W3 A  W1 B W2

5 #3 cont. Use chart to avoid redundancy (2) Bottom-up rule (4) if you are adding edge to the chart, then for every rule in the grammar of the form B  C W2, add an edge to the chart. i j B  C W2 C  W1

6 #4 Whatever’s reasonable… (10) Consider sample questions going to be asked How to translate question into SQL

7 #5 Level Well-formedness Types of ambiguity constraints Morphological Rules of inflection and Analysis: structural, analysis (3) derivation morpheme boundaries, [prefix, stem, suffix] morpheme identity Lexical (+3) … Syntactic Grammar rules Analysis: structural, word Analysis(3) category [POS] Semantic Selection restrictions Analysis: word sense, Interpretation(3) quantifier scope Generation: synonymy Pragmatic ?principles of cooperative Analysis: Interpretation(3) conversation? ?pragmatic function? (speaker, listener, context) Generation: ?realization of pragmatic function?

8 #6 a (6) b (4) Smaller H correct model approximate model

9 #7 a (5) Pointwise Mutual Information is roughly a measure of how much one word tells us about the other.

10 #7 cont. b (5) X = word 1, Y = word 2 (3) Physical meaning: higher value = higher dependence. Pointwise Mutual Information is roughly a measure of how much one word tells us about the other. (2)

11 #7 cont c (5) Perfect dependence (3) As the perfectly dependent bigrams get rarer, their mutual information increases.  bad measure of dependence (2) With MI, bigrams composed of low-frequency words will receive a higher score than bigrams composed of high-frequency words. Higher frequency means more evidence and a higher rank for bigrams is preferred when we have more evidence

12 #8 a (5) sentence pairs b (5)

13 #8 Cont. c (5) do for all words


Download ppt "NLP Midterm Solution 2006. #1 bilingual corpora –parallel corpus (document-aligned, sentence-aligned, word-aligned) (4) –comparable corpus (4) Source."

Similar presentations


Ads by Google