Special Topics in Computer Science Advanced Topics in Information Retrieval Lecture 10: Natural Language Processing and IR. Syntax and structural disambiguation.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Simplifications of Context-Free Grammars
AP STUDY SESSION 2.
1
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
Chapter 7 System Models.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Special Topics in Computer Science Advanced Topics in Information Retrieval Lecture 9: Natural Language Processing and IR. Tagging, WSD, and Anaphora Resolution.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Algebraic Expressions
Objectives: Generate and describe sequences. Vocabulary:
UNITED NATIONS Shipment Details Report – January 2006.
Introduction to Algorithms 6.046J/18.401J
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.
Create an Application Title 1A - Adult Chapter 3.
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
Custom Services and Training Provider Details Chapter 4.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Chapter 6 File Systems 6.1 Files 6.2 Directories
Projects in Computing and Information Systems A Student’s Guide
Programming Language Concepts
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Break Time Remaining 10:00.
Turing Machines.
1 Lennart Lönngren University of Tromsø LOVE. 2 Let us start with a sentence in the active voice and its passive counterpart.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
EU market situation for eggs and poultry Management Committee 20 October 2011.
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
2 |SharePoint Saturday New York City
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Chapter 6 File Systems 6.1 Files 6.2 Directories
How to convert a left linear grammar to a right linear grammar
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
Adding Up In Chunks.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
25 seconds left…...
Datorteknik TopologicalSort bild 1 To verify the structure Easy to hook together combinationals and flip-flops Harder to make it do what you want.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Analyzing Genes and Genomes
1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 12 View Design and Integration.
Essential Cell Biology
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Datorteknik TopologicalSort bild 1 To verify the structure Easy to hook together combinationals and flip-flops Harder to make it do what you want.
Immunobiology: The Immune System in Health & Disease Sixth Edition
Physics for Scientists & Engineers, 3rd Edition
Energy Generation in Mitochondria and Chlorplasts
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
RefWorks: The Basics October 12, What is RefWorks? A personal bibliographic software manager –Manages citations –Creates bibliogaphies Accessible.
Profile. 1.Open an Internet web browser and type into the web browser address bar. 2.You will see a web page similar to the one on.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
The Pumping Lemma for CFL’s
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Natural Language - General
Presentation transcript:

Special Topics in Computer Science Advanced Topics in Information Retrieval Lecture 10: Natural Language Processing and IR. Syntax and structural disambiguation Alexander Gelbukh

2 Previous Chapter: Conclusions Tagging, word sense disambiguation, and anaphora resolution are cases of disambiguation of meaning Useful in translation, information retrieval, and text undertanding Dictionary-based methods good but expensive Statistical methods cheap and sometimes imperfect... but not always (if very large corpora are available)

3 Previous Chapter: Research topics Too many to list New methods Lexical resources (dictionaries) = Computational linguistics

4 Contents Language levels Syntax Dependency approach Constituency-based approach Head-driven approach Grammars and parsing Ambiguity and disambiguation

5 Language levels Letters are built up into words Words into sentences Sentences into text Each level has its own representation This allows for modular processing A module describes one level or transforms from one level to another

6 Source of language complexity: 1-D

7

8 Linguistic processor translates between representations

9 General scheme of text processing Linguistic processor uses linguistic knowledge Applied system uses other types of knowledge (e.g., Artificial Intelligence)

10 Language levels Morphological: words Syntactic: sentences Semantic: meaning Pragmatic: intention...?

11 Fine structure of linguistic processor

12 Example of text Science is important for our country. Science is important for our country. The Government pays it much attention. The Government pays it much attention.

13 Textual representation Text is a sequence of letter. S c i e n c e i s i m p o r t a n t f o r o u r c o u n t r y. T h e G o v e r n m e n t p a y s i t m u c h a t t e n t i o n.

14 Morphological analysis Morfological analysis

15 Morphological representation A sequence of words.

16 Syntactic parsing

17 Syntactic representation A sequence of syntactic trees.

18 Syntactic representation What happened? With whom happened?... their details

19 Semantic analysis Next lecture...

20 Syntax The structure describing the relationships between words in a sentence Describes the relationships implied by grammatical characteristics not by meaning Often allows for simple paraphrasing John reads the book The book is read by John

21 Early approach: Dependency syntax Tree Nodes: words Arcs: modified by Modifies means adds details, clarifies, chooses of many... makes more specific Arcs are typed Types are: subject, object, attribute,... Subject Object Recipient Attribute

22... Dependency syntax General situation: pay More specifically: the one where: who pays is government what is paid is attention to whom it is paid is it More specifically: attention that is much Subject Object Recipient Attribute

23 Advantages/disadvantages of Dependency Syntax Advantages Solid linguistic base Rather direct translation into semantics Easily applicable to languages with free word order Korean? Russian, Latin This is why solid linguistic base: good for classical languages! Disadvantages No nice mathematical base No simple algorithms

24 Most popular approach: Constituency (Phrase Structure grammars) Tree Nodes: nested segments of the phrase Cannot intersect, only nested Usually are labeled with part-of-speech names Arcs: nesting In classical approach, arcs are not labeled [ [ Our Government ] [ pays [ much attention ] [ to it ] ] ]

25 Constituency [ [ Our Government ] [ pays [ much attention ] [ to it ] ] ] Our Government pays much attention to it

26 Constituency [ [ Our R Government N ] NP [ pays V [ much A attention N ] NP [ to P it R ] PP ] VP ] S R: pronoun NP: noun phrase N: nounVP: verb phrase V: verb PP: prepositional phrase A: adjective S: sentence

27 Constituency: graphical representation [ [ Our Government ] NP [ pays [ much attention ] NP [ to it ] PP ] VP ] S S VP NP NP PP NP VP NP NP R N V A N P R Our Government pays much attention to it

28 Phrase structure grammar Enumerates possible configurations at nodes Usually recursive S NP VP NP A NP NP R NP NP P NP NP N VP VP NP PP VP V S VP NP NP PP NP VP NP NP R N V A N P R Our Government pays much attention to it

29 Context-independency hypothesis A configuration is possible or not, regardless of where it is used Wherever you find VP NP PP, it can be VP Wherever you find NP VP, it can be S If you can put together S that covers all the sentence, it is a grammatically correct description With this, given a suitable grammar, you can List all sentences of a language List only correct sentences of that language List all and only correct structures Correctness means a native speakers intuition

30 Generative idea Find a grammar to list all and only correct sentences (with their structures) of a language This is a complete description of that language! How can be useful in analysis? Reverse the grammar

31 Parsing Given a grammar and a sentence Find all possible structures That describe this sentence with this grammar Many methods. Not discussed today. A lot of research. Very fast algorithms Complexity: cubic in the number of words in the sentence (there are better methods, up to 2.8) Problem: combinatorics of variants

32 Advantages and disadvantages of consitituency approach Advantages Nice mathematics, very well understood Efficient analysis algorithms, very well-elaborated Good for languages with fixed word order English. Chinese? Disadvantages Difficult translation into semantics Bad when it comes to freer word order Even in English! Worse in other languages

33 Head-driven approaches Combine some advantages of dependency-based and constituency-based approaches Syntax is still fixed-order. But word dependency information is added Easier translation into semantics More linguistically-based How? In each constituent, the main word (head) is marked It modifies the head of the larger constituent [ [ Our Government ] [ pays [ much attention ] [ to it ] ] ]

34 Syntactic ambiguity I see a cat with a telescope I see [a cat] [with a telescope] I use a telescope to see a cat I see [ a cat [with a telescope] ] I see a cat that has a telescope Nearly any preposition causes ambiguity Dozens, thousands, millions of variants for a sentence! Because their numbers multiply I see a cat with a telescope in a garden at the shore of a river

35 Ambiguity resolution Syntactic means are not enough Is telescope more related to see or to cat? Statistical methods: is it used with see or cat? Dictionary-based methods: does it share more meaning with see or cat? Path length in a dictionary of semantic relationships Ideally, context should be analyzed, and reasoning applied: I see a cat with a telescope. It keeps the telescope in its left paw. Now no good methods for this.

36 Shallow parsing Due to the HUGE problems in resolving ambiguity Do not resolve it! Do what you can de well I see [a cat] [with a telescope] [in a garden] [at the shore] [of a river] Better than nothing Can be done well

37 Evaluation PARSEVAL international contents A practical parser usually gives only one variant Implies disambiguation! Manually built corpora (treebanks) Compare what the program did with what humans did

38 One of the uses in IR: Lexical ambiguity resolution Syntactic analysis helps in POS disambiguation: Oil is used well in Mexico. Oil well is used in Mexico. Well = ? But does not help in WSD: I deposited my money in an international bank. I live on a beautiful bank of Han river.

39 Research topics Faster algorithms E.g. parallel Handling linguistic phenomena not handled by current approaches Ambiguity resolution! Statistical methods A lot can be done

40 Conclusions Syntax structure is one of intermediate representations of a text for its processing Helps text understanding Thus reasoning, question answering,... Directly helps POS tagging Resolves lexical ambiguity of part of speech But not WSD-type ambiguities A big science in itself, with 50 (2000?) years of history

41 Thank you! Till June 8? 6 pm Semantics