1 Introduction to Natural Language Processing (Lecture for CS410 Text Information Systems) Jan 28, 2011 ChengXiang Zhai Department of Computer Science.

Slides:



Advertisements
Similar presentations
The Structure of Sentences Asian 401
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Statistical NLP: Lecture 3
Data Mining: Concepts and Techniques Mining Text Data
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Introduction to Natural Language Processing Hongning Wang
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Introduction to Computational Linguistics Lecture 2.
CS147 - Terry Winograd - 1 Lecture 14 – Agents and Natural Language Terry Winograd CS147 - Introduction to Human-Computer Interaction Design Computer Science.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
1/23 Applications of NLP. 2/23 Applications Text-to-speech, speech-to-text Dialogues sytems / conversation machines NL interfaces to –QA systems –IR systems.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
ELN – Natural Language Processing Giuseppe Attardi
9/8/20151 Natural Language Processing Lecture Notes 1.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
April 2008Historical Perspectives on NLP1 Historical Perspectives on Natural Language Processing Mike Rosner Dept Artificial Intelligence
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Introduction to CL & NLP CMSC April 1, 2003.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Syntax Why is the structure of language (syntax) important? How do we represent syntax? What does an example grammar for English look like? What strategies.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Overview of Information Retrieval (CS598-CXZ Advanced Topics in IR Presentation) Jan. 18, 2005 ChengXiang Zhai Department of Computer Science University.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
Introduction to Dialogue Systems. User Input System Output ?
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Rules, Movement, Ambiguity
Artificial Intelligence: Natural Language
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
CSE573 Autumn /23/98 Natural Language Processing Administrative –PS3 due today –PS4 out Wednesday, due Friday 3/13 (last day of class) special.
Natural Language Processing Chapter 1 : Introduction.
LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.
Data Mining: Text Mining
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Supertagging CMSC Natural Language Processing January 31, 2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Introduction to Natural Language Processing Hongning Wang
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NATURAL LANGUAGE PROCESSING
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Overview of Statistical Language Models
Statistical NLP: Lecture 3
Machine Learning in Natural Language Processing
Parsing and More Parsing
Natural Language Processing
CS246: Information Retrieval
A User study on Conversational Software
Artificial Intelligence 2004 Speech & Natural Language Processing
Information Retrieval
Presentation transcript:

1 Introduction to Natural Language Processing (Lecture for CS410 Text Information Systems) Jan 28, 2011 ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

2 Lecture Plan What is NLP? A brief history of NLP The current state of the art NLP and text management

3 What is NLP? … เรา เล่น ฟุตบอล … How can a computer make sense out of this string ? Thai: - What are the basic units of meaning (words)? - What is the meaning of each word? - How are words related with each other? - What is the “combined meaning” of words? - What is the “meta-meaning”? (speech act) - Handling a large chunk of text - Making sense of everything Syntax Semantics Pragmatics Morphology Discourse Inference

4 An Example of NLP A dog is chasing a boy on the playground DetNounAuxVerbDetNounPrepDetNoun Noun Phrase Complex Verb Noun Phrase Prep Phrase Verb Phrase Sentence Dog(d1). Boy(b1). Playground(p1). Chasing(d1,b1,p1). Semantic analysis Lexical analysis (part-of-speech tagging) Syntactic analysis (Parsing) A person saying this may be reminding another person to get the dog back… Pragmatic analysis (speech act) Scared(x) if Chasing(_,x,_). + Scared(b1) Inference

5 If we can do this for all the sentences, then … BAD NEWS: Unfortunately, we can’t. General NLP = “AI-Complete”

6 NLP is Difficult!!!!!!! Natural language is designed to make human communication efficient. As a result, –we omit a lot of “common sense” knowledge, which we assume the hearer/reader possesses –we keep a lot of ambiguities, which we assume the hearer/reader knows how to resolve This makes EVERY step in NLP hard –Ambiguity is a “killer”! –Common sense reasoning is pre-required

7 Examples of Challenges Word-level ambiguity: E.g., –“design” can be a noun or a verb (Ambiguous POS) –“root” has multiple meanings (Ambiguous sense) Syntactic ambiguity: E.g., –“natural language processing” (Modification) –“A man saw a boy with a telescope.” (PP Attachment) Anaphora resolution: “John persuaded Bill to buy a TV for himself.” (himself = John or Bill?) Presupposition: “He has quit smoking.” implies that he smoked before.

8 Despite all the challenges, research in NLP has also made a lot of progress…

9 High-level History of NLP Early enthusiasm (1950’s): Machine Translation –Too ambitious –Bar-Hillel report (1960) concluded that fully-automatic high-quality translation could not be accomplished without knowledge (Dictionary + Encyclopedia) Less ambitious applications (late 1960’s & early 1970’s): Limited success, failed to scale up –Speech recognition –Dialogue (Eliza) –Inference and domain knowledge (SHRDLU=“block world”) Real world evaluation (late 1970’s – now) –Story understanding (late 1970’s & early 1980’s) –Large scale evaluation of speech recognition, text retrieval, information extraction (1980 – now) –Statistical approaches enjoy more success (first in speech recognition & retrieval, later others) Current trend: –Heavy use of machine learning techniques –Boundary between statistical and symbolic approaches is disappearing. –We need to use all the available knowledge –Application-driven NLP research (bioinformatics, Web, Question answering…) Stat. language models Robust component techniques Learning-based NLP Applications Knowledge representation Deep understanding in limited domain Shallow understanding

10 The State of the Art A dog is chasing a boy on the playground DetNounAuxVerbDetNounPrepDetNoun Noun Phrase Complex Verb Noun Phrase Prep Phrase Verb Phrase Sentence Semantics: some aspects - Entity/relation extraction - Word sense disambiguation - Anaphora resolution POS Tagging: 97% Parsing: partial >90%(?) Speech act analysis: ??? Inference: ???

11 Technique Showcase: POS Tagging This sentence serves as an example of Det N V1 P Det N P annotated text… V2 N Training data (Annotated text) POS Tagger “This is a new sentence” This is a new sentence Det Aux Det Adj N This is a new sentence Det Det Det Det Det … … Det Aux Det Adj N … … V2 V2 V2 V2 V2 Consider all possibilities, and pick the one with the highest probability Method 1: Independent assignment Most common tag Method 2: Partial dependency w 1 =“this”, w 2 =“is”, …. t 1 =Det, t 2 =Det, …,

roller skates 12 Technique Showcase: Parsing S  NP VP NP  Det BNP NP  BNP NP  NP PP BNP  N VP  V VP  Aux V NP VP  VP PP PP  P NP V  chasing Aux  is N  dog N  boy N  playground Det  the Det  a P  on Grammar Lexicon Generate S NPVP BNP N Det A dog VPPP AuxV is the playground on a boy chasing NPP S VP BNP N dog PP AuxV is on a boy chasing NP P Det A the playground NP … … … … Probability of this tree= Choose a tree with highest prob…. Can also be treated as a classification/decision problem…

13 Semantic Analysis Techniques Only successful for VERY limited domain or for SOME aspect of semantics E.g., –Entity extraction (e.g., recognizing a person’s name): Use rules and/or machine learning –Word sense disambiguation: addressed as a classification problem with supervised learning –Sentiment tagging –Anaphora resolution … In general, exploiting machine learning and statistical language models…

14 What We Can’t Do 100% POS tagging –“He turned off the highway.” vs “He turned off the fan.” General complete parsing –“A man saw a boy with a telescope.” Deep semantic analysis –Will we ever be able to precisely define the meaning of “own” in “John owns a restaurant.”? Robust & general NLP tends to be “shallow”, while “deep” understanding doesn’t scale up …

15 Major NLP Applications Speech recognition: e.g., Auto telephone call routing Text management –Text retrieval/filtering –Text classification –Text summarization –Text mining –Query answering Language tutoring –Spelling/grammar correction Machine translation –Cross-language retrieval –Restricted natural language Natural language user interface Our focus

16 NLP & Text Management Better NLP => Better Text Management Bad NLP => Bad Text Management? Robust, shallow NLP tends to be more useful than deep, but fragile NLP. Errors in NLP can hurt text management performance…

17 How Much NLP is Really Needed? TasksDependency on NLP Classification/ Retrieval Summarization/ Extraction/ Mining Translation/ Dialogue Question Answering “Easier” & More “workarounds”

18 Workaround Example I: ELIZA Person : Men are all alike. ELIZA : In what way? Person : They're always bugging us about something or other. ELIZA : Can you think of a specific example? Person : Well, my boyfriend made me come here. ELIZA : Your boyfriend made you come here? Person : He says I'm depressed much of the time. ELIZA : I am sorry to hear that you are depressed. Person : It's true. I am unhappy. ELIZA : Do you think coming here will help you not to be unhappy? Person : I need some help, that much seems certain. ELIZA : What would it mean to you if you got some help? Person : Perhaps I could learn to get along with my mother. ELIZA : Tell me more about your family ELIZA plays the role of a therapist How does it work? Heuristic rules based on Keyword matching! E.g., I remember X => Do you often think of X? always => Can you think of a specific example? It keeps the dialogue going on, but NEVER makes any claim! A breakdown example: Person : Failure is the mother of success. ELIZA : Tell me more about your family No NLP, but useful. Perhaps we should call this NLP? Statistical NLP often has a similar flavor with “SOFT” rules LEARNED from data

19 Workaround Example II: Statistical Translation Learn how to translate Chinese to English from many example translations Intuitions: - If we have seen all possible translations, then we simply lookup - If we have seen a similar translation, then we can adapt - If we haven’t seen any example that’s similar, we try to generalize what we’ve seen English Speaker Translator Noisy Channel P(E)P(C|E) English Words (E) Chinese Words(C) English Translation P(E|C)=? All these intuitions are captured through a probabilistic model

20 So, what NLP techniques are most useful for text management? Statistical NLP in general, and statistical language models in particular The need for high robustness and efficiency implies the dominant use of simple models (i.e., unigram models)

21 What You Should Know NLP is the basis for text management –Better NLP enables better text management –Better NLP is necessary for sophisticated tasks But –Bad NLP doesn’t mean bad text management –There are often “workarounds” for a task –Inaccurate NLP can even hurt the performance of a task The most effective NLP techniques are often statistical with the help of linguistic knowledge The challenge is to bridge the gap between NLP and applications