Introduction to Natural Language Processing Hongning Wang

Slides:



Advertisements
Similar presentations
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
1 Introduction to Natural Language Processing (Lecture for CS410 Text Information Systems) Jan 28, 2011 ChengXiang Zhai Department of Computer Science.
Introduction to Natural Language Processing Hongning Wang
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Introduction to Computational Linguistics Lecture 2.
Natural Language Processing AI - Weeks 19 & 20 Natural Language Processing Lee McCluskey, room 2/07
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Part of speech (POS) tagging
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
1/23 Applications of NLP. 2/23 Applications Text-to-speech, speech-to-text Dialogues sytems / conversation machines NL interfaces to –QA systems –IR systems.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
ELN – Natural Language Processing Giuseppe Attardi
9/8/20151 Natural Language Processing Lecture Notes 1.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Survey of Semantic Annotation Platforms
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Computational Linguistics Ling 200 Spring 2006.
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
Natural Language Processing Rogelio Dávila Pérez Profesor – Investigador
Chapter 15 Natural Language Processing (cont)
Introduction to CL & NLP CMSC April 1, 2003.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
GTRI.ppt-1 NLP Technology Applied to e-discovery Bill Underwood Principal Research Scientist “The Current Status and.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Overview of Information Retrieval (CS598-CXZ Advanced Topics in IR Presentation) Jan. 18, 2005 ChengXiang Zhai Department of Computer Science University.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
Introduction to Dialogue Systems. User Input System Output ?
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Artificial Intelligence: Natural Language
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Natural Language Processing Chapter 1 : Introduction.
Data Mining: Text Mining
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Natural Language Processing.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
The Unreasonable Effectiveness of Data
Natural Language Processing Slides adapted from Pedro Domingos
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Text segmentation Amany AlKhayat. Before any real processing is done, text needs to be segmented at least into linguistic units such as words, punctuation,
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Natural Language Processing (NLP)
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NATURAL LANGUAGE PROCESSING
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Introduction to NLP Thanks for Hongning slides on Text Ming Courses, Slides are slightly modified by Lei Chen.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Natural Language Processing [05 hours/week, 09 Credits] [Theory]
Natural Language Processing (NLP)
Machine Learning in Natural Language Processing
CSE 635 Multimedia Information Retrieval
Chunk Parsing CS1573: AI Application Development, Spring 2003
Natural Language Processing
CS246: Information Retrieval
Natural Language Processing (NLP)
Information Retrieval
Natural Language Processing (NLP)
Presentation transcript:

Introduction to Natural Language Processing Hongning Wang

What is NLP? كلب هو مطاردة صبي في الملعب. How can a computer make sense out of this string ? Arabic text - What are the basic units of meaning (words)? - What is the meaning of each word? Morphology Text Mining2 Syntax - How are words related with each other? Semantics - What is the “combined meaning” of words? Pragmatics - What is the “meta-meaning”? (speech act) Discourse - Handling a large chunk of text Inference - Making sense of everything

An example of NLP A dog is chasing a boy on the playground. DetNounAuxVerbDetNounPrepDetNoun Noun Phrase Complex Verb Noun Phrase Prep Phrase Verb Phrase Sentence Dog(d1). Boy(b1). Playground(p1). Chasing(d1,b1,p1). Semantic analysis Lexical analysis (part-of- speech tagging) Syntactic analysis (Parsing) A person saying this may be reminding another person to get the dog back… Pragmatic analysis (speech act) Scared(x) if Chasing(_,x,_). + Scared(b1) Inference Text Mining3

If we can do this for all the sentences in all languages, then … BAD NEWS: Unfortunately, we cannot right now. General NLP = “Complete AI” Automatically answer our s Translate languages accurately Help us manage, summarize, and aggregate information Use speech as a UI (when needed) Talk to us / listen to us Text Mining4

NLP is difficult!!!!!!! Natural language is designed to make human communication efficient. Therefore, – We omit a lot of “common sense” knowledge, which we assume the hearer/reader possesses – We keep a lot of ambiguities, which we assume the hearer/reader knows how to resolve This makes EVERY step in NLP hard – Ambiguity is a “killer”! – Common sense reasoning is pre-required Text Mining5

An example of ambiguity Get the cat with the gloves. Text Mining6

Examples of challenges Word-level ambiguity – “design” can be a noun or a verb (Ambiguous POS) – “root” has multiple meanings (Ambiguous sense) Syntactic ambiguity – “natural language processing” (Modification) – “A man saw a boy with a telescope.” (PP Attachment) Anaphora resolution – “John persuaded Bill to buy a TV for himself.” (himself = John or Bill?) Presupposition – “He has quit smoking.” implies that he smoked before. Text Mining7

Despite all the challenges, research in NLP has also made a lot of progress… Text Mining8

A brief history of NLP Early enthusiasm (1950’s): Machine Translation – Too ambitious – Bar-Hillel report (1960) concluded that fully-automatic high-quality translation could not be accomplished without knowledge (Dictionary + Encyclopedia) Less ambitious applications (late 1960’s & early 1970’s): Limited success, failed to scale up – Speech recognition – Dialogue (Eliza) – Inference and domain knowledge (SHRDLU=“block world”) Real world evaluation (late 1970’s – now) – Story understanding (late 1970’s & early 1980’s) – Large scale evaluation of speech recognition, text retrieval, information extraction (1980 – now) – Statistical approaches enjoy more success (first in speech recognition & retrieval, later others) Current trend: – Boundary between statistical and symbolic approaches is disappearing. – We need to use all the available knowledge – Application-driven NLP research (bioinformatics, Web, Question answering…) Statistical language models Robust component techniques Applications Knowledge representation Deep understanding in limited domain Shallow understanding Text Mining9

The state of the art A dog is chasing a boy on the playground DetNounAuxVerbDetNounPrepDetNoun Noun Phrase Complex Verb Noun Phrase Prep Phrase Verb Phrase Sentence Semantics: some aspects -Entity/relation extraction -Word sense disambiguation -Anaphora resolution POS Tagging: 97% Parsing: partial >90% Speech act analysis: ??? Inference: ??? Text Mining10

Machine translation Text Mining11

Dialog systems Apple’s siri systemGoogle search Text Mining12

Information extraction Google Knowledge Graph Wiki Info Box Text Mining13

Information extraction YAGO Knowledge Base CMU Never-Ending Language Learning Text Mining14

Building a computer that ‘understands’ text: The NLP pipeline Text Mining15

Tokenization/Segmentation Split text into words and sentences – Task: what is the most likely segmentation /tokenization? There was an earthquake near D.C. I’ve even felt it in Philadelphia, New York, etc. There + was + an + earthquake + near + D.C. I + ve + even + felt + it + in + Philadelphia, + New + York, + etc. Text Mining16

Part-of-Speech tagging Marking up a word in a text (corpus) as corresponding to a particular part of speech – Task: what is the most likely tag sequence A + dog + is + chasing + a + boy + on + the + playground DetNounAuxVerbDetNoun Prep DetNoun A + dog + is + chasing + a + boy + on + the + playground Text Mining17

Named entity recognition Determine text mapping to proper names – Task: what is the most likely mapping Its initial Board of Visitors included U.S. Presidents Thomas Jefferson, James Madison, and James Monroe. Organization, Location, Person Text Mining18

Recap: perplexity The inverse of the likelihood of the test set as assigned by the language model, normalized by the number of words 6501: Text Mining19 N-gram language model

Recap: an example of NLP A dog is chasing a boy on the playground. DetNounAuxVerbDetNounPrepDetNoun Noun Phrase Complex Verb Noun Phrase Prep Phrase Verb Phrase Sentence Dog(d1). Boy(b1). Playground(p1). Chasing(d1,b1,p1). Semantic analysis Lexical analysis (part-of- speech tagging) Syntactic analysis (Parsing) A person saying this may be reminding another person to get the dog back… Pragmatic analysis (speech act) Scared(x) if Chasing(_,x,_). + Scared(b1) Inference Text Mining20

Recap: NLP is difficult!!!!!!! Natural language is designed to make human communication efficient. Therefore, – We omit a lot of “common sense” knowledge, which we assume the hearer/reader possesses – We keep a lot of ambiguities, which we assume the hearer/reader knows how to resolve This makes EVERY step in NLP hard – Ambiguity is a “killer”! – Common sense reasoning is pre-required Text Mining21

Syntactic parsing Grammatical analysis of a given sentence, conforming to the rules of a formal grammar – Task: what is the most likely grammatical structure A + dog + is + chasing + a + boy + on + the + playground DetNounAuxVerbDetNoun Prep DetNoun Noun PhraseComplex Verb Noun Phrase Prep Phrase Verb Phrase Sentence Text Mining22

Relation extraction Identify the relationships among named entities – Shallow semantic analysis Its initial Board of Visitors included U.S. Presidents Thomas Jefferson, James Madison, and James Monroe. 1. Thomas Jefferson Is_Member_Of Board of Visitors 2. Thomas Jefferson Is_President_Of U.S. Text Mining23

Logic inference Convert chunks of text into more formal representations – Deep semantic analysis: e.g., first-order logic structures Its initial Board of Visitors included U.S. Presidents Thomas Jefferson, James Madison, and James Monroe. Text Mining24

Towards understanding of text Who is Carl Lewis? Did Carl Lewis break any records? Text Mining25

Major NLP applications Speech recognition: e.g., auto telephone call routing Text mining – Text clustering – Text classification – Text summarization – Topic modeling – Question answering Language tutoring – Spelling/grammar correction Machine translation – Cross-language retrieval – Restricted natural language Natural language user interface Our focus Text Mining26

NLP & text mining Better NLP => Better text mining Bad NLP => Bad text mining? Robust, shallow NLP tends to be more useful than deep, but fragile NLP. Errors in NLP can hurt text mining performance… Text Mining27

How much NLP is really needed? Tasks Dependency on NLP Classification Clustering Summarization Extraction Topic modeling Translation Dialogue Question Answering Scalability Inference Speech Act Text Mining28

Statistical NLP in general. The need for high robustness and efficiency implies the dominant use of simple models So, what NLP techniques are the most useful for text mining? Text Mining29

What you should know Different levels of NLP Challenges in NLP NLP pipeline Text Mining30