1 CS598 DNR FALL 2005 Machine Learning in Natural Language Dan Roth University of Illinois, Urbana-Champaign

Slides:



Advertisements
Similar presentations
Introduction to Computational Linguistics
Advertisements

Page 1 CS 546 Machine Learning in NLP Structured Prediction: Theories and Applications to Natural Language Processing Dan Roth Department of Computer Science.
Page 1 CS 546 Machine Learning in NLP Structured Prediction: Theories and Applications to Natural Language Processing Dan Roth Department of Computer Science.
CS 446: Machine Learning Dan Roth
CS 6961: Structured Prediction Fall 2014 Introduction Lecture 1 What is structured prediction?
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Information Retrieval in Practice
Search Engines and Information Retrieval
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Page 1 Learning and Global Inference for Information Access and Natural Language Understanding Dan Roth Department of Computer Science University of Illinois.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Information Retrieval in Practice
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dörre, Peter Gerstl, and Roland Seiffert Presented By: Jake Happs,
Methodology Conceptual Database Design
 MODERN DATABASE MANAGEMENT SYSTEMS OVERVIEW BY ENGINEER BILAL AHMAD
Natural Language Processing Ellen Back, LIS489, Spring 2015.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Machine Learning Theory Maria-Florina (Nina) Balcan Lecture 1, August 23 rd 2011.
Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
1 CS546 Spring 2009 Machine Learning in Natural Language Dan Roth & Ivan Titov SC Wed/Fri 9:30  What’s.
Search Engines and Information Retrieval Chapter 1.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Page 1 CS 546 Machine Learning in NLP Structured Prediction: Theories and Applications to Natural Language Processing Dan Roth Department of Computer Science.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Visual Scene Understanding (CS 598) Derek Hoiem Course Number: Instructor: Derek Hoiem Room: Siebel Center 1109 Class Time: Tuesday and Thursday.
CS 6961: Structured Prediction Fall 2014 Course Information.
INTRODUCTIONCS446 Fall ’15 CS 446: Machine Learning Dan Roth University of Illinois, Urbana-Champaign
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Research Topics CSC Parallel Computing & Compilers CSC 3990.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
NLP ? Natural Language is one of fundamental aspects of human behaviors. One of the final aim of human-computer communication. Provide easy interaction.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Computational UIUC Roxana Girju Student Orientation August 22, 2013.
C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
Melissa Nelson EDU 521 Fall First Grade Standards Whole Class KWLLearning Centers Small Groups Math : Determine and compare sets of pennies.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
Page 1 CS 546 Machine Learning in NLP Structured Prediction: Theories and Applications in Natural Language Processing Dan Roth Department of Computer Science.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
SPELLING, RESEARCH, HYPERLINKS, PROPERTIES 10/16/13.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Investigate Plan Design Create Evaluate (Test it to objective evaluation at each stage of the design cycle) state – describe - explain the problem some.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
CIS 700 Advanced Machine Learning Structured Machine Learning:   Theory and Applications in Natural Language Processing Dan Roth Department of Computer.
CS 446: Machine Learning Dan Roth
Computational UIUC Lane Schwartz Student Orientation August 18, 2016.
School of Computer Science & Engineering
CIS 519/419 Applied Machine Learning
Statistical NLP: Lecture 9
CIS 519/419 Applied Machine Learning
Dan Roth Department of Computer Science
Information Retrieval
Presentation transcript:

1 CS598 DNR FALL 2005 Machine Learning in Natural Language Dan Roth University of Illinois, Urbana-Champaign

2 Comprehension (ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous. 1. Who is Christopher Robin? 2. When was Winnie the Pooh written? 3. What did Mr. Robin do when Chris was three years old? 4. Where did young Chris live? 5. Why did Chris write two books of his own?

3 Illinois’ bored of education board...Nissan Car and truck plant is … …divide life into plant and animal kingdom (This Art) (can N) (will MD) (rust V) V,N,N The dog bit the kid. He was taken to a veterinarian a hospital Tiger was in Washington for the PGA Tour What we Know: Ambiguity Resolution

4 Comprehension (ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous. 1. Who is Christopher Robin? 2. When was Winnie the Pooh written? 3. What did Mr. Robin do when Chris was three years old? 4. Where did young Chris live? 5. Why did Chris write two books of his own?

5 I have a spelling checker, it came with my PC It plane lee marks four my revue Miss steaks aye can knot sea. Eye ran this poem threw it, your sure reel glad two no. Its vary polished in it's weigh My checker tolled me sew. A checker is a bless sing, it freeze yew lodes of thyme. It helps me right awl stiles two reed And aides me when aye rime. Each frays come posed up on my screen Eye trussed to bee a joule... introduction An Owed to the Spelling Checker

6 Intelligent Access to Information Access free form text News articles; reports: maintenance, projects,…. ; web data Mixed form information: layout intensive (lists, tables,databases…) As if it was a data base Ability to identify the semantics of the text Specific Tasks: Basic recognition, categorization & tagging tasks; semantic analysis; semantic integration; textual entailment,…. Done within and across documents Techniques: Machine Learning and Inference

7 Intelligent Access to Information:Tasks Named Entity (Semantic Categorization): Identifying names of entities and concepts in the text Semantic Relations: Identifying relations between entities in documents Cross-Document Entities Identification and Tracing Robust Reading of Text; overcome variability in writing Temporal integration of Information Tracking of entities along time; information integration; change. Question Answering [JFK was busy; the parking lots were full] LOC [Dr. ABC joined Microsoft, Redmond and will lead the SearchIt project.] [The JFK problem; The Michael Jordan Problem [Dr. ABC joined Google to save the AnswerIt project.]

8 Demo Screen shot from a CCG demo More work on this problem: Scaling up Integration with DBs Temporal Integration/Inference ……

9 Understanding Questions What is the question asking? (different from Googling) Beyond finding candidate passages; choose the right one. Q: What is the fastest automobile in the world? A1: …will stretch Volkswagen’s lead in the world’s fastest growing vehicle market. Demand for cars is expected to soar A2: …the Jaguar XJ220 is the dearest (415,000 pounds), fastest (217mph) and most sought after car in the world.  Context: News articles (SJM, LAT,…)  And, what if the answers require aggregation,…

10 Not So Easy

11 Question Processing Requires several levels of analysis Syntactic/Functional Analysis (+lexical information) Semantic Annotation Global Analysis: Determining the type of the question Determining the type of the expected answer Determine properties (constraints) on the answer An abstract representation of the question All these are viewed as learning* problems

12 Tools A collection of tools that are essential for any intelligent use of text… Robust text analysis tools Tokenization; POS tagging; Shallow parsing Name Entity Classifiers people; locations; organizations; transpotration; materials… Information Extraction functional phrases (e.g., job descriptions; acquisitions) Relations/Event recognizers born_in(A,B); capital_of(C,D); killed(A,B) Role of the Tutorial….

13 Illinois’ bored of education [board] Nissan Car and truck plant; plant and animal kingdom (This Art) (can N) (will MD) (rust V) V,N,N The dog bit the kid. He was taken to a veterinarian; a hospital Tiger was in Washington for the PGA Tour  Finance; Banking; World News; Sports Important or not important; love or hate Classification: Ambiguity Resolution

14  The goal is to learn a function f: X  Y that maps observations in a domain to one of several categories.  Task: Decide which of {board,bored } is more likely in the given context:  X: some representation of: The Illinois’ _______ of education met yesterday…  Y: {board,bored }  Typical learning protocol:  Observe a collection of labeled examples (x,y) 2 X £ Y  Use it to learn a function f:X  Y that is consistent with the observed examples, and (hopefully) performs well on new, previously unobserved examples. Classification

15  Theoretically: generalization bounds  How many example does one need to see in order to guarantee good behavior on previously unobserved examples.  Algorithmically: good learning algorithms for linear representations.  Can deal with very high dimensionality (10 6 features)  Very efficient in terms of computation and # of examples. On-line.  Key issues remaining:  Learning protocols: how to minimize interaction (supervision); how to map domain/task information to supervision; semi-supervised learning; active learning. ranking.  What are the features? No good theoretical understanding here. Classification is Well Understood

16 before person name(“Mohammed Atta”) gender(male) city person date month(April) year(2001) country Mohammed Atta met with an Iraqi intelligence agent in Prague in April meeting participant location time name(Iraq) affiliation nationality after word(an) tag(DT) word(intelligence) tag(NN) word(Iraqi) tag(JJ) before... after country name(“Czech Republic”) name(Prague) organization location end begin Output Data Attributes (node labels) Roles (edge labels) Learn this Structure (Many dependent Classifiers; Finding best coherent structure  INFERENCE) Map Structures (Determine equivalence or entailment between structures  INFERENCE) Extract Features from this structure  INFERENCE

17 before person name(“Mohammed Atta”) gender(male) city person date month(April) year(2001) country Mohammed Atta met with an Iraqi intelligence agent in Prague in April meeting participant location time name(Iraq) affiliation nationality after word(an) tag(DT) word(intelligence) tag(NN) word(Iraqi) tag(JJ) before... after country name(“Czech Republic”) name(Prague) organization location end begin Output Data Attributes (node labels) Roles (edge labels)

18 Screen shot from a CCG demo Semantic Parse (Semantic Role Labelling)

19 By “textually entailed” we mean: most people would agree that one sentence implies the other. Textual Entailment WalMart defended itself in court today against claims that its female employees were kept out of jobs in management because they are women WalMart was sued for sexual discrimination Entails Subsumed by 

20 A fundamental task that can be used as a building block in multiple NLP and information extraction applications Has multiple direct applications Why Textual Entailment?

21 A key problem in natural language understanding is to abstract over the inherent syntactic and semantic variability in natural language. Multiple tasks attempt to do just that. Relation Extraction: Dole ’ s wife, Elizabeth, is a native of Salisbury, N.C.  Elizabeth Dole was born in Salisbury, N.C Information Integration (Data Bases) Different database schemas represent the same information under different titles. Information retrieval: Multiple issues, from variability in the query and target text, to relations Summarization; Paraphrasing Multiple techniques can be applied; all are entailment problems. Examples You may disagree with the truth of this statement; and you may infer also that: the presidential candidate’s wife was born in N.C.

22 Given: Q: Who acquired Overture? Determine: A: Eyeing the huge market potential, currently led by Google, Yahoo took over search company Overture Services Inc last year. Question Answering Eyeing the huge market potential, currently led by Google, Yahoo took over search company Overture Services Inc last year Yahoo acquired Overture Entails Subsumed by  (and distinguish from other candidates)

23 Direct Application: Semantic Verification (and distinguish from other candidates) Given: A long contract that you need to ACCEPT Determine: Does it satisfy the 3 conditions that you really care about? ACCEPT?

24 Role of Learning “Solving” a natural language problem requires addressing a wide variety of questions. Learning is at the core of any attempt to make progress on these questions. Learning has multiple purposes: Knowledge Acquisition Classification/Context sensitive disambiguation integration of various knowledge sources to ensure robust behavior

25 Main channel of Communication Knowledge Acquisition Important for Cognition and Engineering perspectives A grand application: Human computer interaction. Language understanding and generation Knowledge acquisition NL interface to complex systems Querying NL databases Reducing information overload Why Natural Language? introduction

26 Challenging from a Machine Learning perspective There is no significant aspect of natural language that can be studied without giving learning a principle role.. Language Comprehension: a large scale phenomenon in terms of both knowledge and computation Requires an integrated approach: need to solve problems in learning, inference, knowledge representation… There is not “cheating”: no toy problems. Why Learning in Natural Language?

27 This Course There are many topics to cover. A very active field of research, many ideas are floating around, most of which will not stay. Rather than covering problems - we will cover some of the main ideas and techniques. Attempt to abstract away from specific works and understand the main paradigms. Move towards: Beyond Classification Knowledge Representation and Inference

28 This Course Representation-less Approaches Statistics Paradigms Generative and Discriminative Understanding why things work Classification: Learning Algorithms Generative and Discriminative algorithms The ubiquitousness of Linear Representations Features and Kernels Inference Generative models; Conditional Models Inference with Classifiers Constraint satisfaction Structural Mappings (translation) Problems  Knowledge Acquisition  Multi words?  Classification  Classification of Verbs?  Co-reference?  Inferring Sequential Structure  (Semantic) Parsing  Story Comprehension  Answer Selection

29 The Course Plan Introduction Why Learning; Learning vs. Statistic Learning Paradigms Theory of Generalization Generative Models LSQ ( Probabilistic Approaches Work) Power of Generative Models Modeling HMM, K-Means, EM; Semi-Sup;ME Discriminatory Algorithm Classification & Inference Linear Learning Algorithms Learning Structured Representations Inference: putting things together Representation & Inference Sequential and General structures Structure Mapping Features Feature extraction languages Kernels (over structures) Sequential Structures Verb Classifications ? Parsing; NE Story Comprehension Representation-Less Approaches Statistics & Information Theory MultiWords ?

30 1. Introduction to Natural Language Learning Why is it difficult ? Statistics vs. Learning When do we need learning? Examples of problems 2. Statistics and Information Theory Corpus based work: data and tasks. 3. Learning Paradigms PAC Learning Bayesian Learning Examples More Detailed Plan (I)

31 3. Learning Algorithms Examples General Paradigm: feature based representation Linear functions On line algorithms: additive/multiplicative update Decision Lists; TBL Memory Based 4. Bayesian Methods Naïve Bayes HMMs (Predictions and model learning) Max Entropy LSQ: why do probabilistic algorithms work? More Detailed Plan (II)

32 5. Relaxing Supervision EM Semi supervised learning: co-learning vs. selection 6. Inference: Sequential Models HMMs (Predictions and model learning) HMMs (with Classifiers), PMMs Constraint Satisfaction/ ad hoc methods 7. Inference: Complex Models Inference as constrained optimization Parsing Structural Mapping Generative vs Discriminative 8. Projects More Detailed Plan (III)

33 Who Are You? Undergrads? Ph.D students? Post Ph.D? Background: Natural Language Learning Algorithms/Theory of Computation Survey

34 Expectations Interaction is important! Please, ask questions and make comments Read; Present; Work on projects. Independence: (do, look for, read) more than surface level Rigor: advanced papers will require more Math than you know… Critical Thinking: don’t simply believe what’s written; criticize and offer better alternatives

35 Next Time Examine some of the philosophical themes and leading ideas that motivate statistical approaches to linguistics and natural language and to Begin exploring what can be learned by looking at statistics of texts.