CS 124/LINGUIST 180 From Languages to Information

Slides:



Advertisements
Similar presentations
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Advertisements

Draft Online Course Template Development Nnannah C. James
Leksička semantika i pragmatika 5. predavanje. Ambiguity Find at least 5 meanings of this sentence: –I made her duck I cooked waterfowl for her benefit.
Introduction to Natural Language Processing A.k.a., “Computational Linguistics”
How to make the most of your website: It’s one of your best marketing, branding, awareness tools.
Welcome to the Orientation for ENGLISH FOR CAREERS To view this presentation, just click the right arrow button to go to the next slide or the left arrow.
Introduction to the course Day 1 LING Computational Linguistics Harry Howard Tulane University.
Language Perception and Comprehension
Oct 2009HLT1 Human Language Technology Overview. Oct 2009HLT2 Acknowledgement Material for some of these slides taken from J Nivre, University of Gotheborg,
Introduction to NLP What is Natural Language Processing?
Introduction to Semantics and Pragmatics. LING NLP 2 NLP tends to focus on: Syntax – Grammars, parsers, parse trees, dependency structures.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Natural Language Processing (NLP) Overview and history of the field Knowledge of language The role of ambiguity Models and Algorithms Eliza, Turing, and.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
CS /29/2004 (Recitation Objectives) and Computer Science and Objects and Algorithms.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Computational Genomics Lecture 1, Tuesday April 1, 2003.
Fall 2004 Cognitive Science 207 Introduction to Cognitive Modeling Praveen Paritosh.
Advanced Writing Tuesdays & Thursdays 11am-1pm Instructor-Suzanne Bardasz.
Natural Language Processing Ellen Back, LIS489, Spring 2015.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
CSCI 200 Introduction To Programming with Visual Basic Bob Bradley.
9/8/20151 Natural Language Processing Lecture Notes 1.
CS223 Algorithms D-Term 2013 Instructor: Mohamed Eltabakh WPI, CS Introduction Slide 1.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
James Tam CPSC 203: Introduction To Computers (Independent Study) James Tam.
1 Computational Linguistics Ling 200 Spring 2006.
U & I: Users & Information Lab Sept 2008  Alice Oh 
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
Introduction to NLP ch1 What is Natural Language Processing?
CS 140 Computer Programming (I) Second semester (3 credits) Imam Mohammad bin Saud Islamic University College of Computer Science and Information.
Introduction to CL & NLP CMSC April 1, 2003.
S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
CPSC 121: Models of Computation Unit 0 Introduction George Tsiknis Based on slides by Patrice Belleville and Steve Wolfman.
Minimum Edit Distance Definition of Minimum Edit Distance.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
King Saud UniversityCSC112 - First Semester CSC 112 Java Programming I Introduction.
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
Welcome To MOODLE Getting Started. Introductions Christa McLaughlin – High School math teacher and high school lead teacher of technology Jason Grubbs.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
PRESENTATIONS RB, p MK, p.181. CONTENT DELIVERY % % ?
Analysis of Algorithms: Math Review Richard Kelley, Lecture 2.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Welcome to the Computer Science classes Mrs. Whitlock AP night Chattahoochee High School.
Pattern Recognition NTUEE 高奕豪 2005/4/14. Outline Introduction Definition, Examples, Related Fields, System, and Design Approaches Bayesian, Hidden Markov.
Minimum Edit Distance Definition of Minimum Edit Distance.
COSC 6336 Natural Language Processing
Natural Language Processing [05 hours/week, 09 Credits] [Theory]
Welcome to the Orientation for
Sentiment analysis algorithms and applications: A survey
Natural Language Processing
Introduction to NLP What is Natural Language Processing?
Insight Ahmad Jabi | Yazan Shakhshir | Saleem Abu Dhair
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
CSCI 5582 Artificial Intelligence
Lecturer: Geoff Hulten TAs: Kousuke Ariga & Angli Liu
planning a presentation
Making Your Website Work NJAET October 13, 2009
Introduction to Sentiment Analysis
From Language to Information
Lecturer: Geoff Hulten TAs: Alon Milchgrub, Andrew Wei
Presentation transcript:

CS 124/LINGUIST 180 From Languages to Information Dan Jurafsky Stanford University Introduction and Course Overview

What this course is about Automatically extracting meaning and structure from: Natural language text Speech Web pages Social networks (and other networks) Genome sequences

Commercial World Lots of exciting stuff going on…

Question Answering: IBM’s Watson

Information Extraction and Sentiment Analysis http://www.bing.com/search?q=canon+powershot&go=&form=QBLH&qs=n Sentiment analysis Attribute detection Relation extraction

Sentiment Emotional Spell Check New York Times “10 big ideas of 2010” http://video.nytimes.com/video/2010/12/15/magazine/1248069422438/emotional-spell-check.html?scp=1&sq=emotional%20spell%20check&st=cse

Blog Analytics Data-mining of blogs, discussion forums, message boards, user groups, and other forms of user generated media Product marketing information Political opinion tracking Social network analysis Buzz analysis (what’s hot, what topics are people talking about right now).

Livejournal.com: I, me, my on or after Sep 11, 2001 Cohn, Mehl, Pennebaker. 2004. Linguistic markers of psychological change surrounding September 11, 2001. Psychological Science 15, 10: 687-693. Graph from Pennebaker slides

September 11 LiveJournal.com study: We, us, our Cohn, Mehl, Pennebaker. 2004. Linguistic markers of psychological change surrounding September 11, 2001. Psychological Science 15, 10: 687-693. Graph from Pennebaker slides

Machine Translation Helping human translators Fully automatic Enter Source Text:  这 不过 是 一 个 时间 的 问题 . Translation from Stanford’s Phrasal: This is only a matter of time.

Google Translate Fried ripe plantains: http://laylita.com/recetas/2008/02/28/platanos-maduros-fritos/

Information Extraction Event: Curriculum mtg Date: Jan-16-2012 Start: 10:00am End: 11:30am Where: Gates 159 Subject: curriculum meeting Date: January 15, 2012 To: Dan Jurafsky Hi Dan, we’ve now scheduled the curriculum meeting. It will be in Gates 159 tomorrow from 10:00-11:30. -Chris Create new Calendar entry

Computational Biology: Finding Genes Start codon ATG 5’ 3’ Exon 1 Exon 2 Exon 3 Intron 1 Intron 2 Stop codon TAG/TGA/TAA Splice sites Pictures from Serafim Batzoglou

Computational Biology: Comparing Sequences AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- | | | | | | | | | | | | | x | | | | | | | | | | | TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Sequence comparison is key to Finding genes Determining function Uncovering the evolutionary processes Slide stuff from Serafim Batzoglou

Ambiguity Resolving ambiguity is a crucial goal throughout string and language processing

Ambiguity Find at least 5 meanings of this sentence: I made her duck

Ambiguity Find at least 5 meanings of this sentence: I made her duck I cooked waterfowl for her benefit (to eat) I cooked waterfowl belonging to her I created the (plaster?) waterfowl she owns I caused her to quickly lower her head or body I waved my magic wand and turned her into undifferentiated waterfowl

Ambiguity is Pervasive I caused her to quickly lower her head or body Syntactic category: “duck” can be a Noun or Verb I cooked waterfowl belonging to her. Syntactic category: “her” can be a possessive (“of her”) or dative (“for her”) pronoun I made the (plaster) duck statue she owns Word Meaning : “make” can mean “create” or “cook”

Ambiguity is Pervasive Grammar: make can be: Transitive: (verb has a noun direct object) I cooked [waterfowl belonging to her] Ditransitive: (verb has 2 noun objects) I made [her] (into) [undifferentiated waterfowl] Action-transitive (verb has a direct object + verb) I caused [her] [to move her body]

Ambiguity is Pervasive: Phonetics!!!!! I mate or duck I’m eight or duck Eye maid; her duck Aye mate, her duck I maid her duck I’m aid her duck I mate her duck I’m ate her duck I’m ate or duck

Why else is natural language understanding difficult? non-standard English segmentation issues idioms Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥ dark horse get cold feet lose face throw in the towel the New York-New Haven Railroad neologisms world knowledge tricky entity names unfriend Retweet bromance Mary and Sue are sisters. Mary and Sue are mothers. Where is A Bug’s Life playing … Let It Be was recorded … … a mutation on the for gene … But that’s what makes it fun!

Making progress on this problem… The task is difficult! What tools do we need? Knowledge about language Knowledge about the world A way to combine knowledge sources How we generally do this: probabilistic models built from language data P(“maison”  “house”) high P(“L’avocat général”  “the general avocado”) low Luckily, rough text features can often do half the job.

Models Finite state machines Markov models Alignment models Genome alignment Alignment of sentence in L1 to sentence in L2 Alignment of text to speech Vector space model of IR Network models

Dynamic Programming Minimum Edit Distance The Viterbi Algorithm Don’t do the same work over and over. Avoid this by building and making use of solutions to sub-problems that must be invariant across all parts of the space. Minimum Edit Distance The Viterbi Algorithm Baum-Welch/Forward-Backward (In parsing: CKY, Earley, charts, etc)

Machine Learning Machine learning based classifiers that are trained to make decisions based on features extracted from the context Simple Classifiers: Naïve Bayes Decision Trees Sequence Models: Hidden Markov Models Maximum Entropy Markov Models Conditional Random Fields

Course logistics in brief Instructor: Dan Jurafsky TAs: Leon Lin, Robin Melnick, Evan Rosen, Alden Timme, Adam Vogel Time: TuTh 9:30-10:45, Braunlec Requirements: Online Video Lectures with embedded quizzes Homeworks: In Java or Python Online Review Exercises Final Exam Class sessions: Tuesdays: Discussions/Guest Lectures Thursdays: Open group working hours

Overview of the course http://cs124.stanford.edu