NLP
Natural Language Processing Class Logistics
Quiz Where is this quote from? Dave Bowman: Open the pod bay doors, HAL. HAL: I’m sorry Dave. I’m afraid I can’t do that. REPLACE
Quiz Answer “2001: A Space Odyssey” 1968 film by Stanley Kubrick based on a joint screenplay with Arthur C. Clarke. http://www.imdb.com/title/tt0062622/
Watson Example http://www.geekwire.com/2013/ibm-takes-watson-cloud/
What is Natural Language Processing Natural Language Processing (NLP) is the study of the computational treatment of natural (human) language. In other words, teaching computers how to understand (and generate) human language.
How Computers Understand Language
Modern Applications Search engines Question answering Google, Yahoo!, Bing, Baidu Question answering IBM’s Watson Natural language assistants Apple’s Siri Translation systems Google Translate News digest Yahoo! Automated earthquake reports LA Times Automated stock market reports Narrative Science
Notes Computers are confused by (human) language Goals of this class Specific techniques are needed NLP draws on research in Linguistics, Theoretical Computer Science, Mathematics, Statistics, Artificial Intelligence, Psychology, Databases, etc. Goals of this class Understand that language processing is hard (and why) Understand the key problems in NLP Learn about the methods used to address these problems Understand the limitations of these methods
EECS 595/LING 541/SI 561 Instructor: Class times: GSI: Grader: Dragomir Radev (radev@umich.edu) Class times: M 3:10-5:55 in 133 Chrysler GSI: Catherine Finegan-Dollak (cfdollak) Grader: TBA
EECS 595/LING 541/SI 561 Additional readings: Course home page: http://web.eecs.umich.edu/~radev/NLP-fall2015/ Textbook: http://www.cs.colorado.edu/~martin/slp.html Speech and Language Processing by Jurafsky and Martin Second edition, 2009 http://web.stanford.edu/~jurafsky/slp3/ Additional readings: www.nltk.org
Other Available Books Foundations of Statistical Natural Language Processing Chris Manning and Hinrich Schütze http://nlp.stanford.edu/fsnlp/ Natural Language Understanding James Allen REFORMAT
Course Dates SEP OCT 5 NOV DEC no class Mon Oct 19 14 21 28 OCT 5 12 26 NOV 2 9 16 23 30 DEC 7 14 no class Mon Oct 19 midterm (unofficial) Nov 2 last class Mon Dec 14 exams Dec 16-23
Structure of the Course Four major parts: Linguistic, mathematical, and computational background Computational models of morphology, syntax, semantics, discourse, pragmatics Core NLP technology: parsing, part of speech tagging, text generation, semantic analysis, etc. Applications: text classification, sentiment analysis, text summarization, question answering, machine translation, information extraction, etc. Three major goals: Learn the basic principles and theoretical issues underlying natural language processing Learn techniques and tools used to develop practical, robust systems that can understand text and communicate with users in one or more languages Gain insight into some open research problems in natural language MOVE ELSEWHERE?
Syllabus Book sections Introduction (chapter 1) Words (chapters 2-6) Syntax (chapters 12-16) Semantics and Pragmatics (chapters 17-21) Applications (chapters 22-25)
Draft Syllabus Introduction Language Modeling Part-of-Speech Tagging Hidden Markov Models Formal Grammars of English Syntactic Parsing Statistical Parsing Features and Unification Dependency Parsing The Representation of Meaning Computational Semantics Lexical Semantics Computational Lexical Semantics Computational Discourse Information Extraction Question Answering and Summarization Dialog and Conversational Agents Machine Translation Sentiment and Subjectivity Analysis Vector Semantics Deep Learning for NLP
Grading Assignments 4 programming projects (60%) Midterm (15%) Final (20%) Class participation (5%)
Programming Projects Language Modeling and Part of Speech Tagging Dependency Parsing Vector Semantics for Word Sense Disambiguation Machine Translation
More Sample Projects Noun phrase parser Paraphrase identification Question answering NL access to databases Named entity tagging Rhetorical parsing Anaphora resolution Document and sentence alignment Using bioinformatics methods Information extraction Speech processing Sentence normalization Text summarization Sentence compression Definition extraction Crossword puzzle generation Prepositional phrase attachment Machine translation Generation Semi-structured document parsing Semantic analysis of short queries User-friendly summarization Number classification Time-dependent fact extraction
Courses at Other Places Brick-and-Mortar Johns Hopkins University (Jason Eisner) Cornell University (Lillian Lee) Stanford University (Chris Manning, Dan Jurafsky, Richard Socher) U. Maryland (Hal Daumé) Berkeley (Dan Klein) U. Texas (Ray Mooney) Coursera Manning/Jurafsky (2012, survey) Michael Collins (2013, more advanced)
The Association for Computational Linguistics (ACL) www.aclweb.org
The Alphabet Soup NLP (Natural Language Processing) CL (Computational Linguistics) IR (Information Retrieval) SP (Speech Processing) HLT (Human Language Technology) NLE (Natural Language Engineering) ML (Machine Learning)
Research in NLP Conferences: Journals: University centers: ACL/NAACL, EMNLP, SIGIR, AAAI/IJCAI, Coling, HLT, EACL/NAACL, AMTA/MT Summit, ICSLP/Eurospeech Journals: Computational Linguistics, TACL, Natural Language Engineering, Information Retrieval, Information Processing and Management, ACM Transactions on Information Systems, ACM TALIP, ACM TSLP University centers: Berkeley, Columbia, Stanford, CMU, JHU, Brown, UMass, MIT, UPenn, USC/ISI, Illinois, Michigan, UW, Maryland, etc. Toronto, Edinburgh, Cambridge, Sheffield, Saarland, Trento, Prague, QCRI, NUS, and many others Industrial research sites: Google, MSR, Yahoo!, FB, IBM, SRI, BBN, MITRE, AT&T Labs The ACL Anthology http://www.aclweb.org/anthology The ACL Anthology Network (AAN) http://clair.eecs.umich.edu/aan/index.php
NLP