Presentation on theme: "Oct 2009HLT1 Human Language Technology Overview. Oct 2009HLT2 Acknowledgement Material for some of these slides taken from J Nivre, University of Gotheborg,"— Presentation transcript:
Oct 2009HLT2 Acknowledgement Material for some of these slides taken from J Nivre, University of Gotheborg, Sweden D. Jurafsky & J. Martin
Oct 2009HLT3 Human Language Technology HLT sometimes referred to as Natural Language Processing focus on linguistic processing Computational Linguistics focus on understanding language Language Engineering focus on practical tasks and results
Oct 2009HLT4 HLT – Engineering v. Science Engineering NLP is concerned with the design and implementation of effective NL input and output components for computational systems (Robert Dale 2000) Science The use of computers for linguistic research and applications
Oct 2009HLT5 HLT is Interdisciplinary Linguistics Theoretical Applied Computer Science Algorithms Compiling Techniques Artificial Intelligence Understanding, reasoning Intelligent Action
Oct 2009HLT6 HLT is Commercial Lot’s of exciting stuff going on… Powerset
Oct 2009HLT10 Web Analytics Data-mining of social media weblogs, discussion forums, message boards, user groups, and other forms of user generated media Sentiment analysis, social network analysis Product marketing information Opinion tracking over space and time Social network analysis Buzz analysis (what’s hot, what topics are people talking about right now).
Oct 2009HLT11 HLT can help with Understanding how language works by implementing complex theories directly More Natural Communication development of multimodal M/M communication: language, speech, gesture Development of multilingual applications Knowledge Management Language is the fabric of the web
Oct 2009HLT12 Language Enabled Applications What makes an application a language processing application (as opposed to any other piece of software)? An application that requires the use of knowledge about human languages Example: Is Unix wc (word count) an example of a language processing application?
Oct 2009HLT13 Language Enabled Applications Word count? When it counts words: Yes To count words you need to know what a word is. That’s knowledge of language. When it counts lines and bytes: No Lines and bytes are computer artifacts, not linguistic entities
Oct 2009HLT14 Topics: Applications Small Spelling correction Hyphenation Medium Word-sense disambiguation Named entity recognition Information retrieval Big Question answering Conversational agents Automatic Summarisation Machine translation Stand-alone Enabling applications Funding/Business plans
Oct 2009HLT15 Big Applications These kinds of applications require a tremendous amount of knowledge of language. Consider the following interaction with HAL the computer from 2001: A Space Odyssey
Oct 2009HLT16 HAL from 2001 Dave: Open the pod bay doors, Hal. HAL: I’m sorry Dave, I’m afraid I can’t do that. http://www.youtube.com/watch?v=kkyUM mNl4hk http://www.youtube.com/watch?v=kkyUM mNl4hk
Oct 2009HLT17 What’s needed? Speech recognition and synthesis Knowledge of the English words involved What they mean How groups of words fit together into groups What the groups mean How the groups relate to each other.
Oct 2009HLT18 What’s needed? Dialog It is polite to respond, even if you’re planning to kill someone. It is polite to pretend to want to be cooperative (I’m afraid, I can’t…)
Oct 2009HLT19 Summary of Application Areas Document Processing Classification Summarisation Information Extraction Question Answering Information Retrieval Dialogue Multilinguality Machine Translation Translation tools Multimodality speech intonation image
Oct 2009HLT20 Basic Problems Analysis Conversion of NL input to internal representations Generation Conversion of internal representations to NL output Issues What kind of input/output/representations? Role of learning Supervised v unsupervised What training data is available? System Evaluation
Oct 2009HLT21 Levels of Linguistic Knowledge Phonetics/Phonology: sound structure Morphology: word structure Syntax: sentence structure Semantics: meanings Pragmatics: use of language in context Discourse: paragraphs, texts, dialogues
Oct 2009HLT22 Processing Pipelines Each level of knowledge is associated with an encapsulated set of processes. Interfaces are defined that allow the various levels to communicate. This often leads to a pipeline architecture.
Oct 2009HLT23 Ambiguity Computational linguists are obsessed with ambiguity Ambiguity is a fundamental problem of computational linguistics Resolving ambiguity is a crucial goal Ambiguity arises at different levels of analysis
Oct 2009HLT24 Ambiguity – different flavours Lexical I made her duck Syntactic Young men and women Referential She did it Pragmatic Can you pass the salt?
Oct 2009HLT25 Ambiguity Find at least 5 meanings of this sentence: I made her duck I cooked waterfowl for her benefit (to eat) I cooked waterfowl belonging to her I created the (plaster?) duck she owns I caused her to quickly lower her head or body I waved my magic wand and turned her into undifferentiated waterfowl
Oct 2009HLT26 Ambiguity is Pervasive I made her duck I caused her to quickly lower her head or body Lexical category: “duck” can be a N or V I cooked waterfowl belonging to her. Lexical category: “her” can be a possessive (“of her”) or dative (“for her”) pronoun I made the (plaster) duck statue she owns Lexical semantics: “make” can mean “create” or “cook”
Oct 2009HLT27 Ambiguity is Pervasive Grammar: Make can be: Transitive: (verb has a noun direct object) I cooked [waterfowl belonging to her] Ditransitive: (verb has 2 noun objects) I made [her] (into) [undifferentiated waterfowl] Action-transitive (verb has a direct object and another verb) I caused [her] [to move her body]
Oct 2009HLT28 Ambiguity is Pervasive Phonetics! I mate or duck I’m eight or duck Eye maid; her duck Aye mate, her duck I maid her duck I’m aid her duck I mate her duck I’m ate her duck I’m ate or duck I mate or duck
Oct 2009HLT29 Dealing with Ambiguity Four possible approaches : 1. Tightly coupled interaction among processing levels; knowledge from other levels can help decide among choices at ambiguous levels. 2. Pipeline processing that ignores ambiguity as it occurs and hopes that other levels can eliminate incorrect structures.
Oct 2009HLT30 Dealing with Ambiguity 3. Probabilistic approaches based on making the most likely choices 4. Don’t do anything, maybe it won’t matter 1. We’ll leave when the duck is ready to eat. 2. The duck is ready to eat now. Does the “duck” ambiguity matter with respect to whether we can leave?
Oct 2009HLT31 Ways of Studying NLP By Application MT, IE, IR etc. By Approach rational vs. empirical By Linguistic Level morphology, syntax etc. By Algorithm
Oct 2009HLT32 Algorithms State Machines automata and transducers Rule Systems regular and context free grammars Search top-down/bottom-up parsing Probabilistic algorithms
Oct 2009HLT33 Organisation of Course Module 1: Words Linguistics: Morphological Structure Morphological Processing LAB + Assignment I Module 2: Sentences Linguistics: Syntactic Structure NL Parsing Algorithms LAB + Assignment II Module 3: Texts Statistics Text Classification LAB + Assignment III
Oct 2009HLT34 Course Information Course Website http://staff.um.edu.mt/mros1/hlt Reference Texts D. Jurafsky and J. Martin, Speech and Language Processing, 2 nd Edition, Prentice-Hall S. Bird, E. Klein and E. Loper, Natural Language Processing with Python http://www.nltk.org http://www.nltk.org Thank you