Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Published byModified over 5 years ago
Presentation on theme: "Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming."— Presentation transcript:
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming together, looking at speech and language processing from different perspectives. –Computational Linguistics (Linguistics) –Natural Language Processing (Computer Science) –Speech Recognition (Electrical Engineering) –Computational Psycholinguistics (Psychology)
Different Levels of Speech and Language Processing Phonetics and Phonology – The study of sounds in language Morphology – The study of components of words Syntax – The study of structural relationships between words Semantics – The study of meaning Pragmatics – The study of use of language for accomplishing goals Discourse – The study of large linguistic units
Ambiguity in Language Almost in every level ambiguity is introduced, and one of the main tasks in NLP is to resolve such ambiguities. I made her duck = I cooked waterfowl for her. I cooked waterfowl belonging to her. I created the (plastic?) duck she owns. I caused her to quickly lower her body. I waved my magic wand and turned her into a waterfowl. Time flies like an arrow vs. Fruit flies like a banana
Models and Algorithms for NLP Taken mainly from Computer Science, Mathematics and Linguistics –State Machines and Automata: Finite-state automata & transducers, weighted automata, Markov models… –Formal Rule Systems: Regular grammars, CFGs, Unification Grammars… –Logic: First-order Calculus, Predicate Logic… –Probability Theory: Statistical Processing, Machine Learning…
The Turing Test Alan Turing (1950): Empirical test for Artificial Intelligence. A human interrogator asks questions to a human and to a machine through a teletype, and tries to find out who is the human and who is the machine. Q: Please write me a sonet on the topic of the Fouth Bridge. A: Count me out on this one. I never could write poetry. Q: Add 34957 to 70764. A: (Pause for 30 seconds) 105621.
ELIZA Weizenbaum (1966): Program imitating the responses of a psychotherapist. User: You are like my father in some ways. ELIZA: What resemblance do you see? User: You are not very aggresive but I think you don’t want me to notice that. ELIZA: What makes you think I am not very aggressive? User: You don’t argue with me. ELIZA: Why do you think I don’t argue with you? –Used simple pattern matching, without any deeper knowledge of the world or of the conversation. –http://www-ai.ijs.si/cgi-bin/eliza/eliza_script
Foundational Insights: 1940s and 1950s Automata. –Based of Turing’s computational model. –Led to formal language theory (Chomsky). Probabilistic – Information Theoretic Models. –Transmission of language and communication treated as a noisy channel and decoding problem. –First machine speech recognizers (1952).
Two Camps: 1957-1970 Symbolic vs. Stochastic Paradigm. Symbolic –Formal language theory, generative syntax (Chomsky) –Implementation of first parsers –Artificial Intelligence Stochastic –Bayesian Methods Optical Character Recognition Authorship Identification
Four Paradigms: 1970-1983 Stochastic Paradigm –Speech Recognition Algorithms (Hidden Markov Models) Logic-Based Paradigm –Work that led to Prolog, Functional Grammars and Unification Natural Language Understanding –SHRDLU –Question-answering Systems Discourse Modeling –Automatic Reference Resolution
Empiricism and Finite-State Models: 1983-1993 Return of Empiricism and Finite State Methods. –Not so popular in the previous decades. Finite-state models: –Phonology and morphology –Syntax Probabilistic models: –Speech recognition –Part of speech tagging –Probabilistic parsing
The Field Comes Together: 1994- Spread of probabilistic and data-driven methods to all kinds of problems. Increase in computer speed led to commercial exploitation of speech and language technologies. The web led to emphasis on information retrieval and extraction. Some lessened emphasis on theoretical work
Practical Application Areas Information-accessing Systems –Database queries –Information Retrieval –Information Extraction Task-oriented Systems –Text-editors –Robots Educational Systems –Intelligent Tutoring –Student Modelling Translation Systems –Machine Translation –Computer-aided translation
Practical Application Areas System Modality Text Speech Multi-modal applications System Initiatives Analysis Generation
Current Research http://cslu.cse.ogi.edu/HLTsurvey/HLTsurvey.html Spoken Language Input Written Language Input Language Analysis and Understanding Language Generation Spoken Output Technologies Discourse and Dialogue Document Processing Multilinguality Multimodality Transmission and Storage Mathematical Methods Language Resources Evaluation
Course Topics Computational Morphology Regular Grammars, Finite-state Automata and Transducers Corpus Linguistics N-Grams, Part-of-speech Tagging Parsing and Context-free Grammars Unification Grammars Lexical Semantics and WordNet Word Sence Disambiguation and Information Retrieval Machine Translation