Language Technology I © 2005 Hans Uszkoreit Language Technology I 2005/06 Hans Uszkoreit Universität des Saarlandes and German Research Center for Artificial.

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

Introduction to Computational Linguistics
Introduction to Computational Linguistics
Chapter 5: Introduction to Information Retrieval
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
CSE111: Great Ideas in Computer Science Dr. Carl Alphonce 219 Bell Hall Office hours: M-F 11:00-11:
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
Research on Intelligent Information Systems Himanshu Gupta Michael Kifer Annie Liu C.R. Ramakrishnan I.V. Ramakrishnan Amanda Stent David Warren Anita.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Chapter 12: Intelligent Systems in Business
1/23 Applications of NLP. 2/23 Applications Text-to-speech, speech-to-text Dialogues sytems / conversation machines NL interfaces to –QA systems –IR systems.
Science and Engineering Practices
Overview of Search Engines
Natural Language Processing Ellen Back, LIS489, Spring 2015.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Language Technology 2005/06 Hans Uszkoreit Universität des Saarlandes
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Introduction to NLP.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
1 Computational Linguistics Ling 200 Spring 2006.
Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Decision Support Systems Chapter 10.
Chapter 10 Language and Computer English Linguistics: An Introduction.
Introduction to CL & NLP CMSC April 1, 2003.
Research Topics CSC Parallel Computing & Compilers CSC 3990.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
NLP ? Natural Language is one of fundamental aspects of human behaviors. One of the final aim of human-computer communication. Provide easy interaction.
Major Disciplines in Computer Science Ken Nguyen Department of Information Technology Clayton State University.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Introduction to Computational Linguistics
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Translingual Information Management Stephan Busemann Language Technology Lab German Research Center for Artificial Intelligence.
Contact : Bernadette Bouchon-Meunier, Patrick Gallinari, Jean-Gabriel Ganascia LIP6, UPMC, 8 rue du Capitaine Scott, Paris, France
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
NATURAL LANGUAGE PROCESSING Zachary McNellis. Overview  Background  Areas of NLP  How it works?  Future of NLP  References.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Basics of Natural Language Processing Introduction to Computational Linguistics.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
INTRODUCTION TO APPLIED LINGUISTICS
Information Retrieval and Web Search
Basque language: is IT right on?
Information Retrieval and Web Search
CSE 635 Multimedia Information Retrieval
Natural Language Processing
Lecture 8 Information Retrieval Introduction
Artificial Intelligence 2004 Speech & Natural Language Processing
Information Retrieval
European Masters Program Language & Communication Technologies
Presentation transcript:

Language Technology I © 2005 Hans Uszkoreit Language Technology I 2005/06 Hans Uszkoreit Universität des Saarlandes and German Research Center for Artificial Intelligence (DFKI)

Language Technology I © 2005 Hans Uszkoreit Overview  What is Language Technology  Some Selected Technologies  Some Selected Applications  Information Extraction  Cross-Linguistic Information Retrieval  Management  Language Checking

© 2000 Hans Uszkoreit CL Motivations engineering cognition linguistics

© 2000 Hans Uszkoreit Motivationen modells of grammar languagetechnologyapplications models of human language processing engineering cognition linguistics

Language Technology I © 2005 Hans Uszkoreit What is a Technology Technology: methods and techniques that together enable some application. In real life usage of the word there is a continuum between methods and applications. method/techniquefinite state transduction component technology tokenizer technology named entity recognition high precision text indexing applicationconcept based search engine

Language Technology I © 2005 Hans Uszkoreit Types of Technologies Communication partners: humans and machines (technology), humans and humans humans and infostructure Modes and media for input and output: text, speech, pictures, gestures Synchronicity: synchronous vs. asynchronous Situatedness: sensitivity to context, location, time, plans Type of linguality: monolingual, multilingual, translingual Type of processing: Categorization, summarization, extraction, understanding, translating, responding Level of linguistic description: phonology, morphology, syntax, semantics,pragmatics

Language Technology I © 2005 Hans Uszkoreit speech technologies text technologies knowledge technologies multimedia & multimodality technologies language technologies Language Technologies

Language Technology I © 2005 Hans Uszkoreit Language Technologies L ANGUAGE T ECHNOLOGIES

Language Technology I © 2005 Hans Uszkoreit Language Technologies Text Technologies L ANGUAGE T ECHNOLOGIES

Language Technology I © 2005 Hans Uszkoreit L ANGUAGE T ECHNOLOGIES Language Technologies Speech TechnologiesText Technologies

Language Technology I © 2005 Hans Uszkoreit Language Technologies Speech TechnologiesText Technologies gathering indexing categorization clustering summarization L ANGUAGE T ECHNOLOGIES

Language Technology I © 2005 Hans Uszkoreit Language Technologies Speech TechnologiesText Technologies text understanding text translation information extraction report generation L ANGUAGE T ECHNOLOGIES

Language Technology I © 2005 Hans Uszkoreit Language Technologies Speech TechnologiesText Technologies Voice Recognition Speech Verification Speech Recognition Voice Modelling Speech Synthesis Speaker Identification Language Indentification L ANGUAGE T ECHNOLOGIES

Language Technology I © 2005 Hans Uszkoreit Language Technologies Speech TechnologiesText Technologies Speech Generation Speech Unterstanding Spoken Dialogue Systems Speech Translation Systems L ANGUAGE T ECHNOLOGIES

Language Technology I © 2005 Hans Uszkoreit Language Technologies Speech TechnologiesText Technologies language understanding language generation dialogue modelling machine translation L ANGUAGE T ECHNOLOGIES

Language Technology I © 2005 Hans Uszkoreit Speech recognition Spoken language is recognized and transformed in into text as in dictation systems, into commands as in robot control systems, or into some other internal representation.

Language Technology I © 2005 Hans Uszkoreit Speech Synthesis (also Speech Generation) Utterances in spoken language are produced from text (text-to-speech systems) or from internal representations of words or sentences (concept-to-speech systems)

Language Technology I © 2005 Hans Uszkoreit Text Categorization This technology assigns texts to categories. Texts may belong to more than one category, categories may contain other categories. Filtering is a special case of categorization with just two categories.

Language Technology I © 2005 Hans Uszkoreit Text Summarization The most relevant portions of a text are extracted as a summary. The task depends on the needed lengths of the summaries. Summarization is harder if the summary has to be specific to a certain query.

Language Technology I © 2005 Hans Uszkoreit Text Indexing As a precondition for document retrieval, texts are are stored in an indexed database. Usually a text is indexed for all word forms or – after lemmatization – for all lemmas. Sometimes indexing is combined with categorization and summarization.

Language Technology I © 2005 Hans Uszkoreit Text Retrieval Texts are retrieved from a database that best match a given query or document. The candidate documents are ordered with respect to their expected relevance. Indexing, categorization, summarization and retrieval are often subsumed under the term information retrieval.

Language Technology I © 2005 Hans Uszkoreit Information Extraction Relevant information pieces of information are discovered and marked for extraction. The extracted pieces can be: the topic, named entities such as company, place or person names, simple relations such as prices, desti- nations, functions etc. or complex relations describing accidents, company mergers or football matches.

Language Technology I © 2005 Hans Uszkoreit Data Fusion and Text Data Mining Extracted pieces of information from several sources are combined in one database. Previously undetected relationships may be discovered.

Language Technology I © 2005 Hans Uszkoreit Question Answering Question Answering Natural language queries are used to access information in a database. The database may be a base of structured data or a repository of digital texts in which certain parts have been marked as potential answers.

Language Technology I © 2005 Hans Uszkoreit Report Generation A report in natural language is produced that describes the essential contents or changes of a database. The report can contain accumulated numbers, maxima, minima and the most drastic changes.

Language Technology I © 2005 Hans Uszkoreit Spoken Dialogue Systems The system can carry out a dialogue with a human user in which the user can solicit information or conduct purchases, reservations or other transactions.

Language Technology I © 2005 Hans Uszkoreit Translation Technologies Technologies that translate texts or assist human trans- lators. Automatic translation is called machine translation. Translation memories use large amounts of texts together with existing translations for efficient look-up of possible translations for words, phrases and sentences.

Language Technology I © 2005 Hans Uszkoreit Formal and Computational Methods Generic CS Methods Programming languages, algorithms for generic data types, and software engineering methods for structuring and organizing software development and quality assurance. Specialized Algorithms Dedicated algorithms have been designed for parsing, generation and translation, for morphological and syntactic processing with finite state automata/transducers and many other tasks. Nondiscrete Mathematical Methods Statistical techniques have become especially successful in speech processing, information retrieval, and the automatic acquisition of language models. Other methods in this class are neural networks and powerful techniques for optimization and search.

Language Technology I © 2005 Hans Uszkoreit Linguistic Methods and Resources Logical and Linguistic Formalisms For deep linguistic processing, constraint based grammar formalisms are employed. Complex formalisms have been developed for the representation of semantic content and knowledge. Linguistic Knowledge Linguistic knowledge resources for many languages are utilized: dictionaries, morphological and syntactic grammars, rules for semantic interpretation, pronunciation and intonation. Corpora and Corpus Tools Large collections of application-specific or generic collections of spoken and written language are exploited for the acquisition and testing of statistical or rule-based language models.

Language Technology I © 2005 Hans Uszkoreit Methods from Cognitive Science (Psychology) Models of Cognitive Systems and their Components The interaction of perception, knowledge, reasoning and action including communication is modelled in cognitive psychology. Such models can be consulted or employed for the design of language processing systems. Formalized models of components such as memory, reasoning and auditive perception are also often utilized for models of language processing. Empirical methods fromn Experimental Psychology Since cognitive psychology investigates the intelligent behavior of human organisms, many methods have been developed for the observation and empirical analysis of language production and comprehension. Such methods can be extremely useful for building computer models of human language processing (Examples: "Wizard of Oz Experiments" and measurements of syntactic and semantic processing complexity.

Language Technology I © 2005 Hans Uszkoreit Maturity of Speech Technologies Voice Control Systems Dictation Systems Text-to-Speech Systems Machine Initiative Spoken Dialogue Systems Identification and Verification Systems Spoken Information Access Mixed Initiative Spoken Dialogue Systems Speech Translation Systems Deployed. On the market Mature or close to maturity Research prototypes in R&D

Language Technology I © 2005 Hans Uszkoreit Maturity of Text Technologies Spell Checkers Machine-Assisted Human Translation Translation Memories Indicative Machine Translation Grammar Checkers Information Extraction Human Assisted Machine Translation Report Generation High Quality Text Translation Text Generation Systems Deployed. On the market Mature or close to maturity Research prototypes in R&D

Language Technology I © 2005 Hans Uszkoreit Maturity of IM Technologies Word-Based Information Retrieval Summarization by Simple Condensation Simple Statistical Categorization Simple Automatic Hyperlinking Cross-Lingual Information Retrieval Automatic Hyperlinking With Disambiguation Simple Information Extraction (Unary, Binary Relations) Complex Information Extraction (Ternary+ Relations) Dense Associative Hyperlinking Concept-Based Information Retrieval Text Understanding Deployed. On the market Mature or close to maturity Research prototypes in R&D

Language Technology I © 2005 Hans Uszkoreit M EGA T RENDS ambient computing ubiquitous computing situated computing pervasive computing disappearing computers global infostructure collective memory collective knowledge learning organizations meta-knowledge repositories personalization adaptation learning ubiquitous access