Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

Slides:



Advertisements
Similar presentations
Special Topics in Computer Science Advanced Topics in Information Retrieval Lecture 10: Natural Language Processing and IR. Syntax and structural disambiguation.
Advertisements

GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Introduction to Computational Linguistics
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
UNIT-III By Mr. M. V. Nikum (B.E.I.T). Programming Language Lexical and Syntactic features of a programming Language are specified by its grammar Language:-
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
References Kempen, Gerard & Harbusch, Karin (2002). Performance Grammar: A declarative definition. In: Nijholt, Anton, Theune, Mariët & Hondorp, Hendri.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.
Computational Grammars Azadeh Maghsoodi. History Before First 20s 20s World War II Last 1950s Nowadays.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved Digital Enterprise Research Institute Ontologies & Natural Language.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Language Technology 2005/06 Hans Uszkoreit Universität des Saarlandes
German Research Center for Artificial Intelligence GmbH HANS USZKOREIT 2005 LST FOUNDATIONS COURSE  2005/06 FOUNDATIONS OF LANGUAGE SCIENCE AND TECHNOLOGY.
Introduction to NLP.
Martin KayCL Introduction1 Martin Kay Stanford University Ling 138/238.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
1 The BT Digital Library A case study in intelligent content management Paul Warren
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Overview of the Course Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Natural Language Processing Rogelio Dávila Pérez Profesor – Investigador
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014.
Language Technology I © 2005 Hans Uszkoreit Language Technology I 2005/06 Hans Uszkoreit Universität des Saarlandes and German Research Center for Artificial.
Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes
Research Topics CSC Parallel Computing & Compilers CSC 3990.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture 1: Overview
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
Head-driven Phrase Structure Grammar (HPSG)
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
Computational Linguistics. The Subject Computational Linguistics is a branch of linguistics that concerns with the statistical and rule-based natural.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Natural Language Programming David Vadas The University of Sydney Supervisor: James Curran.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
Translingual Information Management Stephan Busemann Language Technology Lab German Research Center for Artificial Intelligence.
© 2005 Hans Uszkoreit FLST WS 05/06 FLST Grammars and Parsing Hans Uszkoreit.
Supertagging CMSC Natural Language Processing January 31, 2006.
Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU
Linguistics as a Model for the Cognitive Approaches in Biblical Studies Tamás Biró SBL, London, 4 July 2011.
The Unreasonable Effectiveness of Data
College of Computer Science, SCU Computer English Lecture 1 Computer Science Yang Ning 1/46.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
PRESENTED BY: PEAR A BHUIYAN
课程名 编译原理 Compiling Techniques
PRESENTATION: GROUP # 5 Roll No: 14,17,25,36,37 TOPIC: STATISTICAL PARSING AND HIDDEN MARKOV MODEL.
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken The Rôle of Linguistics for the Future of Language Processing

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit The development of linguistics Linguistics and the computer The relevance of CL for theoretical linguistics The role of linguistics for language technology Current trends and outlook The development of linguistics Linguistics and the computer The relevance of CL for theoretical linguistics The role of linguistics for language technology Current trends and outlook Outline

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit Data-Gathering and Maintenance automatic handling of large volumes of data Scientific Computing data and model visualization data exploitation, simulation modelling Electronic scientific information data on research (centers, people, resources, projects, literature) Electronic scientific content reports, articles, books, e-journals, e-print archives Data-Gathering and Maintenance automatic handling of large volumes of data Scientific Computing data and model visualization data exploitation, simulation modelling Electronic scientific information data on research (centers, people, resources, projects, literature) Electronic scientific content reports, articles, books, e-journals, e-print archives IT in Science

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit Development of Linguistics first half of 20th century: linguistics becomes concrete structuralist linguistics - ontological concepts (entities and structures) second half of 20th century: linguistics becomes formal generative linguistics - formalisms for syntax and semantics first half of 21st century: linguistics becomes empirical empirical linguistics - quantitative models - graded grammaticality

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit The Rôle of Computation formalization led to highly complex systems of formal rules, principles or constraints that cannot be tested, validated and modified without sophisticated information processing language data of sufficient size cannot be gathered, searched, and maintained anymore without powerful computing

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit Empirical Linguistics discrete findings statistical findings replicability shared interpretations of data connection with data and results

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit E MPIRICAL L INGUISTICS corpus data experimental psycholinguistic data introspective data DB of relevant data research

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit Driving Forces of CL Cognition models of human language processing Cognition models of human language processing Engineering language technology applicationsEngineering applications Linguistics linguistic theory Linguistics

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit Role of Computing in Linguistics theoretical linguistics applied linguistics linguistics w/o the computer linguistics with the computer

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit Until 1980 Linguistics Computational Linguistics

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit Linguistics Computational Linguistics

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit Linguistics Computational Linguistics

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit LT M ETHODS discrete non-discrete hybrid shallow deep HMM-based POS Tagger

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit discrete non-discrete hybrid shallow deep HPSG-Parser with MRS LT M ETHODS

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit discrete non-discrete hybrid shallow deep PCF Parser LT M ETHODS

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit discrete non-discrete hybrid shallow deep syntactic LFG parser with ME selection LT M ETHODS

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit discrete non-discrete hybrid shallow deep LT M ETHODS (Trends)

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit Simulation and Modelling N NP A NDetV VP NP S Sue gave Paul an old penny. NP

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit Sue gab Paul einen alten Pfennig. NP NA NDetV S/NP NP S N A NDetV VP NP S Sue gave Paul an old penny. NP x[(old'(penny')) (x) Past(give'(sue, paul, x)))]

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit A PPLICATIONS Machine Translation e.g. Systran, Logos, M ETAL -Comprendium, IBM PT Access to Databases e.g. Core Language Engine New: Information Extraction and Text Enrichment e.g. W HITEBOARD, D EEP T HOUGH Machine Translation e.g. Systran, Logos, M ETAL -Comprendium, IBM PT Access to Databases e.g. Core Language Engine New: Information Extraction and Text Enrichment e.g. W HITEBOARD, D EEP T HOUGH

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit O NCE U PON A T IME Broad industrial research in deep parsing Xerox - LFG Siemens - LFG IBM Germany - HPSG Hewlett Packard - GPSG and HPSG IBM USA - PLNLP and Slot Grammar Very large projects EUROTRA LILOG LS-GRAM Broad industrial research in deep parsing Xerox - LFG Siemens - LFG IBM Germany - HPSG Hewlett Packard - GPSG and HPSG IBM USA - PLNLP and Slot Grammar Very large projects EUROTRA LILOG LS-GRAM

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit G RAMMAR F RAMEWORKS Head-Driven Phrase Structure Grammar (HPSG) Lexical Functional Grammar (LFG) Tree-Adjunction Grammar (TAG) Categorial Grammar (CG) Dependency Grammar (DG) GB-Minimalist Program Head-Driven Phrase Structure Grammar (HPSG) Lexical Functional Grammar (LFG) Tree-Adjunction Grammar (TAG) Categorial Grammar (CG) Dependency Grammar (DG) GB-Minimalist Program

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit Problems with Deep Analysis Coverage (Development Time) Robustness (Coping with Out-of-Grammar Input) Efficiency (Runtime and Space Efficiency) Specificity (Selection among Readings) Coverage (Development Time) Robustness (Coping with Out-of-Grammar Input) Efficiency (Runtime and Space Efficiency) Specificity (Selection among Readings)

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit R EAL G RAMMARS LinGO - English Resource Grammar types lines of code lexemes average feature structure > 300 nodes German Grammar of equal size Japanese grammar is still smaller LinGO - English Resource Grammar types lines of code lexemes average feature structure > 300 nodes German Grammar of equal size Japanese grammar is still smaller

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit Future of Linguistics Combination of discrete and nondiscrete methods All in one integrated system (such as a blackboard or manager architecture) Separate systems annotating the same input with different control schemes (whiteboard or pool architecture) Combination of discrete and nondiscrete methods All in one integrated system (such as a blackboard or manager architecture) Separate systems annotating the same input with different control schemes (whiteboard or pool architecture)

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit Outlook Linguistics will develop hybrid discrete and nondiscrete models of language More subareas of linguistics will employ computational modelling Computational linguistics will play a central role in the emprirical branch of linguistic research Computational linguistics methods and results do have a future in language technology Language technology will have to get more deeply into semantics The field provides some grand challenges Linguistics will develop hybrid discrete and nondiscrete models of language More subareas of linguistics will employ computational modelling Computational linguistics will play a central role in the emprirical branch of linguistic research Computational linguistics methods and results do have a future in language technology Language technology will have to get more deeply into semantics The field provides some grand challenges

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit Grand Challenges hybrid models of language processing and learning, models of language change empirical methodology of language science: large multilevel linguistically interpreted data collections ambient computing -- ubiquitous natural access to information and assistance turning the WWW as well as personal and collective digital infor- mation repositories into digital memories and knowledge bases hybrid models of language processing and learning, models of language change empirical methodology of language science: large multilevel linguistically interpreted data collections ambient computing -- ubiquitous natural access to information and assistance turning the WWW as well as personal and collective digital infor- mation repositories into digital memories and knowledge bases