Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Empirical and Data-Driven Models of Multimodality Advanced Methods for Multimodal Communication Computational Models of Multimodality Adequate.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
4.1 Blended approaches: Information Engineering IMS Information Systems Development Practices.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Information Retrieval in Practice
SOAPI: a flexible toolkit for implementing ingest and preservation workflows Mark Hedges Centre for e-Research, King’s College London Arts and Humanities.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
1 CS 502: Computing Methods for Digital Libraries Lecture 20 Multimedia digital libraries.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
AceMedia Personal content management in a mobile environment Jonathan Teh Motorola Labs.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Lecture Nine Database Planning, Design, and Administration
Overview of Search Engines
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Cluj Napoca, 28 August IEEE International Conference on Intelligent Computer Communication and Processing Digital Libraries Workshop Towards.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
A summary of the report written by W. Alink, R.A.F. Bhoedjang, P.A. Boncz, and A.P. de Vries.
Course Revision Contents  Compilers  Compilers Vs Interpreters  Structure of Compiler  Compilation Phases  Compiler Construction Tools  A Simple.
Conversational Applications Workshop Introduction Jim Larson.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
Information Extraction From Medical Records by Alexander Barsky.
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
ATLAS Demystified: A Practical Introduction Christophe Laprun, Jonathan Fiscus, John Garofolo, Sylvain Pajot National Institute of Standards and Technology.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
A Flexible and Extensible Architecture for Linguistic Annotation Steven Bird *, David Day †, John Garofolo ‡, John Henderson †, Christophe Laprun ‡ and.
CST320 - Lec 11 Why study compilers? n n Ties lots of things you know together: –Theory (finite automata, grammars) –Data structures –Modularization –Utilization.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
Collaborative Annotation of the AMI Meeting Corpus Jean Carletta University of Edinburgh.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Architecture of Decision Support System
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Human Communication Research Centre Universities of Edinburgh, Glasgow John Lee (Deputy Director, Edinburgh)
LaHave House Project 1 LaHave House Project Automated Architectural Design BML + ARC.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
1 Advanced Software Architecture Muhammad Bilal Bashir PhD Scholar (Computer Science) Mohammad Ali Jinnah University.
Reviews Crawler (Detection, Extraction & Analysis) FOSS Practicum By: Syed Ahmed & Rakhi Gupta April 28, 2010.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Standards for representing meeting metadata and annotations in meeting databases Standards for representing meeting metadata and annotations in meeting.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
D4Science and ETICS Building and Testing gCube and gCore Pedro Andrade CERN EGEE’08 Conference 25 September 2008 Istanbul (Turkey)
CS223: Software Engineering
SEESCOASEESCOA SEESCOA Meeting Activities of LUC 9 May 2003.
Dr. Mohamed Ramadan Saady 314ALL CH1.1 Chapter 1: Introduction to Compiling.
June 3-6, 2003E-Society Lisbon Automatic Metadata Discovery from Non-cooperative Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science.
W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
Information Retrieval in Practice
Advanced Computer Systems
PRINCIPLES OF COMPILER DESIGN
Chapter 1 Introduction.
Search Engine Architecture
Chapter 1 Introduction.
课程名 编译原理 Compiling Techniques
Dr. Bhavani Thuraisingham The University of Texas at Dallas
Chaitali Gupta, Madhusudhan Govindaraju
Software Architecture & Design
Presentation transcript:

Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania

Sponsorship Natural Language Engineering: Integrating Parallel and Parametric Processing Victorian Partnership for Advanced Computing Expertise Grant EPPNME

NLE Application Areas  Information Extraction  Information Retrieval  Authoring Tools  Language Analysis  Language Understanding  Knowledge Representation  Knowledge Discovery  Spoken Language Input  Written Language Input  Natural Language Generation  Spoken Output  Multilinguality  Multimodality  Discourse and Dialogue  Spoken dialogue systems  Cross-language information retrieval  Word-sense disambiguation  Multi-document summarisation  Natural language database interfaces

Some NLE Applications in detail Information extraction from broadcast news  Tokenization, alignment, entity detection, coreference resolution, semantic mapping Spoken language dialogue systems (SLDS)  Speech recognition, parsing, user modelling, discourse management, generation, synthesis Language analysis  Interlinear text annotation, lexicon development, morphosyntactic grammar development

Meta Activities Discovery  What tools work with data in format X?  What lexical resources exist for language Y? Reuse  Diverse implementation frameworks  Component integration, wrapping, etc Training and evaluation  Parametric and parallel processing  Comparing systems running on the same data  Gold standard vs theory comparison  Analyzing interaction logs

Learn about NLE This department hosts a mirror of the ACL digital anthology 50k pages, 40 years

SLDS Architecture

SLDS Components

Another SLDS Architecture

Observations Common components, different arrangements  Multiple components for doing the same task Most NLE components convert between information types  Parser: from strings to trees  ASR: from speech to text  Summariser: from text to selected text But:  Many processes benefit from other information sources (e.g. exploiting intonation in input)  Input and output can be aligned  Solution: multilayer annotations

Multilayer annotations

Multilayer Annotations

Annotation Graphs Labelled digraphs with timestamped nodes

Annotation Graphs: complex example AGTK: Annotation Graph Toolkit  library, applications  agtk.sourceforge.net

NLE and Grids NLE Applications  typically constructed out of numerous components  each component responsible for a specialised task  executed against large data sets To use grids in NLE:  subscribe to a model which allows automated discovery of data and components  flexible design of applications, coordination of execution, storage of results Ideally:  view grid as a commodity, hidden from application developers

Architectural Components Data  Language resources for analysis  E.g. Switchboard, 2400 annotated telephone conversations (26 CDs) Software Components  minimal individual functional units e.g. Annotation Server, Alignment, ASR, Data Source Packaging, Format Conversion, Text Annotation, Lexicon Server, Semantic Mapping  common interface specification Metadata Repositories  Dublin Core Application Profile for NLE resources Application  data + components + processing instructions  declarative specification in XML Grid Service  computational and storage resources for application execution

Architecture

Conclusion Natural Language Engineering  interesting test case for grid services  many mature component technologies  applications that are both data and processor intensive  applications for building the multilingual information society of the future...