8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
Intelligent Information Retrieval CS 336 –Lecture 2: Query Language Xiaoyan Li Spring 2006 Modified from Lisa Ballesteros’s slides.
ANLE1 CC 437: Advanced Natural Language Engineering ASSIGNMENT 2: Implementing a query expansion component for a Web Search Engine.
Automatic Web Page Categorization by Link and Context Analysis Giuseppe Attardi Antonio Gulli Fabrizio Sebastiani.
KnowItNow: Fast, Scalable Information Extraction from the Web Michael J. Cafarella, Doug Downey, Stephen Soderland, Oren Etzioni.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
BİL744 Derleyici Gerçekleştirimi (Compiler Design)1.
Overview of Search Engines
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Information Retrieval – and projects we have done. Group Members: Aditya Tiwari ( ) Harshit Mittal ( ) Rohit Kumar Saraf ( ) Vinay.
Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
Survey of Semantic Annotation Platforms
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שבע Partial Parsing אורן גליקמן.
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Supertagging CMSC Natural Language Processing January 31, 2006.
Text segmentation Amany AlKhayat. Before any real processing is done, text needs to be segmented at least into linguistic units such as words, punctuation,
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Natural Language Processing (NLP)
CS416 Compiler Design lec00-outline September 19, 2018
Introduction CI612 Compiler Design CI612 Compiler Design.
Chunk Parsing CS1573: AI Application Development, Spring 2003
Introduction to Information Retrieval
CS416 Compiler Design lec00-outline February 23, 2019
CS246: Information Retrieval
Natural Language Processing (NLP)
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Information Retrieval and Web Design
Information Retrieval
Natural Language Processing (NLP)
Presentation transcript:

8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar

SuperTagging: Applications Information Filtering: SuperTagging used to increase retrieval precision Text Simplification: SuperTagging used to induce rules for text simplification Word Sense Disambiguation Machine Translation Information Extraction Noun Phrase Chunking

Glean: Document Filtering Problem: to access only relevant information Current approaches: –Information retrieval (IR) systems use keywords, boolean operators etc. Problems due to synonymy and polysemy –Most Web search engines tend to maximize recall (coverage) emphasize speed of retrieval but sacrifice precision (`accuracy’ of result) Our approach: Use syntactic information to increase precision.

Glean: The Basics Underlying ideas: meaning of a word decided by how it is used much information latent in text good to use post-processing filter model Use SuperTagging to get syntactic labeling Part-of-Speech tags are not as useful [RIAO ‘97]RIAO ‘97

Glean: Architecture

Glean: Query by Example Input: Search Engine Query Expression +work +IRCS +”natural language processing” +learning Concept/word of Interest work Prototypical usage: She has been working on problems related to aspect. He works in the area of Information Retrieval. She works on statistical mechanics. Recently he has been working in the area of quantum computing. Interpretation: get all documents satisfying the query expression, check if they contain sentences with a variant of work, check that these are `relevant’, i.e. structurally similar to the context around work in the prototypical sentences.

Glean: Inducing a Pattern Prototypical usage: She has been working on problems related to aspect. Chunked, supertagged version: She/A_NXN has/B_Vvx~been/B_Vvx~working/A_nx0V on/B_vxPnx problems/A_NXN related/A_nx1V to/B_vxPnx aspect/A_NXN./B_sPU Context around word of interest: She/A_NXN has/B_Vvx~been/B_Vvx~working/A_nx0V on/B_vxPnx … Generalized pattern: */NP *working/A_nx0V* on/B_vxPnx This pattern also matches, for example: “We are also working on type systems for data and knowledge bases.”

The Glean system Implemented (mainly in PERL) with HTML Form-interfaces, with a variety of options a SuperTagger server Results 97 % recall and 88 % precision in filtering out irrelevant material in a small test. Large scale evaluation in progress. Demo available Research collaboration between the National Centre for Software Technology, Bombay, Institute for Research in Cognitive Science & Center for Advanced Study of India, University of Pennsylvania. Demo

SuperTagging: Benefits Right level of granularity Rich tag set, suitable for a variety of applications Accurate: over 92% accuracy Fast: words/sec (interpreted PERL) Can be easily retrained, if required Many more applications possible

Automatic Text Simplification Basic Idea: To process complex text –create better tools or –simplify the text to be processed! Initial Prototype of Simplification System (Bombay) –Based on Finite State Grammars –Rules on strings to map complex sentences to simpler ones To simplify sentences of the form: Talwinder Singh, who masterminded the Air India sabotage, was killed in a shoot-out with police... we use a rule such as: Segment1/NP, who Segment2, Segment3 => Segment1 Segment3. Segment1 Segment2. to get : Talwinder Singh was killed in a shoot-out with police…. Talwinder Singh masterminded the Air India sabotage.

Automatic Text Simplification SuperTagging is better [Coling96][Coling96] –Constituent spans easier to identify –Simplification rules more expressive Rules can now be induced automatically [KBCS96, KBS]KBCS96KBS –Data: Parallel (aligned) corpus of complex and simple text –Induction Procedure: Data tagged using SuperTagging and LDA Aligned labeled trees for complex & simple trees compared Tree-to-trees transformations identified Reduced to a normal form to get simplification rules.

Noun-Phrase Chunking Variety of approaches (Hindle, Marcus & Ramshaw, Voutilainen) for Noun-Phrase Chunking Depending on application, we may need –maximal noun phrases –basal noun phrases –all derivable noun phrases SuperTagging provides mechanisms for application-specific noun phrase chunking Can form part of (or basis for) a variety of tools