Presentation is loading. Please wait.

Presentation is loading. Please wait.

8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.

Similar presentations


Presentation on theme: "8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar."— Presentation transcript:

1 8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar

2 SuperTagging: Applications Information Filtering: SuperTagging used to increase retrieval precision Text Simplification: SuperTagging used to induce rules for text simplification Word Sense Disambiguation Machine Translation Information Extraction Noun Phrase Chunking

3 Glean: Document Filtering Problem: to access only relevant information Current approaches: –Information retrieval (IR) systems use keywords, boolean operators etc. Problems due to synonymy and polysemy –Most Web search engines tend to maximize recall (coverage) emphasize speed of retrieval but sacrifice precision (`accuracy’ of result) Our approach: Use syntactic information to increase precision.

4 Glean: The Basics Underlying ideas: meaning of a word decided by how it is used much information latent in text good to use post-processing filter model Use SuperTagging to get syntactic labeling Part-of-Speech tags are not as useful [RIAO ‘97]RIAO ‘97

5 Glean: Architecture

6 Glean: Query by Example Input: Search Engine Query Expression +work +IRCS +”natural language processing” +learning Concept/word of Interest work Prototypical usage: She has been working on problems related to aspect. He works in the area of Information Retrieval. She works on statistical mechanics. Recently he has been working in the area of quantum computing. Interpretation: get all documents satisfying the query expression, check if they contain sentences with a variant of work, check that these are `relevant’, i.e. structurally similar to the context around work in the prototypical sentences.

7 Glean: Inducing a Pattern Prototypical usage: She has been working on problems related to aspect. Chunked, supertagged version: She/A_NXN has/B_Vvx~been/B_Vvx~working/A_nx0V on/B_vxPnx problems/A_NXN related/A_nx1V to/B_vxPnx aspect/A_NXN./B_sPU Context around word of interest: She/A_NXN has/B_Vvx~been/B_Vvx~working/A_nx0V on/B_vxPnx … Generalized pattern: */NP *working/A_nx0V* on/B_vxPnx This pattern also matches, for example: “We are also working on type systems for data and knowledge bases.”

8 The Glean system Implemented (mainly in PERL) with HTML Form-interfaces, with a variety of options a SuperTagger server Results 97 % recall and 88 % precision in filtering out irrelevant material in a small test. Large scale evaluation in progress. Demo available Research collaboration between the National Centre for Software Technology, Bombay, Institute for Research in Cognitive Science & Center for Advanced Study of India, University of Pennsylvania. Demo

9 SuperTagging: Benefits Right level of granularity Rich tag set, suitable for a variety of applications Accurate: over 92% accuracy Fast: 31 - 57 words/sec (interpreted PERL) Can be easily retrained, if required Many more applications possible

10 Automatic Text Simplification Basic Idea: To process complex text –create better tools or –simplify the text to be processed! Initial Prototype of Simplification System (Bombay) –Based on Finite State Grammars –Rules on strings to map complex sentences to simpler ones To simplify sentences of the form: Talwinder Singh, who masterminded the Air India sabotage, was killed in a shoot-out with police... we use a rule such as: Segment1/NP, who Segment2, Segment3 => Segment1 Segment3. Segment1 Segment2. to get : Talwinder Singh was killed in a shoot-out with police…. Talwinder Singh masterminded the Air India sabotage.

11 Automatic Text Simplification SuperTagging is better [Coling96][Coling96] –Constituent spans easier to identify –Simplification rules more expressive Rules can now be induced automatically [KBCS96, KBS]KBCS96KBS –Data: Parallel (aligned) corpus of complex and simple text –Induction Procedure: Data tagged using SuperTagging and LDA Aligned labeled trees for complex & simple trees compared Tree-to-trees transformations identified Reduced to a normal form to get simplification rules.

12 Noun-Phrase Chunking Variety of approaches (Hindle, Marcus & Ramshaw, Voutilainen) for Noun-Phrase Chunking Depending on application, we may need –maximal noun phrases –basal noun phrases –all derivable noun phrases SuperTagging provides mechanisms for application-specific noun phrase chunking Can form part of (or basis for) a variety of tools


Download ppt "8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar."

Similar presentations


Ads by Google