Natural Language Processing at NYU: the Proteus Project

Slides:

Advertisements

Similar presentations

eClassifier: Tool for Taxonomies

Advertisements

Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.

The Impact of Task and Corpus on Event Extraction Systems Ralph Grishman New York University Malta, May 2010 NYU.

NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.

Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,

Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A

Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,

Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU.

T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.

July 9, 2003ACL An Improved Pattern Model for Automatic IE Pattern Acquisition Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University.

Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.

An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.

Information Retrieval Ch Information retrieval Goal: Finding documents Search engines on the world wide web IR system characters Document collection.

Interfaces for Querying Collections. Information Retrieval Activities Selecting a collection –Lists, overviews, wizards, automatic selection Submitting.

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:

World Wide Web As the World Wide Web increased in popularity, it was difficult to keep track of all web addresses. Search engines were created to minimize.

The Wharton School of the University of Pennsylvania OPIM 101 2/16/19981 The Information Retrieval Problem n The IR problem is very hard n Why? Many reasons,

Chapter 5: Information Retrieval and Web Search

1 Internet Search Tools Adapted from Kathy Schrock’s PowerPoint entitled “Successful Web Search Strategies” Kathy Schrock’s complete PowerPoint available.

Query Relevance Feedback and Ontologies How to Make Queries Better.

Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.

Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.

The World Wide Web is a great place to find more information about a topic. But there are a lot of sites out there—some are good and some are not so good.

Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.

Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.

The World Wide Web is a great place to find more information about a topic. But there are a lot of sites out there—some are good and some are not so good.

Information Retrieval Lecture 6 Introduction to Information Retrieval (Manning et al. 2007) Chapter 16 For the MSc Computer Science Programme Dell Zhang.

Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.

Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.

How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A Ralph Grishman NYU.

©2003 Paula Matuszek CSC 9010: Text Mining Applications Dr. Paula Matuszek (610)

By Sarah Kastner, Brian Marhefki, and David Vrooman.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Information Retrieval

Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.

Supporting Knowledge Discovery: Next Generation of Search Engines Qiaozhu Mei 04/21/2005.

Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.

FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.

Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.

Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.

Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.

1 CS 430: Information Discovery Lecture 21 Interactive Retrieval.

Major Issues n Information is mostly online n Information is increasing available in full-text (full-content) n There is an explosion in the amount of.

SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.

Automated Information Retrieval

CSCE 590 Web Scraping – Information Extraction II

Sampath Jayarathna Cal Poly Pomona

Information Organization: Overview

Lecture 12: Relevance Feedback & Query Expansion - II

Daniel Bevis William King Villanova University Spring 2006 CS9010

Robust Semantics, Information Extraction, and Information Retrieval

Backpage Gold Coast – One of the foremost classified sites. A relatively new classified ads site, Backpage Gold Coast is more similar to backpage But it makes a good pick if you want a platform that is free and easy to use.

Discovery of Inference Rules for Question Answering

Director, Proteus Project Research in Natural Language Processing

Eric Sieverts University Library Utrecht Institute for Media &

IR Theory: Evaluation Methods

Semantic Knowledge Discovery, Organization and Use

Dept. of Computer Science University of Liverpool

Basic Information Retrieval

CS 430: Information Discovery

Introduction Task: extracting relational facts from text

Chapter 5: Information Retrieval and Web Search

CS246: Information Retrieval

Information Retrieval

Zhixiang Chen & Xiannong Meng U.Texas-PanAm & Bucknell Univ.

Information Organization: Overview

Presentation transcript:

Natural Language Processing at NYU: the Proteus Project Ralph Grishman September 2009

Proteus Project Faculty Ralph Grishman Satoshi Sekine Adam Meyers http://nlp.cs.nyu.edu/

‘Just the Facts’ Vast amount of information is now available on-line in text form but getting ‘the facts’ can be very hard and slow Where has Secretary Clinton been over the last month? Which places on the East Coast have had swine flu outbreaks this month? To move from search to question answering we need more than a bag of words we need to figure out who-did-what-to-whom

Understanding natural language isn’t easy The rebels strafed the car … with automatic weapons fire. … with the Minister and his deputy. They … died instantly. … were promptly arrested. Understanding language requires a lot of knowledge.

How to get all this knowledge? By hand … too expensive Use weakly supervised learning Give a few examples (‘seeds’) Use very large text corpus to learn similar examples

Knowledge Discovery: An Example Goal: want to keep track of all the hirings and departures of executives need to find all the ways such events are described Method: identify a few seed patterns retrieve documents containing patterns find subject-verb-object pattern with high frequency in retrieved documents relatively high frequency in retrieved docs vs. other docs add pattern to seed and repeat

#1: pick seed pattern Seed: < person retires >

#2: retrieve relevant documents Seed: < person retires > Fred retired. ... Harry was named president. Maki retired. ... Yuki was named president. Relevant documents Other documents

#3: pick new pattern Seed: < person retires > < person was named president > appears in several relevant documents Fred retired. ... Harry was named president. Maki retired. ... Yuki was named president.

#4: add new pattern to pattern set Pattern set: < person retires > < person was named president >

Results for some event types, unsupervised learning can do as well as manual pattern development Recall and precision as a function of number of iterations of learner:

Robust Learning Quality of learned patterns is uneven ambiguity of language leads us to learn incorrect patterns Need to identify cases of uncertainty Potential linguistic ambiguities With multiple classifiers using distinct features, cases where they disagree Query user for selected uncertain examples Weakly supervised learning + active learning robust, rapid knowledge discovery

For More Information Project web site Course nlp.cs.nyu.edu Course G22.2590 - Natural Language Processing (Spring 2010)