1 How to make sense out of unstructured data? Yi Chen Dept. of Computer Science and Engineering Arizona State University.

Slides:



Advertisements
Similar presentations
VLDB ‘07 Query Processing over Incomplete Autonomous Databases Garrett Wolf (Arizona State University) Hemal Khatri (MSN Live Search) Bhaumik Chokshi (Arizona.
Advertisements

Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
1 Introduction to Natural Language Processing (Lecture for CS410 Text Information Systems) Jan 28, 2011 ChengXiang Zhai Department of Computer Science.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Comparing Offline and Online Statistics Estimation for Text Retrieval from Overlapped Collections MS Thesis Defense Bhaumik Chokshi Committee Members:
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Supporting Queries with Imprecise Constraints Ullas Nambiar Dept. of Computer Science University of California, Davis Subbarao Kambhampati Dept. of Computer.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Introduction to Computational Linguistics Lecture 2.
Adaptive Query Processing for Data Aggregation: Mining, Using and Maintaining Source Statistics M.S Thesis Defense by Jianchun Fan Committee Members: Dr.
Intranet Mediator Clement Yu Department of Computer Science University of Illinois at Chicago.
Challenges in Adapting Automated Planning for Autonomic Computing Biplav Srivastava Subbarao Kambhampati IBM India Research Lab Arizona State University.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
1 Database Research at the UW  Faculty: Alon Halevy and Dan Suciu. A dozen Ph.D students  Related faculty: Oren Etzioni, Pedro Domingos, Dan Weld and.
Context Free Grammar S -> NP VP NP -> det (adj) N
Answering Imprecise Queries over Autonomous Web Databases Ullas Nambiar Dept. of Computer Science University of California, Davis Subbarao Kambhampati.
Information Extraction from Documents for Automating Softwre Testing by Patricia Lutsky Presented by Ramiro Lopez.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
ELN – Natural Language Processing Giuseppe Attardi
Logic Programming for Natural Language Processing Menyoung Lee TJHSST Computer Systems Lab Mentor: Matt Parker Analytic Services, Inc.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Dept. Computer Science, Korea Univ. Intelligent Information System Lab. XML clustering methods Sohn Jong-Soo Intelligent Information.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Natural Language Processing Guangyan Song. What is NLP  Natural Language processing (NLP) is a field of computer science and linguistics concerned with.
Ontology-Based Information Extraction: Current Approaches.
KEYS 2012 May 20, 2012, Scottsdale, Arizona, USA The Third International Workshop on Keyword Search on Structured Data.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Data Mining for Web Intelligence Presentation by Julia Erdman.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
Facilitating Document Annotation using Content and Querying Value.
Natural Language Processing Menu Based Natural Language Interfaces -Kyle Neumeier.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Query Processing over Incomplete Autonomous Databases Presented By Garrett Wolf, Hemal Khatri, Bhaumik Chokshi, Jianchun Fan, Yi Chen, Subbarao Kambhampati.
The man bites the dog man bites the dog bites the dog the dog dog Parse Tree NP A N the man bites the dog V N NP S VP A 1. Sentence  noun-phrase verb-phrase.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
SEARCH ENGINES AND BOOLEAN OPS. QUINTIN LUNSFORD.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Iana Atanassova Research: – Information retrieval in scientific publications exploiting semantic annotations and linguistic knowledge bases – Ranking algorithms.
Data Mining: Text Mining
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
IR. SI 650/EECS 549 Information Retrieval People search the Web daily Search engines –Google –Bing –Baidu –Yandex Information Retrieval is about search.
AUTONOMOUS REQUIREMENTS SPECIFICATION PROCESSING USING NATURAL LANGUAGE PROCESSING - Vivek Punjabi.
BLUE (Boeing Language Understanding Engine) - A Quick Tutorial on How it Works Working Note Peter Clark Phil Harrison (Boeing Phantom Works)
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Facilitating Document Annotation Using Content and Querying Value.
One Platform for Mining Structured and Unstructured Data: Dream or Reality? VLDB Panel 13 Sep 2006 Jayavel Shanmugasundaram Yahoo! Research.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
IS6146 Databases for Management Information Systems Lecture 12: Exam Revision Rob Gleasure robgleasure.com.
Statistical Natural Language Parsing Parsing: The rise of data and statistics.
DATA INTEGRATION FOR LANGUAGE DOCUMENTATION
CKY Parser 0Book 1 the 2 flight 3 through 4 Houston5 6/19/2018
Authorship Attribution Using Probabilistic Context-Free Grammars
Probabilistic CKY Parser
Key Observation Theorem:
Subbarao Kambhampati (Arizona State University)
What is IR? In the 70’s and 80’s, much of the research focused on document retrieval In 90’s TREC reinforced the view that IR = document retrieval Document.
CKY Parser 0Book 1 the 2 flight 3 through 4 Houston5 11/16/2018
ما الذي يريد صاحب العمل أن يعرفه؟
Dept. of Computer Science University of Liverpool
Subbarao Kambhampati (Arizona State University)
CS246: Information Retrieval
Anthony Okorodudu CSE Answering Imprecise Queries over Autonomous Web Databases By Ullas Nambiar and Subbarao Kambhampati Anthony Okorodudu.
Presentation transcript:

1 How to make sense out of unstructured data? Yi Chen Dept. of Computer Science and Engineering Arizona State University

2 Databases Have Been a Great Success  for managing structured data  But, 85% of the World’s Data is Not in Databases!

3 How to Obtain Information from Unstructured Data?  Efforts have been made by other areas  Search engines: Google, Yahoo, MSN, Ask,…  Information extraction (IE) [Avatar, TIES, …]  Natural language processing (NLP) [Treebank, UIMA, …]  What can databases do for unstructured data?  XML provides a good basis for representing semi- structured data,  However, challenges remain!! They produce semi-structured data from texts

4 Querying Data Generated from IE  Information extraction produces data about specific entities and relationships  Data generated from information extraction are error prone  incomplete data [Imieliski, Koch,…]  probabilistic databases [Getoor, Jagadish, Halevy, Subrahmanian, Suciu, Tannen, Widom, …]  malleable schemas [Chang, Halevy, Ives…]  Query posed by naïve users are inaccurate  keywords [Agrawal, Chaudhuri, Das, Doan, Gravano, Papakonstantinou, Shanmugasundaram..]  over- or under-specified queries [Chaudhuri..]  natural language queries [Jagadish..]  QUIC: a system that handles data incompleteness and query imprecision at the same time for autonomous databases [CIDR 07, ICDE 07]  Collaborated with Subbarao Kambhampati, Garrett Wolf, Hemal Khatri, Bhaumik Chokshi, Jianchun Fan, and Ullas Nambiar

5 Querying Data Generated from NLP  Natural language processing generates tree structured data (parse trees)  Understanding the lexical structure of a sentence helps query answering  E.g. find the NP after “Bob” and “with” within an NP  Demands queries similar to but different from XQuery/XPath queries S VP NP V Det Prep NP Bob adogtoday saw Alice with PP NP  LPath: a query language for linguistic annotation data generated from NLP over text documents [ICDE06]  Collaborated with Susan Davidson, Steven Bird, Haejoong Lee, and Yifeng Zheng

6 Challenge  How should we close the loop? Documents Data bases Queries Revised queries Result 1 Result 2