“Here is my data. Where do I start?” Examples of Ad Hoc Databases Automatic Example Queries for Ad Hoc Databases Bill Howe 1, Garret Cole 2, Nodira Khoussainova.

Slides:



Advertisements
Similar presentations
Understanding Tables on the Web Jingjing Wang. Problem to Solve A wealth of information in the World Wide Web Not easy to access or process by machine.
Advertisements

State of Connecticut Core-CT Project Query 8 hrs Updated 6/06/2006.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Structure Query Language (SQL) COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
Introduction to Structured Query Language (SQL)
Nov 24, 2003Murali Mani SQL B term 2004: lecture 12.
Data Cube and OLAP Server
Slides adapted from A. Silberschatz et al. Database System Concepts, 5th Ed. SQL - part 2 - Database Management Systems I Alex Coman, Winter 2006.
Database Systems More SQL Database Design -- More SQL1.
SQL By: Toan Nguyen. Download Download the software at During the installation –Skip sign up for fast installation.
Microsoft Access 2010 Chapter 7 Using SQL.
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
Midterm 1 Concepts Relational Algebra (DB4) SQL Querying and updating (DB5) Constraints and Triggers (DB11) Unified Modeling Language (DB9) Relational.
Information Retrieval in Practice
Databases & Data Warehouses Chapter 3 Database Processing.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Alert Correlation for Extracting Attack Strategies Authors: B. Zhu and A. A. Ghorbani Source: IJNS review paper Reporter: Chun-Ta Li ( 李俊達 )
Concepts of Database Management, Fifth Edition
IST722 Data Warehousing Business Intelligence Development with SQL Server Analysis Services and Excel 2013 Michael A. Fudge, Jr.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
SQL 資料庫查詢語言 取材自 EIS, 3 rd edition By Dunn et al..
Relational DBs and SQL Designing Your Web Database (Ch. 8) → Creating and Working with a MySQL Database (Ch. 9, 10) 1.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Introduction to Accounting Information Systems
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Automatically Synthesizing SQL Queries from Input-Output Examples Sai Zhang University of Washington Joint work with: Yuyin Sun.
Access Path Selection in a Relational Database Management System Selinger et al.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
SQL: Data Manipulation Presented by Mary Choi For CS157B Dr. Sin Min Lee.
OLAP : Blitzkreig Introduction 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema :
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Robin Mullinix Systems Analyst GeorgiaFIRST Financials PeopleSoft Query: The Next Step.
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
OLAP Recap 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema : Hierarchical Dimensions.
IS 230Lecture 6Slide 1 Lecture 7 Advanced SQL Introduction to Database Systems IS 230 This is the instructor’s notes and student has to read the textbook.
SQL SeQueL -Structured Query Language SQL SQL better support for Algebraic operations SQL Post-Relational row and column types,
SQL LANGUAGE and Relational Data Model TUTORIAL Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Lecture 8 – SQL Joins – assemble new views from existing tables INNER JOIN’s The Cartesian Product Theta Joins and Equi-joins Self Joins Natural Join.
Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.
1/18/00CSE 711 data mining1 What is SQL? Query language for structural databases (esp. RDB) Structured Query Language Originated from Sequel 2 by Chamberlin.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
7 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel 7.6 Advanced Select Queries SQL provides useful functions that.
LM 5 Introduction to SQL MISM 4135 Instructor: Dr. Lei Li.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
Feature Generation and Selection in SRL Alexandrin Popescul & Lyle H. Ungar Presented By Stef Schoenmackers.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
More SQL: Complex Queries,
Neighborhood - based Tag Prediction
The Database Exercises Fall, 2009.
Geographic Information Systems
Associative Query Answering via Query Feature Similarity
Data Integration for Relational Web
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Chapter 7 Most important: 7.2
SQL: Structured Query Language
One-Pass Algorithms for Database Operations (15.2)
Query Functions.
Presentation transcript:

“Here is my data. Where do I start?” Examples of Ad Hoc Databases Automatic Example Queries for Ad Hoc Databases Bill Howe 1, Garret Cole 2, Nodira Khoussainova 3, Leilani Battle 4 { 1 billhowe, 2 gbc3, 3 nodira, 4 University of Washington, Seattle, WA, USA Tabular data extracted from files, spreadsheets, DBs, the web No schema available No query logs available No DBAs available Unknown relationships, semantics, utility Temporary, time-sensitive applications An ad hoc database is a collection of tables with unknown relationships gathered to serve a specific, often transient, often urgent, purpose. “Ad Hoc Databases” A researcher assembles an ad hoc database of recent experimental results to prepare a paper or proposal. Emergency workers responding to a natural disaster assemble an ad hoc data-base from lists of addresses of nearby schools, locations of resources (e.g., ambulances), and contact information for emergency workers. A consulting business analyst assembles an ad hoc database from a set of spreadsheets provided by management for a short term engagement A security analyst assembles an ad hoc database from a set of application trace logs after an attack Approach 1.Model each operator independently using curated sts of example queries from the web 2.Compose operators to generate a search space of example queries 3.Rank each set using scores derived from configurable patterns called idioms SQLShare: Database-as-a-Service for Ad Hoc Data 1.Streamlined for a single workflow: 2.No DDL; schema inferred from data 3.Views as first-class citizens 4.Unfettered sharing; cloud-hosted 5.Full SQL; no restrictions Upload Query Share Q: Are users willing and able to write SQL? A: Yes! But they need access to high-quality examples (c.f. Gray, Szalay et al. 2005; Howe 2010) Modeling Each Operator JoinProject Finding: Important attributes appear near the far left or far right of the table. Select Finding: Good queries return around 5- 10% of the tuples in the table/join. Group by Grouping column: a column is selected if it has manageable distinct values. Aggregates: In most cases, project count(*). If a separate numeric column is also discovered, demonstrate functions sum, min, max, and avg. Union Idea: two tables are good candidates for a union if they share sequence of columns with matching data types. Finds more matches than just considering column name matches. A histogram of the weighted positional scores of columns from the example queries of the SDSS database. Finding: “Good” joins characterized by linear relationships among a handful of set properties We train a decision tree over these features using existing sets of example queries (with > 80% precision and recall) Buid a graph (V,E) where V is the set of tables and E is the set of “good” joins. Each quey i s a minimum spanning tree of a connected component of this graph. What makes a good set of queries? It is application-dependent. e.g. for a DB class, the queries should demonstrate various SQL concepts vs. if user is familiar with SQL but not with the schema, better to have queries that refer to important tables or views. What is an idiom? A function I : Q  [0, 1]. Takes a query and outputs score between 0 and 1. Examples: - outputs 1 if query includes GROUP BY clause, 0 otherwise - rewards queries with more joins - outputs a score based on which important views the query references - outputs 1 if query demonstrates a self-join, 0 otherwise Evaluating “Good” Queries How to use idioms to select the set of starter queries? Represent queries as vectors of idiom scores We use a greedy algorithm by selecting best query first, then iteratively add best additional query given the queries collected up to this point. I1I1...ImIm q1q1 I 1 (q 1 ) I m (q 1 ) qnqn I 1 (q n ) I m (q n ) Goals: - maximize idiom scores - select diverse set of queries Score of a set of k queries: