Learning Semantic String Transformations from Examples Rishabh Singh and Sumit Gulwani.

Slides:



Advertisements
Similar presentations
Relational Algebra Relational algebra consists of a set of relational operators Each operator has one or more relations as input and creates a new relation.
Advertisements

Synthesizing Number Transformations from Input-Output Examples Rishabh Singh and Sumit Gulwani.
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
From Verification to Synthesis Sumit Gulwani Microsoft Research, Redmond August 2013 Marktoberdorf Summer School Lectures: Part 1.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
FlashExtract : A General Framework for Data Extraction by Examples
Foundations of Relational Implementation n Defining Relational Data n Relational Data Manipulation n Relational Algebra.
Data Manipulation using Programming by Examples and Natural Language Invited Upenn April 2015 Sumit Gulwani.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Your name here The lecture notes are based on using Microsoft Access interactively as part of the lecture.
Usable Synthesis Sumit Gulwani Microsoft Research, Redmond Usable Verification Workshop November 2010 MSR Redmond.
Presenter: PCLee Design Automation Conference, ASP-DAC '07. Asia and South Pacific.
Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.
Spreadsheet Table Transformations from Examples William HarrisSumit Gulwani.
1 Lecture 10 – Synthesis from Examples Eran Yahav.
ENGIN112 L12: Circuit Analysis Procedure September 29, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 12 Circuit Analysis Procedure.
Programming by Example using Least General Generalizations Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft Research.
Cultivating Research Taste (illustrated via a journey in Program Synthesis research) Programming Languages Mentoring Workshop 2015 Sumit Gulwani Microsoft.
Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
Programming by Examples Marktoberdorf Lectures August 2015 Sumit Gulwani.
Automatically Synthesizing SQL Queries from Input-Output Examples Sai Zhang University of Washington Joint work with: Yuyin Sun.
End-User Programming (using Examples & Natural Language) Sumit Gulwani Microsoft Research, Redmond August 2013 Marktoberdorf Summer.
HAP 709 – Healthcare Databases SQL Data Manipulation Language (DML) Updated Fall, 2009.
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
Dimensions in Synthesis Part 3: Ambiguity (Synthesis from Examples & Keywords) Sumit Gulwani Microsoft Research, Redmond May 2012.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
…and postgis & full text search & fuzzy comparisons.
Module 3: Creating Maps. Overview Lesson 1: Creating a BizTalk Map Lesson 2: Configuring Basic Functoids Lesson 3: Configuring Advanced Functoids.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.
Abdullah Said Alkalbani University of Buraimi
Application of dependency graph to security protocol analysis Ilja Tšahhirov (joint work with Peeter Laud) Theory Days at Jõulumäe 5 Oct 2008.
Advanced Relational Algebra & SQL (Part1 )
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
FlashNormalize: Programming by Examples for Text Normalization International Joint Conference on Artificial Intelligence, Buenos Aires 7/29/2015FlashNormalize1.
Predicting a Correct Program in PBE Rishabh Singh, Microsoft Research Sumit Gulwani, Microsoft Research.
Automating String Processing in Spreadsheets using Input-Output Examples Sumit Gulwani Microsoft Research, Redmond.
Compositional Program Synthesis from Natural Language and Examples Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft.
FlashMeta Microsoft PROSE SDK: A Framework for Inductive Program Synthesis Oleksandr Polozov University of Washington Sumit Gulwani Microsoft Research.
CS422 Principles of Database Systems Introduction to Query Processing Chengyu Sun California State University, Los Angeles.
Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He Joint work with: Kevin Chen-Chuan Chang, Jiawei Han Univ.
Programming by Examples Marktoberdorf Lectures August 2015 Sumit Gulwani.
Dagstuhl Seminar Oct 2015 Sumit Gulwani Applications of Inductive Programming in Data Wrangling.
Efficient Semantic Web Service Discovery in Centralized and P2P Environments Dimitrios Skoutas 1,2 Dimitris Sacharidis.
Writing Basic SQL SELECT Statements Lecture
Programming by Examples applied to Data Wrangling Invited SYNT July 2015 Sumit Gulwani.
Deductive Techniques for synthesis from Inductive Specifications Dagstuhl Seminar Oct 2015 Sumit Gulwani.
©Silberschatz, Korth and Sudarshan2.1Database System Concepts - 6 th Edition Chapter 8: Relational Algebra.
Sumit Gulwani Spreadsheet Programming using Examples Keynote at SEMS July 2016.
Sumit Gulwani Programming by Examples Applications, Algorithms & Ambiguity Resolution Keynote at IJCAR June 2016.
Tackling Ambiguity in PBE Rishabh Singh
Ritu CHaturvedi Some figures are adapted from T. COnnolly
Outline Core Synthesis Architecture [1 hour by Sumit]
Module 2: Intro to Relational Model
COP4710 Database Systems Relational Algebra.
Probabilistic Data Management
Programming by Examples
Introduction to Query Optimization
Some slides by Elsa L Gunter, NJIT, and by Costas Busch
Database management concepts
Programming by Examples
Lecture 12: Data Wrangling
Writing Basic SQL SELECT Statements
Database management concepts
Presentation transcript:

Learning Semantic String Transformations from Examples Rishabh Singh and Sumit Gulwani

FlashFill

Transformations Syntactic Transformations –Concatenation of regular expression based substring –“VLDB2012”  “VLDB” Semantic Transformations –More than just characters –“1/5/2010”  “May 1 st 2010”

Semantic Transformations Semantic information as relational tables –1  January, 2  February Learn table lookup queries –VLOOKUP macro 2 nd most problematic

Outline Lookup Transformations Lookup + Syntactic Transformations Case Studies

Table Lookup Transformation s Demo

Learning Framework Input Strings F Output String F1F1 1. Domain-specific Language L FnFn … 2. Algorithm to learn all F s from (i,o)

Emp Record SSNEmpIdName John Henry William Johnson Steve Russell Ian Jordan Mary Dina Input v 1 Output Steve Russell Select(Name, EmpRecord, (SSN = v 1 )) Example - Lookup

ItemRec ItemIdItem ST-340Stroller BI-567Bib DI-328Diapers WI-989Wipes AS-469Aspirator PriceRec ItemIdPrice ST-340$ BI-567$3.56 DI-328$21.45 WI-989$5.12 AS-469$2.56 Input v 1 Output Stroller$ Select(Price, PriceRec, (ItemId = Select(ItemId, ItemRec, Item = v 1 )) Example – Transitive Lookup

Learn Query ItemRec ItemIdItem ST-340Stroller BI-567Bib DI-328Diapers WI-989Wipes AS-469Aspirator PriceRec ItemIdPrice ST-340$ BI-567$3.56 DI-328$21.45 WI-989$5.12 AS-469$2.56 Input v 1 Output Stroller$ Select(Price, PriceRec, (ItemId = Select(ItemId, ItemRec, Item = v 1 ))

Strings reachable from input row Emp Record SSNEmpIdName John Henry William Johnson Steve Russell Ian Jordan

strings in table rows of visited nodes Steve Russell

…….. Repeat until k steps or fixpoint

…….. Steve Russell

Maintains tree structure –share common sub-expressions CNF of Boolean Conditionals –independent column predicates

Synthesize Procedure Synthesize((i 1,o 1 ), …, (i n,o n )) P = GenerateStr t (i 1,o 1 ) for j = 2 to n: P’ = GenerateStr t (i j,o j ) P = Intersect t (P’, P) return P

Semantic String Transformation s Demo

[GulwaniPOPL11]

Syntactic manipulations over lookup outputs Syntactic manipulations before indexing

SSN: Emp Record SSNEmpIdName John Henry William Johnson Steve Russell Ian Jordan Mr. Steve Russell

SSN: Emp Record SSNEmpIdName John Henry William Johnson Steve Russell Ian Jordan

SSN: Emp Record SSNEmpIdName John Henry William Johnson Steve Russell Ian Jordan

{ “SSN: ”, “ ”, “1125”, “Steve Russell” } Set of reachable strings

{ “SSN: ”, “ ”, “1125”, “Steve Russell” } Mr. Steve Russell

Experiments

Related Work Matching strings for table joins –Record Matching [Elmagarmid et. al. 07, Koudas et. al. SIGMOD06] –Schema Matching [Dhamankar et. al. SIGMOD04, Warren & Tompa VLDB06] Query Synthesis –from representative view [Das Sharma et.al. ICDT10, Tran et.al. SIGMOD09] Text-editing by example –QuickCode[Gulwani POPL11] –SMARTedit[Lau et.al. ML03], Simulatenous Editing[Miller et.al. USENIX01]

Thanks! End-Users Algorithm Designer s Software Developers Large potential

Backup slides

Semantic String Transformations Time (12 Hr)Time (24 Hr) 09309:30 AM 15203:20 PM =TEXT(C,”00 00”)+0

Semantic String Transformations DateFormatted Date Jun 3 rd,

Idea 1: Share sub-expressions T3T3 C1C1 C2C2 C3C3 s3s3 s4s4 s5s5 T1T1 C1C1 C2C2 C3C3 s1s1 s2s2 s3s3 T2T2 C1C1 C2C2 C3C3 s2s2 s3s3 s4s4 Select(C 3, T 2, C 1 =e) Select(C 2, T 3, C 1 =Select(C 2,T 2,C 1 =e)

Youtube Videos French Polish Urdu German Serbian Russian

Idea 2: CNF conditionals T C1C1 C2C2 C3C3 …CnCn C n+ 1 sssst v1v1 v2v2 … vmvm Out ssst

No. of Consistent Expressions

Succinct Representation

Performance

Ranking

Idea 2: CNF conditionals

Related Work Record Matching –Similarity functions for matching [Elmagarmid et. al. 07, Koudas et. al. SIGMOD06] –Customizable similarity function [Arasu et. al. VLDB09] Learning Schema Matches –iMAP [Dhamankar et. al. SIGMOD04] concat. of column strings using domain-specific knowledge –[Warren & Tompa VLDB06] concatenation of column substrings, single table

Related Work Query Synthesis [Das Sharma et.al. ICDT10, Tran et.al. SIGMOD09] –Infer relation from large representative example view –no joins or projections Text-editing using examples –QuickCode[Gulwani POPL11] string transformations –SMARTedit[Lau et.al. ML03], Simulatenous Editing[Miller et.al. USENIX01] programming by demonstration

General Framework A Domain-specific Transformation Language L –Expressive and succinct Efficient Data structures for set of expressions –Version-space algebra GenerateStr –All sets of expressions from I-O example Intersect –Intersect two sets of expressions

Emp Record SSNEmpIdName John Henry William Johnson Steve Russell Ian Jordan Mary Dina Input v 1 Output Steve Russell Select(Name, EmpRecord, (SSN = v 1 )) Example - Lookup

ItemRec ItemIdItem ST-340Stroller BI-567Bib DI-328Diapers WI-989Wipes AS-469Aspirator PriceRec ItemIdPrice ST-340$ BI-567$3.56 DI-328$21.45 WI-989$5.12 AS-469$2.56 Input v 1 Output Stroller$ Bib Aspirator Wipes Select(Price, PriceRec, (ItemId = Select(ItemId, ItemRec, Item = v 1 )) Example – Transitive Lookups

T1T1 C1C1 C2C2 C3C3 s1s1 s2s2 s3s3 T2T2 C1C1 C2C2 C3C3 s2s2 s3s3 s4s4 TiTi C1C1 C2C2 C3C3 sisi s i+1 s i+2 Example … TmTm Input v 1 Output s1s1 smsm

T i-1 C1C1 C2C2 C3C3 s i-1 sisi s i+1 T i-2 C1C1 C2C2 C3C3 s i-2 s i-1 sisi Sub-expression Sharing

Current State of the Art: Help forums

Observations Semantic string transformations Input-output examples based interaction –New disambiguating inputs Add-in with the same interface