SQL Log Analysis An Exploration of XQuery as a Tool to Analyze SQL Parse Trees Nathan Bales, Mary Fernandez, Lukasz Golab, Ted Johnson.

Slides:



Advertisements
Similar presentations
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Advertisements

Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
C6 Databases.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 19 Algorithms for Query Processing and Optimization.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
© Copyright 2011 John Wiley & Sons, Inc.
Wrapup Amol Deshpande CMSC424. “Inventing the Future” Wednesday at 3:30pm 1115 CSIC Exam.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
ETEC 100 Information Technology
1 Basic DB Terms Data: Meaningful facts, text, graphics, images, sound, video segments –A collection of individual responses from a marketing research.
Fundamentals, Design, and Implementation, 9/e Chapter 11 Managing Databases with SQL Server 2000.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
SMIILE Finaly COBOL! and what else is new Gordana Rakić, Zoran Budimac.
Chapter 1: The Database Environment and Development Process
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Attribute databases. GIS Definition Diagram Output Query Results.
...Looking back Why use a DBMS? How to design a database? How to query a database? How does a DBMS work?
Information systems and databases Database information systems Read the textbook: Chapter 2: Information systems and databases FOR MORE INFO...
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
4/20/2017.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
CS2008/CS5035 Exam Preparation. Dept. of Computing Science, University of Aberdeen2 Organization of Lecture Notes Group 1 - SQL –L1 – Introduction –L2.
Chapter 5 Database Processing.
Rationale Aspiring Database Developers should be able to efficiently query and maintain databases. This module will help students learn the Structured.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Introduction to Databases Chapter 8: Improving Data Access.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
1 Revision and Exam Briefing M. Akhtar Ali School of CEIS.
XML Processing Moves Forward XSLT 2.0 and XQuery 1.0 Michael Kay Prague 2005.
Introduction to Databases A line manager asks, “If data unorganized is like matter unorganized and God created the heavens and earth in six days, how come.
Sofia, Bulgaria | 9-10 October Using XQuery to Query and Manipulate XML Data Stephen Forte CTO, Corzen Inc Microsoft Regional Director NY/NJ (USA) Stephen.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Database System Concepts and Architecture
Company LOGO OODB and XML Database Management Systems – Fall 2012 Matthew Moccaro.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
FEN  Concepts and terminology  Operations (relational algebra)  Integrity constraints The relational model.
PowerPoint Presentation for Dennis, Wixom & Tegarden Systems Analysis and Design Copyright 2001 © John Wiley & Sons, Inc. All rights reserved. Slide 1.
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.
Chapter Chapter 13-2 Accounting Information Systems, 1 st Edition Data and Databases.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Copyright © 2004 Pearson Education, Inc.. Chapter 26 XML and Internet Databases.
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
FEN Introduction to the database field:  Applications, concepts and terminology Seminar: Introduction to relational databases.
ROOT I/O for SQL databases Sergey Linev, GSI, Germany.
3/6: Data Management, pt. 2 Refresh your memory Relational Data Model
Mining real world data RDBMS and SQL. Index RDBMS introduction SQL (Structured Query language)
Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Database Development Indra Budi
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Welcome to CPSC 534B: Information Integration Laks V.S. Lakshmanan Rm. 315.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Foundations of information systems : BIS 1202 Lecture 4: Database Systems and Business Intelligence.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Intro to MIS – MGS351 Databases and Data Warehouses
XML: Extensible Markup Language
Databases and Data Warehouses Chapter 3
Introduction of Week 9 Return assignment 5-2
Lecture 30: Final Review Wednesday, December 6, 2000.
Query Optimization.
Supporting High-Performance Data Processing on Flat-Files
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

SQL Log Analysis An Exploration of XQuery as a Tool to Analyze SQL Parse Trees Nathan Bales, Mary Fernandez, Lukasz Golab, Ted Johnson

History Copyright 2007 at&t  Query logs from internal at&t source  Teradata warehouse  Ted Johnson asked to analyze logs  Data to be migrated to new DBMS  Owners want to:  Find out which data is ‘live’  Inform new database design  Proprietary tools insufficient  Ted’s Approach  Parse to memory AST, C++ to walk tree  Extract components to flat files

Analysis Goals  Index suggestion  Find, count, predicates over single tables  What comparison operators are used on an attribute?  Determine usefulness of views  Which are used? Not used? Joined?  Discover Hidden Schemata  Produce various interpretations of join graph  Claim facts about structure  Identify Query Sources  Find queries written using the same tool or template  etc... many more possible Copyright 2007 at&t

Observations on Query Logs  Logs are large  50,000 queries per month; several months  Arbitrary complexity  Queries have thousands of terms  Hide complexity in views  Teradata may not materialize  Natural tree structure  Other analysis methods analyze text or data  XML and XQuery are good tools for tree structured data Copyright 2007 at&t

Example query from log SELECT FTV_FINANCIAL_TRANSACTION.RESPONSIBILITY_CHARGED_CD, FTV_FINANCIAL_TRANSACTION.PROJECT_NBR, FTV_FINANCIAL_TRANSACTION.TRNSCTN_EXPENDITURE_TYPE_CD, FTV_FINANCIAL_TRANSACTION.PURCHASE_CARD_VENDOR_NM, SUM(FTV_FINANCIAL_TRANSACTION.TRANSACTION_AMT), FTV_VENDOR.VENDOR_NM, FTV_FINANCIAL_TRANSACTION.TRANSACTION_SOURCE_NM, FTV_TRNSCTN_EXPNDTR_TY_FDW.TRNSCTN_EXPENDITURE_TYPE_DESC, FTV_FINANCIAL_TRANSACTION.RESPONSIBILITY_ORIGINATING_CD, FTV_FINCL_TRNSCTN_INVOICE.INVOICE_NBR, FTV_FINCL_TRNSCTN_INVOICE.INVOICE_RECORDED_SBC_USERID, FTV_XC_MR2000_FDW_TL.FIN_CD, FTV_ACCOUNT_SERIES_FDW.ACCOUNT_NM, CASE WHEN (FTV_FINANCIAL_TRANSACTION.SUB_ACCOUNT_CD IS NOT NULL) AND (FTV_FINANCIAL_TRANSACTION.SUB_ACCOUNT_CD <> ' ') THEN FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD || '.' || TRIM (TRAILING FROM FTV_FINANCIAL_TRANSACTION.SUB_ACCOUNT_CD) || TRIM (TRAILING FROM FTV_FINANCIAL_TRANSACTION.ACCOUNT_LETTER_CD) ELSE FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD END, FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD, TRIM (TRAILING FROM FTV_FINANCIAL_TRANSACTION.SUB_ACCOUNT_CD), FTV_ACCOUNT_SERIES_FDW.PLANT_CLASS_CD, FTV_FINANCIAL_TRANSACTION.ACCOUNTED_IND, FTV_FINANCIAL_TRANSACTION.ACTIVITY_CD, FTV_FINANCIAL_TRANSACTION.FINANCIAL_APPLICATION_CD, FTV_FINANCIAL_TRANSACTION.TRANSACTION_COMMENT1_TXT, FTV_FINANCIAL_TRANSACTION.TRANSACTION_COMMENT2_TXT, FTV_FINANCIAL_TRANSACTION.TRANSACTION_DESC, FTV_FINANCIAL_TRANSACTION.EMPLOYEE_SBC_USERID, FTV_FINANCIAL_TRANSACTION.INVENTORY_PRODUCT_ID, FTV_FINANCIAL_TRANSACTION.JOB_ACTIVITY_CD, FTV_FINANCIAL_TRANSACTION.TRANSACTION_ENTRY_TYPE_DESC, FTV_FINANCIAL_TRANSACTION.REFERENCE_NBR, FTV_FINANCIAL_TRANSACTION.TRANSACTION_TYPE_NM, FTV_FINANCIAL_TRANSACTION.DATA_YEAR_MONTH_FMT_DT, CAST(CAST(FTV_FINANCIAL_TRANSACTION.DATA_YEAR_MONTH_DT AS FORMAT 'YYYY') AS CHAR(4)), CAST(CAST(FTV_FINANCIAL_TRANSACTION.DATA_YEAR_MONTH_DT AS FORMAT 'MM') AS CHAR(2)), FTV_GEOGRAPHIC_LOCATION_1.LOCATION_CLLI_CD, FTV_FINANCIAL_TRANSACTION.BUDGET_LOCATION_CD, FTV_GEOGRAPHIC_LOCATION_1.WIRE_CENTER_CLLI_CD, FTV_COMPANY_CODE_FDW.REGIONAL_COMPANY, FTV_FINANCIAL_TRANSACTION.COMPANY_CD, CASE WHEN (FTV_FINANCIAL_TRANSACTION.INSTANCE_CD = 'WL') THEN (CASE WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '1') THEN 'AST' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '2') THEN 'LIB' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '3') THEN 'EQT' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '4') THEN 'REV' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '5') THEN 'CGS' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '6') THEN 'EXP' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '7') THEN 'EXP' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '8') THEN 'INC' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '9') THEN 'EXT' ELSE ' ' END) WHEN (FTV_FINANCIAL_TRANSACTION.INSTANCE_CD = 'TL') THEN (CASE WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '1') THEN 'AST' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '4') THEN 'LIB' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '5') THEN 'REV' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '6') THEN 'EXP' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '7') THEN 'INC' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '8') THEN 'CLR' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '9') THEN 'CLR' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '3') THEN 'DEP' WHEN (FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD IN ('2002','2003','2004')) THEN (CASE WHEN (FTV_FINANCIAL_TRANSACTION.ACTIVITY_CD LIKE '5%') THEN 'OCP' ELSE 'CON' END) WHEN (FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD IN ('2005','2006','2007')) THEN 'OCP' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '2') AND (FTV_FINANCIAL_TRANSACTION.ACTIVITY_CD LIKE '5%') THEN 'OCP' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '2') THEN 'CON' ELSE ' ' END) ELSE ' ' END, CASE WHEN FTV_FINANCIAL_TRANSACTION.SUB_ACCOUNT_CD = ' ' THEN ' ' WHEN FTV_FINANCIAL_TRANSACTION.ACCOUNT_LETTER_CD = ' ' THEN ' ' ELSE TRIM (TRAILING FROM FTV_FINANCIAL_TRANSACTION.SUB_ACCOUNT_CD) || TRIM (TRAILING FROM FTV_FINANCIAL_TRANSACTION.ACCOUNT_LETTER_CD) END, FTV_PROJECT.PROJECT_DESC, FTV_PROJECT.PROJECT_NM, FTV_PROJECT.PROJECT_TYPE_NM FROM FDW_ACCESS_VIEWS.VZFW506_TRNSCTN_EXPNDTR_TY_FDW FTV_TRNSCTN_EXPNDTR_TY_FDW RIGHT JOIN FINANCE_ACCESS_VIEWS.VSTF005_FINANCIAL_TRANSACTION FTV_FINANCIAL_TRANSACTION ON FTV_TRNSCTN_EXPNDTR_TY_FDW.INSTANCE_CD=FTV_FINANCIAL_TRANSACTION.INSTANCE_CD AND FTV_TRNSCTN_EXPNDTR_TY_FDW.TRNSCTN_EXPENDITURE_TYPE_CD=FTV_FINANCIAL_TRANSACTION.TRNSCTN_EXPENDITURE_TYPE_CD LEFT JOIN ACCESS_VIEWS.VCCR038_GEOGRAPHIC_LOCATION FTV_GEOGRAPHIC_LOCATION_1 ON FTV_GEOGRAPHIC_LOCATION_1.INSTANCE_CD=FTV_FINANCIAL_TRANSACTION.INSTANCE_CD AND FTV_GEOGRAPHIC_LOCATION_1.GEOGRAPHIC_LOCATION_CD=FTV_FINANCIAL_TRANSACTION.BUDGET_LOCATION_CD LEFT JOIN FINANCE_ACCESS_VIEWS.VCTF003_FINCL_TRNSCTN_INVOICE FTV_FINCL_TRNSCTN_INVOICE ON FTV_FINCL_TRNSCTN_INVOICE.INVOICE_ID=FTV_FINANCIAL_TRANSACTION.INVOICE_ID AND FTV_FINCL_TRNSCTN_INVOICE.INSTANCE_CD=FTV_FINANCIAL_TRANSACTION.INSTANCE_CD LEFT JOIN ACCESS_VIEWS.VCCR037_PROJECT FTV_PROJECT ON FTV_FINANCIAL_TRANSACTION.INSTANCE_CD=FTV_PROJECT.INSTANCE_CD AND FTV_FINANCIAL_TRANSACTION.PROJECT_NBR=FTV_PROJECT.PROJECT_NBR AND FTV_FINANCIAL_TRANSACTION.VALID_PROJECT_CD=FTV_PROJECT.VALID_PROJECT_CD LEFT JOIN ACCESS_VIEWS.VCCR039_VENDOR FTV_VENDOR ON FTV_VENDOR.VENDOR_ID=FTV_FINANCIAL_TRANSACTION.VENDOR_ID AND FTV_VENDOR.INSTANCE_CD=FTV_FINANCIAL_TRANSACTION.INSTANCE_CD LEFT JOIN FDW_ACCESS_VIEWS.VZFW500_COMPANY_CODE_FDW FTV_COMPANY_CODE_FDW ON FTV_FINANCIAL_TRANSACTION.COMPANY_CD=FTV_COMPANY_CODE_FDW.COMPANY_CD LEFT JOIN FDW_ACCESS_VIEWS.VZFW501_ACCOUNT_SERIES_FDW FTV_ACCOUNT_SERIES_FDW ON FTV_FINANCIAL_TRANSACTION.INSTANCE_CD=FTV_ACCOUNT_SERIES_FDW.INSTANCE_CD AND FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD=FTV_ACCOUNT_SERIES_FDW.MAIN_ACCOUNT_CD AND FTV_FINANCIAL_TRANSACTION.SUB_ACCOUNT_CD=FTV_ACCOUNT_SERIES_FDW.SUB_ACCOUNT_CD AND FTV_FINANCIAL_TRANSACTION.ACCOUNT_LETTER_CD=FTV_ACCOUNT_SERIES_FDW.ACCOUNT_LETTER_CD LEFT JOIN FDW_ACCESS_VIEWS.VZFW507_XC_MR2000_FDW FTV_XC_MR2000_FDW_TL ON FTV_FINANCIAL_TRANSACTION.INSTANCE_CD=FTV_XC_MR2000_FDW_TL.INSTANCE_CD AND FTV_FINANCIAL_TRANSACTION.TRNSCTN_EXPENDITURE_TYPE_CD=FTV_XC_MR2000_FDW_TL.XC_CD WHERE (FTV_FINANCIAL_TRANSACTION.INSTANCE_CD = 'TL') AND ((FTV_FINANCIAL_TRANSACTION.COMPANY_CD LIKE ('T%'))) AND (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) ^= '5' ) AND ( FTV_FINANCIAL_TRANSACTION.DATA_YEAR_MONTH_FMT_DT BETWEEN 'JAN-2005' AND 'DEC-2005' AND SUBSTR(FTV_FINANCIAL_TRANSACTION.RESPONSIBILITY_CHARGED_CD,1,3) = 'S0S' AND FTV_FINANCIAL_TRANSACTION.PROJECT_NBR IN (' ', ' ', ' ', ' ', ' ') AND (CASE WHEN (FTV_FINANCIAL_TRANSACTION.INSTANCE_CD = 'WL') THEN (CASE WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '1') THEN 'AST' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '2') THEN 'LIB' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '3') THEN 'EQT' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '4') THEN 'REV' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '5') THEN 'CGS' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '6') THEN 'EXP' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '7') THEN 'EXP' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '8') THEN 'INC' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '9') THEN 'EXT' ELSE ' ' END) WHEN (FTV_FINANCIAL_TRANSACTION.INSTANCE_CD = 'TL') THEN (CASE WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '1') THEN 'AST' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '4') THEN 'LIB' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '5') THEN 'REV' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '6') THEN 'EXP' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '7') THEN 'INC' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '8') THEN 'CLR' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '9') THEN 'CLR' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '3') THEN 'DEP' WHEN (FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD IN ('2002','2003','2004')) THEN (CASE WHEN (FTV_FINANCIAL_TRANSACTION.ACTIVITY_CD LIKE '5%') THEN 'OCP' ELSE 'CON' END) WHEN (FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD IN ('2005','2006','2007')) THEN 'OCP' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '2') AND (FTV_FINANCIAL_TRANSACTION.ACTIVITY_CD LIKE '5%') THEN 'OCP' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '2') THEN 'CON' ELSE ' ' END) ELSE ' ' END = 'CON' OR CASE WHEN (FTV_FINANCIAL_TRANSACTION.INSTANCE_CD = 'WL') THEN (CASE WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '1') THEN 'AST' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '2') THEN 'LIB' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '3') THEN 'EQT' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '4') THEN 'REV' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '5') THEN 'CGS' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '6') THEN 'EXP' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '7') THEN 'EXP' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '8') THEN 'INC' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '9') THEN 'EXT' ELSE ' ' END) WHEN (FTV_FINANCIAL_TRANSACTION.INSTANCE_CD = 'TL') THEN (CASE WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '1') THEN 'AST' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '4') THEN 'LIB' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '5') THEN 'REV' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '6') THEN 'EXP' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '7') THEN 'INC' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '8') THEN 'CLR' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '9') THEN 'CLR' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '3') THEN 'DEP' WHEN (FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD IN ('2002','2003','2004')) THEN (CASE WHEN (FTV_FINANCIAL_TRANSACTION.ACTIVITY_CD LIKE '5%') THEN 'OCP' ELSE 'CON' END) WHEN (FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD IN ('2005','2006','2007')) THEN 'OCP' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '2') AND (FTV_FINANCIAL_TRANSACTION.ACTIVITY_CD LIKE '5%') THEN 'OCP' WHEN (SUBSTR(FTV_FINANCIAL_TRANSACTION.MAIN_ACCOUNT_CD,1,1) = '2') THEN 'CON' ELSE ' ' END) ELSE ' ' END = 'EXP') ) GROUP BY 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42 Copyright 2007 at&t

Summer Progress  Background work  Design XML schema for SQL query parse trees  Parse logs to XML  Give qualified names context  Inline views  Exploration Phase  Write simple analysis over XML log  Poor performance  Materialize views of interesting facts  Increased complexity  Annotate XML log with interesting facts  Adding to schema breaks old analyses  Result: Need sound, robust conceptual model Copyright 2007 at&t

Example Analysis { for $att in $log//column where exists($log//predicate [.//column === $att]) and empty($log//select_expression [empty(./ancestor::subquery)]) return { $att } } Copyright 2007 at&t

Research Issue 1 of 2  Conceptual model for query analysis  Logical Level:  Identify interesting aspects of queries  Aspect defined by XQuery function f(x)  Find or group queries with specific aspects  Physical Level:  What is an aspect  Concrete component : any sub-parse-tree  Abstract component : result of applying f(x) to sub-parse-tree  Aspect index  Indexes concrete components in log  Keyed on abstract component values Copyright 2007 at&t

Research Issue 1 of 2 (Example)  Aspect: Join Edge  Concrete part: a predicate  $predicate in $log//predicate  Abstract part: join edge  { for $table in $predicate//table/name/text() order by $table asc return { $table } } Copyright 2007 at&t

System Diagram Copyright 2007 at&t

Conclusion  Will patent  Hope to publish  Exposed open optimization problems  Things I learned:  Vastly different approaches to computer science research can be very successful  How industrial problems motivate research  Let the research motivate the paper, not vice versa  10.5 weeks 3,000 miles from fiancée = not healthy Copyright 2007 at&t

QA Copyright 2007 at&t

Research Issue 2 of 2  Extending the model for similarity analysis  Leverage structural similarity in addition to textual  Use understood properties of SQL to improve score  Example:  SELECT a FROM r  Which is more similar?  SELECT a, b, c FROM r  SELECT a, d FROM r, s  Consider similarities of multiple aspects in a single query  Query optimization could break scores Copyright 2007 at&t

Related Work  Vendor analysis tools  DB2’s index advisor (others)  Practical Query Analysis (  Ruby tool for MySQL and PostgreSQL  Aggregate text after some normalization  SQL Text Mining (Vik Singh, Jim Gray, Mark Manasse – MSR Tech Report)  Normalize query text  Cluster with known text similarity methods  Goals  Bot detection  Query recommendation Copyright 2007 at&t