High-Performance Querying on RAW data Anastasia Ailamaki EPFL.

Slides:



Advertisements
Similar presentations
Instant JChem - current status and what's coming soon. Tim Dudgeon Solutions for Cheminformatics.
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Here are my Data Files. Here are my Queries. Where are my Results? Stratos Idreos* Ioannis Alagiannis Ryan Johnson § Anastasia Ailamaki § University of.
BY LECTURER/ AISHA DAWOOD DW Lab # 3 Overview of Extraction, Transformation, and Loading.
Module 8 Importing and Exporting Data. Module Overview Transferring Data To/From SQL Server Importing & Exporting Table Data Inserting Data in Bulk.
Robust query processing Goetz Graefe, Christian König, Harumi Kuno, Volker Markl, Kai-Uwe Sattler Dagstuhl – September 2010.
1Key – Report Creation with DB2. DB2 Databases Create Domain for DB2 Test Demo.
A N O VERVIEW OF B USINESS I NTELLIGENCE T ECHNOLOGY Source: Communications of the ACM, Vol. 54 No. 8 Surajit Chaudhuri, Umeshwar Dayal, Vivek Narasayya,
Lightning Queries Miguel Branco. Obs. 1: Eating our own (dog) food Data Database Obs. 2: Data Deluge How many of you use databases to store your own data?
ORACLE Lecture 1: Oracle 11g Introduction & Installation.
Database Software File Management Systems Database Management Systems.
Fast Track to ColdFusion 9. Getting Started with ColdFusion Understanding Dynamic Web Pages ColdFusion Benchmark Introducing the ColdFusion Language Introducing.
Integration and Insight Aren’t Simple Enough Laura Haas IBM Distinguished Engineer Director, Computer Science Almaden Research Center.
Securing Data Storage Protecting Data at Rest Advanced Systems Group Dell Computer Asia Ltd.
Multiple Tiers in Action
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Components of the Data Warehouse Michael A. Fudge, Jr.
Session-01. Hibernate Framework ? Why we use Hibernate ?
Web Application Architecture: multi-tier (2-tier, 3-tier) & mvc
Data Warehouse Tools and Technologies - ETL
XML in SQL Server Overview XML is a key part of any modern data environment It can be used to transmit data in a platform, application neutral form.
1 Copyright © 2004, Oracle. All rights reserved. Introduction to Oracle Forms Developer and Oracle Forms Services.
Information Systems Chapter 5 Building the database Part 1. Unsing Access.
1 Overview of Databases. 2 Content Databases Example: Access Structure Query language (SQL)
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
CHAPTER 8: MANAGING DATA RESOURCES. File Organization Terms Field: group of characters that represent something Record: group of related fields File:
File Processing - Database Overview MVNC1 DATABASE SYSTEMS Overview.
Custom Reporting in Blackboard Learn. What happens between clicking run and getting the report? Connect to a data source Where is the information?
© 2007 by Prentice Hall 1 Introduction to databases.
Next-Generation Databases Miguel Branco on behalf of the RAW team.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Open Your Mind to Open Source MPDO’s & EOPR’s Centre for IT & eGovernance AMR-APARD Hyderabad Welcome!
IT 456 Seminar 5 Dr Jeffrey A Robinson. Overview of Course Week 1 – Introduction Week 2 – Installation of SQL and management Tools Week 3 - Creating and.
Visual Registration Overview Combines the elements of Schedule Finder, the Course Catalog, and the Registration Process all in an easy to use GUI.
Why use a Database B8 B8 1.
NoDB: Querying Raw Data --Mrutyunjay. Overview ▪ Introduction ▪ Motivation ▪ NoDB Philosophy: PostgreSQL ▪ Results ▪ Opportunities “NoDB in Action: Adaptive.
13 Copyright © 2009, Oracle. All rights reserved. Integrating with Oracle Business Intelligence Enterprise Edition (OBI EE)
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
Chapter One (Database System) Objectives Introduction to Database Management Systems (DBMS) Data and Information History of DB Types of DB.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
EMBL-EBI MSD Search and Visualization tools Jawahar Swaminathan.
Client-Server Paradise ICOM 8015 Distributed Databases.
Implementation of a Relational Database as an Aid to Automatic Target Recognition Christopher C. Frost Computer Science Mentor: Steven Vanstone.
Firewater The “split” personality of LucidDB A quick whirl through combining a purpose built BI database with scale “out” capabilities aka – LucidDB roadmap.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
GOOGLE FUSION TABLES: WEB- CENTERED DATA MANAGEMENT AND COLLABORATION HectorGonzalez, et al. Google Inc. Presented by Donald Cha December 2, 2015.
The IPSO Factor Enriching portfolios with market data.
RAW A database for high-performance querying of raw data Miguel Branco.
ESG-CET Meeting, Boulder, CO, April 2008 Gateway Implementation 4/30/2008.
Data Management Conference Performance & Scalability Simon Sabin London September 29th.
Chapter 1 Database Access from Client Applications.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Financial Information Management Business Intelligence Putting VBA & SQL To Work.
Analytics Plus Product Overview. Introduction Analytics Plus is a self-service Business Intelligence and advanced analytics software. On-premise reporting.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
How To Start a SQL server Connecting to SQL Server.
نمايندگي استان يزد. نمايندگي استان يزد طراحی کسب و کار الکترونیکی ارائه کننده : محسن افسر قره باغ.
Every Good Graph Starts With
Embedding the Reporting Engine Version 3.5
07 | Analyzing Big Data with Excel
بسم الله الرحمان الرحیم
MANAGING DATA RESOURCES
Analytics Plus Product Overview 1.
Topology Comparison PowerWorld Users Tool Zach Gill Sanford
Business Intelligence
Best Practices in Higher Education Student Data Warehousing Forum
Geographic Information Systems
Presentation transcript:

High-Performance Querying on RAW data Anastasia Ailamaki EPFL

Source: “An Overview of Business Intelligence Technology”. S. Chaudhuri, U. Dayal, V. Narasayya. CACM August 2011 Source: “An Overview of Business Intelligence Technology”. S. Chaudhuri, U. Dayal, V. Narasayya. CACM August 2011 create a database to run queries RAW DATA FILES LOAD INTO DB QUERY REPORT RESULTS APPLICATIONS data-to-query time too long data “locked” in vendor private data: no move, no copy

run queries to create a database MapReduce Engine Relational DBMS Reporting Server Spreadsheet Enterprise Search Engine … … … External Data Sources Operational Databases invest only in interesting data

easy for you to say No ETL Declarative querying is king Complex data: tables; arrays; hierarchies large-scale vertical integration Flexibility: multiple file formats; no static schemas; … Efficiency!

Higgs analysis with RAW CSV ROOT SELECT event.jet… FROM goodruns.CSV, atlas001.root WHERE csv.RunNumber == root.RunNumber AND root.EF_2mu13 == TRUE AND … join scan root scan csv filter … containing “good” run numbers … containing physics events Code Generate the Access Paths Code Generate the Query Build Position and Data Caches RAW is 100x faster