Physics Analysis inside the Oracle DB Progress report 10 Octobre 2013.

Slides:

Advertisements

Similar presentations

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.

Advertisements

AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo

Intro to Threading CS221 – 4/20/09. What we’ll cover today Finish the DOTS program Introduction to threads and multi-threading.

1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])

Chapter 4 Assessing and Understanding Performance

1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.

CPS110: Implementing threads/locks on a uni-processor Landon Cox.

NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.

Code Generation CS 480. Can be complex To do a good job of teaching about code generation I could easily spend ten weeks But, don’t have ten weeks, so.

Counters and Registers

External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.

MOVE-4: Upgrading Your Database to OpenEdge® 10 Gus Björklund Wizard, Vice President Technology.

Using ROOT geometry for FVTX reconstruction Zhengyun You Peking University Los Alamos National Lab Jan 22, 2007.

Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.

Crystal And Elliott Edward M. Kwang President. Crystal Version Standard - $145 Professional - $350 Developer - $450.

Oracle PL/SQL Programming Steven Feuerstein All About the (Amazing) Function Result Cache of Oracle Database 11g.

Internet Forms and Database Bob Kisel Amgraf, Inc.

IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.

By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.

Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.

Oracle Index study for Event TAG DB M. Boschini S. Della Torre

MonetDB/X100 hyper-pipelining query execution Peter Boncz, Marcin Zukowski, Niels Nes.

Roopa.T PESIT, Bangalore. Source and Credits Dalvik VM, Dan Bornstein Google IO 2008 The Dalvik virtual machine Architecture by David Ehringer.

Tot 15 LTPDA Graphic User Interface summary and status N. Tateo 26/06/2007.

1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.

Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.

mysql-proxy By Farhad Saberi - Overview and architecture - Making The case for a proxy - Lua - Lua examples - The admin interface - currently.

9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

How to discover the Higgs Boson in an Oracle database Maaike Limper.

Views Lesson 7.

Introduction Advantages/ disadvantages Code examples Speed Summary Running on the AOD Analysis Platforms 1/11/2007 Andrew Mehta.

Physics analysis & databases M. Limper Part 1: SQL analytics (live physics analysis demo) Part 2: Chopped-up tables and partition-wise joins 20/03/2014.

3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems

1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.

The PHysics Analysis SERver Project (PHASER) CHEP 2000 Padova, Italy February 7-11, 2000 M. Bowen, G. Landsberg, and R. Partridge* Brown University.

Cpr E 308 Spring 2005 Process Scheduling Basic Question: Which process goes next? Personal Computers –Few processes, interactive, low response time Batch.

DOUBLE INSTANCE LOCKING A concurrency pattern with Lock-Free read operations Pedro Ramalhete Andreia Correia November 2013.

LHC Physics Analysis and Databases or: “How to discover the Higgs Boson inside a database” Maaike Limper.

COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.

Eddies: Continuously Adaptive Query Processing Ross Rosemark.

David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.

Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.

Session 1 Module 1: Introduction to Data Integrity

Visual Basic for Application - Microsoft Access 2003 Finishing the application.

September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!

1 Copyright © 2005, Oracle. All rights reserved. Following a Tuning Methodology.

PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.

A first look at CitusDB & in- database physics analysis M. Limper 19/06/2014.

Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.

15 Copyright © 2004, Oracle. All rights reserved. Debugging Triggers.

Table Structures and Indexing. The concept of indexing If you were asked to search for the name “Adam Wilbert” in a phonebook, you would go directly to.

An SQL-based approach to Physics Analysis M. Limper.

3 Copyright © 2006, Oracle. All rights reserved. Designing and Developing for Performance.

COMP 430 Intro. to Database Systems Transactions, concurrency, & ACID.

Mini-Workshop on multi-core joint project Peter van Gemmeren (ANL) I/O challenges for HEP applications on multi-core processors An ATLAS Perspective.

Practical Hadoop: do’s and don’ts by example Kacper Surdy, Zbigniew Baranowski.

SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.

Stored Procedures – Facts and Myths

File System Structure How do I organize a disk into a file system?

Lecture 25 More Synchronized Data and Producer/Consumer Relationship

Lecture 12 Lecture 12: Indexing.

CS 440 Database Management Systems

LTPDA Graphic User Interface summary and status

ETL process management with TSQL

Introduction to Execution Plans

Introduction to Execution Plans

Indexes and Performance

Introduction to Execution Plans

Presentation transcript:

Physics Analysis inside the Oracle DB Progress report 10 Octobre 2013

Performance with parallel scaling  Original result showed DB-version of the H+Z benchmark was faster than serial execution of the root-ntuple-analysis,on test3-setup 2  Parallelism with root can be mimicked by running multiple simultaneous jobs each running on a subset of data  Ntuple-version improves more with parallel execution as the DB is limited by IO more data needs to be read by the DB compared to the column-storage in the ntuples  Test3-setup has datafiles stored on nfs, resulting in relativly slow I/O speed of ~250 MB/s

Ttbar cutflow analysis  A cutflow analysis for the top-pair production cross-section measurement was implemented as a new benchmark.  Original “RootCore”-packages used by ATLAS top physics group are compared to a modified set of packages that retrieve data from the DB via an SQL-query. 3 More realistic than Higgs+Z benchark:  uses 262 variables (compared to 40 in Higgs+Z)  also uses data from photon and primary-vertex objects (photon- table=114 GB table!)  Another 3 external libraries added, used to call functions for corrections on electron and photon objects  Selection of electron and photon objects can not be done as single table select, the corrections depends on the number of vertices in the event (pile-up) so an inner join with the vertex-table is required Only “electron”-channel implemented so far but “muon”-channel is very similar…

Performance with parallel scaling Ttbar cutflow analysis from root- ntuples is very slow in serial! But when running multiple simultaneous root-jobs it again becomes faster than DB-version … this is on the test3-setup where I/O is relatively slow 4

Mapred-cluster  The mapred-cluster (described in previous report) has better I/O reads 5 nodes connected to 5 disk arrays with a total of 60 disks -> up to 2500 MB/s  As shown last time root-ntuple analysis is faster on this cluster for Higgs+Z benchmark: 5 40 root-jobs: 71 seconds SQL parallel 40: 135 seconds

Mapred-cluster But the root-ntuple analysis is slower than DB for ttbar cutflow analysis 6 Optimistic conclusion: Oracle DB beats root-ntuple analysis for realistic physics analysis, when given fast enough I/O reads! 40 root-jobs: 588 seconds SQL parallel 40: 372 seconds

ttbar cutflow: why so slow? But WHY is the ttbar cutflow analysis so much slower than Higgs+Z for the ntuple-analysis on the “mapred”-cluster ? As a test I rewrote the ttbar-analysis packages to return the result only for a specific sub-selection, and similary broke the SQL-version down to produce the same result This allows to compare the timing results for DB vs ntuple for different type of selections! 7  The DB analysis was always faster, even though much less branches need to be loaded for the separate object-selection than for the full ttbar cutflow (=262 branches)  This looked suspicious to me!

ttbar cutflow: why so slow? I wrote a single root-macro to reproduce the same results as the single-electron- select from the ttbar-packages This macro was much faster: 55 seconds instead of 214 seconds 8  I think the difference in time is due to the ttbar-packages disabling branches and calling Tree->GetEntry() while my root-macro call branch->GetEntry(), this turns out to make a large difference in CPU: good-electron, simple macro w. tree->GetEntry(), 40 root-jobs: Time = 156 seconds good-electron, simple macro w. load branches, 40 root-jobs: Time = 156 seconds

Summary  Real analysis code is not always optimized to run as fast as possible!  ttbar cutflow is an interesting case-study and it uses many different object-selections that can be studied separately  Experience from converting ttbar cutflow in SQL: Using a materialized view for the goodrunlist-selection Adding external libraries (PL/SQL calling Java calling C++) is easy once you know how to! It is more difficult when selection requires cross-table selection with INNER JOIN (photon/electron selection requiring info from vertex-table for pileup correction) but not impossible I’m using lots of MATERIALIZE hints, I don’t trust the SQL optimizer as it goes a crazy with all those JOINs… I still have a single analysis query to run entire cutflow, eventually might be better/easier to write the code by explicitly creating tables to hold intermediate selection (using NOLOGGING-option and parallel dml hints to speed up table-creation)  To do: Finish ttbar cutflow, there are still some cuts requiring cross-table matching not implemented Add muon-channel, ntuple-analysis produces 2 cut-flow for muon- and electron-channel in a single analysis as they use the results after good object selection, I can do the same in SQL but I will need to create intermediate tables to hold temporary selection 9