Presentation is loading. Please wait.

Presentation is loading. Please wait.

Next-Generation Databases Miguel Branco on behalf of the RAW team.

Similar presentations


Presentation on theme: "Next-Generation Databases Miguel Branco on behalf of the RAW team."— Presentation transcript:

1 Next-Generation Databases Miguel Branco on behalf of the RAW team

2 Trends More complex hardware –Multicores, GPUs, Cloud, NUMA*, PoP+SoC**, … More complex questions –“Last month sales”  “Next month sales” More complex apps –Distributed, Service-oriented, Rack-aware,... More data analysts –Easy-of-use, Interactivity, Collaboration,.. More data –Volume, File Formats,... 2 * Non-uniform memory architectures ** Package on Package, System on a Chip

3 3

4 No data loading –No “physical” data copy: support existing file formats No database tuning –Instead, self-tuned based on actual usage patterns Not restricted to tables –Add support for trees, vectors, matrices, … Not just SQL –Instead, enable domain-specific languages 4

5 Traditional Database 5 Data adapts to the query engine DBMS SQL CSVXML JSON

6 RAW 6 Query engine adapts to the data DBMS SQL CSVXML JSON RAW lang “DSL”

7 How RAW adapts to data CSV ROOT join scan root scan csv filter … containing “good” run numbers … containing physics events Code Generate the Access Paths Code Generate the Query Build Position and Data Caches SELECT event.jet… FROM csv, root WHERE csv.RunNumber = root.RunNumber AND root. EF_2mu13 == TRUE AND … Adapt to format, file instance and query just-in-time

8 Adapting to schema & query GENERAL-PURPOSE readInt(); skipField(); readFloat(); skipRestLine(); JUST-IN-TIME Remove overhead of generic operators 8

9 Adapting to format Unroll Columns Free navigation in files Embedded indexes/existing APIs readInt(); skipField(); readFloat(); skipRest(); -fieldLength:10 -tupleLength:100 -Need fields 2 & 5 of 2 nd row moveTo(110); readInt(); moveTo(140); readFloat(); -Bitmaps, R-Trees etc. -readNextField() vs. readField(filename,id) 9

10 JIT – OPTION 2 Filter Tuple Construction Col1 Col9 Col1 Col9 Col1 5 4 3 1 2 Ad-Hoc Operators for Raw Data Col9 Scan CSV Columns 1,9 Filter Tuple Construction Col1 JIT – OPTION 1 Col1 Col9 CSV file: SELECT col9 WHERE col1 < [X] Scan CSV Column 1 Fine-grained, raw-data-aware decisions Scan CSV Column 9 10 Processes 3/5 raw col9 fields

11 Electron eventID INT eta FLOAT pt FLOAT Jet eventID INT eta FLOAT pt FLOAT Event eventID INT runNumber INT Muon eventID INT eta FLOAT pt FLOAT ROOT - C++RAW class Event { class Muon { float pt, eta; … } class Electron { float pt, eta; … } class Jet { float pt, eta; … } int runNumber; vector muons; vector electrons; vector jets; } HEP analysis: Data 11

12 HEP analysis: Queries “Identify events of interest → Filter out background events → Plot aggregated results in a histogram” SELECT event FROM root:/data1/ATLAS/*.root, csv:/data1/ATLAS/events.csv WHERE ( csv.id = event.id AND event.EF_e24vhi_medium1 OR event.EF_e60_medium1 OR event.EF_2e12Tvh_loose1 OR event.EF_mu24i_tight OR event.EF_mu36_tight OR event.EF_2mu13) AND event.muon.mu_ptcone20 < 0.1 * event.muon.mu_pt AND event.muon.mu_pt > 20000. AND ABS(event.muon.mu_eta) < 2.4 AND ….. 1000+ lines of C++ for (unsigned int imuon = 0 ; imuon size(); imuon++) { if (((*curr_entries)[jentry]. mu_ptcone20)->at(imuon) < 0.1 * ((*curr_entries)[jentry]. mu_pt)->at(imuon) && ((*curr_entries)[jentry]. mu_pt)->at(imuon) > 20000. && fabs(((*curr_entries)[jentry]. mu_eta)->at(imuon)) < 2.4 && … }... ROOT - C++RAW 12

13 RAW vs. the ROOT framework [Xeon CPU E7-28867 @ 2.13GHz 1TB HDD - 7200RPM,192GB RAM] ROOT: 900 GB in 127 files CSV: 1 “table” of IDs 13 Declarative queries + up to 90x improvement

14 RAW for High-Energy Physics End-users: –Performance (JIT, codegen, vectorwise, …) –Easy-to-use (declarative) query language Infrastructure Providers: –Data kept in original location & file format –Declarative query language  More optimization opportunities “Event” caches http://dias.epfl.ch/RAW Thank You! 14


Download ppt "Next-Generation Databases Miguel Branco on behalf of the RAW team."

Similar presentations


Ads by Google