Presentation on theme: "Here are my Data Files. Here are my Queries. Where are my Results? Stratos Idreos* Ioannis Alagiannis Ryan Johnson § Anastasia Ailamaki § University of."— Presentation transcript:
Here are my Data Files. Here are my Queries. Where are my Results? Stratos Idreos* Ioannis Alagiannis Ryan Johnson § Anastasia Ailamaki § University of Toronto *CWI, Amsterdam École Polytechnique Fédérale de Lausanne
CERN ($20B physics experiment) Last year: 35PB! Experiments, simulation, user data… All stored in flat files Database only stores metadata Custom solutions & scripts Almost never a DBMS 2 Why???
Why people dont use DBMS? 3 Requirements Analysis Define a schema Load the data Tune the system Evolving requirements => no convergence Iterate to convergence
Data import & tuning 4 Database Not worth the startup cost Flat Files Load Tuples Massage Data DBMS owns the data now Why complete load? Hire DB expert? Which format? Why wait?
Avoiding up-front overheads 5 a1a2a3…a10… DBMS actions driven by workload Flat File Hot data Flat files an integral part of the system Flat files an integral part of the system Adaptive loads Query over flat files Tuning in background
Dynamic file adaptation 7 a1a2a3…… Original Flat File a1a2…a4… a1a2…a4… New Flat Files a) Parse only needed columns b) New flat file per attribute Analyze non-tokenized attributes
Adaptive loading in practice 8 Amortize loading cost over the query sequence Q1: Loading Cost + First QueryConstant performance for all queries a)On-the-fly load b)Cache data Filtering on-the-fly Q1: half the cost Q11: load from FF select sum(a1), avg(a2) from R where a1
Invisible DBMS Towards a fully autonomous system 9 Challenge: make this invisible Give me your data as is grep, awk (supports SQL + your tools) Give me your queries Get your results! Adaptive Kernel Adaptive Load Adaptive Data Store