Lightning Queries Miguel Branco
Obs. 1: Eating our own (dog) food Data Database Obs. 2: Data Deluge How many of you use databases to store your own data? Which one are we going to “move”?
“Lightning Fast” Queries? Time Loading “Overhead” Preparation “Overhead”
In-situ databases! Large collections of files Integration with existing tools Multiple data formats Changing areas of interest … lack of trust in database vendors … databases “forever owning” the data Databases that operate directly on raw data files
Great, but what about … … performance? competitive
Trap FS calls to maintain caches Scan FS buffer to build caches or maps Year,Make,Model,Description,Price 1997,Ford,E350,"ac, abs, moon", ,Chevy,"Venture ""Extended Edition""","", ,Chevy,"Venture ""Extended Edition, Very Large""","", ,Jeep,Grand Cherokee,"MUST SELL! air, moon roof, loaded", != Data Loading User does not need to control when, what, how or where data is cached Blocks of raw files Row-store Col-store Best suited for … raw file format user queries Indices over … raw files cached data Positional Maps Caching File System integration Indices In-Situ Database Usability Data Deluge Positional Maps + Caching + Indices + File System Integration
EPFL