Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intel “Big Data” Science and Technology Center Michael Stonebraker.

Similar presentations


Presentation on theme: "Intel “Big Data” Science and Technology Center Michael Stonebraker."— Presentation transcript:

1 Intel “Big Data” Science and Technology Center Michael Stonebraker

2 2 Context Intel held a national “beauty contest” to locate their next S & T center MIT won, with a “Big Data” proposal — 160 proposals $2.5M per year for 3-5 years plus 5 Intel scientists 20 PIs, half at MIT

3 3 Big Data Means What? Volume too large — Stupid analytics (i.e. SQL) solved by commercial data warehouse products — Smart analytics (predictive modelling, machine learning, …) Velocity too big — Drink from a firehose Variety too large — Data integration problem And what does this mean to computer architecture!

4 4 Big Data Means What? Volume too large – smart analytics — Array data bases — Parallel algo — Integration of linear algebra — Scalable vis Velocity too big — Main memory DBs And what does this mean to computer architecture! — Many core — Son-of-flash — Xeon Phi

5 5 Array Data Bases Elasticity in SciDB Query optimizer for SciDB Genomics benchmark — Run on SciDB, SciDB +Phi, column stores, row stores, MadLib, Hadoop Graphs as sparse arrays EarthDB

6 6 Scalable Algo Parallelizing locality sensitive hashing Other algo people are going to work in other areas — Pick your favorite algo, parallelize and make scale Scalable Julia

7 7 Integration of Linear Algebra Hardly anybody can beat BLAS/Lapack/Scalapack — 10 ** 5 difference between Python and Intel- optimized C++ — If you write operation X, chances are you will lose to Jack Dongarra by an order of magnitude — Don’t fight the wizard

8 8 Integration of Linear Algebra DBMS + Scalapack — Federation required — Resource manager required — Recoverable Scalapack required Someday — A common storage format — Would make ACID much easier, …

9 9 Visualization Resolution reduction — Using “explain” Choose the rendering automatically — Decision tree Smart prefetch Integrate with SciDB backend and Stanford visualizer front end

10 10 High Velocity Big pattern – little state — Find me a “banana” followed within 10 msec by a strawberry — Historically CEP Big state – little pattern — Assemble my global real-time risk — Main memory DBMS

11 11 High Velocity Lots of commonality between CEP and MM DBMS We are adding queues/windows to H-Store It’s clear we will do ACID – CEP as fast as CEP I predict the death of CEP

12 12 High Velocity – Other Predictions Death of Aries — Command logging much faster than data logging Death of disk-oriented OLTP data bases — H-store with anti-caching is wildly faster than MySQL with or without MemcacheD Trying an emulator for “son of flash” — Will make MM DBMSs even more attractive

13 13 Many Core 1000 cores will give major heartburn to all system software — Traditional DBMSs will collapse DBMSs cannot have shared data structures — H-Store approach Move the computation — Hardware-supported “move” — New concurrency control algorithms (revival of Dora?)


Download ppt "Intel “Big Data” Science and Technology Center Michael Stonebraker."

Similar presentations


Ads by Google