The State of the Art in Supporting “Big Data” by Michael Stonebraker.

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

Introducing WatchGuard Dimension. Oceans of Log Data The 3 Dimensions of Big Data Volume –“Log Everything - Storage is Cheap” –Becomes too much data –
ProDeaf Breaks the Communication Barrier Between Deaf and Hearing, Translating Portuguese into Brazilian Sign Language through a Platform Built on Microsoft.
Setting Big Data Capabilities Free How to Make Business on Big Data? Stig Torngaard, Partner Platon.
Don’t Let Anybody Slip into Your Network! Using the Login People Multi-Factor Authentication Server Means No Tokens, No OTP, No SMS, No Certificates MICROSOFT.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
MyCloudIT Removes the Complexity of Moving Cloud Customers’ Entire IT Infrastructures to Microsoft Azure – Including the Desktop MICROSOFT AZURE ISV: MYCLOUDIT.
Introduction to Introduction to Database Systems Rose-Hulman Institute of Technology Curt Clifton.
Chapter 1: Data Models and DBMS Architecture Title: What Goes Around Comes Around Authors: M. Stonebraker, J. Hellerstein Pages: 2-40.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
With the Help of the Microsoft Azure Platform, Devbridge Group Provides Powerful, Flexible, and Scalable Responsive Web Solutions MICROSOFT AZURE ISV PROFILE:
SPRING 2011 CLOUD COMPUTING Cloud Computing San José State University Computer Architecture (CS 147) Professor Sin-Min Lee Presentation by Vladimir Serdyukov.
PARALLEL DBMS VS MAP REDUCE “MapReduce and parallel DBMSs: friends or foes?” Stonebraker, Daniel Abadi, David J Dewitt et al.
Data Management Capabilities and Past Performance Dr. Srinivas Kankanahalli.
Copyright © 2014 Pearson Education, Inc. 1 It's what you learn after you know it all that counts. John Wooden Key Terms and Review (Chapter 6) Enhancing.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Presentation to the Housing Technology Conference Tim Cowland- Senior Consultant 27 th February 2014 The Rise of the Housing Cloud.
IPNexus Briefing Instant Messaging and Collaboration.
Introduction To Windows Azure Cloud
Facebook (stylized facebook) is a Social Networking System and website launched in February 2004, operated and privately owned by Facebook, Inc. As.
MapReduce With a SQL-MapReduce focus by Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2
DECISION SUPPORT SYSTEM ARCHITECTURE: The data management component.
With the Help of the Microsoft Azure Platform, Awingu’s Web-Based Workspace Aggregator Enables Concrete and Easy Mobility Scenarios MICROSOFT AZURE ISV.
What Does ‘Big Data’ Mean and Who Will Win? Michael Stonebraker.
© 2012 Datameer, Inc. All rights reserved. Page 1 © 2012 Datameer, Inc. All rights reserved. Hadoop in Financial Services Adam Gugliciello, Solutions Engineer.
Agenda Data Storage Database Management Systems Chapter 3 AOL Time-Warner Case.
Creating New Business Value with Big Data Attivio Active Intelligence Engine®
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
OpenField Consolidates Stadium Data, Provides CRM and Analysis Functions for an Intelligent, End-to-End Solution COMPANY PROFILE : OPENFIELD Founded by.
Datalayer Notebook Allows Data Scientists to Play with Big Data, Build Innovative Models, and Share Results Easily on Microsoft Azure MICROSOFT AZURE ISV.
 Cachet Technologies 1998 Cachet Technologies Technology Overview February 1998.
1 Melanie Alexander. Agenda Define Big Data Trends Business Value Challenges What to consider Supplier Negotiation Contract Negotiation Summary 2.
PANEL SENIOR BIG DATA ARCHITECT BD-COE
Big Data: Electronic Gold And why Oreus should invest in Big Data Thomas Snuverink.
Information Systems in Organizations
CloudWay.ro Gives Clients Fast Invoicing, Stock Management, and Resource Planning via Microsoft Azure and Azure SQL Database MICROSOFT AZURE ISV PROFILE:
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Connect Applications and Business Partners in Integration Cloud, the Reliable and Transparent Integration Environment Built on Microsoft Azure MICROSOFT.
Microsoft Azure and DataStax: Start Anywhere and Scale to Any Size in the Cloud, On- Premises, or Both with a Leading Distributed Database MICROSOFT AZURE.
Data-Centric Security and User Access Controls for Hadoop on Microsoft Azure MICROSOFT AZURE APP BUILDER PROFILE: BLUETALON BlueTalon provides data-centric.
Axis AI Solves Challenges of Complex Data Extraction and Document Classification through Advanced Natural Language Processing and Machine Learning MICROSOFT.
Built on the Microsoft Azure Platform, UberCloud Helps Engineers and Software Providers to Offer and Deploy Powerful Cloud Services On Demand MICROSOFT.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Information Systems in Organizations Managing the business: decision-making Growing the business: knowledge management, R&D, and social business.
Big Data Analytics Are we at risk? Dr. Csilla Farkas Director Center for Information Assurance Engineering (CIAE) Department of Computer Science and Engineering.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
BIG DATA. The information and the ability to store, analyze, and predict based on that information that is delivering a competitive advantage.
 Cloud Computing technology basics Platform Evolution Advantages  Microsoft Windows Azure technology basics Windows Azure – A Lap around the platform.
Dato Confidential 1 Danny Bickson Co-Founder. Dato Confidential 2 Successful apps in 2015 must be intelligent Machine learning key to next-gen apps Recommenders.
© 2016 Catalyze, Inc. Go-To-Market Services HIPAA Compliance in the Cloud: Catalyze Provides Microsoft Azure Customers with a HITRUST Certified Platform-as-a-Service.
An Introduction To Big Data For The SQL Server DBA.
Improve the Performance, Scalability, and Reliability of Applications in the Cloud with jetNEXUS Load Balancer for Microsoft Azure MICROSOFT AZURE ISV.
Intel “Big Data” Science and Technology Center Michael Stonebraker.
What Does ‘Big Data’ Mean and Who Will Win? Michael Stonebraker
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Process Improvement Process Identification
Veeam Backup Repository
What is business intelligence?
Marketing Operations Leverages Scalable and Secure Machine Learning, Big Data from Azure “We deal with large streams of sensitive data from our users,
Partner Logo Azure Provides a Secure, Scalable Platform for ScheduleMe, an App That Enables Easy Meeting Scheduling with People Outside of Your Company.
Crypteron is a Developer-Friendly Data Breach Solution that Allows Organizations to Secure Applications on Microsoft Azure in Just Minutes MICROSOFT AZURE.
Carl Data Solutions Collects Utility Sensor and Meter Data to Provide Advanced Reporting, Alarming, and Analytics with Microsoft Azure MICROSOFT AZURE.
Ch 4. The Evolution of Analytic Scalability
Clouds & Containers: Case Studies for Big Data
XtremeData on the Microsoft Azure Cloud Platform:
Big Data Young Lee BUS 550.
McGraw-Hill Technology Education
Big DATA.
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

The State of the Art in Supporting “Big Data” by Michael Stonebraker

2 What is “Big Data” Too much Volume (I have too much data) Too much Velocity (Its coming at me too fast) Too much Variety (Its coming at me from too many places in too many formats)

3 Too Much Data -- The Data Warehouse World (structured data) Mature (and large) commercial market with several well-regarded vendors I know of a couple dozen of these in production use on petabytes of data — E.g. Zynga, E-Bay — That is about 20 Mbytes for every person in the US! No reason why this technology won’t scale as customers want larger installations — Expect data warehouses to get larger by 1-2 orders of magnitude over the rest of this decade.

4 Too Much Data -- The Hadoop/Hive World (semi-structured data) I know over another 20 or so petascale Hadoop installations — E.g. Facebook No reason this technology won’t continue to scale And probably converge with the data warehouse world

5 Too Much Data –- The Data Scientist World Predictive Modelling, data mining, data clustering, recommendation engines, …. Complex analytics – not in SQL Not well understood — World of research, start-ups, … My prediction: — As the world moves from simple analytics to complex analytics, the server side technology will mature to meet the need

6 Too Fast Often a legacy problem — Rise in stock market volume breaking the legacy real-time infrastructure of investment banks Usually solvable by throwing money/hardware at the problem Usually amenable to aggregation in the sensor network to knock down the velocity — E.g. car insurance sensors

7 Too Fast Some problems yet to be solved (query languages, integration of storage with “on-the-wire processing) — But I see no showstoppers here Technology is capable of handling “the firehose” that will result from “the internet of things”

8 Too Many Places Mature technology for integrating 20 data sources — Extract-Transform and Load (ETL) vendors But how to integrate 10,000? — Novartis has 10,000 bench chemists and biologists, each with an (independently constructed) data set of experimental results — Company wants to integrate these 10,000 data sources — And add additional ones from the public web

9 Too Many Places Research problem! — Killing most CIO’s that I know Very active area of investigation Startups in this space If there is any achilles heel in big data, this is it!

10 DBMS Security Works well — i.e. I have never heard of the DBMS screwing up in this area.

11 Encryption Can be entrusted to the DBMS — Appropriate when there are many clients sharing data — Don’t want the encryption key to be on 500 desktops Can be entrusted to the client — Appropriate when there is personal (single user) data — See Nickolai’s talk this afternoon

12 Leaks Usually insiders (think Edward Snowdon) Or unguarded desktops (my password on a post-it note on my PC) No possible way for the DBMS to prevent this

13 However DBMS can write a “command log” (everything everybody did) — Enables after-the-fact auditing — Sniff the log for suspicious behavior (unusual activity) — Would be a nice DBMS add-on But it is a human management problem to actually use it!!!