© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Data Directions Greg.

Slides:



Advertisements
Similar presentations
From Startup to Enterprise A Story of MySQL Evolution Vidur Apparao, CTO Stephen OSullivan, Manager of Data and Grid Technologies April 2009.
Advertisements

© 2006 DataCore Software Corp SANmotion New: Simple and Painless Data Migration for Windows Systems Note: Must be displayed using PowerPoint Slideshow.
Starfish: A Self-tuning System for Big Data Analytics.
 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
Planning for Converged Infrastructure
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Agile Infrastructure built on OpenStack Building The Next Generation Data Center with OpenStack John Griffith, Senior Software Engineer,
HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.
High Performance Analytical Appliance MPP Database Server Platform for high performance Prebuilt appliance with HW & SW included and optimally configured.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 1 HiVertica Capstone Project.
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Software Defined Networking.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP StoreOnce How to win.
Chapter 9 Designing Systems for Diverse Environments.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Gary Humphries HP StorageWorks Software.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice HP StorageWorks LeftHand update Marcus.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Restricted. For HP.
Lecture-8/ T. Nouf Almujally
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP 3PAR StoreServ 7000.
Business Intelligence: The Next Big Thing (Really!) John Bair CTO, Ajilitee Sep 14, 2012 Presented to TDWI St. Louis Chapter.
CLOUD COMPUTING. A general term for anything that involves delivering hosted services over the Internet. And Cloud is referred to the hardware and software.
Apache Trafodion (incubating) Enterprise-Class Transactional SQL-on-Hadoop DBMS trafodion.incubator.apache.org.
Ch 4. The Evolution of Analytic Scalability
Opensource for Cloud Deployments – Risk – Reward – Reality
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Apache and Hadoop are.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Copyright 2009 Fujitsu America, Inc. 0 Fujitsu PRIMERGY Servers “Next Generation HPC and Cloud Architecture” PRIMERGY CX1000 Tom Donnelly April
Workload Optimized Processor
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Restricted. For HP.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark Cluster Monitoring 2.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Introduction to Hadoop and HDFS
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. LogKV: Exploiting Key-Value.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
An Introduction to HDInsight June 27 th,
The exponential growth of data –Challenges for Google,Yahoo,Amazon & Microsoft in web search and indexing The volume of data being made publicly available.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Using NAS as a Gateway to SAN Dave Rosenberg Hewlett-Packard Company th Street SW Loveland, CO 80537
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 1 Vertica to HDFS Capstone.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
Stairway to the cloud or can we take the highway? Taivo Liik.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
2014 Redefining the Data Center: White-Box Networking Jennifer Casella October 9, 2014 #GHC
BIG DATA/ Hadoop Interview Questions.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
BIG DATA BIGDATA, collection of large and complex data sets difficult to process using on-hand database tools.
St. Petersburg, 2016 Openstack Disk Storage vs Amazon Disk Storage Computing Clusters, Grids and Cloud Erasmus Mundus Master Program in PERCCOM Author:
Protecting a Tsunami of Data in Hadoop
Big Data Analytics on Large Scale Shared Storage System
Organizations Are Embracing New Opportunities
Big Data Enterprise Patterns
Hadoop Aakash Kag What Why How 1.
An Open Source Project Commonly Used for Processing Big Data Sets
Grid Means Business OGF-20, Manchester, May 2007
Ch 4. The Evolution of Analytic Scalability
Overview of big data tools
Charles Tappert Seidenberg School of CSIS, Pace University
SQL Server 2019 Bringing Apache Spark to SQL Server
Presentation transcript:

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Data Directions Greg Battas Big Data Chief Technologist

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 2 Several shifts beginning in Big Data Architecture Big Data is Growing Up Big Data cluster consolidation Software defined storage taking root Software organizing around a common base Purpose built hardware for Big Data

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 3 Comparing Big Data and CI architectures Ethernet Switches Shared Storage Blade SAN Switches Argos Ethernet Switches Converged InfrastructureBig Data Ethernet designed for Flexibility Blades allow dense compute nodes Storage arrays shared by SAN designed to be accessible to any node so that it can be dynamically allocated Network designed for low cost/high cross sectional bandwidth Argos allows maximum density with mediocre CPU power Direct attached storage with minimal hardware resiliancy is used for cost and cultural reasons

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 4 Big Data Architecture Principals and Pitfalls Principals Began with a movement away from proprietary storage and databases Parallel Programming/Distributed Filesystem’s on industry standard hardware “Move” compute closer to the data/disk to reduce overhead Direct Attached Storage with S/W resiliency Strong Open Source culture Major Ecosystems with Rapidly evolving, mix and match functionality Pitfalls Provisioning servers means moving data Difficult to quickly “re-slice” a configuration No simple sharing of data amongst clusters Big Data must be copied to each cluster to leverage various H/W and S/W Node Hadoop Batch Processing Hbase Event ProcessingVertica AnalyticsSAS VA 12am – 6am 6am – 12am

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 5 Common Wisdom “Take the Processing to the Data” Unlike other apps, big data depends on massive IO to read huge amounts of data from disk Traditional SAN approaches where every block must be shipped over a SAN does not scale cost effectively Big data scales because the processing happens close to the data by using internal DAS and shipping work to each node Ethernet Switches Shared Storage SAN Switches Ethernet Switches Traditional ITBig Data App

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 6 Reality “Take the Processing to the Data” Only a portion of the processing can be done locally Shuffles redistribute data across the grid Replication pushes inserts and updates to multiple nodes MPP RDBMS’s have spent years optimizing this problem Learned that operations should be pushed down if they are Data reducing Have complete locality of data Learned that the majority of the CPU power is still needed for work that can’t be pushed down Reduce

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 7 Big Data is often deployed with distributed file systems on industry standard hardware Software Defined Storage A different approach The largest data stores in the world chose to move to industry standard servers running parallel file systems rather than traditional storage arrays or databases HDFS, S3, Swift and Cinder are becoming most significant as interfaces Today a mix of proprietary and open source technologies Big data is accelerated the adoption of SDS into other areas

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 8 HDFS becoming the common substrate for many Big Data Software vendors HDFS Hadoop MapReduce MPP DBMS Data Integration Analytic Tools &Frameworks Enterprise Security Unstructured Analytics

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 9 Fueling a shift to open source NoSQL products being adopted by software vendors The first wave of Big Data was around Batch Hadoop for Analytics and ETL offload Often coupled with interactive SQL co-processors Now we are seeing growing interest in NoSQL products Commercial ISV’s are the canary in the coal mine Some very aggressive projects to port to NoSQL Hbase seems to be preferred by ISV’s Challenge of moving commercial products to NoSQL SQL Language Transactions Joins

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 10 System on a Chip creates a new model for servers The Shift to Optimized hardware The significance of Moonshot goes far beyond packaging The power of purpose built hardware The Economics of Dark Silicon Acceleration Open source opens the door

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 11 Where we are working in Big Data Allow customers to converge big data clusters Leverage shared resources for multiple big data environments Allow rapid elasticity and provisioning without moving data Ability to store data once and operate on it with different types of compute nodes Bring big data software together into a common framework Hadoop, Unstructured analytics, MPP DBMS, Enterprise Security, analytic tools and data integration tools Aligned around a common distributed filesystem (HFDS compliant) Support multi-temperate data Assist ISV’s and customers moving to NoSQL Leverage HP Intellectual property in database Use Moonshot to leverage the shift to optimized hardware

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 12