Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.

Similar presentations


Presentation on theme: "© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware."— Presentation transcript:

1 © 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware

2 2 Big Data: Not Just for the Web Giants – Now the Intelligent Enterprise

3 3 Real-time analysis allows instant understanding of market dynamics. Retailers can have intimate understanding of their customers needs and use direct targeted marketing. Market Segment Analysis  Personalized Customer Targeting`

4 4 The Emerging Pattern of Big Data Systems: Retail Example Real-Time Streams Exa-scale Data Store Parallel Data Processing Parallel Data Processing Real-Time Processing Machine Learning Data Science Cloud Infrastructure Analytics

5 5 A single GE Jet Engine produces 10 Terabytes of data in one hour – 90 Petabytes per year. Enabling early detection of faults, common mode failures, product engineering feedback. Post Mortem  Proactively Maintained Connected Product

6 6 Storage: Plan for Peta-scale Data Storage and Processing PB of Data Analytics Rapidly Outgrows Traditional Data Size by 100x

7 7 Cloud Infrastructure Supports Mixed Big Data Workloads Machine Learning Hadoop Real-Time Analytics Change workload types to Real-time Analytics, Machine Learning, Hadoop above cloud infra, too Cloud Infrastructure Machine Learning Hadoop Real-Time Analytics Management Network/Security Storage/Availability Compute

8 8 Cloud Infrastructure Supports Multiple Tenants Change workload types to Real-time Analytics, Machine Learning, Hadoop above cloud infra, too Cloud Infrastructure Management Network/Security Storage/Availability Compute Web User Analytics Financial Analysis Historical Customer Behavior

9 9 Software-defined Datacenter: Compute Agility / Rapid deployment Lower Capex Isolation for resource control and security Operational efficiency Management The Core Values of Virtualization Apply to Big Data Network/Security Storage/Availability Compute

10 10 Strong Isolation between Workloads is Key Hungry Workload 1 Reckless Workload 2 Nosy Workload 3 Cloud Infrastructure

11 11 Consolidation of workloads: Higher Utilization Hadoop 1 Hadoop 2 HBase Without virtualization independent Hadoop clusters each have access to fraction of total physical resources Consolidate and virtualize, -Consolidated cluster has access to entire pool of physical resources -For common use cases, reduce latency on priority jobs on consolidated cluster -Multiple HDFS striped across all physical hosts

12 12 Hadoop batch analysis Big Data Mix of Workloads File System/Data Store Host HBase real-time queries NoSQL Cassandra, Mongo, etc Big SQL Impala, Pivotal HawQ Compute layer Virtualization Host Other Spark, Shark, Solr, Platfora, Etc,…

13 13 Management Software-defined Datacenter: Storage Requirements of Next Generation Storage Network/Security Storage/Availability Compute 10x lower cost of storage Handle explosive data growth Support a variety of application types Solve the privacy and security issues

14 14 Software-defined Storage Enables Fundamental Economics Petabytes Deployed Traditional SAN/NAS Distributed Object Storage HDFS MAPR CEPH Scale-out NAS Isilon, NTAP

15 15 Big-Data using Local Disks Host Top of Rack Switch Servers with Local Disks 16-24 core server 12-24 SATA 2-4TB Disks 10 GbE adapter iSCSI/NFS for Shared Storage for vMotion etc,… High Performance 10GBE Switch per Rack

16 16 Big Data Storage Scale-out Network Storage Elastic Compute Scale-out Network Storage Hadoop Protocol Snapshots Posix Apps Full NFS Access Replication Erasure Coding

17 17 Customer Success: Hadoop as a Service at FedEx Scale-out Isilon Cluster -Shared Data -NAS + Hadoop Elastic vSphere Cluster -Mixed Workloads -vSphere -Existing Rack Mount Servers

18 18 Hadoop Virtual Node 2 NN data node Isilon Storage Configuration for Data/Compute Separation With Isilon Virtualization Host VMDK OS Image – VMDK Shared storage SAN/NAS OS Image – VMDK Hadoop Virtual Node 1 Ext4 Job- tracker Ext4 Temp OS Image – VMDK Ext4 Task- tracker Ext4 Hadoop Virtual Node 3 Ext4 Task- tracker Ext4

19 19 Agile Big Data at FedEx Trusted Isolation Well known auditable platform Security Deploy in minutes Optimize for shift in workload characteristics Agility Create true multi- tenancy Mixed workloads Elasticity

20 20 Breakthrough Use Cases  Web Log Analysis  Initial exploration was around detection of mobile devices accessing the website.  Analysis of 570 billion web server log entries took approximately 9 minutes to complete on a small cluster.  ZIP code Analysis  Analysis of data to determine which ZIP codes are the highest source or destination for shipments.  Shipment Analysis  Analysis of shipment information to determine patterns that may delay a package.

21 21 Cloud Infrastructure is Ready for Big Data – Are you? Cloud Infrastructure

22 22 Q&A


Download ppt "© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware."

Similar presentations


Ads by Google