Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Decision HPS Performance.

Similar presentations


Presentation on theme: "© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Decision HPS Performance."— Presentation transcript:

1 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Decision HPS Performance CoE Jimmy ZHAO June 10, 2013

2 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP’s Big Data Benchmark Strategy

3 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 3 HP’s Big Data Community Engagement HP has lead in BI performance for a long time, and we are interested in working with the WDBD to leverage that leadership to Big Data HP is the only company who ever held #1 non- clustered results across 100GB, 300GB, 1TB, 3TB, 10TB, and 30TB in TPC-H (see attached slide) Today HP continues to lead in non-clustered high-end TPC-H: #1 x86 3TB, #1 10TB, and #1 30TB HP has more TPC-H publication than others

4 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 4 Business Intelligence (BI) Performance Leadership TPC-H non-clustered results* *Results as of July 15, 2010. Cannot be shared externally without additional TPC data. HP ProLiant HP Integrity Sustained leadership in BI performance over several years Multi-OS proof points: HP-UX, Windows, and Linux Multi-DB proof points: Oracle, SQL Server, and Sybase DL380 G7 DL585 G7 Superdome 2 DL980 G7 Superdome Superdome

5 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 5 HAVEn – Big Data Analytics Platform HAVEn Social mediaIT/OTImagesAudioVideo Transactional data MobileSearch engineEmailTexts Catalog massive volumes of distributed data H adoop/ HDFS Process and index all information A utonom y IDOL Analyze at extreme scale in real-time V ertica Collect & unify machine data E nterpris e Security Powering HP Software + your apps n Apps Documents

6 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Data Benchmarking Problem State

7 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 7 New Business Model Eco-system Analytics Know more from business, partners and customers More business, customer, behavior, effect and efficiency data from existing systems SoMolized Business – Social & Mobile Social marketing, advertisement and promotion Popularity ranking – Like/Unlike Mobile Internet Anytime and anywhere: Time + Location data Data Driven Business Model Efficiency data Business process tracking and re- engineering Effectiveness data Marketing effectiveness Customer understanding Customer satisfaction

8 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 8 Requirements for Big Data Benchmarking Modeling the real-world applications Handling huge data volume Data is variable Multiple types of analysis Model the real world infrastructure and technologies Demonstrating the new business model Changes in the business systems Cloud & Big Data SoMo model Low cost Reuse the same infrastructure for variable analysis works Simple framework High Velocity Different kinds of queries Interactive/ad-hoc queries Support business growth Number of analysis jobs Size of data

9 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Decision Support System Definition

10 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 10 ……………… Traditional Data Mining Process Analytic Data Warehouse ETL Data mart Semi- Structure Structure

11 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 11 CRISP-DM vs. NG-DM Business Understanding Data Preparation Modeling Evaluation Machine Learning BDPMED BUPME-(ML) NG-DM Larger data volume More complicated Faster deploy Faster analytics

12 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 12 DSS System - Scope DSS System Sources Extract Query Data In System Transform Load Machine Learning in System P M BUBU E BUBU DU P M ED P M BU E ML P M U E P M U E BUBU DU P M ED BUBU P M ED P M U E P M U E P M U E P M U E Mix Parallel

13 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 13 Analytic Structured Un-structured DSS System Scope Semi-Structured Larger Database Larger & Hybrid DW Visual & Interactive AI Continuall y Integratio n In-database Analytic more Keep growing Slowly shrink

14 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Benchmark Design - Big Decision

15 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 15 Why & How? Benchmark for A DSS/Data Mining solutions Everything running in the same system Engine of Analytics Reflecting the real business model Huge data volume Data from Social Data from Web log Data from Comments Broader Data support Semi-structured data Un-structured data Continuous Data Integration ETL just a normal job of the system Data Integration whenever there’s data Big Data Analytics Big Decision – Big TPC-DS! TPC-DS Mature and proved workload for BI Mix workloads Well defined scale factors SoMoized TPC-DS Additional data and dimension from new data Semi-structured and unstructured data TB to PB or event Zeta Byte support NEW TPC-DS generator – Agile ETL Continuously data generation and injection Consider as part of the workloads New massive parallel processing technologies Convert queries to SQL liked queries Include interactive & regular Queries Include Machine Learning jobs

16 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 16 Agile ETL Marketing Big Decision Block Diagram SNS Marketing TPC-DS Web page Sales Web log Item Reviews Social Message Search & Social Advertise Search Social Advertise Social Web pages Extractio n Transfor m Load Customer Social Feedbacks Mobile log

17 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 17 SoMolized Retails Data Model Design Almost the same data model Inject more data – Networking data – Behavior data – Tracking data – Preference data More complicated data dimension – Time + Location SM_Web_Sales SM_Sites Date_Dim SM_Custome r SM_Promotio n Item Customer_Dem ographics Customer_A ddress Time_Dim Ship_Mode Household_ Demographi cs Income_Ban d Web_Page Warehouse Web & Mobile Log

18 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 18 Workload Design DSS System Sources Extract Query Data In System Transform Load Machine Learning in System P M U E BUBU DU P M ED P M U E MLML 20% 30% 50% 90 SQL Liked/MR Job 9 SQL 9 ML Jobs Huge Volume DI: Data Injectors MR: Map Reduce Jobs ML: Machine Learning Algorithms Agile ETL Structured DI Semi-structured DI Unstructured DI SQL SQL Liked/MR ML Mix and Parallel Analytics Workloads

19 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 19 Driver Architecture - Deployment Flume Injectors MR Driver Big Decision System Name Node Second ary NN Query Driver Data Node Data Node N … Batch Controller

20 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 20 Flume Agile ETL + MR/Query Architecture Unstructured DI MR Query Structured DI … … … Injector Domain Mining Domain MR Drivers Query Drivers Batch Controller … Semi-structured DI Query … Modified TPC-DS Generator Scale Factor based

21 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 21 Batch Controller Benchmarking Metrics DI: Data Injectors MR: Map Reduce Jobs ML: Machine Learning Algorithms Peak Scaling Consistency DI SQL SQL Liked/ MR ML DI SQL SQL Liked/ MR ML DI SQL SQL Liked/ MR ML DI SQL SQL Liked/ MR ML DI SQL SQL Liked/ MR ML DI SQL SQL Liked/ MR ML DI SQL SQL Liked/ MR ML DI SQL SQL Liked/ MR ML DI SQL SQL Liked/ MR ML DI SQL SQL Liked/ MR ML DI SQL SQL Liked/ MR ML DI SQL SQL Liked/ MR ML DI SQL SQL Liked/ MR ML SQL SQL Liked/ MR ML SQL SQL Liked/ MR ML

22 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Thank you ?


Download ppt "© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Decision HPS Performance."

Similar presentations


Ads by Google