Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi Senior Software Engineer.

Similar presentations


Presentation on theme: "Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi Senior Software Engineer."— Presentation transcript:

1 Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer Intel China Software Center 2013.07.16

2 HiBench 2015-10-8 –Enhanced DFSIO Micro Benchmarks Web Search –Sort –WordCount –TeraSort –Nutch Indexing –Page Rank Machine Learning –Bayesian Classification –K-Means Clustering HDFS See our paper “The HiBench Suite: Characterization of the MapReduce-Based Data Analysis” in ICDE’10 workshops (WISS’10) 1.Different from GrixMix, SWIM? 2.Micro Benchmark? 3.Isolated components? 4.End-2-end Benchmark? 5.We need ETL- Recommendation Pipeline

3 TestCF Pref ETL ETL-Recommendation (hammer) Sales tables log table Sales updates h1h1 h2h2 h 24 ip agent Retcode cookies WP Cookies updates Sales preferences Browsing preferences User-item preferences Pref-logs ETL-logs Pref-sales Item based Collaborati ve Filtering Pref-comb HIVE-Hadoop Cluster (Data Warehouse) Item-item similarity matrix Offline test Test data Statistics & Measureme nts TPC-DS Mahout ETL-sales

4 ETL-Recommendation (hammer) Task Dependences Pref-logs ETL-logs Pref-sales Item based Collaborati ve Filtering Pref-comb ETL-sales Offline test

5 Empirical Data (hammer) 5 Intel Xeon E5-2600 @ 2.2Ghz, sandyBridge 2 x 8 x HT = 32 cores 192G Mem, WD 7200 0.3x12x4=14.4T 1000M net, 300M~400M/s 4-node cluster, RHL6.2, cdh4.1.2 HiBench etl-recomm branch, HiTune-0.9 Sales ~14G (TPC-DS scale 100), logs ~105G

6 Empirical Data (hammer) 6

7

8 LinkBench 8 Benchmark for Social Graph Service Originally Developed by Facebook on Top of MySQL –Simulate social graph workloads similar to Facebook’s online service –Key workload properties match Facebook’s real production workload Different from Analytical Workloads Our Work –Port LinkBench to HBase –On top of Phoenix (SQL support over HBase)

9 Resources HiBench –https://github.com/intel-hadoop/HiBenchhttps://github.com/intel-hadoop/HiBench HiBench ETL-Recomm Branch –https://github.com/intel-hadoop/HiBench/tree/etl-recommhttps://github.com/intel-hadoop/HiBench/tree/etl-recomm LinkBench –https://github.com/intel-hadoop/linkbenchhttps://github.com/intel-hadoop/linkbench HiTune –https://github.com/intel-hadoop/HiTunehttps://github.com/intel-hadoop/HiTune Phoenix –https://github.com/intel-hadoop/phoenixhttps://github.com/intel-hadoop/phoenix 9


Download ppt "Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi Senior Software Engineer."

Similar presentations


Ads by Google