1 A Characterization of Big Data Benchmarks Wen.Xiong Zhibin Yu, Zhendong Bei, Juanjuan Zhao, Fan Zhang, Yubin Zou, Xue Bai, Ye Li, Chengzhong Xu Shenzhen.

Slides:

Advertisements

Similar presentations

CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Advertisements

Dynamic Thread Mapping for High- Performance, Power-Efficient Heterogeneous Many-core Systems Guangshuo Liu Jinpyo Park Diana Marculescu Presented By Ravi.

Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia.

Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.

PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.

International Symposium on Low Power Electronics and Design Dynamic Workload Characterization for Power Efficient Scheduling on CMP Systems 1 Gaurav Dhiman,

1 CS533 Modeling and Performance Evaluation of Network and Computer Systems Workload Characterization Techniques (Chapter 6)

Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.

Experiments on Query Expansion for Internet Yellow Page Services Using Log Mining Summarized by Dongmin Shin Presented by Dongmin Shin User Log Analysis.

Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.

Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached Bohua Kou Jing gao.

Memory System Characterization of Big Data Workloads

Yu Zheng, Lizhu Zhang, Xing Xie, Wei-Ying Ma Microsoft Research Asia

An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Self-Correlating Predictive Information Tracking for Large-Scale Production Systems Zhao, Tan, Gong, Gu, Wambolt Presented by: Andrew Hahn.

A Hierarchical Energy-Efficient Framework for Data Aggregation in Wireless Sensor Networks IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 55, NO. 3, MAY.

Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.

Architectural Impact of SSL Processing Jingnan Yao.

By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and

Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.

CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling

DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,

Physical Layer Informed Adaptive Video Streaming Over LTE Xiufeng Xie, Xinyu Zhang Unviersity of Winscosin-Madison Swarun KumarLi Erran Li MIT Bell Labs.

Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation Lan Yi Senior Software Engineer.

Analysis of Branch Predictors

Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.

Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.

ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

Effects of wrong path mem. ref. in CC MP Systems Gökay Burak AKKUŞ Cmpe 511 – Computer Architecture.

MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.

Computer Organization and Architecture Tutorial 1 Kenneth Lee.

BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker: 林佳宜.

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.

Siyuan Liu *#, Yunhuai Liu *, Lionel M. Ni *# +, Jianping Fan #, Minglu Li + * Hong Kong University of Science and Technology # Shenzhen Institutes of.

Performance Analysis of the Compaq ES40--An Overview Paper evaluates Compaq’s ES40 system, based on the Alpha Only concern is performance: no power.

Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.

Architectural Impact of Stateful Networking Applications Javier Verdú, Jorge García Mario Nemirovsky, Mateo Valero The 1st Symposium on Architectures for.

Record Linkage in a Distributed Environment

Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.

Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.

SocialVoD: a Social Feature-based P2P System Wei Chang, and Jie Wu Presenter: En Wang Temple University, PA, USA IEEE ICPP, September, Beijing, China1.

Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS

11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,

Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

DISSERTATION RESEARCH PLAN Mitesh Meswani. Outline  Dissertation Research Update  Previous Approach and Results  Modified Research Plan  Identifying.

Workload Design: Selecting Representative Program-Input Pairs Lieven Eeckhout Hans Vandierendonck Koen De Bosschere Ghent University, Belgium PACT 2002,

Sunpyo Hong, Hyesoon Kim

Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.

1 Terrain-Constrained Mobile Sensor Networks Shu Zhou 1, Wei Shu 1, Min-You Wu 2 1.The University of New Mexico 2.Shanghai Jiao Tong University IEEE Globecom.

1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter ： Zhao-Wei Luo Che-Jung Chang,Der-Chiang.

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

Page 1 SARC Samsung Austin R&D Center SARC Maximizing Branch Behavior Coverage for a Limited Simulation Budget Maximilien Breughe 06/18/2016 Championship.

Computer Sciences Department University of Wisconsin-Madison

T-Share: A Large-Scale Dynamic Taxi Ridesharing Service

Adaptive Cache Partitioning on a Composite Core

Memory System Characterization of Commercial Workloads

The Problem Finding a needle in haystack An expert (CPU)

Fei Cai Shaogang Wu Longbing Zhang and Zhimin Tang

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

Tosiron Adegbija and Ann Gordon-Ross+

Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle

Jianbo Dong, Lei Zhang, Yinhe Han, Ying Wang, and Xiaowei Li

Organizational culture in cardiovascular care in Chinese hospitals: a descriptive cross-sectional study Emily S. Yin, Nicholas S. Downing, Xi Li, Sara.

University of Wisconsin-Madison

Dimension reduction : PCA and Clustering

The Performance of Big Data Workloads in Cloud Datacenters

Presentation transcript:

1 A Characterization of Big Data Benchmarks Wen.Xiong Zhibin Yu, Zhendong Bei, Juanjuan Zhao, Fan Zhang, Yubin Zou, Xue Bai, Ye Li, Chengzhong Xu Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences

2 Agenda Background Motivation Methodology Evaluation Conclusion Future work

24/05/2015 ETI Confidential 3 Background Requirements of a benchmark suite Characteristics of different workload-input pairs Spatio-temporal data in a real world system

24/05/2015 ETI Confidential 4 Background (1/3) Requirements of a benchmark suite –a benchmark suite should contain workloads that represent a wide range of application domains. –workloads in a benchmark suite should be as diverse as possible. –a benchmark suite should not have redundant workloads in itself, keeping simulation or measure time as short as possible.

24/05/2015 ETI Confidential 5 Background (1/3) simulation time between different numbers of workload-input pairs After removing redundancy, it can decrease 30% number of workload-input pairs and %40 simulation time.

24/05/2015 ETI Confidential 6 Background (2/3) Characteristics of different workload-input pairs –Characteristics of workloads as the size of input data set changing Stable Unstable

24/05/2015 ETI Confidential 7 Background (3/3) Spatio-temporal data in Shenzhen Transportation System –GPS trajectory data of taxicabs, taxicabs, 90 millions GPS points per day. –Smart card data in metro transportation system, 15+ millions smart cards, 12+ millions transaction records per day.

24/05/2015 ETI Confidential 8 Background (3/3) (1)2000 square kilometers, 18 millions of people. (2)road network in Shenzhen contains vertices and road segments.

24/05/2015 ETI Confidential 9 Motivation Remove redundancy of a typical benchmark suite Provide a benchmark suite for spatio-temporal data

24/05/2015 ETI Confidential 10 Motivation (1/2) Remove redundancy of a typical benchmark suite –To decrease experiment time of benchmarking the objective system by minimizing the number of typical workload-input pairs.

24/05/2015 ETI Confidential 11 Motivation (2/2) Provide a benchmark suite for spatio-temporal data –Representative workloads in our benchmark suite are as follows: transaction count (hotregion) spatiotemporal origin destination (sztod) map matching hotspot monitoring spatiotemporal secondary sort

24/05/2015 ETI Confidential 12 Methodology Typical MapReduce-based workloads Micro architecture level metrics Principal component analysis (PCA) Hierarchical clustering and K-means clustering

24/05/2015 ETI Confidential 13 Methodology Typical MapReduce-based workloads (1/2): indexworkloadsource 1sortHiBench 2wordcountHiBench 3terasortHiBench 4bayesHiBench 5K-meansHiBench 6Nutch indexingHiBench 7pagerankHiBench 8hive-jionHiBench 9Hive-aggregateHiBench 10grepDCBench 11svmDCBench

24/05/2015 ETI Confidential 14 Methodology Typical MapReduce-based workloads (2/2): indexworkloadsource 12ibcfDCBench 13fpgDCBench 14hmmDCBench 15sztodour internal program for trajectory data 16hotregionour internal program for trajectory data

24/05/2015 ETI Confidential 15 Methodology Micro architecture level metrics are as follows: –Instruction per cycle (IPC) –L1 instruction cache miss ratio –L2 instruction cache miss ratio –Last level cache miss ratio –Branch prediction per instruction –Branch miss prediction per instruction –Off-chip bandwidth utilization

24/05/2015 ETI Confidential 16 Methodology Principal Component Analysis: –It can reduce program characteristics while controlling the amount of information that is thrown away.

24/05/2015 ETI Confidential 17 Methodology Hierarchical clustering –Hierarchical clustering is a "bottom up" approach: each observation starts in its own cluster, and workload-input pairs of clusters are merged as one moves up the hierarchy. It is useful in simultaneously looking at multiple clustering possibilities, and we can use a dendrogram for selecting desired number of clusters. K-means clustering –K-means clustering aims to partition n workloads-input pairs into k clusters in which each workload-input pair belongs to the cluster with the nearest mean, where K is a value specified by user.

24/05/2015 ETI Confidential 18 Evaluation (instruction per cycle) 24/05/2015 ETI Confidential 18 The IPC of these sixteen workloads are range from 0.72 to 0.96, with an average value of Wordcount has the lowest IPC value and hotregion has highest value among these workloads.

24/05/2015 ETI Confidential 19 Evaluation (L1 ICache miss ratio) 24/05/2015 ETI Confidential 19 The cache miss ratios of these typical workloads are range from 3.9% to 19.8%, with an average value of 8.9%. svm has the lowest L1 instruction cache miss ratio and hive-aggre has the highest L1 instruction cache miss ratio.

24/05/2015 ETI Confidential 20 Evaluation (L2 ICache miss ratio) 24/05/2015 ETI Confidential 20 The cache misses value of these workloads are range from 23.7% to 64.9%. On average, workloads from DCBench in right side have larger L2 instruction miss rate then workloads from HiBench in the left side. Overall, the L2 cache is ineffective in our experiment platform.

24/05/2015 ETI Confidential 21 Evaluation (branch prediction per instruction ) 24/05/2015 ETI Confidential 21 These values are range from 0.18 to 0.23, with an average value of Hotregion has the lowest value of branch prediction per instruction while nutchindexing has the highest value of branch prediction per instruction.

24/05/2015 ETI Confidential 22 Evaluation (branch missprediction ratio ) 24/05/2015 ETI Confidential 22 These ratios are range from 1.5% to 5.6%, with an average value of 2.7%. Pagerank has the lowest branch miss prediction ratio while nutch indexing has the highest branch miss prediction ratio. The results show that the branch predictor of our processor matches these typical MapReduce based applications.

24/05/2015 ETI Confidential 23 Evaluation (off-chip bandwidth utilization) 24/05/2015 ETI Confidential 23 Among these workloads we evaluated, terasort is the only one that has the highest utilization ratio with a value of 14%. Overall, in our experiment platform, processors significantly over-provision off-chip bandwidth for these typical workloads.

24/05/2015 ETI Confidential 24 Evaluation (Hierarchical clustering )

24/05/2015 ETI Confidential 25 Evaluation (Hierarchical clustering ) (1)strong cluster, three workload-input pairs of same workload clustered together. (2)weak cluster, two workload-input pairs of same workload clustered together. (3)non cluster, no workload-input pairs of same workload clustered together. index cluster typeworkloads 1strong clusterwordcount, sort, terasort 2weak clustersztod, hotregion 3non clustersvm, ibcf

24/05/2015 ETI Confidential 26 Evaluation(K-means clustering) Seclecting 8 workload-input pairs via K-means clustering clusterworkloadsrepresentative 1sztod-98G,hotregion-17G, hmm-16Ghmm-16G 2fpg, ibcf-2Gfpg 3sztod-24G,sztod-49Gsztod-49G 4wordcount-15G,wordcount-30G, wordcount-60G, svm-20G wordcount-30G 5nutchindexing 6hotregion-35G, hotregion-70G, bayes, hive- aggre hotregion-35G 7sort-15G, sort-30G, sort-60G, terasort-25G, terasort-50G, terasort-100G, hive-join, pagerank Sort-60G 8kmeans

24/05/2015 ETI Confidential 27 Evaluation(K-means clustering) sort-60G can be taken as the representative workload-input pair of its group including eight members.

24/05/2015 ETI Confidential 28 Conclusion Redundancy exists in these pioneering benchmark suites –Such as sort and terasort. The workload behavior of trajectory data analysis applications is dramatically affected by their input data sets.

24/05/2015 ETI Confidential 29 Future work Conduct similarity analysis in workload-input pairs at a larger scale. –More metrics and larger input size Fully implement a big data benchmark suite for spatio-temporal data –Data model, data generator and typical workload-input pairs.

Thank You !!! 30