Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.

Slides:



Advertisements
Similar presentations
Analysis of Computer Algorithms
Advertisements

CoMPI: Enhancing MPI based applications performance and scalability using run-time compression. Rosa Filgueira, David E.Singh, Alejandro Calderón and Jesús.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
SLA-Oriented Resource Provisioning for Cloud Computing
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
Data Mining Classification: Alternative Techniques
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.
Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications Piyush Shivam, Shivnath Babu, Jeffrey Chase Duke University.
Indian Statistical Institute Kolkata
Proactive Prediction Models for Web Application Resource Provisioning in the Cloud _______________________________ Samuel A. Ajila & Bankole A. Akindele.
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
Lecture Notes for Chapter 4 Introduction to Data Mining
Dr. Gengbin Zheng and Ehsan Totoni Parallel Programming Laboratory University of Illinois at Urbana-Champaign April 18, 2011.
Machine Learning Neural Networks
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
1 Virtual Machine Resource Monitoring and Networking of Virtual Machines Ananth I. Sundararaj Department of Computer Science Northwestern University July.
Classifier Decision Tree A decision tree classifies data by predicting the label for each record. The first element of the tree is the root node, representing.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
Lecture 5 (Classification with Decision Trees)
By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and
Ekrem Kocaguneli 11/29/2010. Introduction CLISSPE and its background Application to be Modeled Steps of the Model Assessment of Performance Interpretation.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.
Efficient Model Selection for Support Vector Machines
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Network Aware Resource Allocation in Distributed Clouds.
Department of Computer Science, University of Waikato, New Zealand Geoffrey Holmes, Bernhard Pfahringer and Richard Kirkby Traditional machine learning.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
Papers on Storage Systems 1) Purlieus: Locality-aware Resource Allocation for MapReduce in a Cloud, SC ) Making Cloud Intermediate Data Fault-Tolerant,
1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Parallelization of Classification Algorithms For Medical Imaging on a Cluster Computing System 指導教授 : 梁廷宇 老師 系所 : 碩光通一甲 姓名 : 吳秉謙 學號 :
Simulation is the process of studying the behavior of a real system by using a model that replicates the behavior of the system under different scenarios.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
A dynamic optimization model for power and performance management of virtualized clusters Vinicius Petrucci, Orlando Loques Univ. Federal Fluminense Niteroi,
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.
Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
1/22 Optimization of Google Cloud Task Processing with Checkpoint-Restart Mechanism Speaker: Sheng Di Coauthors: Yves Robert, Frédéric Vivien, Derrick.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
CS 351/ IT 351 Modeling and Simulation Technologies HPC Architectures Dr. Jim Holten.
Aneka Cloud ApplicationPlatform. Introduction Aneka consists of a scalable cloud middleware that can be deployed on top of heterogeneous computing resources.
User Scenarios in VENUS-C Focus on Structural Analysis Ignacio Blanquer I3M - UPV.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Sunpyo Hong, Hyesoon Kim
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
OPERATING SYSTEMS CS 3502 Fall 2017
Distributed Network Traffic Feature Extraction for a Real-time IDS
Parallel Density-based Hybrid Clustering
Parallel Data Laboratory, Carnegie Mellon University
Parallel Programming in C with MPI and OpenMP
Applying SVM to Data Bypass Prediction
Chapter 7: Transformations
Performance And Scalability In Oracle9i And SQL Server 2000
Presentation transcript:

Predicting performance of applications and infrastructures Tania Lorido 27th May 2011

Problem definition Objective Predicting utilization of resources (memory, CPU,...) on different computing systems in order to determine application behavior. To predict performance if the available resources change To change available resources in elastic infrastructures Three scenarios Benchmark traces on a simulator (INSEE) NAS Parallel Benchmarks Real applications on real systems (Data from U. of Florida) Applications running in the cloud (Arsys)

First scenario: INSEE

What is INSEE? Interconnection Network Simulation and Evaluation Environment Input: Traces containing messages sent among nodes. Output: Execution time And many other network-related figures

Objectives Get a dataset running several traces on the simulator Create different models -> execution time prediction Learn about ML techniques

Input traces NAS Parallel Benchmark suite Scientific codes implemented in Fortran + MPI Can run in systems of different sizes Tested with 16 or 64 tasks Run on a real system (Kalimero-like cluster) Captured the whole list of point-to-point messages sent between every pair of tasks.

Topologies 2D mesh2D torus

We have…... a set of tasks: 16 or 64 … a set of nodes: 256 (16x16 torus) How to assign tasks to nodes?

Partitioning Selecting a set of nodes Three options: random, band & quadrant An example: We need 4 nodes Topology: mesh

Random Band Quadrant

Mapping Assigning each task to one of the nodes in the set Two options: random & consecutive Example… … with band partitioning

Random Consecutive

Background noise In a real environment, several applications compete for the network. We emulate that with random messages sent among nodes: background noise Different levels

Predictive Variables

Experiment A model for each trace type (7 types) Class variable: execution time discretized in 3 bins Width Height (equal frequency) Classifiers: KNN, Naive Bayes, J48 tree 10 repeated, 5 cross-validation Accuracy

Results (I)

Results (II)

Interpretation for results Quite good results (80-100% of accuracy) Background noise doesn’t affect (information gain = ) … learning about ML techniques.

Second scenario: parallel application data from the U. of Florida

What have they done? Run a couple of real applications on real systems to obtain datasets Apply several regression techniques to predict execution time and other parameters related to resource usage. KNN, LR, DT, SVM, … Propose a new algorithm and compare it with “classical ones”

Objectives Repeat the experiment – same results? Discretize variables and apply classification techniques. Multidimensional prediction

Real applications Bioinformatics applications: BLAST : Basic Local Alignment Search Tool RAxML : Randomized Axelerated Maximum Likelihood

… running on real systems

Datasets are available BLASTRAxML 6592 data points Two class variables Execution time (seconds) Output size (bytes) 487 data points Two class variables Execution time (seconds) Resident Set Size, RSS (bytes)

Predictive variables - RAxML

Attribute selection Different sets chosen by the authors

Testing different classifiers…

First experiment - Regression 10 repeated, 10 cross-validation Classifier evaluation: Percentage error where f i = forecast value, a i = actual value Mean Percentage Error

Results

Second experiment – Classification Output variable discretized in 4 bins Width Height (equal frequency) Predictive variables discretized applying Fayyad Irani Makes groups trying to minimize entropy Same classifiers, except Linear Regression and SVM Classifier evaluation criterion: Accuracy

Results

Interpretation Height-based discretization: 65 – 75% accuracy Width-based discretization 92 – 96% accuracy … BUT…

Attribute selection Information gain with respect to the class is 0 (or close to) for some variables Previous attribute selection is done based on author criterion So… we apply: Attribute Evaluator: CfsSubsetEval Search Method: BestFirst And the results….

Conclusions Regression experiment repeated with the same results Width-based discretization discarded “Same results” after attribute selection And next… Multidimensional prediction: BLAST: Execution time & output size RAxML: Execution time & memory size (RSS)

Third scenario: prediction of resource demands in cloud computing This is future work

What does Arsys offer? (I) Traditional application and web hosting An IaaS cloud computing platform

What does Arsys offer? (II) A tool for the client to create and manage his own VMs: RAM Number of cores Disk space Theoretically, no limits in resource usage Resources can be changed dynamically  Elasticity

What do they want? A tool that: Monitors resource utilization by a user’s VM… … and predicts future utilization to… … proactively modify resource reservations… … to optimize application performance… … and cost Initially we will focus on the prediction part

Variables to predict ( an example ) Used amount of RAM. MB. Used amount of SWAP. MB. Amount of free disk space. MB. Disk performance. KB/s Processor load. MHz Processor use percentage. Network bandwidth usage. Kb/s

Approaches 1/0 predictions based on threshold Will a variable reach a certain value? Interval-based predictions Regression Time series Prediction based on trends

Questions?

Predicting performance of applications and infrastructures Tania Lorido 27th May 2011