Automatic Configuration of Internet Services Wei Zheng, Ricardo Bianchini, and Thu Nguyen Department of Computer Science Rutgers University.

Slides:



Advertisements
Similar presentations
Performance Testing - Kanwalpreet Singh.
Advertisements

Parasol and GreenSwitch: Managing Datacenters Powered by Renewable Energy Íñigo Goiri, William Katsak, Kien Le, Thu D. Nguyen, and Ricardo Bianchini Department.
Social network partition Presenter: Xiaofei Cao Partick Berg.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Stanford University CS243 Winter 2006 Wei Li 1 Register Allocation.
Monitoring Data Structures Using Hardware Transactional Memory Shakeel Butt 1, Vinod Ganapathy 1, Arati Baliga 2 and Mihai Christodorescu 3 1 Rutgers University,
~1~ Infocom’04 Mar. 10th On Finding Disjoint Paths in Single and Dual Link Cost Networks Chunming Qiao* LANDER, CSE Department SUNY at Buffalo *Collaborators:
The Volcano/Cascades Query Optimization Framework
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
November 5, 2007 ACM WEASEL Tech Efficient Time-Aware Prioritization with Knapsack Solvers Sara Alspaugh Kristen R. Walcott Mary Lou Soffa University of.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Fabián E. Bustamante, Winter 2006 Understanding and dealing with operator mistakes in Internet services K. Nagaraja, F. Oliveira, R. Bianchini, R. Martin,
Managing Your Network Environment © 2004 Cisco Systems, Inc. All rights reserved. Managing Cisco IOS Devices INTRO v2.0—9-1.
JustRunIt: Experiment-Based Management of Virtualized Data Centers Wei Zheng Ricardo Bianchini Rutgers University Yoshio Turner Renato Santos John Janakiraman.
1 Sensor Relocation in Mobile Sensor Networks Guiling Wang, Guohong Cao, Tom La Porta, and Wensheng Zhang Department of Computer Science & Engineering.
Towards Energy Efficient Hadoop Wednesday, June 10, 2009 Santa Clara Marriott Yanpei Chen, Laura Keys, Randy Katz RAD Lab, UC Berkeley.
MassConf: Automatic Configuration Tuning By Leveraging User Community Information Computer Science Wei Zheng, Ricardo Bianchini, Thu Nguyen Rutgers University.
Improving Market-Based Task Allocation with Optimal Seed Schedules IAS-11, Ottawa. September 1, 2010 G. Ayorkor Korsah 1 Balajee Kannan 1, Imran Fanaswala.
Towards Energy Efficient MapReduce Yanpei Chen, Laura Keys, Randy H. Katz University of California, Berkeley LoCal Retreat June 2009.
Efficient and Robust Computation of Resource Clusters in the Internet Efficient and Robust Computation of Resource Clusters in the Internet Chuang Liu,
Cooperative Caching Middleware for Cluster-Based Servers Francisco Matias Cuenca-Acuna Thu D. Nguyen Panic Lab Department of Computer Science Rutgers University.
Distributed Cluster Repair for OceanStore Irena Nadjakova and Arindam Chakrabarti Acknowledgements: Hakim Weatherspoon John Kubiatowicz.
© Honglei Miao: Presentation in Ad-Hoc Network course (19) Minimal CDMA Recoding Strategies in Power-Controlled Ad-Hoc Wireless Networks Honglei.
Rethinking Internet Traffic Management: From Multiple Decompositions to a Practical Protocol Jiayue He Princeton University Joint work with Martin Suchara,
Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
Towards Autonomic Hosting of Multi-tier Internet Services Swaminathan Sivasubramanian, Guillaume Pierre and Maarten van Steen Vrije Universiteit, Amsterdam,
A Self-tuning Page Cleaner for the DB2 Buffer Pool Wenguang Wang Rick Bunt Department of Computer Science University of Saskatchewan.
Theophilus Benson Aditya Akella David A Maltz
A Randomized Approach to Robot Path Planning Based on Lazy Evaluation Robert Bohlin, Lydia E. Kavraki (2001) Presented by: Robbie Paolini.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
1 SOC Test Architecture Optimization for Signal Integrity Faults on Core-External Interconnects Qiang Xu and Yubin Zhang Krishnendu Chakrabarty The Chinese.
November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
Profile Driven Component Placement for Cluster-based Online Services Christopher Stewart (University of Rochester) Kai Shen (University of Rochester) Sandhya.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Building Green Cloud Services at Low Cost Josep Ll. Berral, Íñigo Goiri, Thu D. Nguyen, Ricard Gavaldà, Jordi Torres, Ricardo Bianchini.
1 Evaluating the Impact of Thread Escape Analysis on Memory Consistency Optimizations Chi-Leung Wong, Zehra Sura, Xing Fang, Kyungwoo Lee, Samuel P. Midkiff,
A Networked Machine Management System 16, 1999.
Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project June 17,2012 Students: Michael Tsalenko.
1 Efficient Dependency Tracking for Relevant Events in Shared Memory Systems Anurag Agarwal Vijay K. Garg
Server Performance, Scaling, Reliability and Configuration Norman White.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
1 Admission Control and Request Scheduling in E-Commerce Web Sites Sameh Elnikety, EPFL Erich Nahum, IBM Watson John Tracey, IBM Watson Willy Zwaenepoel,
An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
CS270 Project Overview Maximum Planar Subgraph Danyel Fisher Jason Hong Greg Lawrence Jimmy Lin.
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
Scale up Vs. Scale out in Cloud Storage and Graph Processing Systems
Pipelining and Retiming
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Optimization Problems
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Module 9 Planning and Implementing Monitoring and Maintenance.
1 Simple provisioning, complex consolidation – An approach to improve the efficiency of provisioning oriented optical networks Tamás Kárász Budapest University.
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
An Efficient Threading Model to Boost Server Performance Anupam Chanda.
APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.
Automating Configuration Troubleshooting with Dynamic Information Flow Analysis Mona Attariyan Jason Flinn University of Michigan.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
7 Finding Bridge in a Graph. What is a bridge ? A C D B F G E.
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
CGRA Express: Accelerating Execution using Dynamic Operation Fusion
CS223 Advanced Data Structures and Algorithms
Ann Gordon-Ross and Frank Vahid*
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Admission Control and Request Scheduling in E-Commerce Web Sites
Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs
Presentation transcript:

Automatic Configuration of Internet Services Wei Zheng, Ricardo Bianchini, and Thu Nguyen Department of Computer Science Rutgers University

Motivation Downtime and management are expensive for services Downtime can cost millions of dollars per hour [IW00] Management can account for 5080% of IT budgets [Brown05] Operator mistakes are a major source of downtime Cause up to 36% of service failures [Oppenheimer03] Misconfiguration is the most common type of mistake More than 50% of service failures [Oppenheimer03] 24 out of 42 mistakes in a multi-tier service [Nagaraja04]

Understanding Misconfigurations Misconfigurations occur frequently for two main reasons: Servers and services are becoming increasingly complex Operators modify servers configs often, as service evolves via software and hardware upgrades, node additions and removals Example: Apache One config file has 240+ lines relating to performance, structure, modules, and others Config files must change every time a node is added or removed in the 2 nd tier

Our Work Idea: Automatically generate server config files as the system and its workload evolve Goals: Reduce config complexity and operator effort, and generate optimized config files efficiently Contributions: Infrastructure for automatic configuration of services Heuristics for efficient parameter tuning Quantitative evaluation for a realistic service

Outline Motivation Approach and Contributions Automatic Configuration Infrastructure (ACI) Tuning Parameters Evaluation Conclusions

Key Observations Behind ACI Most of each servers config doesnt change Node additions and removals require tedious and mistake-prone reconfigurations Changes to a config parameter affect only a few other parameters

Key Observations Behind ACI Most of each servers config doesnt change Config file templates and generation scripts Node additions and removals require tedious and mistake-prone reconfigurations Changes to a config parameter affect only a few other parameters

Key Observations Behind ACI Most of each servers config doesnt change Config file templates and generation scripts Node additions and removals require tedious and mistake-prone reconfigurations Network of membership daemons Changes to a config parameter affect only a few other parameters

Key Observations Behind ACI Most of each servers config doesnt change Config file templates and generation scripts Node additions and removals require tedious and mistake-prone reconfigurations Network of membership daemons Changes to a config parameter affect only a few other parameters Parameter dependency graph and parameter-tuning heuristics

Global View of ACI (3-tier Service)

Local View of ACI Daemon Scripts + templates Config files Node i Global membership info + tuned parameter values

Outline Motivation Approach and Contributions Automatic Configuration Infrastructure (ACI) Tuning Parameters Evaluation Conclusions

Tuning Parameters Tune parameters for some metric, as service evolves Evolution may cause current config to behave poorly Ex: when throughput is the metric, more memory may allow more threads and higher throughput Problem Hard to know what parameters are affected Hard to select new values for the parameters

Parameter Dependency Graph Idea: Constrain the search space with a parameter dependency graph and value ranges Vertex = parameter, directed edge = dependency After each change, ACI traverses the graph to find the affected parameters and tune them experimentally Ex: if Apache node is upgraded, make Apache parameters sources and tune reachable parameters Overhead of generating graph can be amortized

Generating the Graph 1 st step: find important parameters using CART 2 nd step: find dependencies between important parameters Pair-wise test: parameter A depends on B iff different settings of B lead to different best values for A; check min, mid, max values

Generating the Graph 1 st step: find important parameters using CART 2 nd step: find dependencies between important parameters Pair-wise test: parameter A depends on B iff different settings of B lead to different best values for A; check min, mid, max values A depends on B

Generating the Graph 1 st step: find important parameters using CART 2 nd step: find dependencies between important parameters Pair-wise test: parameter A depends on B iff different settings of B lead to different best values for A; check min, mid, max values A depends on BB doesnt depend on A

Generating the Graph (cont.) An evolution may change the bottleneck tier Dependencies may be affected by provisioning of tiers Ex: if throughput is the metric, different bottleneck tiers may lead to different dependencies Run dependency checking once per tier, each time forcing the tier to become the bottleneck Dependency graph is the union of these results

Tuning = Traversing the Graph In a cycle, tune all member vertices at the same time In an acyclic chain, tune parents first assuming best values for ancestors and current values for descendants A singleton is tuned in isolation Select values using Simplex algorithm and value sets # experiments increases with connectivity, value set size

Summary Find important parameters Find dependencies Tune parameters Normal execution Evolution

Outline Motivation Approach and Contributions Automatic Configuration Infrastructure (ACI) Tuning Parameters Evaluation Conclusions

Evaluation: Performance Tuning 3-tier auction (Apache, Tomcat, MySQL) on 9 nodes; 24 performance parameters Management complexity and mistake-elimination results in the paper Efficiency: Number of throughput-tuning experiments after DB machine upgrade, 2 application server removal

Online Auction Dependencies

Performance Tuning Approaches Exhaustive: O(R N ) runs; R = parameter range and N = # parameters = 24; infeasible Dependency graph + exhaustive: N = 9; infeasible Simplex ACI = Dependency graph + Simplex

Performance Tuning Efficiency ACI needs 2 service changes to amortize the graph overhead EvolutionTuning Approach Throughput (reqs/sec) Number of Experiments DB node upgrade Best prior values343- Simplex search ACI Removal of 2 application servers Best prior values91- Simplex search ACI

Amortizing the Graph Overhead Does dependency graph change when service evolves? DB node upgrade Graph does not change Removal of 2 application server nodes Graph does not change Upgrade MySQL server version Graph is a subset of original one

Conclusions Proposed automatic configuration infrastructure Proposed notion of parameter dependency graph Quantified benefits experimentally for realistic service ACI reduces config complexity, eliminates misconfigs, and produces high-performance services efficiently

Questions?