Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.

Slides:



Advertisements
Similar presentations
ONE STOP THE TOTAL SERVICE SOLUTION FOR REMOTE DEVICE MANAGMENT.
Advertisements

Chapter 19: Network Management Business Data Communications, 5e.
Silberschatz and Galvin  Operating System Concepts Module 16: Distributed-System Structures Network-Operating Systems Distributed-Operating.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Gossip Algorithms and Implementing a Cluster/Grid Information service MsSys Course Amar Lior and Barak Amnon.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
Chapter 19: Network Management Business Data Communications, 4e.
Data - Information - Knowledge
Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
City University London
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
October 2003 Iosif Legrand Iosif Legrand California Institute of Technology.
MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 14 Server and Network Monitoring.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
PRASHANTHI NARAYAN NETTEM.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.
CH 13 Server and Network Monitoring. Hands-On Microsoft Windows Server Objectives Understand the importance of server monitoring Monitor server.
Windows Server 2008 Chapter 11 Last Update
N-Tier Architecture.
Client/Server Architectures
1 The Google File System Reporter: You-Wei Zhang.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Beyond DHTML So far we have seen and used: CGI programs (using Perl ) and SSI on server side Java Script, VB Script, CSS and DOM on client side. For some.
Databases and the Internet. Lecture Objectives Databases and the Internet Characteristics and Benefits of Internet Server-Side vs. Client-Side Special.
Application-Layer Anycasting By Samarat Bhattacharjee et al. Presented by Matt Miller September 30, 2002.
Database System Concepts and Architecture Lecture # 2 21 June 2012 National University of Computer and Emerging Sciences.
TRƯỜNG ĐẠI HỌC CÔNG NGHỆ Bộ môn Mạng và Truyền Thông Máy Tính.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Distributed systems A collection of autonomous computers linked by a network, with software designed to produce an integrated computing facility –A well.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Intro – Part 2 Introduction to Database Management: Ch 1 & 2.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Serverless Network File Systems Overview by Joseph Thompson.
A Summary of the Distributed System Concepts and Architectures Gayathri V.R. Kunapuli
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Distributed database system
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 1: Characterization of Distributed & Mobile Systems Dr. Michael R.
Copyright 2007, Information Builders. Slide 1 Machine Sizing and Scalability Mark Nesson, Vashti Ragoonath June 2008.
Data Communications and Networks Chapter 9 – Distributed Systems ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang.
Chapter 1 Database Access from Client Applications.
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
Operating Systems Distributed-System Structures. Topics –Network-Operating Systems –Distributed-Operating Systems –Remote Services –Robustness –Design.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Lec 3: Infrastructure of Network Management Part2 Organized by: Nada Alhirabi NET 311.
Chapter 19: Network Management
Project Target Develop a Web Based Management software suit that will enable users to control Hardware using standard HTTP & Java Applet compatible web.
N-Tier Architecture.
Self Healing and Dynamic Construction Framework:
Hands-On Microsoft Windows Server 2008
Maximum Availability Architecture Enterprise Technology Centre.
#01 Client/Server Computing
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Outline Midterm results summary Distributed file systems – continued
Ch 4. The Evolution of Analytic Scalability
Distributed Systems and Concurrency: Distributed Systems
#01 Client/Server Computing
Presentation transcript:

Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat

215-Jun-15 Overall Problem Monitoring a cluster of cooperating computers –Different from client-server where only server’s matter –Requires substantial information from all machines –100’s-1000’s of nodes –Client-server becomes subset of this problem

315-Jun-15 Problems & Solutions Cluster software and hardware is constantly evolving –Monitoring software must be extensible and flexible  Use relational tables Failures will occur in the cluster –Monitoring software must detect and recover from failures  Use timestamps for weak synchronization Scalability needed to hundreds of nodes –Need to efficiently transfer data from sources to sinks  Use hierarchy & hybrid push-pull protocol –Need to display statistics and information from all nodes  Use statistical aggregation + color,shade to minimize info. loss

415-Jun-15 Overview Details of solutions –Handling evolving software –Detecting and recovering from failures –Scaling data management –Scaling visualization Implementation –Architecture –Programs –Snapshot –Experience Conclusion & Future Work

515-Jun-15 Problem: Clusters Evolve Solution: Relational tables Increases flexibility by decoupling data users from data providers Increases extensibility by structuring data into independent tables Increases extensibility by allowing additional columns in tables without breaking old programs Retains performance through transparent use of indicies Improvement over tree structures in previous systems

615-Jun-15 Problem: Failures Occur Solution: Use timestamps 1 Loss of periodic updates to timestamps allow remote nodes to detect failures 2 Timestamps allow weak synchronization between databases –Better availability during failures, simpler recovery 3 Timestamps allow stale data to be eliminated –Only requires purges run every so often rather than relying on programs to clean up after themselves Reasons 2 & 3 are useful even in normal operation

715-Jun-15 Problem: Scalable Data Access Solution: Hierarchy + efficient protocol Hierarchy allows –Batching of data from different nodes (all data from routers) –Specialization to particular data (all data on processes) Efficient protocol (Hybrid of push/pull) –Sink sends (SQL select command, interval, count ) to source –Changed data is extracted via SQL every interval seconds and forwarded to the sink count times –Sink can cancel requests at any time –Achieves the best of pull and push protocols in terms of wasted data transfers, freshness, and network bandwidth

815-Jun-15 Problem: Scalable Visualization Solution: Statistical aggregation + use of shade & color to minimize information loss Aggregate across similar variables (average load of 10 machines); show dispersion (std. dev.) as shade Aggregate across variables from one node (utilization = max{disk,network,cpu}) Both forms of aggregation at the same time — hierarchical aggregation Use color to draw attention to special things (nodes down) to limit visual overload

915-Jun-15 Implementation Architecture gather node-level DB forwarder node-level DB forwarder gather node-level DB forwarder node-level DB forwarder mid-level DB joinpush forwarder mid-level DB joinpush forwarder top-level DB joinpush javaserver Java applet

1015-Jun-15 Implementation Details Databases are MiniSQL –Freely available with source code –Implements subset of SQL Forwarder implements source part of hybrid protocol –Using polling to get data from database Joinpush implements merging part of hierarchy –Control of merge sources external to the program Both forwarder & joinpush implemented in threaded C –Simpler implementation for blocking operations –Could be merged in with the database

1115-Jun-15 Implementation Details, cont. Gather implemented in perl –Simpler to add new data sources, but would like threading –Somewhat inefficient, might re-implement in C Javaserver implemented in perl –Easier to extend with additional aggregation forms –Application level proxy because Java can’t access network Javaclient implemented in Java –Allows clients to run in browser anywhere in the world –Weak feedback to javaserver to control information displayed

1215-Jun-15 Implementation Snapshot

1315-Jun-15 Experience Configuration information should be in database –Had them in random files; database collects it together Reset-world operation very important –Puts system in known state Useful for default destination of statistics of remote database –Minimizes load on monitored nodes –Potentially reduces fault tolerance Browser user interface very useful –Limitations of Java very obnoxious

1415-Jun-15 Conclusion Four problems & solutions important for any cluster monitoring system –Evolution inherent in uses of clusters –Independent failures occur in all clusters –Scalability of data management needed for large clusters –Scalability of visualization also needed for large clusters Implementation works, and initially useful, further deployment needed Experience identified problems, places for improvements.

1515-Jun-15 Future Work Automatic identification of statistics relevant to problems –Expect to be able to use Boolean disjunction learning algorithms Tracking of long term trends and statistical measures Self tuning of specialized databases based on usage Addition of notification, repair components Gathering of more statistics (via SNMP for example) Distribution of system to external sites