Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project.

Slides:

Advertisements

Similar presentations

1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.

Advertisements

Network II.5 simulator ..

Research Issues in Web Services CS 4244 Lecture Zaki Malik Department of Computer Science Virginia Tech

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.

Scalable Content-Addressable Network Lintao Liu

Jaringan Komputer Lanjut Packet Switching Network.

Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.

PZ13B Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ13B - Client server computing Programming Language.

GridFlow: Workflow Management for Grid Computing Kavita Shinde.

Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

SQM - 1DCS - ANULECTURE Software Quality Management Software Quality Management Processes V & V of Critical Software & Systems Ian Hirst.

16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.

What is adaptive web technology?  There is an increasingly large demand for software systems which are able to operate effectively in dynamic environments.

DISTRIBUTED COMPUTING

Lecture Week 3 Introduction to Dynamic Routing Protocol Routing Protocols and Concepts.

Data Communications and Networking

Chapter 6: The Traditional Approach to Requirements

Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.

The Design Discipline.

DEMIGUISE STORAGE An Anonymous File Storage System VIJAY KUMAR RAVI PRAGATHI SEGIREDDY COMP 512.

Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.

1 System Models. 2 Outline Introduction Architectural models Fundamental models Guideline.

INFORMATION SYSTEMS Overview

CH2 System models.

“Intra-Network Routing Scheme using Mobile Agents” by Ajay L. Thakur.

DISTRIBUTED COMPUTING

QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.

SOFTWARE DESIGN AND ARCHITECTURE LECTURE 09. Review Introduction to architectural styles Distributed architectures – Client Server Architecture – Multi-tier.

Introduction To System Analysis and Design

TELE202 Lecture 5 Packet switching in WAN 1 Lecturer Dr Z. Huang Overview ¥Last Lectures »C programming »Source: ¥This Lecture »Packet switching in Wide.

Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.

임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.

Information: Policy, Strategy and Systems Module Overview

Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.

DISTRIBUTED COMPUTING. Computing? Computing is usually defined as the activity of using and improving computer technology, computer hardware and software.

Packet switching network Data is divided into packets. Transfer of information as payload in data packets Packets undergo random delays & possible loss.

Database Environment Chapter 2. Data Independence Sometimes the way data are physically organized depends on the requirements of the application. Result:

A Software Framework for Distributed Services Michael M. McKerns and Michael A.G. Aivazis California Institute of Technology, Pasadena, CA Introduction.

Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.

1 BBN Technologies Quality Objects (QuO): Adaptive Management and Control Middleware for End-to-End QoS Craig Rodrigues, Joseph P. Loyall, Richard E. Schantz.

CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

Secure Systems Research Group - FAU 1 WS-Reliability Pattern Ingrid Buckley Dept. of Computer Science and Engineering Florida Atlantic University Boca.

Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.

AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?

Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.

CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.

Quality Is in the Eye of the Beholder: Meeting Users ’ Requirements for Internet Quality of Service Anna Bouch, Allan Kuchinsky, Nina Bhatti HP Labs Technical.

Network Topologies for Scalable Multi-User Virtual Environments Lingrui Liang.

1 The XMSF Profile Overlay to the FEDEP Dr. Katherine L. Morse, SAIC Mr. Robert Lutz, JHU APL

James A. Senn’s Information Technology, 3rd Edition

OPERATING SYSTEMS CS 3502 Fall 2017

Resource Management IB Computer Science.

Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng

Chapter 6 The Traditional Approach to Requirements.

Introduction to Load Balancing:

Peer-to-peer networking

Quick Introduction to OS

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.

Replication Middleware for Cloud Based Storage Service

Ch 15 –part 3 -design evaluation

Fault Tolerance Distributed Web-based Systems

Distributed Systems Bina Ramamurthy 11/30/2018 B.Ramamurthy.

CLUSTER COMPUTING.

Data and Computer Communications

Distributed Systems Bina Ramamurthy 4/22/2019 B.Ramamurthy.

Distributed Systems and Concurrency: Distributed Systems

Presentation transcript:

Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project

Overview Introduction What is a distributed system What is robustness Aims of study Method Development of simulated model Investigation of model Results & Findings

Introduction What is a distributed system? Network of connected entities (agents) Entities communicate via message passing  Agents can engage with any other agent Decentralised control  Contrast to P2P with centralised indexes Behavior of individual agents conform to the goals of the system

Introduction Example: Web services Services are offered by a collection remote computers in a network Services are combined to build complex super-services Appearance that super-services are provided by a single interface agent (web- server)

Introduction Insurance policy comparison service Insurance Broker

Introduction What is robustness? Correct operation under varied conditions High tolerance of failure (extreme conditions) Graceful in defeat

Introduction Aim of study To improve robustness in distributed systems Implementation and comparison of two alternative systems How is robustness achieved? Redundancy Regeneracy (Adaptation) Load balancing

Method Simulation model Offers services (tasks) Responds to service requests Services are highly coupled Agent based network consisting of 20 agents Measures  Success  Response time

Method Model framework Components Network mechanics Sequence of execution Representation of time Agent communication  Messages

Method Components Object oriented Simulator object  Controls timing of events Agent objects  Provide services  Communicate with other agents Message objects  Method of communication

Method Components

Method Components

Method Network mechanics Underlying interconnection network (Internet) An agent can engage any other agent Agents form a subset of all possible relationships Routing and propagation latencies are abstracted by the Simulator Message types  Service  Capability sharing  Agent information

Method Network mechanics (underlying network)

Method Network mechanics (agent relationships)

Method Sequence of execution Simulation is sliced into a sequence of time steps (abstraction of real time) In each time step:  Messages are forwarded by Simulator  Agents are prompted sequentially (any ordering) to execute the time step Execute scheduled services Respond to received messages  Messages are received and ordered by Simulator

Method Sequence of execution

Method Representation of time Floating point number  Global time (GT) Current time step  Local time (LT) Function of the total number of processing cycles used prior to event  Result: Time = GT.LT

Method Agent communication Messages  Passed between agents  Forwarded via Simulator  Types Request, Response, Forward, Receipt 11 across the 3 areas: Service Capability sharing Agent information

Method Agent communication Role of Simulator  Accepts messages and calculates delivery time Applies latency  Orders messages according to time of delivery  Forwards all messages that reach their destination within the current time step

Method Agent communication (sending)

Method Agent communication (sending)

Method Agent communication (receiving) Messages in inbox are processed according to delivery time  At start of time step inbox contains all messages for that time step  Agent only reads a message when agent time reaches time of delivery

Method Service Unit of work provided by an agent Can be requested by external users or agents within the system Complex workflows (dependencies)  Dependence represents the results of a service being used by another service Virtual knowledge communities

Method Service Example, task 1 1: do part1 2: do part2 3: request task2,task3,task4 wait task2 4: do part3 wait task3,task4 5: do finish

Method Virtual knowledge communities Groups of agents with similar interests  Coupled services Benefits  More advantageous agent relationships  Priority treatment for agents within a community  Improvements in reliability and response times

Method Virtual knowledge communities

Method Agent implementation Service advertising Knowledge  Validation Scheduling Failure Performance metrics Entropy Strategy

Method Service advertising (indexing) Performance metrics mitigate risk of inefficient routing

Method Knowledge Two forms  Neighbour relationships  Capabilities (services) Fixed storage allocation  Services require twice the storage of a relationship  Initialised to: 8 relationships (50%) 4 services (50%)

Method Knowledge Neighbour relationships  Services offered by neighbour is recorded  Service directory is stored by each agent Ranked collection of agents that provide a service Ranking based on weighted average of past results  Performance and utilization metrics are recorded Capabilities (services)  Space not used by simulation  Allocation represents the demands of data carried by a service (i.e. databases)

Method Knowledge Validation  Occurs at set intervals  Updates services advertised by neighbours  Updates neighbour utilization

Method Routing Agents have multiple options Choose the best route based on knowledge Rerouting requests upon failure  Investigations Limiting number of hops Limiting number of routing options

Method Routing

Method Scheduling Agent receives service request for service offered by agent Agent schedules service by appending to service schedule Services are suspended if blocked Services resumed are pre-pended to service schedule  Fairness

Method Failure Period of time an agent is non-responsive Randomly generated for each time step Based on the reliability (parameter) of network Implemented as a failure schedule  100 time steps, looped  Ensures identical conditions for comparison of the systems under analysis

Method Performance metrics Weighted measures of ability  Response times  Failures Used to assess and rank neighbours Usage records  Popularity of Relationships Services  Used to reallocate storage

Method Entropy Perceived unreliability of system Maintained by each agent Formed through interactions with neighbours and subsequent analysis Will evolve over time Triggers strategic response

Method Strategy Response to environmental conditions Aims to improve service delivery Two alternatives implemented/tested Standard approach Adaptive memory approach

Method Standard approach Fixed allocations of storage Aims to store most frequently used  Relationships  Services Weighted usage records Swap least popular allocated knowledge with most popular unallocated knowledge Swapping occurs at set intervals

Method Adaptive memory approach Dynamic allocations of storage Manipulation  Triggered by entropy  Tied to strategy Low reliability = Increase neighbours High reliability = Increase services  Limited to avoid extreme reactions

Method Adaptive memory approach Expectations  High reliability More services offered by agents Less time of response to global requests Agents specialise in services according to its employment.  Low reliability More contingencies or routing options maintained by agents Higher probability of success  Brokers and workers Run-time evolution

Method Investigation of model Effect of varying indexing depth Effect of limiting hops from source of a request Effect of limiting routing options explored by agents Effect of enforcing a minimum level of redundancy

Method Effect of varying indexing depth

Results & Findings Effect of varying indexing depth on success (Depth 2)

Results & Findings Effect of varying indexing depth on success (Depth 3)

Results & Findings Effect of varying indexing depth on success (Depth 4)

Results & Findings Effect of varying indexing depth on time (Depth 2)

Results & Findings Effect of varying indexing depth on time (Depth 3)

Results & Findings Effect of varying indexing depth on time (Depth 4)

Results & Findings Conclusions Comparison of two systems indicates occasional improvements made by adaptive memory technique  Improved through optimisations Results indicate that increasing indexing depth does not improve robustness Affecters  Limitations of hops and routing options  Service dependencies

Questions