Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project.

Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project

Overview Introduction What is a distributed system What is robustness Aims of study Method Development of simulated model Investigation of model Results & Findings

Introduction What is a distributed system? Network of connected entities (agents) Entities communicate via message passing  Agents can engage with any other agent Decentralised control  Contrast to P2P with centralised indexes Behavior of individual agents conform to the goals of the system

Introduction Example: Web services Services are offered by a collection remote computers in a network Services are combined to build complex super-services Appearance that super-services are provided by a single interface agent (web- server)

Introduction Insurance policy comparison service Insurance Broker

Introduction What is robustness? Correct operation under varied conditions High tolerance of failure (extreme conditions) Graceful in defeat

Introduction Aim of study To improve robustness in distributed systems Implementation and comparison of two alternative systems How is robustness achieved? Redundancy Regeneracy (Adaptation) Load balancing

Method Simulation model Offers services (tasks) Responds to service requests Services are highly coupled Agent based network consisting of 20 agents Measures  Success  Response time

Method Model framework Components Network mechanics Sequence of execution Representation of time Agent communication  Messages

Method Components Object oriented Simulator object  Controls timing of events Agent objects  Provide services  Communicate with other agents Message objects  Method of communication

Method Components

Method Network mechanics Underlying interconnection network (Internet) An agent can engage any other agent Agents form a subset of all possible relationships Routing and propagation latencies are abstracted by the Simulator Message types  Service  Capability sharing  Agent information

Method Network mechanics (underlying network)

Method Network mechanics (agent relationships)

Method Sequence of execution Simulation is sliced into a sequence of time steps (abstraction of real time) In each time step:  Messages are forwarded by Simulator  Agents are prompted sequentially (any ordering) to execute the time step Execute scheduled services Respond to received messages  Messages are received and ordered by Simulator

Method Sequence of execution

Method Representation of time Floating point number  Global time (GT) Current time step  Local time (LT) Function of the total number of processing cycles used prior to event  Result: Time = GT.LT

Method Agent communication Messages  Passed between agents  Forwarded via Simulator  Types Request, Response, Forward, Receipt 11 across the 3 areas: Service Capability sharing Agent information

Method Agent communication Role of Simulator  Accepts messages and calculates delivery time Applies latency  Orders messages according to time of delivery  Forwards all messages that reach their destination within the current time step

Method Agent communication (sending)

Method Agent communication (receiving) Messages in inbox are processed according to delivery time  At start of time step inbox contains all messages for that time step  Agent only reads a message when agent time reaches time of delivery

Method Service Unit of work provided by an agent Can be requested by external users or agents within the system Complex workflows (dependencies)  Dependence represents the results of a service being used by another service Virtual knowledge communities

Method Service Example, task 1 1: do part1 2: do part2 3: request task2,task3,task4 wait task2 4: do part3 wait task3,task4 5: do finish

Method Virtual knowledge communities Groups of agents with similar interests  Coupled services Benefits  More advantageous agent relationships  Priority treatment for agents within a community  Improvements in reliability and response times

Method Virtual knowledge communities

Method Agent implementation Service advertising Knowledge  Validation Scheduling Failure Performance metrics Entropy Strategy

Method Service advertising (indexing) Performance metrics mitigate risk of inefficient routing

Method Knowledge Two forms  Neighbour relationships  Capabilities (services) Fixed storage allocation  Services require twice the storage of a relationship  Initialised to: 8 relationships (50%) 4 services (50%)

Method Knowledge Neighbour relationships  Services offered by neighbour is recorded  Service directory is stored by each agent Ranked collection of agents that provide a service Ranking based on weighted average of past results  Performance and utilization metrics are recorded Capabilities (services)  Space not used by simulation  Allocation represents the demands of data carried by a service (i.e. databases)

Method Knowledge Validation  Occurs at set intervals  Updates services advertised by neighbours  Updates neighbour utilization

Method Routing Agents have multiple options Choose the best route based on knowledge Rerouting requests upon failure  Investigations Limiting number of hops Limiting number of routing options

Method Routing

Method Scheduling Agent receives service request for service offered by agent Agent schedules service by appending to service schedule Services are suspended if blocked Services resumed are pre-pended to service schedule  Fairness

Method Failure Period of time an agent is non-responsive Randomly generated for each time step Based on the reliability (parameter) of network Implemented as a failure schedule  100 time steps, looped  Ensures identical conditions for comparison of the systems under analysis

Method Performance metrics Weighted measures of ability  Response times  Failures Used to assess and rank neighbours Usage records  Popularity of Relationships Services  Used to reallocate storage

Method Entropy Perceived unreliability of system Maintained by each agent Formed through interactions with neighbours and subsequent analysis Will evolve over time Triggers strategic response

Method Strategy Response to environmental conditions Aims to improve service delivery Two alternatives implemented/tested Standard approach Adaptive memory approach

Method Standard approach Fixed allocations of storage Aims to store most frequently used  Relationships  Services Weighted usage records Swap least popular allocated knowledge with most popular unallocated knowledge Swapping occurs at set intervals

Method Adaptive memory approach Dynamic allocations of storage Manipulation  Triggered by entropy  Tied to strategy Low reliability = Increase neighbours High reliability = Increase services  Limited to avoid extreme reactions

Method Adaptive memory approach Expectations  High reliability More services offered by agents Less time of response to global requests Agents specialise in services according to its employment.  Low reliability More contingencies or routing options maintained by agents Higher probability of success  Brokers and workers Run-time evolution

Method Investigation of model Effect of varying indexing depth Effect of limiting hops from source of a request Effect of limiting routing options explored by agents Effect of enforcing a minimum level of redundancy

Method Effect of varying indexing depth

Results & Findings Effect of varying indexing depth on success (Depth 2)

Results & Findings Effect of varying indexing depth on time (Depth 2)

Results & Findings Conclusions Comparison of two systems indicates occasional improvements made by adaptive memory technique  Improved through optimisations Results indicate that increasing indexing depth does not improve robustness Affecters  Limitations of hops and routing options  Service dependencies

Questions

Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project.

Similar presentations

Presentation on theme: "Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project.

Similar presentations

Presentation on theme: "Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project."— Presentation transcript:

Similar presentations

About project

Feedback