Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project.

Similar presentations


Presentation on theme: "Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project."— Presentation transcript:

1 Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project

2 Overview Introduction What is a distributed system What is robustness Aims of study Method Development of simulated model Investigation of model Results & Findings

3 Introduction What is a distributed system? Network of connected entities (agents) Entities communicate via message passing  Agents can engage with any other agent Decentralised control  Contrast to P2P with centralised indexes Behavior of individual agents conform to the goals of the system

4 Introduction Example: Web services Services are offered by a collection remote computers in a network Services are combined to build complex super-services Appearance that super-services are provided by a single interface agent (web- server)

5 Introduction Insurance policy comparison service Insurance Broker

6 Introduction What is robustness? Correct operation under varied conditions High tolerance of failure (extreme conditions) Graceful in defeat

7 Introduction Aim of study To improve robustness in distributed systems Implementation and comparison of two alternative systems How is robustness achieved? Redundancy Regeneracy (Adaptation) Load balancing

8 Method Simulation model Offers services (tasks) Responds to service requests Services are highly coupled Agent based network consisting of 20 agents Measures  Success  Response time

9 Method Model framework Components Network mechanics Sequence of execution Representation of time Agent communication  Messages

10 Method Components Object oriented Simulator object  Controls timing of events Agent objects  Provide services  Communicate with other agents Message objects  Method of communication

11 Method Components

12 Method Components

13 Method Network mechanics Underlying interconnection network (Internet) An agent can engage any other agent Agents form a subset of all possible relationships Routing and propagation latencies are abstracted by the Simulator Message types  Service  Capability sharing  Agent information

14 Method Network mechanics (underlying network)

15 Method Network mechanics (agent relationships)

16 Method Sequence of execution Simulation is sliced into a sequence of time steps (abstraction of real time) In each time step:  Messages are forwarded by Simulator  Agents are prompted sequentially (any ordering) to execute the time step Execute scheduled services Respond to received messages  Messages are received and ordered by Simulator

17 Method Sequence of execution

18 Method Representation of time Floating point number  Global time (GT) Current time step  Local time (LT) Function of the total number of processing cycles used prior to event  Result: Time = GT.LT

19 Method Agent communication Messages  Passed between agents  Forwarded via Simulator  Types Request, Response, Forward, Receipt 11 across the 3 areas: Service Capability sharing Agent information

20 Method Agent communication Role of Simulator  Accepts messages and calculates delivery time Applies latency  Orders messages according to time of delivery  Forwards all messages that reach their destination within the current time step

21 Method Agent communication (sending)

22 Method Agent communication (sending)

23 Method Agent communication (receiving) Messages in inbox are processed according to delivery time  At start of time step inbox contains all messages for that time step  Agent only reads a message when agent time reaches time of delivery

24 Method Service Unit of work provided by an agent Can be requested by external users or agents within the system Complex workflows (dependencies)  Dependence represents the results of a service being used by another service Virtual knowledge communities

25 Method Service Example, task 1 1: do part1 2: do part2 3: request task2,task3,task4 wait task2 4: do part3 wait task3,task4 5: do finish

26 Method Virtual knowledge communities Groups of agents with similar interests  Coupled services Benefits  More advantageous agent relationships  Priority treatment for agents within a community  Improvements in reliability and response times

27 Method Virtual knowledge communities

28 Method Agent implementation Service advertising Knowledge  Validation Scheduling Failure Performance metrics Entropy Strategy

29 Method Service advertising (indexing) Performance metrics mitigate risk of inefficient routing

30 Method Knowledge Two forms  Neighbour relationships  Capabilities (services) Fixed storage allocation  Services require twice the storage of a relationship  Initialised to: 8 relationships (50%) 4 services (50%)

31 Method Knowledge Neighbour relationships  Services offered by neighbour is recorded  Service directory is stored by each agent Ranked collection of agents that provide a service Ranking based on weighted average of past results  Performance and utilization metrics are recorded Capabilities (services)  Space not used by simulation  Allocation represents the demands of data carried by a service (i.e. databases)

32 Method Knowledge Validation  Occurs at set intervals  Updates services advertised by neighbours  Updates neighbour utilization

33 Method Routing Agents have multiple options Choose the best route based on knowledge Rerouting requests upon failure  Investigations Limiting number of hops Limiting number of routing options

34 Method Routing

35 Method Scheduling Agent receives service request for service offered by agent Agent schedules service by appending to service schedule Services are suspended if blocked Services resumed are pre-pended to service schedule  Fairness

36 Method Failure Period of time an agent is non-responsive Randomly generated for each time step Based on the reliability (parameter) of network Implemented as a failure schedule  100 time steps, looped  Ensures identical conditions for comparison of the systems under analysis

37 Method Performance metrics Weighted measures of ability  Response times  Failures Used to assess and rank neighbours Usage records  Popularity of Relationships Services  Used to reallocate storage

38 Method Entropy Perceived unreliability of system Maintained by each agent Formed through interactions with neighbours and subsequent analysis Will evolve over time Triggers strategic response

39 Method Strategy Response to environmental conditions Aims to improve service delivery Two alternatives implemented/tested Standard approach Adaptive memory approach

40 Method Standard approach Fixed allocations of storage Aims to store most frequently used  Relationships  Services Weighted usage records Swap least popular allocated knowledge with most popular unallocated knowledge Swapping occurs at set intervals

41 Method Adaptive memory approach Dynamic allocations of storage Manipulation  Triggered by entropy  Tied to strategy Low reliability = Increase neighbours High reliability = Increase services  Limited to avoid extreme reactions

42 Method Adaptive memory approach Expectations  High reliability More services offered by agents Less time of response to global requests Agents specialise in services according to its employment.  Low reliability More contingencies or routing options maintained by agents Higher probability of success  Brokers and workers Run-time evolution

43 Method Investigation of model Effect of varying indexing depth Effect of limiting hops from source of a request Effect of limiting routing options explored by agents Effect of enforcing a minimum level of redundancy

44 Method Effect of varying indexing depth

45 Results & Findings Effect of varying indexing depth on success (Depth 2)

46 Results & Findings Effect of varying indexing depth on success (Depth 3)

47 Results & Findings Effect of varying indexing depth on success (Depth 4)

48 Results & Findings Effect of varying indexing depth on time (Depth 2)

49 Results & Findings Effect of varying indexing depth on time (Depth 3)

50 Results & Findings Effect of varying indexing depth on time (Depth 4)

51 Results & Findings Conclusions Comparison of two systems indicates occasional improvements made by adaptive memory technique  Improved through optimisations Results indicate that increasing indexing depth does not improve robustness Affecters  Limitations of hops and routing options  Service dependencies

52 Questions


Download ppt "Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project."

Similar presentations


Ads by Google