Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protocol-level Reconfigurations for Autonomic Management of Distributed Network Services K. Ravindran and M. Rabby Department of Computer Science City.

Similar presentations


Presentation on theme: "Protocol-level Reconfigurations for Autonomic Management of Distributed Network Services K. Ravindran and M. Rabby Department of Computer Science City."— Presentation transcript:

1 Protocol-level Reconfigurations for Autonomic Management of Distributed Network Services K. Ravindran and M. Rabby Department of Computer Science City University of New York (City College) 16 th April 2012

2 Organization of presentation Service model to accommodate application adaptations when network and environment changes Protocol-level control of QoS provisioning for applications Dynamic protocol switching for adaptive network services Meta-level management model for protocol switching Case study of distributed network applications: (replica voting for adaptive QoS of information assurance) Open research issues

3 OUR BASIC MODEL OF SERVICE-ORIENTED NETWORKS

4 Adaptive distributed applications Applications have the ability to: ^^ Determine the QoS received from system infrastructure ^^ Adjust its operational behavior by changing QoS expectations Service-oriented protocol Application system infrastructure Adjust QoS expectation notify QoS offering notify resource changes external environment incidence of hostile conditions (e.g., airborne police networks, edge-managed Internet paths)

5 protocol P(S) exports only an interface behavior to client applications, hiding its internal operations on the infrastructure resources from clients Service-oriented distributed protocols: run-time structure p-1 p-3 p-2 asynchronous processes implementing protocol P(S) signaling messages application Distributed realization of infrastructure resources access service S {q-a,q-b,..} agents implementing service interface for S protocol internal state map protocol state onto service interface state exercise resources {rA,rB,rC,..} {q-a, q-b,..} : QoS parameter space --- e.g., content access latency in CDN {rA, rB,rC,..} : Resource control capabilities --- e.g., placement of mirror sites in a CDN

6 What is our granularity of network service composition ? PROTOCOL !! A protocol exports only an interface behavior to client applications, hiding its Internal operations on the infrastructure resources from clients Examples: 1. reliable data transfer service TCP is the underlying protocol 2. data fusion service multi-sensor voting is the underlying protocol 3. wide-area content distribution content push/pull across mirror sites is the underlying protocol Given a network application, different types/variants of protocols are possible (they exercise network resources in different ways, while providing a given service A protocol good in one operating region of network may not be good in another region one size does not fit all choose an appropriate protocol based on the currently prevailing resource and environment conditions (dynamic protocol switching)

7 P1(S), P2(S) : Protocols capable of providing service S pi1,pi2,pi3,.. : Distributed processes of protocol Pi(S), exercising the infrastructure resources --- i=1,2 NETWORK SERVICE PROVIDER Management view of distributed protocol services Client application. INFRASTRUCTURE RESOURCES.. P1(S) p12 p13 p11.. P2(S) p22 p33 p21 invoke service S(a) service binding service binding exercise resources r=F(a,e) invoke protocol a: desired QoS parameters Service-level management module (SMM) hostile external environment (e) service interface (realized by agents) reconfiguration policies, adaptation rules match QoS achieved (a) with desired QoS (a) protocol selection, QoS-to-resource mapping,..

8 Modeling of environment QoS specs a, protocol parameters par, network resource allocation R are usually controllable inputs In contrast, environment parameters e E* are often uncontrollable and/or unobservable, but they do impact the service-level performance (e.g., component failures, network traffic fluctuations, etc) environment parameter space: E* = E(yk) E(nk) E(ck) parameters that the designer knows about parameters that the designer does not currently know about parameters that the designer can never know about Protocol-switching decisions face this uncertainty

9 What is the right protocol to offer a sustainable service assurance ? Service goals: Robustness against hostile environment conditions Max. performance with currently available resources These two goals often conflict with each other !! A highly robust protocol is heavy-weight, because it makes pessimistic assumptions about the environment conditions protocol is geared to operate as if system failures are going to occur at any time, and is hence inefficient under normal cases of operations A protocol that makes optimistic assumptions about environment conditions achieves good performance under normal cases, but is less robust to failures protocol operates as if failures will never occur, and are only geared to recover from a failure after-the-fact (so, recovery time may be unbounded) Need both types of protocols, to meet the performance and robustness requirements

10 EXAMPLE APPLICATION 1: CONTENT DISTRIBUTION NETWORK

11 p-a p-b latency monitor agent 3 p-b U({p-b}) U({p-a, p-b}) U({p-a, p-b}) content pages update message for pages {x}U({x}): CONTENT DISTRIBUTION NETWORK Layered View client 2 client 1 client 3 content server R agent 1 agent 2 Proxy-capable node & interconnection Local access link Content push/pull-capable proxy node Content-forwarding proxy node Content access service interface Network infrastructure [overlay tree as distribution topology, node/network resources] Service-layer [adaptive algorithm for content push/pull to/from proxy nodes] Application layer [latency spec, content publish-subscribe, adaptation logic] Infrastructure interface client traffic & mobility, content dynamics,.. clients content updates server R c1 papa pbpb sub(P b ) sub(P a ) c2 c3 LL L: latency specs to CDN system L: latency monitored as system output push p a,p b pull p a,p b exercise resourcs control logic x z y u v w q x y v u z q w x y z u v q w environment (E*)

12 Management-oriented control of CDN exercisable at three levels application-level reporting & matching of QoS attributes (e.g., client-level latency adaptation, server-level content scaling) adjust parameters of content access protocols (e.g., proxy-placement, choosing a push or pull protocol) infrastructure resource adjustment (e.g., allocation more link bandwidth, increasing proxy storage capacity, increasing physical connectivity) Our study Control dimensions

13 Client-driven update scheme (time-stamps without server query).... (page changes) server Sproxy X(S)client GTS=1 GTS=2 GTS=3 (LTS=1,GTS=1) request(p) content(p) request(p) content(p) request(p) content(p)update_TS(p,2) (LTS=1,GTS=2) request(p) content(p) get_page(p) update_page(p) request(p) content(p) (LTS=2,GTS=2) (page changes) (local copy) (updated local copy) c >> s TIME PULL protocol c : client access rate s : server update rate

14 server S proxy X(S)client request(p) content(p) update_page(p) request(p) content(p) (page changes) (local copy) c << s TIME Server-driven update scheme (PUSH protocol) update_page(p)

15 Minimal service in the presence of resource depletions (say, less # of proxy nodes due to link congestion) Max. revenue margin under normal operating conditions server-driven protocol (PUSH) and client-driven protocol (PULL) differ in their underlying premise about how current a page content p is when a client accesses p PUSH is heavy-weight (due to its pessimistic assumptions) operates as if client-level accesses on p are going to occur at any time, and hence is inefficient when c << s PULL is light-weight (due to its optimistic assumptions) operates as if p is always up-to-date, and hence incurs low overhead under normal cases, i.e., c >> s CDN service provider goals

16 x x x x x x push pull Normalized message overhead per read x x x x x x push pull 0 latency incurred per read (msec) content distribution topology (simulated) Content forwarding node Content distributing node clients server R content size: 2 mbytes link bandwidths: set between 2 mbps to 10 mbps content updates read request

17 Situational-context based proxy protocol control parametric description of client workloads & QoS specs Model-based estimation of overhead/latency i, j, k CDN simulator request arrivals from clients i, j, k (different content size/type) task planning & scheduling task events (based on combined client request arrival specs) Controller observed QoS specs [L,O] error = - schedule tasks to resources at proxy nodes place proxies V V to reduce [tree T(V,E) G(V,E), A] [optimal methods for facility placement (greedy, evolutionary,..)] Context & Situational assessment module set of nodes & interconnects [G(V,E), costs, policy/rules,..] client demographics, cloud leases, QoE, node/link status,.. plug-in of CDN model T(V,E) G(V,E) node/link outages traffic bursts state feedback (node/link usage) signal, stable x x x x o o o o Normalized cost-measure (overhead) percentage of nodes used as content distributing proxies 5%10%20%30% Base topology (from network map of US carriers): |V|: 280 nodes; 226 client clusters Average # of hops traversed by a client request: 4 A : greedy A : Optimization algorithm employed for computing proxy placement A : genetic (a) (b)

18 EXAMPLE APPLICATION 2: MULTI-SENSOR DATA FUSION

19 sensor devices, data end-user Replica voting protocol (fault detection, asynchrony control) maintenance of device replicas (device heterogeneity, message security) Layered View data delivery service interface (data integrity & availability) Fault-tolerance in sensor data collection N : degree of replication f m : Max. # of devices that are assumed as vulnerable to failure (1 f m < N/2 ) f a : # of devices that actually fail (0 f a f m ) raw data collected from external world sensors (e.g., radar units) YES NO deliver data (say, d-2, later) vote collator USER voter 1. replica voting apparatus propose data voter N voter 2 d-1 d-Nd-2 voter 3 d-3 YES faulty QoS-oriented spec: data miss rate how often [TTC > ] ?? YES/NO: consent/dissent vote (message- transport network) : timeliness constraint on data; TTC: observed time-to-deliver data infrastructure Voting service Data fusion application Modified 2-phase commit protocol (M2PC) environment (E*) device attacks/faults, network message loss, device asynchrony,..

20 Control dimensions for replica voting Protocol-oriented: 1. How many devices to involve 2. How long the message are 3. How long to wait before asking for votes. QoS-oriented: 1. How much information quality to attain 2. How much energy in the wireless voting devices. System-oriented: 1. How good the devices are (e.g., fault-severity) 2. How accurate and resource-intensive the algorithms are.

21 A voting scenario under faulty behavior of data collection devices devices = {v1,v2,v3,v4,v5,v6}; faulty devices = {v3,v5} v2,v4 dissent; v6, v5 consents; omission failure at v3 write good data in buffer by v1 v1,v2,v4,v6 dissent, v5 consents write bad data in buffer by v3 START NOYES TTC (time-to-complete voting round) message overhead (MSG): [3 data, 14 control] messages attempt 1 (data ready at v6 but not at v2,v4) v3,v5 dissent; v1,v2,v4 consent attempt 3 write good data in buffer by v6 TIME deliver good data from buffer attempt 2 (data ready at v2,v4 as well) collusion-type of failure by v3 and v5 to deliver bad data random behavior of v3 and v5 Had v3 also consented, good data delivery would have occurred at time-point A ABC Had v3 proposed a good data, correct data delivery would have occurred at time-B collusion-type of failure by v3 and v5 to prevent delivery of good data Malicious collusions among faulty devices: Leads to an increase in TTC (and hence reduces data availability [1- ]) Incurs a higher MSG (and hence expends more network bandwidth B) K: # of voting iterations (4 in this scenario) f a =2; f m =2

22 Observations on M2PC scenario ^^ Large # of control message exchanges: worst-case overhead = (2f m +1).N [too high when N is large, as in sensor networks] Not desirable in wireless network settings, since excessive message transmissions incur a heavy drain on battery power of voter terminals In the earlier scenario of N=6 and f a =2, # of YES messages = 7, # of NO messages =12 ^^ Integrity of data delivery is guaranteed even under severe failures (i.e., a bad data is never delivered) Need solutions that reduce the number of control messages generated

23 d vote(d,{1,3,4,5}) Y Y Y N Bv1v2v5v4v3 d N Bv1v2v5v4v3 ddddd ddd vote(d,{4}) dd data proposed by v2 data proposed by v2 vote(d,{3}) Y deliver d to end-user TIME TTC ALLV protocol (pessimistic scheme) expends 5 messages total, K=1 iteration SELV protocol (optimistic scheme) expends 4 messages total, K=2 iterations faulty Solution 1: Selective solicitation of votes Poll only f m voters at a time (B specifies the voter list as a bit-map) Sample scenario for M2PC: N=5, f m =1 ( need YES from 2 voters, including the proposer ) actual # of faulty voters: f a =1 wasteful messages !!

24 N=9 fm (SELV) (ALLV) = 8.0 N=8 fm (SELV) (ALLV) = 7.0 N=7 fm (SELV) (ALLV) = 6.0 N=6 fm (SELV) (ALLV) = 5.0 N=5 fm (SELV) (ALLV) = 4.0 Analytical results : mean number of voting iterations per round

25 Employ implicit forms of vote inference Implicit Consent Explicit Dissent (IC-M2PC) mode of voting NO NEWS IS GOOD NEWS !! A voter consents by keeping quiet; dissents by sending NO message (in earlier scenario, saving of 7 YES messages) IC-M2PC mode lowers control message overhead significantly when: ^^ (T p ) is small many voters generate data at around the same time T p ^^ f m « N/2 only a very few voters are bad (but we dont know who they are !!) worst-case control message overhead: O(f m.N ) for 0 < c < 1.0 c depends on choice of vote solicitation time Solution 2:

26 Protocol-level performance and correctness issues Under strenuous failure conditions, the basic form of IC-M2PC entails safety risks (i.e., possibility of delivering incorrect data) normal-case performance is meaningless unless the protocols are augmented to handle correctness problems may occasionally occur !!

27 VOT_RQ (d) d1d2 decide to deliver d to user buffer manager d 2.T net T : maximum message transfer delay net voter 1 (good) voter 2 (good) voter 3 (bad) NO safety violation !! IC-M2PC mode VOT_RQ (d) d1 d d2 NO YES voter 1 (good) voter 3 (bad) voter 2 (good) M2PC mode (reference protocol) decide to not deliver d to user buffer manager optimistic protocol (i.e., NO NEWS IS GOOD NEWS) ^^ very efficient, when message loss is small, delays have low variance, and f m << N/2 --- as in normal cases ^^ need voting history checks after every M rounds before actual data delivery, where M > 1 message overhead: O (N.f m /M); TTC is somewhat high message overhead: O (N^2); TTC is low

28 Dealing with message loss in IC-M2PC mode How to handle sustained message loss that prevent voter dissents from reaching the vote collator?? ^^ Make tentative decisions on commit, based on the implicitly perceived consenting votes ^^ Use aggregated `voting history of voters for last M rounds to sanitize results before final commit (M > 1) 1. If voting history (obtained as a bit-map) does not match with the implicitly perceived voting profile of voters, B suspects a persistent message loss and hence switches to the M2PC mode 2. When YES/NO messages start getting received without a persistent loss, B switches back to IC-M2PC mode Batched delivery of M good results to user bad result never gets delivered (integrity goal)

29 round 1 round 2 (say, dissent from V-x was lost) round 3 [y,n,y,y] [*,y,n,n] round 4 round 6 (dissent from V-x and V-z lost) round 5 (dissent from V-x and V-z lost) deliver d1,d3,d4 – and discard d2 (sporadic message loss) [n, n ] discard d5 and d6 (suspect persistent message loss) IC-M2PC round 8 round 7 M2PC consent and dissent messages are not lost (so, message loss rate has reduced) M2PC round 9 V-xV-zV-y (faulty)buffer manager TIMEd-i : result tentatively decided in round i under ICED mode depict incorrect decisions (y: YES; n: NO) *: voter was unaware of voting (due to message loss) voters X omission failure switch modes Non-delivery of data in a round, such as d2, is compensated by data deliveries in subsequent rounds (`liveness of voting algorithm in real-time contexts) history vector based sanitization of results M=4 M=2

30 Control actions during voting M2PC mode: if (num_YES f m ) deliver data from tbuf to user IC-M2PC mode upon timeout 2T since start of current voting iteration if (num_NO < N-f m ) optimistically treat data in tbuf as (tentatively) deliverable if (# of rounds completed so far = M) invoke history vector-based check for last M rounds Both M2PC and IC-M2PC if num_NO (N-f m ) discard data in tbuf if (# of iterations completed so far < 2f m ) proceed to next iteration else declare a data miss in current round num_YES/NO: # of voters from which YES/NO responses are received for data in proposal buffer tbuf

31 f m : Assumed # of faulty devices data size = 30 kbytes control message size = 50 bytes N=10; # of YES votes needed = 6; (Tp)=50 mec; (Tp)=50 msec; IC-M2PC M2PC TTC (in msec) DAT overhead (# of data proposals) CNTRL overhead (votes, data/vote requests, etc) x x x x x x x x x x x x z z z z z z z z z z z z x x x x x x x x x x x z z z z z z zz z z z z x x xx x x x x x x x x z z z z z z zz z z zzx fmfm fmfm fmfm network loss l=0%network loss l=2%network loss l=4% Experimental study to compare M2PC and IC-M2PC

32 analytical results of IC-M2PC from probabilistic estimates To keep < 2%, fm=1-3 requires l < 4%; fm=4 requires l <1.75%. N=10, Q=5, (T p )=50 msec, Tw=125 msec (Q: # of YES votes awaited in IC-M2PC mode) message loss rate (l) X 10^2 % EXPLICIT mode % 2% changes in network state Time (in seconds) 10% 0% message loss rate in network protocol mode sustained attacks IMPLICIT mode IMPLICIT mode IMPLICIT mode f m =4 f m =3 Sample switching between M2PC and IC-M2PC modes data miss rate at end-user level ( ) X 10^2 % f m =2 f m =1 0 EXPLICIT mode Number of messages Establishes the mapping of agent-observed parameter onto infrastructure-internal parameters l and f a f a : actual # of failed devices (we assume that f a =f m )

33 Replica voting protocol external environment parameters (f m ) data-oriented parameters (size, ). BR user observe data miss rate ( ) controller v1v2vN B & N =1- data delivery rate IA application voting QoS manager [fault-severity, IC-M2PC/M2PC ] situation assessment module scripts & rules protocol designer Global application manager QoS of other applications system output SI SI: system inputs Situational-context based replica voting control E*

34 OUR MANAGEMENT MODEL FOR AUTONOMIC PROTOCOL SWITCHING

35 resource cost based view protocol behavior External event e MACROSCOPIC VIEW protocol p1 is good protocol p2 is good e.g., reliable data transfer service e packet loss rate in network (goback-N protocol is better at lower packet loss rate; selective repeat protocol is better at higher packet loss rate) protocol p1(S(a)) r = F (a,e) p1 protocol p2(S(a)) r = F (a,e) p2 F (a,e): policy function embodied in protocol p to support QoS a for service S p higher value of e more hostile environment a: actual QoS achieved with resource allocation r (a a) Observations: ^^ Resource allocation r = F (a,e) increases monotonically convex w.r.t. e ^^ Cost function (a) is based on resource allocation r under environment condition e [assume (a)=k.r for k > 0] e e normalized cost incurred by protocol (a) e

36 penalty measure for service degradation penalty measured as user-level dissatisfaction utility value of network service u(a) service-level QoS enforced (a) Areq AminAmax 0.0 a user displeasure due to the actual QoS a being lower than the desired QoS a infrastructure resource cost for providing service-level QoS a higher value of a better QoS Degree of service (un)availability is also modeled as a cost [r=F(a,e)] net penalty assigned to service = k1.[1-u(a,a)] + k2. (a) for k1, k2 > 0 e

37 Optimal QoS control problem Consider N applications (some of them mission-critical), sharing an infrastructure-level resource R with split allocations r1, r2,..., rN Minimize: total resource costs (split across N applications) displeasure of i-th application due to QoS degradation : QoS achieved for i-th application with a resource allocation ri : desired QoS achieved for i-th application

38 Policy-based realizations in our management model outsourced implementation Network application requests policy-level decision from management module (say, a business marketing executive, or a military commander may be a part of management module) User-interactive implementation Application-level user interacts with management module to load and/or modify policy functions

39 Design issues in supporting our management model ^^ Prescription of cost relations to estimate projected resource costs of various candidate protocols ^^ Development of protocol stubs that map the internal states of a protocol onto the service-level QoS parameters ^^ Strategies to decide on protocol selection to provide a network service ^^ Engineering analysis of protocol adaptation/reconfiguration overheads and control-theoretic stability during service provisioning (i.e., QoS jitter at end-users) ^^ QoS and security considerations, wireless vs wired networks, etc


Download ppt "Protocol-level Reconfigurations for Autonomic Management of Distributed Network Services K. Ravindran and M. Rabby Department of Computer Science City."

Similar presentations


Ads by Google