Efficient monitoring of Web resources Avigdor Gal (joint work with Haggai Roitman and Louiqa Raschid) IFIP 2.6 meeting 24/6/2009, Nicosia, Cyprus.

Efficient monitoring of Web resources Avigdor Gal (joint work with Haggai Roitman and Louiqa Raschid) IFIP 2.6 meeting 24/6/2009, Nicosia, Cyprus

Profile-Based Online Data Delivery Data delivery: the delivery of data of interest (specified in profiles) from servers (data providers) to clients (data consumers).  Push vs. Pull  Server capabilities vs. Client requirements Profiles: specify what, when, how data should be delivered, and its delivery value. Online: the decision making of what and when to deliver is usually done without a complete knowledge of the all “stream” of future requirements or capabilities in the system, while considering the sources’ dynamic behavior.

Example: Monitoring RSS Feeds pull push Other example applications: E-Commerce & E-Markets Grid Mashups & Portals Continuous Queries (CQ) Cache Management …

Research Goals Proposed a generic model for profile based online data delivery.  Allows to negotiate over the dynamic nature of resources and use time based constraints. Considered both server capabilities and user requirements.  allow the generation of a hybrid push-pull solution. Handle various data delivery aspects:  Dual approach for targeted data delivery.  Hybrid push-pull framework and data delivery solution.  Capturing data delivery tradeoffs  Complex data delivery under bandwidth constraints

Related Work  Push  Systems: BlackBerry, JMS, Google Alerts  Web Caching & Synchronization  Publish/Subscribe  Stream processing/CQ & CEP  Broadcast systems  CDNs (e.g., RSS aggregation)  Pull  Update Models  Web Crawling/Monitoring (WIC)  Sensor-nets  Grid  Web Services  Mashups  Web Caching (LR-Profiles, Prefetching)  PDCM  Hybrid: Pop-Pap, Data Gerrymandering, Ajax, RSS

Data delivery model: Data and Architecture Data delivery model: Data and Architecture

ProMo Proxy - Overview

Data delivery model: Profiles We propose a novel profile model based on execution intervals. Execution Interval: an association of a time interval with some resource.  Complex execution intervals can be also specified.  Can be specified either explicitly or implicitly (using EI-patterns that are further derived using an update model).  Have some unique properties that effect scheduling. Profiles: a set of execution intervals, and include:  Notification rules that associate utility values to execution intervals.  Profile owner role (either client or a server).

Execution intervals - Example

Example Client Profile profile owner role notification rules complex-EI pattern local and global utilities

Schedules, Constraints, and Data Delivery Metrics Schedule: A mapping Constrained Schedules: limited budget for different data delivery tasks (e.g., “politeness” constraints or upper bound on parallel monitoring/listening tasks). Data delivery metrics:  Completeness (max)  Data latency (min)  System resource utilization (Probes) (min)  Execution time (min)  Gained Utility (Satisfiability) (strict) Data delivery objectives and performance evaluation are based on those metrics.

A Dual Approach for Targeted Data Delivery Instead of maximizing utility under (strict) system resource constraint, minimize system resource utilization while (strictly) satisfying (all) user profiles. Main motivation: dynamic allocation according to user profiles may produce benefit for both objectives.  We propose an optimal static algorithm SUP for the dual problem.  Under some conditions, SUP is even optimal for both objectives!  We further present adaptive versions of SUP, fbSUP and fbSUP(λ), that handle non-static situations using feedback.  Overall, results show that the dual approach is capable to dominate the traditional approach and has good utility/budget performance in the non-static case.

ProMo: Hybrid Framework for Online Data Delivery Idea: mediate between clients and servers while considering both client requirements and server capabilities. Solution: use the same profile structure both for servers and clients, as a result:  Matching clients and servers becomes easy.  Easy to generate hybrid schedules. We provide a taxonomy of server capabilities and data delivery patterns. The algorithm supports various capability patterns (e.g., pull-only, push-only, hybrid, push-filter, and conditional- pull)

The Proxy Capturing Approximate Data Delivery Tradeoffs (“The Proxy Dilemma”) Completeness Completeness  more alternatives for drivers Delay Delay  More time to react High completeness may result in delayed delivery  less time to react.  Low delay may results in missing updates  less alternatives to consider. 

Bandwidth Constrained Complex Profile Satisfaction Example 1: Arbitrage MonitoringExample 2: Mashups

Future Work Dual approach:  Consider more constrained settings (e.g., lower bound of gained utility).  Adaptive switch between OptMon1 and OptMon2 solutions.  More general probabilistic adaptive framework.  None-uniform probing costs. ProMo hybrid push-pull  Consider a constrained setting with ProMo (e.g., minimization push-pull costs, politeness constraints).  Develop a more refined server commitment model and server selection and ranking techniques.

Future Work (cont.) Tradeoffs:  Find offline approximation for the general case.  Find online policies with competitive guarantees for the general case.  Usage of Pareto sets as design tool for online policies.  Private profiles based tradeoffs.  Consider complex profiles. Complex:  General cost-benefit model (e.g., consider utility gain vs. monitoring costs).  Use other complex profile semantics (e.g., OR, SUBSET).  Develop update models for complex monitoring.

Backup Slides

Schedules (cont.) delay TjTj riri

Model: Feasible Schedules

Execution Intervals – Properties and Effects on Scheduling Inter-resource overlap  Directly affects the probing congestion Intra-resource overlap  Allows more then a single EI to be captured by a single probe. Rank  Effects the difficulty in satisfying a single client requirement or finding a suitable server capability (thus, some pull will be required).  Can cause a skew in resource access patterns. Explicit vs. Implicit  Implicit may require to use update models to derive explicit EIs and therefore, introduce noise in to the model. Utility  Effect the relative importance of capturing.

SUP – Dual Optimality max clique

fbSUP (SUP with feedback)

fbSUP(λ)

SUP vs. TTL & WIC Static case - FPN(1.0)Dynamic case - Poison #probes(SUP) = 2,462 max #probes(WIC) = >65,000 max #probes(TTL) = >65,000 #probes(SUP) = 3,904 max #probes(WIC) = >20,000 max #probes(TTL) = >7,000

fbSUP vs. fbSUP(λ) Both adaptive versions improve on SUP (with moderate probe budget increase) fbSUP(λ) improve even for X=1 fbSUP(λ) is the dominant Up to X<4 fbSUP(λ) requires slightly more budget then fbSUP For X≥4 fbSUP(λ) completely dominates fbSUP.

ProMo – Server Capabilities vs. Data Delivery Patterns

ProMo Middleware - Example

Pareto Sets and Approximation

Efficient Offline Optimal Solution Efficient Offline Optimal Solution (case with no intra-resource overlaps)

Offline Optimal Algorithm Correctness From the algorithm construction: Pareto optimality: By induction, let S j be the j th schedule that is added to S: SPSP S SjSj S’ S

Efficient Online Policies Efficient Online Policies (case with no intra- resource overlaps) Look Ahead: Look Back: TradeoffLatencyCompletenessOrderingPolicy ??Optimal LA 4-approx. 2-approx. LB ?Less then LA Optimal LAB 2-approx.Optimal2-approx. LBA

LA (EDF) Optimal completeness LA (EDF) Optimal completeness (no intra- resource overlap) r i’ riri case 1: TjTj r i’ riri case 2: TjTj (preserve)  gain.  other resource r i’ was selected (but not by LA)  apply Lemma 42 (preserve or gain). case 1: (gain) case 2:  preserve.  other resource r i’ was selected by LA (but not by S ^ )  apply Lemma 42 (preserve or gain).

LAB Dominates LA  The proof follows from Lemma 42 (completeness preservation) and Lemma 47  each local change from LA to LAB would result in less delay.Lemma 42  The proof follows from the definition of LAB potential: and ordering operator:.

LB Tradeoff 4-Approximation LB Tradeoff 4-Approximation (no intra- resource overlap) Completeness 2-approximation Completeness 2-approximation: Basic idea: given any schedule S and LA, if we change S into LA, each change might improve the performance by at most 1  S has no more then 2 times less completeness then LA. Latency 4-approximation Latency 4-approximation: T k s : k-th first time that OPT didn’t probe, but LB did (and T k f be the last time)…and T k s’, T k f ’ when both did. Best case: LB and OPT act the same  black circles. Worst case (triangles): OPT has: while LB has at most: Thus we get approximation ratio = 4 Whenever the EIs have uniform width W  LB is 2-approximation

LBW Tradeoff 2-Approximation LBW Tradeoff 2-Approximation (no intra- resource overlap) Completeness 2-approximation Completeness 2-approximation: Same as in LB. Optimal Latency Optimal Latency: LB’s greedy “delay traps”: Delay trap Case 1: S S’ j Case 2: S S’ j S

Online policies vs. Optimal Pareto Set

Online policies vs. Optimal Pareto Set: Runtime Scalability Analysis

Online policies: Budget Impact

Online policies: Workload Impact (no intra-resource overlap)

Online policies: Workload Impact (with intra-resource overlap)

Proposed Offline Approximation As A we use Bar-Yehuda et al. algorithm for scheduling split-intervals. C=1  A provides 2k-approx.  we get (2k+2)-approx. C>1  A provides (2k+1)-approx.  we get (2k+3)-approx. Drawbacks: the transformation may be quite expensive. A doesn’t scale (requires LP solution for fractional version of the problem).

Greedy Online Policies Property (no-intra resource overlaps) OrderingPolicy Optimal for simple profilesS-EDF l-competitive whereMRSF Similar to MRSF for problem instances with profilesM-EDF

MRSF: l -Compatitive (case with no intra-resource overlap) “Good guys”“Bad guys” Pick good guys  gain 3 + 2 (length( I )) Pick bad guys  gain 1 Ratio = length( I ))  at the worst case every CEI has equal length  comp. ratio:

Online Policies vs. Offline approx. For rank(P)=1 both WIC and EDF are optimal. For any rank(P) the worst case optimal upper bound is OPT rank(1) /rank(k). Simple policies (i.e., WIC,EDF) do not fit into problems with complex profiles here COMP MRSF ≥ COMP off ≥ OPT/2k offline policy doesn’t scale online policies scale quite well

Efficient monitoring of Web resources Avigdor Gal (joint work with Haggai Roitman and Louiqa Raschid) IFIP 2.6 meeting 24/6/2009, Nicosia, Cyprus.

Similar presentations

Presentation on theme: "Efficient monitoring of Web resources Avigdor Gal (joint work with Haggai Roitman and Louiqa Raschid) IFIP 2.6 meeting 24/6/2009, Nicosia, Cyprus."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient monitoring of Web resources Avigdor Gal (joint work with Haggai Roitman and Louiqa Raschid) IFIP 2.6 meeting 24/6/2009, Nicosia, Cyprus.

Similar presentations

Presentation on theme: "Efficient monitoring of Web resources Avigdor Gal (joint work with Haggai Roitman and Louiqa Raschid) IFIP 2.6 meeting 24/6/2009, Nicosia, Cyprus."— Presentation transcript:

Similar presentations

About project

Feedback