Presentation is loading. Please wait.

Presentation is loading. Please wait.

Final Exam, May 25, 2007 Quality Management in Multimedia Databases and Data Stream Management Systems Yicheng Tu Department of Computer Sciences Purdue.

Similar presentations


Presentation on theme: "Final Exam, May 25, 2007 Quality Management in Multimedia Databases and Data Stream Management Systems Yicheng Tu Department of Computer Sciences Purdue."— Presentation transcript:

1 Final Exam, May 25, 2007 Quality Management in Multimedia Databases and Data Stream Management Systems Yicheng Tu Department of Computer Sciences Purdue University Advisor: Prof. Sunil Prabhakar

2 Final Exam, May 25, 2007 Quality? The nature, kind, or character (of something). Hence, the degree or grade of excellence, etc. possessed by a thing. Restricted to cases in which there is comparison (expressed or implied) with other things of the same kind. - Oxford English dictionary character with respect to fineness, or grade of excellence … - Dictionary.com

3 Final Exam, May 25, 2007 Our Definition series of parameters that describe the characteristics of data processing and lead to different degrees of user satisfaction Overlaps with the concept of Quality-of-Service (QoS) Not data quality

4 Final Exam, May 25, 2007 Problems Two types of problems –Determine the quality of concurrent applications for maximal user satisfaction –To maintain quality of applications under highly dynamic environments Problems are system and application-specific Various techniques/solutions are involved. –Resource reservation –Application adaptation

5 Final Exam, May 25, 2007 Roadmap Introduction Controlling delays in data stream management systems (DSMSs) Quality-aware (media) data replication Other works

6 Final Exam, May 25, 2007 Data Stream Management Systems Data-active query-passive model Continuous query Continuous data, discarded after being processed Applications –Financial analysis –Mobile services –Sensor networks –Network monitoring

7 Final Exam, May 25, 2007 Load Shedding Data processing in DSMS is quality-critical –Tuple processing delay –Data loss –Sampling rate, window size, … Overloading during spikes  degraded quality (processing delay) Solution: load shedding (i.e., adjust data loss)  Eliminating excessive load by dropping data items  Users tolerate approximate query results

8 Final Exam, May 25, 2007 Load Shedding: Challenges Constantly discarding most packets would work What happens to query accuracy? The real (and hard) problem is: How to maintain processing delays while minimizing data loss ? Specifically  When?  How much?  For how long?  Which ones to discard?

9 Final Exam, May 25, 2007 State-of-the-Art Data triage (Reiss & Hellerstein, ICDE06) –Put data into an fast-track analyzer upon overloading LoadStar (Chi et al., VLDB05) Accuracy of aggregate queries under load shedding (Babcock et al., ICDE04) QoS-driven load shedding (Tatbul et al., VLDB03, 06) All utilize intuitive rule-of-thumb algorithms to decide when, how much, and how long Does not work under bursty arrival pattern and variable tuple processing cost

10 Final Exam, May 25, 2007 Our Approach Insight: treat load shedding as a control problem Control: manipulation of system states (outputs) by adjusting input(s) to system In our problem –processing delay -> output –amount of load injected -> input Problem reformulation: Let the output track the desirable value by changing the amount of load discarded delay time

11 Final Exam, May 25, 2007 Feedback Control Suitable for rejecting the effects of disturbances Main components form a feedback control loop Plant Controller  Disturbance + – e(k) = y d - y(k) Actuator Reference Value y d Plant: DSMS engineActuator: load shedder y: average data processing delayy d : desired processing delay e: control erroru: allowed load into DSMS

12 Final Exam, May 25, 2007 Issues System modeling –Critical for control loop design –Analytical models desirable but not currently available –Experimental methods can be used Controller design Database-specific challenges –Lack of real-time measurement of output signal y –Actuator may not be able to implement control signal correctly

13 Final Exam, May 25, 2007 Modeling Borealis Interestingly, system identification of Borealis shows a first-order model with single-queue characteristics In other words (block diagram)

14 Final Exam, May 25, 2007 Controller Design Design based on pole placement –Locations of pole(s) determine how fast/well the system responds Guaranteed performance targets –Convergence rate - responsiveness –Damping - smoothness The controller:

15 Final Exam, May 25, 2007 DSMS-specific challenges A database system is different from a traditional control system in many ways Lack of real-time measurement of output signal y Actuator may not be able to implement control signal correctly Solutions are provided in the context of DSMS Need more systematic study from a control viewpoint

16 Final Exam, May 25, 2007 Experiments Controller and load shedder implemented in a real DSMS - Borealis Synthetic (“Pareto”) and real (“Web”) data streams Query network with variable average processing cost Experiments for comparison –Aurora - open loop –Baseline - primitive feedback control

17 Final Exam, May 25, 2007 Experiments: Inputs

18 Final Exam, May 25, 2007 Main Results - Synthetic Data

19 Final Exam, May 25, 2007 Main Results - Real Data

20 Final Exam, May 25, 2007 Main Results - Data Loss

21 Final Exam, May 25, 2007 Summary on Load Shedding Load shedding is an effective quality adaptation method in DSMSs Ad hoc solutions do not work well under dynamic load A load shedding approach based on feedback control theory shows promising results in a real-world DSMS Control theory could provide solutions to other database problems However, we need to address new challenges that are unique in database problems

22 Final Exam, May 25, 2007 Roadmap Introduction Controlling delays in data stream management systems (DSMSs) Quality-aware (media) data replication Other works

23 Final Exam, May 25, 2007 Quality-Aware Queries in Multimedia DBMS Quality = QoS Querying the DB with quality parameters SELECT vid:[s] FROM VidLib1 WHERE (vid, s) IN FindVideoWithObject( Someone ) QUALITY Resolution = High, Color_depth = Low

24 Final Exam, May 25, 2007 Quality-aware Data Retrieval Quality (QoS) critical for media data Varieties of user quality requirements –Determined by user preference and resource availability –Large number of quality combinations Adaptation techniques to satisfy quality needs –Dynamic adaptation: online transcoding –Static adaptation: retrieve precoded replica from disk

25 Final Exam, May 25, 2007 Dynamic Adaptation Transcoding is very expensive in terms of CPU cost Situation may improve in the future Layered coding –Not standardized yet. –Less popular than people expected

26 Final Exam, May 25, 2007 Static Adaptation Little CPU cost Choice of many commercial service providers What about storage cost? –On the order of total number of quality points –Ignored in previous research assuming Very few quality profiles Storage is dirt cheap –Excessively high for service providers

27 Final Exam, May 25, 2007 Quality-Aware Replication Replicas are of different “quality” Destination: point(s) in a metric quality space Costs of transformation among different qualities are very high Applications –Multimedia –Materialized view –Biological structure Good news: read-only Bad news: too much storage needed

28 Final Exam, May 25, 2007 Two Quality Models Hard-Quality: Users are strict in their quality needs –Quality A cannot serve a request for quality B –Online transcoding is needed Soft-Quality: Users are willing to negotiate/compromise –Quality A can serve a request for quality B –With some penalties (quantified by utility functions)

29 Final Exam, May 25, 2007 Hard-Quality Systems Problem is to minimize reject rate (probability) P under an overall storage constraint C, given –f k: query rate to that quality k –u k : service time for quality k –s k : storage consumption for quality k –c k : CPU consumption for quality k Map system to a multi-rate Erlang loss system Reduced the problem to a 0-1 Knapsack A (good) heuristic solution: –Sort all qualities by their f k /s k values and fill in the storage C

30 Final Exam, May 25, 2007 Soft-quality system: the fixed-storage replica selection (FSRS) Problem An optimization: get the highest utility given the popularity (f k ), storage cost (s k ) of all quality points under total storage S –u(j,k): the utility when a request on quality j is served by quality k Utility is given as a function of distance in quality space –Requests served by the closest replica

31 Final Exam, May 25, 2007 The FSRS Algorithms (I) Problem is NP-hard: a variation of k -mean We propose a heuristic algorithm named Greedy –Aggresively selects replicas based on the ratio of marginal utility gain (∆u) to cost (sk) –Time complexity: O(m 2 I) where I is the # of replicas selected and m the total # of possible replicas selected replica set P := Φ available storage s’ := S while s’ > 0 add the quality point that yields the largest ∆u/s k value to P decrease s’ by s k return P

32 Final Exam, May 25, 2007 The FSRS Algorithms (II) Greedy could pick some bad replicas, especially the earlier selections Remedy: remove those bad choices and re-select The Iterative Greedy algorithm: Time complexity: same as Greedy with a larger coefficient P ← a solution given by Greedy while there exists solution P’ s.t. U(P’) > U(P) do P ← P’ return P

33 Final Exam, May 25, 2007 Other Extensions Our FSRS algorithms can be easily extended to handle –Multiple media objects –Further user-specified constraints on replicas to be selected –Multiple servers

34 Final Exam, May 25, 2007 Dynamic Replication Popularity f of replicas could change over time We only consider the situation where popularity of all replicas of a media object changes together –Reasonable assumption in many systems –Competition for storage among media objects Desirable dynamic replication algorithms: –Find solutions as optimal as those by static FSRS algorithms –Fast enough to make online decisions Naïve solution: run Greedy every time a change of f occurs

35 Final Exam, May 25, 2007 Replication Roadmap (RR) Consider the order replicas are selected by Greedy – follow a predefined path (RR) for each media object RRs are all convex Exchanges of storage may happen between two media objects, triggered by the increase/decrease of f –The one that becomes more popular takes storage from the least popular one –The one that becomes less popular gives up storage to the most popular one –It is efficient to make exchanges at the frontiers of the RRs, no need to look inside

36 Final Exam, May 25, 2007 Replication Roadmap (continued) Storage exchanges, example: Media A should take storage from media B as the slope of its current segment in RR is greater than that of B’s

37 Final Exam, May 25, 2007 Dynamic FSRS algorithm Based on the RR idea Proved performance: results given are as optimal as those chosen by Greedy Preprocess phase: –Build the RRs Online phase: –Performing exchanges till total utility converges –Time complexity: O(I log V) where I: # of storage exchanges occurs and V is the # of media objects

38 Final Exam, May 25, 2007 Effectiveness of FSRS Algorithms For comparison: –The optimal solution (by CPLEX) –Random selections –Local popularity-based

39 Final Exam, May 25, 2007 Efficiency of FSRS Algorithms CPLEX < Iterative Greedy < Greedy < Random < Local Results on a P4 2.4 GHz CPU:

40 Final Exam, May 25, 2007 Dynamic Replication Results Randomly generated changes of f Compare with Greedy Results with (almost) the same optimality as Greedy Reason: small number of storage exchanges

41 Final Exam, May 25, 2007 Summary on media replication Storage cost in static adaptation prohibits replication of all qualities Optimize toward lowest reject (hard-quality) or the highest utility (soft-quality) given storage constraints Two heuristics are proposed for static replication that gives near-optimal choices An online algorithm for a dynamic replication problem

42 Final Exam, May 25, 2007 Other Works VDBMS - a multimedia DBMS –Quality-of-Service Aware Query Processing [EDBT04] –System architecture [MMSJ03, DMS03, ICDE03] Peer-to-peer media streaming –Performance analysis [MMCN04, TOMCCAP05] Genetic algorithms [JEC07] Other topics in data stream systems –Entity-based query processing [VLDB05] –Stream data compression [GSN06] Signal processing [JMASM07, CSC05]

43 Final Exam, May 25, 2007 Ongoing and Future Research Further investigate load shedding problem –Handle actuator uncertainty –Other control targets –Is the optimal achievable? Quality-aware replication: –General case of dynamic replication, why is a random solution not so bad? –Conjecture: Greedy is 4/3-competitive? Application of control theory in other database topics –Self-tuning databases

44 Final Exam, May 25, 2007 Publications-1 [TKDE07] Y. Tu, J. Yan, G. Shen and S. Prabhakar. Multi-Quality Data Replication in Multimedia Databases. IEEE Transactions on Knowledge and Data Engineering (TKDE). 19(5):679-694, May 2007. [JMASM07] L. Qu and Y. Tu. Change Point Estimation of Bi-Level Functions. Journal of Modern Applied Statistical Methods. 5(2), May 2007 [JEC] H. Fang, Q. Wang, Y. Tu and M.F. Horstemeyer. An Efficient Non-Dominated Sorting Algorithm for Evolutionary Algorithms. Accepted to Journal of Evolutionary Computation. [ICDE07] Y. Tu, S. Liu, S. Prabhakar, B. Yao, and W. Schroeder. Using Control Theory for Load Shedding in Data Stream Management. In Procs. of ICDE, pp.490-491, Istanbul, Turkey, April 2007. [GSN06] Y. Xia, Y. Tu, M. Atallah, and S. Prabhakar. Efficient Data Compression in Location Based Services. In Procs. of 2nd International Conference on Geosensor Networks, Boston, MA, October 2006. [VLDB06] Y. Tu, S. Liu, S. Prabhakar, and B. Yao. Load Shedding in Stream Databases - A Control-Based Approach. In Proceedings of VLDB, pp.787-798, September 2006. [TOMCCAP05] Y. Tu, J. Sun, M. Hefeeda, and S. Prabhakar. An Analytical Study of Peer-to- Peer Media Streaming Systems. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP). 1(4):354-376., November 2005.

45 Final Exam, May 25, 2007 Publications-2 [VLDB05] R. Cheng, B. Kao, S. Prabhakar, A. Kwan, and Y. Tu. Adaptive Stream Filters for Entity-Based Queries with Non-Value Tolerance. In Proceedings of VLDB, pp.37-48, August 2005. [DEXA05a] Y. Tu, J. Yan, and S. Prabhakar. Quality-Aware Replication of Multimedia Data. In Proceedings of DEXA, pp. 240-249, August 2005. [DEXA05b] Y. Tu, M. Hefeeda, Y. Xia, S. Prabhakar, and S. Liu. Control-based Quality Adaptation in Data Stream Management Systems.In Proceedings of DEXA, pp. 746-755, August 2005. [CSC05] L. Qu and Y. Tu. Change Point Estimation of Bar Code Signals. In Proceedings of International Conference on Scientific Computing. pp.109-114, Las Vegas, USA, June 2005. [MMJS04] W. Aref, A. Catlin, A. Elmagarmid, J. Fan, M. Hammad, I. Ilyas, M. Marzouk, S. Prabhakar, Y. Tu and X. Zhu. VDBMS: A Testbed Facility for Research in Video Database Benchmarking. ACM/Springer Multimedia Systems. 9(6):575-585., June 2004. [EDBT04] Y. Tu, S. Prabhakar, A. Elmagarmid and R. Sion. QuaSAQ: An Approach to Enabling End-to-End QoS for Multimedia Databases. In Proceedings of Extending Database Technology (EDBT), pp.694-711, Herakolin, Greece., March 2004. [MMCN04] Y. Tu, J. Sun and S. Prabhakar. Performance Analysis of A Hybrid Media Streaming System. In Proceedings of ACM/SPIE Conf. on Multimedia Computing and Networking (MMCN), pp.69-82, San Jose, CA., January 2004.

46 Final Exam, May 25, 2007 Publications-3 [DMS03] W. Aref, A. Catlin, A. Elmagarmid, J. Fan, M. Hammad, I. Ilyas, M. Marzouk, S. Prabhakar, Y. Tu and X. Zhu (alphabetical order). VDBMS: A Testbed Facility for Research in Video Database Benchmarking. In Proceedings of Intl. Conf. on Distributed Multimedia Systems (DMS) 2003, pp.160-166. [ICDE02] W. Aref, A. Elmagarmid, J. Fan, J. Guo, M. Hammad, I. Ilyas, M. Marzouk, S. Prabhakar, A. Rezgui, A. Teoh, E. Terzi, Y. Tu, A. Vakali, X. Zhu (alphabetical order). A Distributed Database Server for Continuous Media. Procs. of ICDE, pp.490-491. San Jose, CA., March 2002. [ICDE06] Y. Tu and S. Prabhakar. Control-Based Load Shedding in Data Stream Management Systems. PhD Workshop, in conjunction with ICDE 2006. Submitted: Using control theory for self-tuning databases. Submitted to journal.

47 Final Exam, May 25, 2007 Thank you! Questions?

48 Final Exam, May 25, 2007 QuaSAQ Quality-of-Service-Aware Query processing Users do not need to know low-level details Cost evaluation toward global optimization goals –Throughput Utilizing current system/network QoS support to deliver the query results Theory first presented in Bertino et al., 2003 Prototyping is essential

49 Final Exam, May 25, 2007 QuaSAQ Architecture Our approach: –Augment the query evaluation and optimization modules to directly take QoS into account Major components –Offline multimedia processor Transcode media objects into copies with different QoS/formats Estimate resource use –Online components QoS Browser Quality Manager QoS APIs


Download ppt "Final Exam, May 25, 2007 Quality Management in Multimedia Databases and Data Stream Management Systems Yicheng Tu Department of Computer Sciences Purdue."

Similar presentations


Ads by Google