Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research in Data Broadcasting

Similar presentations


Presentation on theme: "Research in Data Broadcasting"— Presentation transcript:

1 Research in Data Broadcasting
Michael Franklin University of Maryland November 1998 Joint Work with D. Aksoy, M. Altinel, R. Bose, U. Cetintemel, J. Wang, and S. Zdonik November 1998 © Michael J. Franklin

2 Data Dissemination Many emerging applications involve
large-scale data distribution. stock and sports tickers traffic information systems software distribution news and/or entertainment delivery November 1998 © Michael J. Franklin

3 How well does it Scale? Election, Oscar, DeepBlue, etc. Result servers
use WWW server to disseminate results Scalability?: Server crashes, intolerable delays,… A Better Screen Saver (e.g., Pointcast) “push-based” interface for customized news over the Internet. Scalability?: large companies restrict usage due to heavy LAN traffic. November 1998 © Michael J. Franklin

4 “Push” is a Potential Answer
Broadcast Media Newspapers, Radio, TV, Junk Mail,… Unicast Mail, Telephone, ,… Data Push Teletext, BCIS, Datacycle, etc. Publish/Subscribe Webcasting, Internet Push November 1998 © Michael J. Franklin

5 Push or Pull? - Webcasting (e.g., PointCast)
Web push technology is exploding --- even though there’s no such thing. Byte, 8/97 Client Converter User Interface Server PULL PUSH November 1998 © Michael J. Franklin

6 Push with Periodic Broadcast (e.g., Broadcast Disks)
Let’s look at another kind of push based system. Teletext is a good example of this. Broadcast disks is our own work. Repetition creates a revolving disk. Good for intermittent connection, limited memory, high turn-over, or huge client population. November 1998 © Michael J. Franklin

7 Broadcast Disks Broadcast hottest data more frequently
Superimpose multiple “disks” on broadcast A flexible, tunable, memory hierarchy Client storage resources can be exploited to mitigate scheduling mismatch. November 1998 © Michael J. Franklin

8 Push or Pull? - Broadcast Disks
CACHE User Interface Server Client(s) PUSH PULL November 1998 © Michael J. Franklin

9 What’s going on here? 1) Push vs. Pull is just one dimension along which to compare data delivery mechanisms. - We focus on three. 2) Different mechanisms for data delivery can (and should) be used across different links. - Enabled by Network Transparency. November 1998 © Michael J. Franklin

10 Delivery Options Push Pull Aperiodic Periodic Unicast 1 to n request/
response w/snoop polling polling w\snoop lists publish/ subscribe list digests Broad- cast disks request/ response All of these combinations make sense. There are likely finer grained divisions in this hierarchy. publish/ subscribe November 1998 © Michael J. Franklin

11 Network Transparency Clients Brokers Sources
The type of a link matters only to nodes on each end November 1998 © Michael J. Franklin

12 Using Network Transparency
An example: DB Server Proxy cache A slightly more complex example. Can vary dynamically November 1998 © Michael J. Franklin

13 Large-Scale On Demand Broadcast
Server downlink Clients uplink November 1998 © Michael J. Franklin

14 Push vs. Pull Scalability
Pull-broadcast Pull-unicast Push- broadcast [few clients] [many clients] November 1998 © Michael J. Franklin

15 Previous Algorithms First Come First Served (FCFS)
Longest Wait First (LWF) [Dykeman et al.] Most Requests First (MRF) “ MRF-Lowest (MRFL) “ Priority Index Policy (PIP) [Su and Tassiulas] [Vaidya and Hameed] November 1998 © Michael J. Franklin

16 Previous Algorithms - Performance
* Zipf distribution * overhead ignored LWF: wins but it is impractical. November 1998 © Michael J. Franklin

17 Looking Deeper Hot Pages Cold Pages November 1998
© Michael J. Franklin

18 RxW - A Scalable, Tunable Scheduling Algorithm
Wait queue is FIFO Request queue is sorted by # of reqs Choose item with maximal RxW value Pruning is effective (72% savings), but... Approximate version can be as cheap as O(1). November 1998 © Michael J. Franklin

19 Average Waiting Time (without scheduling overhead)
November 1998 © Michael J. Franklin

20 Scheduling Overhead increasing arrival rate increasing dbSize
November 1998 © Michael J. Franklin

21 Prototype Implementation
Server IP-multicast Downlink (100 Mbps) UDP Uplink (10 Mbps) Clients November 1998 © Michael J. Franklin

22 Responsiveness (Prototype)
Average Wait November 1998 © Michael J. Franklin

23 Data Staging Problem: Scheduling algorithms assume that all data are readily available for broadcast. In reality, data may reside on disk, tertiary storage, or even at remote (e.g., WWW) sites. We have integrated RxW with: Server Cache Management Disk Prefeteching Opportunistic Scheduling (“postfetching”) November 1998 © Michael J. Franklin

24 I. Caching - Love/Hate Hints
Goal: Reduce need to fetch data (i.e., misses). Skewed access  many cold pages. Least Recently Used handles cold pages poorly. Approach: Love hot pages, hate cold ones. A scheduled page is considered hot if both: It is encountered on the R-list first. It is within the top “hot range” R-list pages. November 1998 © Michael J. Franklin

25 Caching Performance (RxW.90)
November 1998 © Michael J. Franklin

26 II. Prefetching Prefetching is a common data staging technique (e.g., video-on-demand, WWW). Since hot pages are cached --- prefetch the cold ones. top “prefetch_window” pages in the W-list are prefetched. November 1998 © Michael J. Franklin

27 III. Opportunistic Scheduling
More importantly -- keep the broadcast busy! “Postfetch” missed page and send out another. For data broadcast, this is best. Square root rule. Small latency penalty. No guesswork. November 1998 © Michael J. Franklin

28 Prefetch vs. Postfetch November 1998 © Michael J. Franklin

29 On-Demand Broadcast Summary
For scalability, real solutions must take into account scheduling overhead and data staging. RxW is 4 times faster than previous algorithms for same scheduling quality, and can be as cheap as O(1). RxW provides hints for data staging. Love/Hate caching plus “postfetching” obviate the need for speculative prefetching. November 1998 © Michael J. Franklin

30 The DBIS Toolkit for Network Data Management
Clients Brokers Sources November 1998 © Michael J. Franklin

31 Other Information Brokers
Information Broker Architecture Other Information Brokers Data Sources Forwarded Profiles Data Items Data Items Data Source Manager Broker Manager Data Sources Decomposed Profiles / Profile Updates Pull Requests Data Source Registration Filtered Data HD Cache Catalog/Profile Manager Mapper Profiles / Pull Requests Catalog Updates IB Master Broadcast Manager Scheduler Client Manager Network Manager Profiles / Pull Requests Acknowledgement (Tune information) Clients Broadcast Medium November 1998 © Michael J. Franklin

32 Map Dissemination Application
IBMaster, First IB, a Data Source and a Client Second IB and Client November 1998 © Michael J. Franklin

33 DBIS Status Level 0 prototype constructed.
Supports Publish/Subscribe only. Broadcast disks and On-demand scheduling to be added. A key research issue is profiles. How to express (is XML helpful?)? How to manage and search 100’s of thousands? How to automatically learn and maintain. Clustering and categorization. November 1998 © Michael J. Franklin

34 Summary Dissemination-based applications require new solutions.
Multiple types of data delivery can be combined easily due to network transparency. We have developed scheduling and data staging techniques and are creating a toolkit . Communication is important but a data management perspective is also essential. Databases  Network Data Management. November 1998 © Michael J. Franklin

35 An Analytical Treatment for RxW
Average waiting time: N ui b-1 b i ui W0 + Wi + Wi Wb = i = b+1 i = 1 b-1 b i 1 - [ 1 - ] ui i = 1 Average waiting time is bounded inherent scalability in # of users Square root ratio (optimal for push-based) MRF - straight ratio, too much to hot pages November 1998 © Michael J. Franklin

36 Worst Case Waiting Time
November 1998 © Michael J. Franklin

37 November 1998 © Michael J. Franklin

38 Extra graphs November 1998 © Michael J. Franklin

39 November 1998 © Michael J. Franklin

40 November 1998 © Michael J. Franklin

41 November 1998 © Michael J. Franklin

42 Caching Policies Goal: keep hot pages in cache. November 1998
© Michael J. Franklin

43 November 1998 © Michael J. Franklin

44 November 1998 © Michael J. Franklin

45 Profiles [Cetintemel, Franklin, Giles 98]
Push requires models of user interests. The accuracy of these models determines the perceived usefulness of the system. The management of these profiles determines performance and scalability. We have developed a novel, multi-modal technique for profile representation and learning. November 1998 © Michael J. Franklin

46 Properties of On-Demand Broadcast
Broadcast is inherently scalable Can do better than FIFO Should approach optimal Push (assuming infinite server, no request overhead) Need to balance average and worst case delay Need algorithms that scale with the database size, bandwidth, and workload intensity. November 1998 © Michael J. Franklin

47 DBIS Application Architecture
DS Code DS Library IB Executable Client Code Library Toolkit Parts Application Developer Implementation Data Source Information Broker IB Master Executable Data Catalog Info. Other IB Executables November 1998 © Michael J. Franklin


Download ppt "Research in Data Broadcasting"

Similar presentations


Ads by Google