Download presentation
Presentation is loading. Please wait.
1
Research in Data Broadcasting
Michael Franklin University of Maryland November 1998 Joint Work with D. Aksoy, M. Altinel, R. Bose, U. Cetintemel, J. Wang, and S. Zdonik November 1998 © Michael J. Franklin
2
Data Dissemination Many emerging applications involve
large-scale data distribution. stock and sports tickers traffic information systems software distribution news and/or entertainment delivery November 1998 © Michael J. Franklin
3
How well does it Scale? Election, Oscar, DeepBlue, etc. Result servers
use WWW server to disseminate results Scalability?: Server crashes, intolerable delays,… A Better Screen Saver (e.g., Pointcast) “push-based” interface for customized news over the Internet. Scalability?: large companies restrict usage due to heavy LAN traffic. November 1998 © Michael J. Franklin
4
“Push” is a Potential Answer
Broadcast Media Newspapers, Radio, TV, Junk Mail,… Unicast Mail, Telephone, ,… Data Push Teletext, BCIS, Datacycle, etc. Publish/Subscribe Webcasting, Internet Push November 1998 © Michael J. Franklin
5
Push or Pull? - Webcasting (e.g., PointCast)
Web push technology is exploding --- even though there’s no such thing. Byte, 8/97 Client Converter User Interface Server PULL PUSH November 1998 © Michael J. Franklin
6
Push with Periodic Broadcast (e.g., Broadcast Disks)
Let’s look at another kind of push based system. Teletext is a good example of this. Broadcast disks is our own work. Repetition creates a revolving disk. Good for intermittent connection, limited memory, high turn-over, or huge client population. November 1998 © Michael J. Franklin
7
Broadcast Disks Broadcast hottest data more frequently
Superimpose multiple “disks” on broadcast A flexible, tunable, memory hierarchy Client storage resources can be exploited to mitigate scheduling mismatch. November 1998 © Michael J. Franklin
8
Push or Pull? - Broadcast Disks
CACHE User Interface Server Client(s) PUSH PULL November 1998 © Michael J. Franklin
9
What’s going on here? 1) Push vs. Pull is just one dimension along which to compare data delivery mechanisms. - We focus on three. 2) Different mechanisms for data delivery can (and should) be used across different links. - Enabled by Network Transparency. November 1998 © Michael J. Franklin
10
Delivery Options Push Pull Aperiodic Periodic Unicast 1 to n request/
response w/snoop polling polling w\snoop lists publish/ subscribe list digests Broad- cast disks request/ response All of these combinations make sense. There are likely finer grained divisions in this hierarchy. publish/ subscribe November 1998 © Michael J. Franklin
11
Network Transparency Clients Brokers Sources
The type of a link matters only to nodes on each end November 1998 © Michael J. Franklin
12
Using Network Transparency
An example: DB Server Proxy cache A slightly more complex example. Can vary dynamically November 1998 © Michael J. Franklin
13
Large-Scale On Demand Broadcast
Server downlink Clients uplink November 1998 © Michael J. Franklin
14
Push vs. Pull Scalability
Pull-broadcast Pull-unicast Push- broadcast [few clients] [many clients] November 1998 © Michael J. Franklin
15
Previous Algorithms First Come First Served (FCFS)
Longest Wait First (LWF) [Dykeman et al.] Most Requests First (MRF) “ MRF-Lowest (MRFL) “ Priority Index Policy (PIP) [Su and Tassiulas] [Vaidya and Hameed] November 1998 © Michael J. Franklin
16
Previous Algorithms - Performance
* Zipf distribution * overhead ignored LWF: wins but it is impractical. November 1998 © Michael J. Franklin
17
Looking Deeper Hot Pages Cold Pages November 1998
© Michael J. Franklin
18
RxW - A Scalable, Tunable Scheduling Algorithm
Wait queue is FIFO Request queue is sorted by # of reqs Choose item with maximal RxW value Pruning is effective (72% savings), but... Approximate version can be as cheap as O(1). November 1998 © Michael J. Franklin
19
Average Waiting Time (without scheduling overhead)
November 1998 © Michael J. Franklin
20
Scheduling Overhead increasing arrival rate increasing dbSize
November 1998 © Michael J. Franklin
21
Prototype Implementation
Server IP-multicast Downlink (100 Mbps) UDP Uplink (10 Mbps) Clients November 1998 © Michael J. Franklin
22
Responsiveness (Prototype)
Average Wait November 1998 © Michael J. Franklin
23
Data Staging Problem: Scheduling algorithms assume that all data are readily available for broadcast. In reality, data may reside on disk, tertiary storage, or even at remote (e.g., WWW) sites. We have integrated RxW with: Server Cache Management Disk Prefeteching Opportunistic Scheduling (“postfetching”) November 1998 © Michael J. Franklin
24
I. Caching - Love/Hate Hints
Goal: Reduce need to fetch data (i.e., misses). Skewed access many cold pages. Least Recently Used handles cold pages poorly. Approach: Love hot pages, hate cold ones. A scheduled page is considered hot if both: It is encountered on the R-list first. It is within the top “hot range” R-list pages. November 1998 © Michael J. Franklin
25
Caching Performance (RxW.90)
November 1998 © Michael J. Franklin
26
II. Prefetching Prefetching is a common data staging technique (e.g., video-on-demand, WWW). Since hot pages are cached --- prefetch the cold ones. top “prefetch_window” pages in the W-list are prefetched. November 1998 © Michael J. Franklin
27
III. Opportunistic Scheduling
More importantly -- keep the broadcast busy! “Postfetch” missed page and send out another. For data broadcast, this is best. Square root rule. Small latency penalty. No guesswork. November 1998 © Michael J. Franklin
28
Prefetch vs. Postfetch November 1998 © Michael J. Franklin
29
On-Demand Broadcast Summary
For scalability, real solutions must take into account scheduling overhead and data staging. RxW is 4 times faster than previous algorithms for same scheduling quality, and can be as cheap as O(1). RxW provides hints for data staging. Love/Hate caching plus “postfetching” obviate the need for speculative prefetching. November 1998 © Michael J. Franklin
30
The DBIS Toolkit for Network Data Management
Clients Brokers Sources November 1998 © Michael J. Franklin
31
Other Information Brokers
Information Broker Architecture Other Information Brokers Data Sources Forwarded Profiles Data Items Data Items Data Source Manager Broker Manager Data Sources Decomposed Profiles / Profile Updates Pull Requests Data Source Registration Filtered Data HD Cache Catalog/Profile Manager Mapper Profiles / Pull Requests Catalog Updates IB Master Broadcast Manager Scheduler Client Manager Network Manager Profiles / Pull Requests Acknowledgement (Tune information) Clients Broadcast Medium November 1998 © Michael J. Franklin
32
Map Dissemination Application
IBMaster, First IB, a Data Source and a Client Second IB and Client November 1998 © Michael J. Franklin
33
DBIS Status Level 0 prototype constructed.
Supports Publish/Subscribe only. Broadcast disks and On-demand scheduling to be added. A key research issue is profiles. How to express (is XML helpful?)? How to manage and search 100’s of thousands? How to automatically learn and maintain. Clustering and categorization. November 1998 © Michael J. Franklin
34
Summary Dissemination-based applications require new solutions.
Multiple types of data delivery can be combined easily due to network transparency. We have developed scheduling and data staging techniques and are creating a toolkit . Communication is important but a data management perspective is also essential. Databases Network Data Management. November 1998 © Michael J. Franklin
35
An Analytical Treatment for RxW
Average waiting time: N ui b-1 b i ui W0 + Wi + Wi Wb = i = b+1 i = 1 b-1 b i 1 - [ 1 - ] ui i = 1 Average waiting time is bounded inherent scalability in # of users Square root ratio (optimal for push-based) MRF - straight ratio, too much to hot pages November 1998 © Michael J. Franklin
36
Worst Case Waiting Time
November 1998 © Michael J. Franklin
37
November 1998 © Michael J. Franklin
38
Extra graphs November 1998 © Michael J. Franklin
39
November 1998 © Michael J. Franklin
40
November 1998 © Michael J. Franklin
41
November 1998 © Michael J. Franklin
42
Caching Policies Goal: keep hot pages in cache. November 1998
© Michael J. Franklin
43
November 1998 © Michael J. Franklin
44
November 1998 © Michael J. Franklin
45
Profiles [Cetintemel, Franklin, Giles 98]
Push requires models of user interests. The accuracy of these models determines the perceived usefulness of the system. The management of these profiles determines performance and scalability. We have developed a novel, multi-modal technique for profile representation and learning. November 1998 © Michael J. Franklin
46
Properties of On-Demand Broadcast
Broadcast is inherently scalable Can do better than FIFO Should approach optimal Push (assuming infinite server, no request overhead) Need to balance average and worst case delay Need algorithms that scale with the database size, bandwidth, and workload intensity. November 1998 © Michael J. Franklin
47
DBIS Application Architecture
DS Code DS Library IB Executable Client Code Library Toolkit Parts Application Developer Implementation Data Source Information Broker IB Master Executable Data Catalog Info. Other IB Executables November 1998 © Michael J. Franklin
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.