Research in Data Broadcasting

Research in Data Broadcasting
Michael Franklin University of Maryland November 1998 Joint Work with D. Aksoy, M. Altinel, R. Bose, U. Cetintemel, J. Wang, and S. Zdonik November 1998 © Michael J. Franklin

Data Dissemination Many emerging applications involve
large-scale data distribution. stock and sports tickers traffic information systems software distribution news and/or entertainment delivery November 1998 © Michael J. Franklin

How well does it Scale? Election, Oscar, DeepBlue, etc. Result servers
use WWW server to disseminate results Scalability?: Server crashes, intolerable delays,… A Better Screen Saver (e.g., Pointcast) “push-based” interface for customized news over the Internet. Scalability?: large companies restrict usage due to heavy LAN traffic. November 1998 © Michael J. Franklin

“Push” is a Potential Answer
Broadcast Media Newspapers, Radio, TV, Junk Mail,… Unicast Mail, Telephone, ,… Data Push Teletext, BCIS, Datacycle, etc. Publish/Subscribe Webcasting, Internet Push November 1998 © Michael J. Franklin

Push or Pull? - Webcasting (e.g., PointCast)
Web push technology is exploding --- even though there’s no such thing. Byte, 8/97 Client Converter User Interface Server PULL PUSH November 1998 © Michael J. Franklin

Push with Periodic Broadcast (e.g., Broadcast Disks)
Let’s look at another kind of push based system. Teletext is a good example of this. Broadcast disks is our own work. Repetition creates a revolving disk. Good for intermittent connection, limited memory, high turn-over, or huge client population. November 1998 © Michael J. Franklin

Broadcast Disks Broadcast hottest data more frequently
Superimpose multiple “disks” on broadcast A flexible, tunable, memory hierarchy Client storage resources can be exploited to mitigate scheduling mismatch. November 1998 © Michael J. Franklin

Push or Pull? - Broadcast Disks
CACHE User Interface Server Client(s) PUSH PULL November 1998 © Michael J. Franklin

What’s going on here? 1) Push vs. Pull is just one dimension along which to compare data delivery mechanisms. - We focus on three. 2) Different mechanisms for data delivery can (and should) be used across different links. - Enabled by Network Transparency. November 1998 © Michael J. Franklin

Delivery Options Push Pull Aperiodic Periodic Unicast 1 to n request/
response w/snoop polling polling w\snoop lists publish/ subscribe list digests Broad- cast disks request/ response All of these combinations make sense. There are likely finer grained divisions in this hierarchy. publish/ subscribe November 1998 © Michael J. Franklin

Network Transparency Clients Brokers Sources
The type of a link matters only to nodes on each end November 1998 © Michael J. Franklin

Using Network Transparency
An example: DB Server Proxy cache A slightly more complex example. Can vary dynamically November 1998 © Michael J. Franklin

Large-Scale On Demand Broadcast
Server downlink Clients uplink November 1998 © Michael J. Franklin

Push vs. Pull Scalability
Pull-broadcast Pull-unicast Push- broadcast [few clients] [many clients] November 1998 © Michael J. Franklin

Previous Algorithms First Come First Served (FCFS)
Longest Wait First (LWF) [Dykeman et al.] Most Requests First (MRF) “ MRF-Lowest (MRFL) “ Priority Index Policy (PIP) [Su and Tassiulas] [Vaidya and Hameed] November 1998 © Michael J. Franklin

Previous Algorithms - Performance
* Zipf distribution * overhead ignored LWF: wins but it is impractical. November 1998 © Michael J. Franklin

Looking Deeper Hot Pages Cold Pages November 1998
© Michael J. Franklin

RxW - A Scalable, Tunable Scheduling Algorithm
Wait queue is FIFO Request queue is sorted by # of reqs Choose item with maximal RxW value Pruning is effective (72% savings), but... Approximate version can be as cheap as O(1). November 1998 © Michael J. Franklin

Average Waiting Time (without scheduling overhead)
November 1998 © Michael J. Franklin

Scheduling Overhead increasing arrival rate increasing dbSize

Prototype Implementation
Server IP-multicast Downlink (100 Mbps) UDP Uplink (10 Mbps) Clients November 1998 © Michael J. Franklin

Responsiveness (Prototype)
Average Wait November 1998 © Michael J. Franklin

Data Staging Problem: Scheduling algorithms assume that all data are readily available for broadcast. In reality, data may reside on disk, tertiary storage, or even at remote (e.g., WWW) sites. We have integrated RxW with: Server Cache Management Disk Prefeteching Opportunistic Scheduling (“postfetching”) November 1998 © Michael J. Franklin

I. Caching - Love/Hate Hints
Goal: Reduce need to fetch data (i.e., misses). Skewed access  many cold pages. Least Recently Used handles cold pages poorly. Approach: Love hot pages, hate cold ones. A scheduled page is considered hot if both: It is encountered on the R-list first. It is within the top “hot range” R-list pages. November 1998 © Michael J. Franklin

Caching Performance (RxW.90)

II. Prefetching Prefetching is a common data staging technique (e.g., video-on-demand, WWW). Since hot pages are cached --- prefetch the cold ones. top “prefetch_window” pages in the W-list are prefetched. November 1998 © Michael J. Franklin

III. Opportunistic Scheduling
More importantly -- keep the broadcast busy! “Postfetch” missed page and send out another. For data broadcast, this is best. Square root rule. Small latency penalty. No guesswork. November 1998 © Michael J. Franklin

Prefetch vs. Postfetch November 1998 © Michael J. Franklin

On-Demand Broadcast Summary
For scalability, real solutions must take into account scheduling overhead and data staging. RxW is 4 times faster than previous algorithms for same scheduling quality, and can be as cheap as O(1). RxW provides hints for data staging. Love/Hate caching plus “postfetching” obviate the need for speculative prefetching. November 1998 © Michael J. Franklin

Other Information Brokers
Information Broker Architecture Other Information Brokers Data Sources Forwarded Profiles Data Items Data Items Data Source Manager Broker Manager Data Sources Decomposed Profiles / Profile Updates Pull Requests Data Source Registration Filtered Data HD Cache Catalog/Profile Manager Mapper Profiles / Pull Requests Catalog Updates IB Master Broadcast Manager Scheduler Client Manager Network Manager Profiles / Pull Requests Acknowledgement (Tune information) Clients Broadcast Medium November 1998 © Michael J. Franklin

Map Dissemination Application
IBMaster, First IB, a Data Source and a Client Second IB and Client November 1998 © Michael J. Franklin

DBIS Status Level 0 prototype constructed.
Supports Publish/Subscribe only. Broadcast disks and On-demand scheduling to be added. A key research issue is profiles. How to express (is XML helpful?)? How to manage and search 100’s of thousands? How to automatically learn and maintain. Clustering and categorization. November 1998 © Michael J. Franklin

Summary Dissemination-based applications require new solutions.
Multiple types of data delivery can be combined easily due to network transparency. We have developed scheduling and data staging techniques and are creating a toolkit . Communication is important but a data management perspective is also essential. Databases  Network Data Management. November 1998 © Michael J. Franklin

An Analytical Treatment for RxW
Average waiting time:  N ui  b-1  b i ui W0 + Wi + Wi Wb = i = b+1 i = 1 b-1  b i 1 -  [ 1 - ] ui i = 1 Average waiting time is bounded inherent scalability in # of users Square root ratio (optimal for push-based) MRF - straight ratio, too much to hot pages November 1998 © Michael J. Franklin

Worst Case Waiting Time

Profiles [Cetintemel, Franklin, Giles 98]
Push requires models of user interests. The accuracy of these models determines the perceived usefulness of the system. The management of these profiles determines performance and scalability. We have developed a novel, multi-modal technique for profile representation and learning. November 1998 © Michael J. Franklin

Properties of On-Demand Broadcast
Broadcast is inherently scalable Can do better than FIFO Should approach optimal Push (assuming infinite server, no request overhead) Need to balance average and worst case delay Need algorithms that scale with the database size, bandwidth, and workload intensity. November 1998 © Michael J. Franklin

DBIS Application Architecture
DS Code DS Library IB Executable Client Code Library Toolkit Parts Application Developer Implementation Data Source Information Broker IB Master Executable Data Catalog Info. Other IB Executables November 1998 © Michael J. Franklin

Research in Data Broadcasting

Similar presentations

Presentation on theme: "Research in Data Broadcasting"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Research in Data Broadcasting

Similar presentations

Presentation on theme: "Research in Data Broadcasting"— Presentation transcript:

Similar presentations

About project

Feedback