Pytheas: Enabling Data-Driven Quality of Experience Optimization Using Group-Based Exploration-Exploitation Junchen Jiang (CMU) Shijie Sun (Tsinghua Univ.)

Slides:



Advertisements
Similar presentations
A Quest for an Internet Video Quality-of-Experience Metric
Advertisements

Junchen Jiang (CMU) Vyas Sekar (Stony Brook U)
1 Developing a Predictive Model for Internet Video Quality-of-Experience Athula Balachandran, Vyas Sekar, Aditya Akella, Srinivasan Seshan, Ion Stoica,
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
Using Conviva Spark Summit Summary Who are we? What is the problem we needed to solve? How was Spark essential to the solution? What can.
On the Effectiveness of Measurement Reuse for Performance-Based Detouring David Choffnes Fabian Bustamante Fabian Bustamante Northwestern University INFOCOM.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
1 A Framework for Lazy Replication in P2P VoD Bin Cheng 1, Lex Stein 2, Hai Jin 1, Zheng Zhang 2 1 Huazhong University of Science & Technology (HUST) 2.
A Network Measurement Architecture for Adaptive Networked Applications Mark Stemm* Randy H. Katz Computer Science Division University of California at.
1 Load Balance and Efficient Hierarchical Data-Centric Storage in Sensor Networks Yao Zhao, List Lab, Northwestern Univ Yan Chen, List Lab, Northwestern.
1 Optimizing Utility in Cloud Computing through Autonomic Workload Execution Reporter : Lin Kelly Date : 2010/11/24.
1 Load Balance and Efficient Hierarchical Data-Centric Storage in Sensor Networks Yao Zhao, List Lab, Northwestern Univ Yan Chen, List Lab, Northwestern.
Tradeoffs in CDN Designs for Throughput Oriented Traffic Minlan Yu University of Southern California 1 Joint work with Wenjie Jiang, Haoyuan Li, and Ion.
Using Conviva 29 Aug Summary Who are we? What is the problem we needed to solve? How was Spark essential to the solution? What can Spark.
- Conviva Confidential - Understanding and Improving Video Quality Vyas Sekar, Ion Stoica, Hui Zhang.
New Challenges in Cloud Datacenter Monitoring and Management
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
Can Internet Video-on-Demand Be Profitable? SIGCOMM 2007 Cheng Huang (Microsoft Research), Jin Li (Microsoft Research), Keith W. Ross (Polytechnic University)
Resource Placement and Assignment in Distributed Network Topologies Accepted to: INFOCOM 2013 Yuval Rochman, Hanoch Levy, Eli Brosh.
A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.
MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG MADES - A Multi-Layered, Adaptive, Distributed Event Store Tilmann Rabl Mohammad Sadoghi Kaiwen Zhang Hans-Arno.
Developing a Predictive Model of Quality of Experience for Internet Video Athula Balachandran -CMU.
A Quest for an Internet Video Quality-of-Experience Metric A. Balachandran, V. Sekar, A. Akella, S. Seshan, I. Stoica and H. Zhang In Proceedings of the.
The Center for Autonomic Computing is supported by the National Science Foundation under Grant No NSF CAC Seminannual Meeting, October 5 & 6,
Upper Confidence Trees for Game AI Chahine Koleejan.
Optimal Client-Server Assignment for Internet Distributed Systems.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
Aditya Akella The Performance Benefits of Multihoming Aditya Akella CMU With Bruce Maggs, Srini Seshan, Anees Shaikh and Ramesh Sitaraman.
Exploiting Network Structure for Proactive Spam Mitigation Shobha Venkataraman * Joint work with Subhabrata Sen §, Oliver Spatscheck §, Patrick Haffner.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Yaping Zhu with: Jennifer Rexford (Princeton University) Aman Shaikh and Subhabrata Sen (ATT Research) Route Oracle: Where Have.
Understanding the Impact of Network Dynamics on Mobile Video User Engagement M. Zubair Shafiq (Michigan State University) Jeffrey Erman (AT&T Labs - Research)
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
Popularity Prediction of Facebook Videos for Higher Quality Streaming
Md Baitul Al Sadi, Isaac J. Cushman, Lei Chen, Rami J. Haddad
Junchen Jiang, Rajdeep Das, Ganesh Ananthanarayanan, Philip A
Optimizing Distributed Actor Systems for Dynamic Interactive Services
Accelerating Peer-to-Peer Networks for Video Streaming
A CASE FOR A COORDINATED INTERNET VIDEO CONTROL PLANE
DASH2M: Exploring HTTP/2 for Internet Streaming to Mobile Devices
Operating Systems : Overview
Curator: Self-Managing Storage for Enterprise Clusters
Diskpool and cloud storage benchmarks used in IT-DSS
Distributed Network Traffic Feature Extraction for a Real-time IDS
July 3, 2015 MuSIC (co-located with ICME) 2015, Torino, Italy
Applying Control Theory to Stream Processing Systems
CFA: A Practical Prediction System for Video Quality Optimization
EONA: Experience-Oriented Network Architecture
Teng Wei and Xinyu Zhang
R SE to the challenges of ntelligent systems
A Comparison of Overlay Routing and Multihoming Route Control
Sub-millisecond Stateful Stream Querying over
Video through a Crystal Ball:
Available Bit Rate Streaming
VDN: Virtual Machine Image Distribution Network for Cloud Data Centers
Be Fast, Cheap and in Control
HyperLoop: Group-Based NIC Offloading to Accelerate Replicated Transactions in Multi-tenant Storage Systems Daehyeok Kim Amirsaman Memaripour, Anirudh.
Cross-Layer Optimizations between Network and Compute in Online Services Balajee Vamanan.
Architectures of distributed systems Fundamental Models
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz
Architectures of distributed systems Fundamental Models
Operating Systems : Overview
Gigabit measurements – quality, not (just) quantity
Declarative Transfer Learning from Deep CNNs at Scale
Lecture 24, Computer Networks (198:552)
Architectures of distributed systems
Architectures of distributed systems Fundamental Models
Conviva & Sky A real-world OTT video Quality of Experience case study
Presentation transcript:

Pytheas: Enabling Data-Driven Quality of Experience Optimization Using Group-Based Exploration-Exploitation Junchen Jiang (CMU) Shijie Sun (Tsinghua Univ.) Vyas Sekar (CMU) Hui Zhang (CMU, Conviva Inc.)

Key points in one minute… Data-driven QoE optimization shows promising quality improvement … Data-driven optimization should use real-time exploration-exploitation How to make decisions with fresh data of geo-distributed sessions at scale Pytheas: design & implementation of group-based exploration-exploitation

Quality of Experience (QoE) today is not ideal [Source: Conviva]

Data-driven approach is promising Global data of many devices Local data of single device Internet CFA [NSDI’16] Footprint [NSDI’16] VIA [SIGCOMM’16] CS2P [SIGCOMM’16] C3 [NSDI’15] SPAND [INFOCOM’00] Internet Classic approaches Data-driven approach

Status quo: Prediction-based workflow Data Collection QoE Predictor Internet Which CDN and bitrate?

Limitations of prediction-based workflow Data Collection = F(Prior Decisions) QoE Predictor Limitation #1: Prediction bias Less data on historically worse decisions Which CDN and bitrate? Internet Limitation #2: Slow reaction Predictions updated on coarse timescales

Outline What’s the right abstraction? Why it’s challenging? How to implement it in network contexts? Evaluation

Ideal abstraction: Real-time exploration-exploitation (Real time E2) Real-time E2 logic Decision making Data Collection Internet

Drawing a parallel from ML Goal: Maximize mean rewards given a limited amount of pulls Goal: Optimize mean QoE for a limited amount of sessions Slot machines Decision space Reward QoE QoE Reward … Pulls by a gambler Sessions

Outline What’s the right abstraction?  Real-time E2 Why it’s challenging? How to implement it in network contexts? Evaluation

Challenge #1: Application sessions are different Running E2 per geolocation? Doesn’t capture complex factors Real-time E2 logic NYC Comcast iOS NYC Comcast iOS NYC AT&T Flash NYC AT&T Flash Chicago Comcast iOS Chicago Comcast iOS Chicago AT&T Flash Chicago AT&T Flash

Challenge #2: E2 with fresh data of geodistributed sessions Backend Global but stale data Backend Running E2 in Backend? Doesn’t have fresh data Running E2 in Frontend? Doesn’t have global data Frontend Fresh but local data Frontend A Frontend B

Outline What’s the right abstraction? Real-time E2 Why it’s challenging? Applying E2 in networking contexts How to implement it in network contexts? Evaluation

Pytheas: Group-based E2 Backend Running real-time E2 at a per-group granularity Frontend A Frontend B NYC Comcast VoD NYC Comcast Live NYC AT&T Live NYC AT&T Live Chicago Comcast VoD Chicago Comcast VoD Chicago AT&T Live Chicago AT&T VoD

Idea #1: Grouping sessions by Critical Features City ISP Content NYC Comcast VoD F( ) ≈ F( ) NYC Comcast * Sessions in the same group share the best decision Critical Features [NSDI’2016]: Subset of features ultimately determines video quality NYC Comcast VoD NYC Comcast Live NYC AT&T Live NYC AT&T Live Chicago Comcast VoD Chicago Comcast VoD Chicago AT&T Live Chicago AT&T VoD

Idea #1: Grouping sessions by Critical Features Per-group E2 logic Upper Confidence Bound algorithm NYC Comcast VoD NYC Comcast Live NYC AT&T Live NYC AT&T Live Chicago Comcast VoD Chicago Comcast VoD Chicago AT&T Live Chicago AT&T VoD

Idea #2: Per-group sessions share network locality In 90+% of groups, the sessions are from the same ISP and city. Per-group E2 logic Upper Confidence Bound algorithm Frontend A Frontend B Per-group E2 logic (update w. fresh data) NYC Comcast VoD NYC Comcast Live NYC AT&T Live NYC AT&T Live Chicago Comcast VoD Chicago Comcast VoD Chicago AT&T Live Chicago AT&T VoD

Idea #3: Session grouping is persistent Session-grouping logic (updated per 10s min) Backend Frontend A Frontend B Per-group E2 logic (update w. fresh data) NYC Comcast VoD NYC Comcast Live NYC AT&T Live NYC AT&T Live Chicago Comcast VoD Chicago Comcast VoD Chicago AT&T Live Chicago AT&T VoD

Pytheas implementation History storage Session-grouping logic Backend Publish/subscribe Per-group logic Frontend Publish/subscribe Client-facing servers HTTP POST Client (e.g., video player)

More in our paper Cross-frontend E2 Fault tolerance Pytheas API Throughput optimization

Outline What’s the right abstraction? Real-time E2 Why it’s challenging? Applying E2 in networking contexts How to implement it in network contexts? Pytheas (Group-based E2) Evaluation

QoE improvement over a prediction-based baseline Real-world trace: 8.5 million video sessions Major content provider x 24hrs Prediction-based baseline: CFA [NSDI 2016] Join time Buffering ratio Better QoE: Improve over CFA by 6-30% on mean, and up to 24-78% on 90th %ile CDF CDF Pytheas better than CFA Pytheas better than CFA Reduction on join time over CFA (%) Reduction on buffering ratio over CFA (%)

# of sessions per sec (K) # of sessions per sec (K) Microbenchmarks CloudLab instance: 8 cores (2.4 GHz), 64GB RAM Message per client: 400B Scalability: Pytheas throughput is almost horizontally scalable. Frontend Backend Real scale: 30 CloudLab nodes can handle YouTube workload (5B sessions/day) with sub-second feedback delay. # of sessions per sec (K) # of sessions per sec (K) # of instances # of instances

Conclusion Motivation: Data-driven approach shows promising QoE improvement. But prior prediction-based systems have fundamental limitations This talk: Right abstraction: Real-time E2 (Real-time exploration exploitation) Challenge: Respond to geo-distributed clients with fresh data at scale Solution: Pytheas realizes Real-time E2 in networking contexts with Group-based E2 Improve video QoE over a prediction-based baseline by 30% (mean) and 78% (90th%ile)