A Quest for an Internet Video Quality-of-Experience Metric A. Balachandran, V. Sekar, A. Akella, S. Seshan, I. Stoica and H. Zhang In Proceedings of the.

Slides:

Advertisements

Similar presentations

A Quest for an Internet Video Quality-of-Experience Metric

Advertisements

Junchen Jiang (CMU) Vyas Sekar (Stony Brook U)

1 Developing a Predictive Model for Internet Video Quality-of-Experience Athula Balachandran, Vyas Sekar, Aditya Akella, Srinivasan Seshan, Ion Stoica,

How does video quality impact user engagement?

1 Developing a Predictive Model of Quality of Experience for Internet Video Athula Balachandran Carnegie Mellon University.

Imbalanced data David Kauchak CS 451 – Fall 2013.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Confused, Timid, and Unstable: Picking a Video Streaming Rate is Hard Published in 2012 ACM’s Internet Measurement Conference (IMC) Five students from.

Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.

Presented at ICC 2012 – Wireless Network Symposium – June 14 th 2012.

Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

1 A Framework for Lazy Replication in P2P VoD Bin Cheng 1, Lex Stein 2, Hai Jin 1, Zheng Zhang 2 1 Huazhong University of Science & Technology (HUST) 2.

Maximizing Classifier Utility when Training Data is Costly Gary M. Weiss Ye Tian Fordham University.

Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.

The Effectiveness of a QoE - Based Video Output Scheme for Audio- Video IP Transmission Shuji Tasaka, Hikaru Yoshimi, Akifumi Hirashima, Toshiro Nunome.

Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.

The Effects of Latency on Player Performance in Cloud-based Games Mark Claypool and David Finkel Computer Science and Interactive.

Low Latency Wireless Video Over Networks Using Path Diversity John Apostolopolous Wai-tian Tan Mitchell Trott Hewlett-Packard Laboratories Allen.

EVERYWHERE: IMPACT OF DEVICE AND INFRASTRUCTURE SYNERGIES ON USER EXPERIENCE Cost TMA – Figaro - NSF Alessandro Finamore Marco Mellia Maurizio Munafò Sanjay.

Machine Learning CS 165B Spring 2012

Issues with Data Mining

SIGCOMM Outline  Introduction  Datasets and Metrics  Analysis Techniques  Engagement  View Level  Viewer Level  Lessons  Conclusion.

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

Developing a Predictive Model of Quality of Experience for Internet Video Athula Balachandran -CMU.

NAB 2012 Cloud Computing Conference New Levels of Media Performance Data Enabled by Cloud Computing -- and Impact on Other Sectors Scott Brown, GM US and.

1 / 12 PSLC Summer School, June 21, 2007 Identifying Students’ Gradual Understanding of Physics Concepts Using TagHelper Tools Nava L.

1 Mobility Aware Server Selection for Mobile Streaming Multimedia CDN Muhammad Mukarram Bin Tariq, Ravi Jain, Toshiro Kawahara {tariq, jain,

DELAYED CHAINING: A PRACTICAL P2P SOLUTION FOR VIDEO-ON-DEMAND Speaker : 童耀民 MA1G Authors: Paris, J.-F.Paris, J.-F. ; Amer, A. Computer.

Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.

Quality of Service Karrie Karahalios Spring 2007.

The Way Forward Factors Driving Video Conferencing Dr. Jan Linden, VP of Engineering Global IP Solutions.

Aditya Akella The Performance Benefits of Multihoming Aditya Akella CMU With Bruce Maggs, Srini Seshan, Anees Shaikh and Ramesh Sitaraman.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Leveraging Asset Reputation Systems to Detect and Prevent Fraud and Abuse at LinkedIn Jenelle Bray Staff Data Scientist Strata + Hadoop World New York,

Dana Butnariu Princeton University EDGE Lab June – September 2011 OPTIMAL SLEEPING IN DATACENTERS Joint work with Professor Mung Chiang, Ioannis Kamitsos,

CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.

Neural Network Implementation of Poker AI

Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.

Understanding the Impact of Network Dynamics on Mobile Video User Engagement M. Zubair Shafiq (Michigan State University) Jeffrey Erman (AT&T Labs - Research)

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Modeling Web Quality-of-Experience on Cellular Networks

SHADOWSTREAM: PERFORMANCE EVALUATION AS A CAPABILITY IN PRODUCTION INTERNET LIVE STREAM NETWORK ACM SIGCOMM CING-YU CHU.

Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Power Guru: Implementing Smart Power Management on the Android Platform Written by Raef Mchaymech.

26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.

Saving Bitrate vs. Users: Where is the Break-Even Point in Mobile Video Quality? ACM MM’11 Presenter: Piggy Date:

Courtesy Piggybacking: Supporting Differentiated Services in Multihop Mobile Ad Hoc Networks Wei LiuXiang Chen Yuguang Fang WING Dept. of ECE University.

The Effects of Latency on Player Performance in Cloud-based Games Mark Claypool and David Finkel Worcester Polytechnic Institute 1 In Proceedings of the.

Dynamics of Competition Between Incumbent and Emerging Network Technologies Youngmi Jin (Penn) Soumya Sen (Penn) Prof. Roch Guerin (Penn) Prof. Kartik.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

If you have a transaction processing system, John Meisenbacher

Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.

Outline Introduction Related Work

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Pytheas: Enabling Data-Driven Quality of Experience Optimization Using Group-Based Exploration-Exploitation Junchen Jiang (CMU) Shijie Sun (Tsinghua Univ.)

CFA: A Practical Prediction System for Video Quality Optimization

A First Look at Performance of TV Streaming Sticks

Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.

Video through a Crystal Ball:

Introduction to Data Mining, 2nd Edition by

Gigabit measurements – quality, not (just) quantity

Sofia Pediaditaki and Mahesh Marina University of Edinburgh

A task of induction to find patterns

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Conviva & Sky A real-world OTT video Quality of Experience case study

Presentation transcript:

A Quest for an Internet Video Quality-of-Experience Metric A. Balachandran, V. Sekar, A. Akella, S. Seshan, I. Stoica and H. Zhang In Proceedings of the 11th ACM Workshop on Hot Topics in Networks Seattle, WA, USA October 29-30, 2012

Introduction Content delivery costs down, subscription-based services up  Internet video traffic predicted to increase – Possibly surpassing television-based viewership in the future Many players: content providers, content delivery networks, video player designers, and users All face challenge: lack of standardized approach to measure Quality of Experience (QoE) Need one to allow objective comparison of competing designs

New Notions of Quality for Internet Video Measuring quality – Internet video using HTTP over CDN – Largely reliable, so “loss” (PSNR) not so relevant – Instead: buffering, bitrate, frame rate, bitrate switching, startup delay Measuring experience – with ads and subscriptions, opinions in controlled study != engagement for business – Instead: fraction of video played, number of visits to provider

Today, Metrics Fall Short Adaptive video players do ad hoc tradeoffs between bitrate, startup delay and buffering [16, 20, 32] Frameworks for multi-CDN optimization use primitive QoE metrics that only capture buffering, not bitrate switching [28, 29] Content providers have no systematic way to evaluate cost-performance tradeoffs from different CDNs [1]

Robust QoE Measure Need unified understanding – Set of quality metrics together affect impact Rather than each in isolation – Natural since there are tradeoffs for individual metrics E.g. lower bitrate means lower buffering but reduces quality Need quantitative understanding – Beyond simple “metric M impacts engagement” – Instead: “changing metric M from x to y changes engagement from a to b”

Key Factors for Internet Video QoE Complex relationships – relationship between metrics and user experience complex, even counter intuitive – E.g., higher bitrate not always highest quality Metric dependencies – metrics subtle interdependencies and tradeoffs – E.g., switching quality reduces buffering, but can annoy users Impact of content – nature of content can confound factors – E.g., live versus video-on-demand have different viewing patterns. – E.g., users interest in content affects tolerance.

Goal Identify feasible roadmap to robust, unified, quantitative QoE metric Cast QoE measure as machine learning problem – Build appropriate model to predict engagement (e.g. play time) as a function of quality metrics Content-induced effects addressed using domain-specific measurements

Preliminary Results Decision tree-based classifier provides 50% accuracy in predicting engagement Carefully setting up inputs and features could lead to 25% gain in accuracy

Outline Introduction(done) Use Cases for Video QoE(next) Challenges in Measuring QoE Predictive Model for QoE Inference Preliminary Results Discussion Conclusion

Use Cases for Video QoE Netflix objectively evaluate CDN – Also Multi-CDN optimizers CDNs efficiently distribute resources for users Video player to make tradeoffs (bitrate vs. buffer) Users to make choices beyond content. Also, some ISPs have bandwidth quotas Industry agreement? – Set of quality metrics – Need for “in the wild” data, not controlled studies Internet Video Ecosystem

Outline Introduction(done) Use Cases for Video QoE(done) Challenges in Measuring QoE(next) Predictive Model for QoE Inference Preliminary Results Discussion Conclusion

Challenges in Measuring QoE Approach Example from 2 large content providers – One serves TV episodes – One serves live sports events Industry standard QoE metrics [6] – Errors, delays, video quality Mini-outline Complex relationships Interaction between metrics Externalities

Complex Relationships Counter-intuitive effects – Higher quality should have higher engagement – But!  lower quality led to longer play times – Why? For live sports in the background, low quality meant low CPU. When high, terminated Non-monotone effects – Higher average bitrate not always higher quality – Bitrate values in discrete steps. When average “between” steps, means a switch – annoys users Threshold effects – Rates up to 0.5 switches/minute no effect on engagement – Higher rates, users quit early

Interaction Between Metrics Switching versus Buffering – Should switch proactively to avoid buffering – But can annoy users (see previous graph) Join time versus Bitrate – Higher bitrate implies higher quality – But means takes longer to start (to fill buffer)

Externalities (1 of 2) Confounding external factors that affect user engagement Genre – Live similar to VoD in terms of quality (right) – But user engagement different (left)

Externalities (2 of 2) User interest – Sample videos and quit, independent of quality issues – Regional issues for live sports Quality the same (right) But locals watch avg 10 minutes longer (left)

Outline Introduction(done) Use Cases for Video QoE(done) Challenges in Measuring QoE(done) Predictive Model for QoE Inference(next) Preliminary Results Discussion Conclusion

Towards Predictive Model for QoE Inference Engagement = f({QualityMetric i }) Engagement: e.g. playtime, visits to website QualityMetric i : e.g. buffering ratio, bitrate Dependencies & hidden relationships handled through machine learning – As long as there are sufficiently large datasets Fortunately, content providers gather (e.g. Netflix) Confounding effects tackled by domain specific insights – Select input data and feed in – Or identify confounding features and let algorithm handle

Outline Introduction(done) Use Cases for Video QoE(done) Challenges in Measuring QoE(done) Predictive Model for QoE Inference(done) Preliminary Results(next) Discussion Conclusion

Confirm Intuition of Approach 1 month of video viewership, 10 million video sessions 10-fold cross-validation – Divide data into 10 equal sized pieces – Train on 9 pieces, test on 1 piece – Repeat 10 times Two solutions – Strawman – Domain-specific refinement

Strawman Solutions (1 of 2) Use “play time” as engagement Classes, based on fraction of video viewed – E.g.: 5 classes [0-20%, 20-40%, 40-60%, 60-80%, %] Varied learning algorithms (standard) – Naïve Bayes – Simple regression – Classic binary decision tree

Strawman Solutions (2 of 2) Decision trees best Bayes tends to be better when independent Simple regression not good when not linear, e.g., non- monotonic All worse with more classes (finer granularity)

Domain-specific Solutions (1 of 3) Decision trees can capture some complexity, but not confounding effects – Refine with measurements Genre-specific refinement: live and VoD different, so segment into two parts and run separately User interest-based refinement: since users tend to “sample” video, ignore early quitters (those < 5 minutes)

Domain-specific Solutions (2 of 3) About a 20% increase in accuracy

Domain-specific Solutions (3 of 3) About an additional 5% increase in accuracy

Outline Introduction(done) Use Cases for Video QoE(done) Challenges in Measuring QoE(done) Predictive Model for QoE Inference(done) Preliminary Results(done) Discussion(next) Conclusion

Discussion (1 of 2) Metrics – for engagement, need more than playtime – Ad impressions, user loyalty to return, total number of videos viewed – Past work [19] suggests quality affects engagement differently, depending upon metric e.g. delay may not affect specific viewing, but may hurt likelihood to return May need to weigh different engagement metrics Externalities – is everything covered? – User ISP or device viewing may have impact? – Individual user preference has impact? – Motivates more measurement studies – May need more feature selection – May need user profile information

Discussion (2 of 2) Intuitive models – inferring cause for lower engagement tough given confounding factors (user interest, tolerance for low quality) – But need intuitive model so designers and practitioners can make sense of tradeoffs – But machine learning models can be black-box (e.g. Principle Component Analysis – PCA) – Fortunately, decision trees have techniques to turn into more intuitive explanations [27] and equations [25] Validation – how to validate if useful? – Could have test group with system driven by metric

Conclusions Many industries suffer where lack of understandings lead to deceptive marketing – Quote individual metrics to look good, without explanation to grander scheme (e.g. clock speed for CPU, or megapixels for camera) With proliferation of quality factors, Internet video could suffer similar fate Goal: robust, unified, quantified QOE metric Preliminary results – reason to be hopeful

Future Work?

Future Work Additional measures of engagement – E.g. return visits Explanations of decision trees Accounting for end-devices – E.g. PC versus tablet versus phone Accounting for last-mile connection – E.g. WiFi versus 4G versus fiber