A CASE FOR A COORDINATED INTERNET VIDEO CONTROL PLANE

A CASE FOR A COORDINATED INTERNET VIDEO CONTROL PLANE
Xi Liu+, Floring Dobrian+, Henry Milner+, Junchen Jiang$, Vyas Sekar*, Ion Stoica+#, Hui Zhang$+ Conviva+, CMU$, Intel Labs*, UC Berkeley# Presented by Rajath Subramanyam and Gopalakrishna Holla for CS538 Fall 2014 Hi ! my name is Rajath. I will be presenting the paper titled “A case for a coordinated internet video control plane”. This paper is the work of researchers from Conviva, CMU, Intel Labs and UC Berkeley

Introduction Video traffic has become the dominant fraction of Internet data traffic. Netflix: 20% of US Internet traffic Very soon 90% of the Internet traffic will be video content User expectations for a HQ viewing experience are increasing Traditional traffic Latency vs. completion time Video traffic Sustained quality over extended periods of time Video traffic represents a significant fraction of today’s Internet traffic. A study shows that Netflix contributes to 20% of US Internet traffic at peak hours Another study suggests that by 2014, 90% of the Internet traffic will be video traffic On the other hand, user expectations for a high quality viewing experience (low startup delays, low buffering and high bitrates) are continuously increasing. The other worrying thing is that video traffic is very different from traditional workload. Traditional Traffic If you are doing interactive web browsing then latency is critical. If you are using downloading a large file then transfer completion time / throughput is critical. Video traffic Latency is less critical in streaming video since application data units are large enough to amortize latency effects Similarly, overall completion time doesn’t reflect actual user experience as it does not capture the rebuffering-induced interruptions Video traffic requires sustained quality over extended periods of time. In fact a study shows that a 1% increase in buffering time can lead to more than a 3 minutes reduction in expected viewing time. . *Disclaimer: All images from World Wide Web

Introduction HTTP Adaptive streaming protocols
Content providers leverage existing HTTP CDN infrastructure to deliver content to end users Mismatch: Video streaming vs. HTTP-based delivery infrastructure Another reason for the explosion in the video traffic is the shift from specialized streaming protocols like RTMP, RTSP to the ubiquitous HTTP chunk-based streaming protocols. Examples of such streaming protocol is MPEG-DASH (Dynamic Adaptive Streaming over HTTP) and Apple’s HLS (HTTP Live Streaming) The use of a commodity service like HTTP has lowered the barrier of entry for content providers to leverage existing HTTP CDN infrastructure to deliver content to wide audience. This also means end users can use multiple viewing platforms to view the video content. Unfortunately, there is a mismatch between the requirements of video streaming and the architecture of today’s HTTP-based video content delivery infrastructures, both at the ISP and CDN level.

Motivation To achieve HQ viewing experience despite an unreliable video delivery infrastructure How do we achieve this ? What parameters can we adapt ? When to optimize these parameters ? Who is in charge ? Using measurement-driven insights, they make a case for a video control plane The authors of the paper are motivated to take up the goal of improving the HQ viewing experience despite an unreliable video delivery infrastructure They aim to answer the following questions through this work: What parameters can we adapt ? Can we adapt the bitrate ? Can we switch among multiple CDNs ? When to optimize these parameters ? Is it better to optimize at startup time or midstream ? Who is in charge of adapting / optimizing these parameters ? Does the client or the server take charge ? Using measurement-driven insights, the authors make a case for a video control plane that can use a global view of client and network conditions to dynamically optimize the video delivery in order to provide a HQ viewing experience to end user despite unreliable deliver infrastructure. The control plane can dynamically adapt the CDN and the bitrate based on the global knowledge of the network.

Video Quality today Dataset Metrics: Average bitrate
Re-buffering ratio Startup time Failure rate Exits before video start The authors first examine the performance of today’s delivery infrastructure and highlight potential source of inefficiencies . The dataset used in the paper is based on one week of client-side measurements from over 200 million viewing sessions or views (this includes both successful and failed views) Over 50 million viewers 91 popular content providers The content includes both live streaming content as well as video-on-demand content They also focused on some industry standard metrics like : Average bitrate Re-buffering ratio: buffering time divided by buffering + playing time (excluding paused or stopped time and buffering time before video start) Startup time: buffering time before a video starts Failure rate: % of views that failed to start Exits before video start: % of vies that failed to play the video without fatal error.

Video Quality today The results of the experiments they ran on the dataset shows that: Re-buffering ratio (RR) 40% of the views experience at least 1% RR 20% of the views experience at least 10% RR Video startup time 23% of views wait for more than 5 s before video starts 14% wait more than 10 seconds Average bitrate 28% of views have avg bitrate less than 500 kbps 74% of view have avg bitrate less than 1 Mbps The implications of this is massive. As mentioned before a 1% increase in rebuffering ratio leads to reduction in play time by 3 minutes. Viewers with low join time are likely to return and viewers with high average bitrate are likely to watch the video longer. In an ad-based commercial service all of this has massive implications

Source of quality issues
1) Client-side variability Source of quality issues Next we identify and analyze three potential issues that could result in poor video quality: First one is Client-side variability The first figure shows the distribution of the standard deviation of intra-session estimated bandwidth The second figure shows the distribution of the standard deviation of inter-session estimated bandwidth For intra-session, we compute the std dev of all the bandwidth samples across the entire lifetime of a view For inter-session, we compute the average of each session and then compute the std dev across the different sessions. As you can see there is a significant variability in client-side conditions For intra-session for views less than 1 Mbps 20% of the viewers have a deviation of 400 kbps For inter-session for views less than 1 Mbps 25% of users have a deviation of 250 kbps This is a general phenomenon across all ISPs This means that an optimal bitrate must be chosen on the client-side for a smooth viewing experience.

2) CDN variability across space and time Source of quality issues The performance of CDN infrastructure for delivering video can vary significantly both spatially (across ISPs or across geographical regions) and temporally The graphs show the average rebuffering ratio, video startup time and video start failure rate across different geographical regions From the graph we can see that : Performance can vary within a city. For example in city1 the RR of CDN1 is twice the RR of CDN2 For each metric there is no single best CDN optimal across all cities Such variations can happen due: Load Content missing on CDN edge servers Other network conditions These graphs show the same metrics for one of the cities. As we can see in the graphs, for all the metrics, no single CDN is the best all the time. This implies that content providers should use multiple CDNs to optimize deliver of content to different regions over time. Why such variability ?

3) AS under stress Source of quality issues The graphs shows the RR of one AS from all three CDNs during a 4-hour flash crowd period We report the normalized load on the x-axis by dividing the current number of users by the maximum number of clients observed over time. This implies that a heavy load can lead to ISP congest and in turn affect content delivery. One of ways the content providers can deal with such hotspots is by reducing the bitrate and providing higher bitrate to “premium customers”

Framework At one end of the spectrum (row1), we can think of static selection of both CDN and bitrate when player is launched. Row 2 is the de-facto approach seen in most video delivery infrastructures Once a control plane is deployed by the content provider, other options become available in the design space In ideal case, the control plane can dynamically adapt both CDN and bitrate midstream based on global knowledge of the network The authors present all possible options in the design space of a framework to optimize video delivery. All the options are summarized in the table shown in the slide Row 1 shows the simplest design that can be adopted by content providers. Both the CDN and bitrate can be chosen by the client at the startup time. However we have already seen that such a method may not be optimal due to variability Row 2 in the table is the de-facto method used by content providers today. The CDN is chosen at startup time, but the bitrate can be adapted dynamically midstream by the client. The authors propose having a control plane which can be deployed by the content providers or a third party on behalf of the content providers. Once such a control plane is introduced more options become available in the design space. In the most ideal case, the control plane can dynamically adapt both the CDN as well as the bitrate midstream based on the global knowledge of the network. Row 4 and Row 5 show some hybrid designs.

Video Control Plane Video control plane comprises of:
Measurement component Performance Oracle Global Optimization The notion of a centralized control plane is not new. It has been used in CDNs and ISPs to optimize server selection and content placement. However, a vide control plane introduces the following aspects: The capability to adapt CDN and bitrate The capability to adapt this midstream (CDN control plane optimizations happen only at startup time) The authors make some simplistic assumptions to simplify design. They assume that one control plane exists per content provider. The video control planed comprises of 3 components: Measurement component: The measurement enginer periodically collects quality statistics for currently active users. This can be achieved by the client video player reporting statistics periodically. In addition to reporting statistics, it also collects some information about the session and the user such as the ISP, location and the current CDN being used. The challenge off course is to choose what all attributes to collect and the frequency Performance Oracle: This component can answer what-if style questions to predict the performance an user can achieve if he chooses a different combination of CDN and bitrate By design the oracle will have to EXTRAPOLATE based on current and past measurements. Global optimiztion: Essentially at a high level this component solves a resource allocation problem. We want to assign each user a CDN and bitrate that maximizes some notion of global utility for the content providers and end users, while operating within the provider’s cost constraints and the CDN capacities. The challenge here are : First, we have to choose a utility and policy objective. Second, this optimization must be fast enough in order to re-optimize assignments in response to network dynamics. The idea of the authors is to make case for such a framework and present initial steps towards a practical realization rather than prescribe specific utility or policy functions.

Potential Improvement
Assume each session makes an optimal choice Cluster clients using similar attributes Eg.:- ISP, location, device, time-of-day This approach has two logical stages: Estimation Extrapolation a: client’s attributes Sa : set of clients sharing same a Sa,p : set of clients with same choice of parameters as well PerfDista,p : empirical distribution of re-buffering ratio Before attempting to design a specific control plane, the authors want to establish the improvement in video quality that can be achieved. They start with building a model after making some assumptions. They first analyze the potential improvement that clients could achieve by choosing only a better CDN and ignoring the effects of the CDN load The goal is to determine the potential performance improvement assuming each session makes the best possible choice. Estimation: In the estimation step we calculate the empirical performance of each combination of attributed and parameter values. For example let a denote a set of values of a client’s attributes e.g., ISP = AT&T, City=Chicago, Device=Xbox. Further let Sa denote set of clients sharing same attribute values a, and let Sa,p denote the set of clients with attribute values a that have made the same choice of parameter p. (i.e. CDN). An example would be Xbox devices of Comcast’s subscribers located in Chicago that stream content from Akamai. For each Sa,p we calculate the empirical distribution for the metric of interest eg., rebuffering ratio.

Potential Improvement
Extrapolation: Extrapolation selects the parameter with the best performance distribution for this specific value of the attribute a. We use p*a = argminp{MEAN(PerfDista,p)} to denote The parameter with the best performance distribution for this specific value of the attribute a The best possible performance that can be achieved by a session with attribute values a by selecting parameter pa* and assuming that the performance experience by the session is randomly drawn from the distribution PerfDista,pa* Such an estimation suffers from curse of dimensionality as the attributes become more fine-grained. In order to overcome this the authors suggest a hierarchical estimation and extrapolation tenchnique.

Improvement Analysis Average Improvement
Compute average improvement in video quality over one week period for two providers using the above extrapolation method. The authors show the improvement in the average case in the vide quality over a time period of one week for two content providers using the above extrapolation method. The graphs show the average improvement for the three video quality metrics namely: buffering ratio, join time and failure rate. Provider 1 see 2x improvement in Buffering ratio. Provider 1 also shows significant improvement in join time and failure rate. In contrast the deliver quality of provider 2 was already good and thus the scope for improvement is lesser

Improvement Analysis Improvement under stress
The authors expect the room for improvement to be significantly higher under more extreme scenarios The results are dramatic. There is a 10x improvement in buffering ratio and 32x reduction in failure rate for Provider 1 and upto 100X improvement In the failure rate for Provider 2.

Practical Design Impact of bitrate on performance Effect of CDN load
Additional attribute Effect of CDN load Threshold-based Past estimates to predict future performance Tractability of global optimization Specific utility function Once the non-trivial improvement in the vide quality is shown after using a control plane, the authors begin to discuss a practical design for the control plane. The earlier study made some simplistic assumptions. The practical design should also look at the impact of bitrate on video quality The effect of CDN load on video quality This is assumed as threshold-based model The load has no effect on the performance upto a particular threshold after which it starts to decrease linearly and at a later higher threshold the performance falls significantly Etc read above lines

Optimization The practical version of the algorithm aims to define a policy object and a utility function The policy goal is to achieve both Fairness and Efficiency Two-phase algorithm: Assign clients a fair share of CDN resources using average sustainable bitrate Incrementally improve total utility for efficiency Utility function: The practical design of the algorithm aims to define a policy objective and utility function. The policy goal tries to achieve both fairness and efficiency. The policy goal chooses a simple two – phase algorithm that aims: To first achieve fairness by assigning all clients a fair share of CDN resources using average sustainable bitrate Then it uses a greedy algorithm to incrementally improve total utility i.e. it picks the the combination of client and CDN/bitrate setting that provides the largest incremental contribution to the global utility function The utility function is show in the equation above. BuffRatio is in percentage and bitrate is in kbps.

Simulation Trace-driven Strategies: Scenarios: Baseline
Global coordination Hybrid Scenarios: Average case CDN performance degradation Flash Crowd Finally the authors study the qualitative benefits of a practical video control plane over other design through a trace-based simulation. They build custom simulation framework and used the same dataset described earlier. In simulation three strategies are compared Baseline: Each client chooses a CDN and bitrate randomly Global coordination : The control plane aglorithm Hybrid: Each client is assigned to a CDN with lowest loaf and a bitrate by the global optimization when it first arrives, but subsequent adaptation is only limited to client-driven bitrate adaptation i.e. start-time CDN selection and client-side mid-stream bitrate adaptation The experiment simulates three scenarios: Average case: normal client arrival pattern CDN performance degradation: normal client arrival pattern with sudden degradation of one CDN performance Flash crowd: a large number of clients arrive simulatenously

Result Metrics: Average case Average utility Failure rate
We observe that global coordination singnificantly outperforms the baseline strategy in terms of average utility. Failure rate for global coordination is zero

Result Metrics: CDN degradation Average utility Failure rate
In this scenario in epochs 20-60, the previously best CDN experience a huge degradation with the average re-buffering ratio going to 13%, before eventually recovering at epoch 60. Global coordination still maintains zero failure rate. It has higher average utility than the other two strategies.

Result Metrics: Flash Crowd Average utility Failure rate
In this scenario in epochs 20-60, a large number of clients join Global coordination algorithm lowers the bitrate for many users in order to accommodate more new users. Failure rate of global coordination is 0 Avrage utility is best in global coordination d

Discussion Scalability Switching tolerance Interaction with CDNs
Multiple controllers 90% of Internet is video content. But isn’t the rest 10% also as important ? Net neutrality Scalability: A concern with global optimization is scalability vs number of clients and time to respond to network events. Logically partition control plane to different geographical regions Other thing is the network topology of the their experiments are not discussed Will it scale to the Internet ? Switching tolerance: how much bitrate switching can users tolerate Interaction with CDNs: Are CDNs already doing this optimization ? Are they only optimizing for latency ? Eventually CDNs like other fields will open up and provide APIs How does it work with Federated CDNs Multiple controllers: Controllers can talk to each other and share information. So far they are assumed to be independent and not influence each other. Net neutrality: who will be in charge of the control plane. How will it affect net neutrality ?

Thank you !

A CASE FOR A COORDINATED INTERNET VIDEO CONTROL PLANE

Similar presentations

Presentation on theme: "A CASE FOR A COORDINATED INTERNET VIDEO CONTROL PLANE"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A CASE FOR A COORDINATED INTERNET VIDEO CONTROL PLANE

Similar presentations

Presentation on theme: "A CASE FOR A COORDINATED INTERNET VIDEO CONTROL PLANE"— Presentation transcript:

Similar presentations

About project

Feedback