Presentation on theme: "1 ShadowStream: Performance Experimentation as a Capability in Production Internet Live Streaming Networks Present by: Chen Alexandre Tian (HUST) Richard."— Presentation transcript:
1 ShadowStream: Performance Experimentation as a Capability in Production Internet Live Streaming Networks Present by: Chen Alexandre Tian (HUST) Richard Alimi (Google) Richard Yang (Yale) David Zhang (PPLive)
Live Streaming is Widely Used 2 Many recent major events live streamed on the Internet Many daily events are streamed as well Justin.tv, livestream, …
State of Art of Live Streaming System Hybrid system (e.g., Adobe Flash 10.1 and later) CDN seeding P2P with BitTorrent-like protocols 3
Performance of Live Streaming System Become Difficult to Understand/Predict System software becoming more complex 4
Internet Environment Complexity 5 ADSL Modem Buffer PowerBoost Inter-ISP throttling …… Misleading results if not considering real network features.
Need Evaluation at Right Scale 6 Misleading results if not considering the target scale.
Key Idea of ShadowStream The production system provides an ideal evaluation platform: real users, real networks, at scale. 7
Starting Point: Use Experiment Algorithm On Real User 8 First Challenge: How to achieve both accuracy and user protection? Two seconds later:
Issues of CDN Protection 9 Scale 100,000 Clients @ 1 mbps rate ->100Gbps More demand with concurrent test channels Network bottleneck There can be bottlenecks from CDN edge servers to streaming clients
New Idea: Scaling Up with Stable Protection Observation: there already exists a stable version w/ reasonable performance 10 Issue: Losses of Experiment Accuracy.
Converge to a Balance Point 12 We should observe m(θ 0 ), but instead we actually observe m(θ).
Putting-Together: Cascading Protection for Accuracy and Scalability 13 Q: Any remaining challenge?
Real user behaviors differ from testing behaviors Idea: transparently orchestrate experimental scenarios from existing, already playing clients Virtual arrivals/virtual departures 14 Test specification Triggering Virtual arrival control Virtual departure control
Independent Arrivals Achieving Global Arrival Pattern Peer generate arrival times by drawing random numbers independently according to the same cumulative distribution function. 15
From Idea to System 16 Challenge: How to minimize developers engineering efforts?
Streaming Hypervisor 17 Hypervisor API need for each streaming engine getSysTime() getLagRange(), getMaxStartupDelay() writePiece(), getPieceMap()
Computing Windows Bounds 18 Hypervisor calls getLagRange()
Compositional Software framework 20 Example: Adding an admission control component
Evaluation: Experiment Accuracy & Protection 21 Only CDN as the Protection : Cascaded Protection :
Evaluation: Experimental Opportunities SH Sports channel and HN Satellite channel, pplive, September 6, 2010 22
Evaluation: Accuracy of Distributed Arrivals 23 Arrival function from Performance and Quality-of-Service Analysis of a Live P2P Video Multicast Session on the Internet. Sachin Agarwal, Jatinder Pal Singh, Aditya Mavlankar, Pierpaolo Bacchichet, and Bernd Girod, In Proceedings of IWQoS 2008. Springer, June 2008
Take Home Idea Many Internet-scale systems are unique systems that are difficult to build/test. The ShadowStream scheme consists of following key ideas: Conduct shadow experiments using real system, real users Protection and accuracy present dual challenges Use Stable for scalable protection Introduce external resources (CDN) to remove interference on competing resources Create shadow behaviors from real users 24
Virtual Sliding Window A streaming engine has two sliding windows: an upload window (P2P) and a download window (CDN and P2P). Each engine call getSysTime() to Hypervisor, based on real system time and time shifted value, Hypervisor assign a virtual system time to each engine. Each engine calculate x(left) and x(right) of download window Each engine advances its sliding window at the channel rate μ pieces per second. 29
The reasoning behind 31 CDN see the original miss-ratio/supply-ratio curve P2P Protection see the curve minus δ
Specification Define multiple classes of clients (e.g., cable or DSL, estimated upload capacity class, or network location) A class-wide arrival rate function λ j (t) Clients lifetime is determined by the distribution Lx 32
Local Replacement for Uncontrolled Early Departures Capturing client state Substitution 33
Triggering Condition Predict(t): autoregressive integrated moving average (ARIMA) method that uses both recent testing channel states and the past history of the same program 34
CDN Capacity and window length CDN window set to 4 seconds The TCP retransmission timeout is 3 seconds for piece loss 1 extra second for waiting retransmitted piece Window length 36
Starting up the engine When starting a streaming engine x, the Streaming Hypervisor gives x pointers to its download and upload windows. at time a(s), the client join test channel and Stable engine starts. at time a(e) >a(s), the client join testing, the Experiment Engine and CDN Protection Engine start. After starting, an engine begins to download pieces starting from the target playpoint to the end of its download window. The piece before startup should be protected by CDN, which would be counted by CDN capacity calculation 37
ShadowStream Outline Motivation and Challenge Experiment Protection and Accuracy Experiment Orchestration Implementation Evaluation 38
Client Substitution Client substitution delay with client dynamics. 39
Sec. 8: Limitation Discussion (Do we really need this?) If Exp consumes resources while no piece received at all (Give priority to Protection?) Download link are bottleneck 41
Modeling P2P Protection Given experiment engine e, target rate R, the miss ratio is m R,e (θ), or, m e (θ) 42 Given protection engine e, its target rate is m e (θ), the required rescue bandwidth is Θ k (m e (θ),p)* m e (θ)= η(e,p,θ)
P2P Protection no accurate result 43 If P1 is the protection, there would exist balance point(s) If P2 is the protection, there would be a negative feed-back loop In either cases, there is no accuracy at all
Example: PPLive From PPLives Presentation Founded by Graduate Students from Huazhong University of Science & Technology PPLive is An online video broadcasting and advertising network provides online viewing experience comparable to TV An efficient P2P technique platform and test bench 48 Estimated global installed base75 million Monthly active users*20 million Daily active users3.5 million Peak concurrent users2.2 million Monthly average concurrent users1.5 million Weekly average usage time11 hours Not Yet!
Three issues delete 56 Information flow control: Although piece 91 is downloaded by the Protection Engine, it should not be labeled as downloaded in the Experiment Engine. Duplicate avoidance: Since both Experiment Engine and Protection Engine are running, if their download windows overlap, they may download the same piece. Experiment feasibility: This lag from realtime is determined when client i joins the test channel with the Protection Engine to make experiment and protection feasible.