Presentation is loading. Please wait.

Presentation is loading. Please wait.

A case study: IPTV SLA Monitoring

Similar presentations


Presentation on theme: "A case study: IPTV SLA Monitoring"— Presentation transcript:

1 A case study: IPTV SLA Monitoring
Corso di Reti di Calcolatori II A case study: IPTV SLA Monitoring Giorgio Ventre The COMICS Research Group @ The University of Napoli Federico II,

2 The general problem: SLA, who cares? A business case for QoS
Outline The general problem: SLA, who cares? A business case for QoS Defining Service Level Agreements A Real-Life SLA monitoring service A case study: IPTV SLA Monitoring

3 Recent trends in the industry
New emerging multimedia services both in fixed and wireless networks Traditional voice carriers are moving to NGN: Essential to control costs and drive up revenues Triple play services: Voice – Video – Data Video represents a key element of the service portfolio Price/quality balance must attract/retain users TV quality must compete with satellite and cable

4 Challenges and quality issues
Users are conditioned to expect high quality TV pictures: Users unlikely tolerate poor/fair quality pictures in IPTV Early delivery of broadband services is unfeasible due to the limited bandwidth compared to cable and satellite Compulsory data compression can potentially degrade quality Need for robust transmission to minimize data- loss and delay

5 Why Quality Assurance is a major issue?
Because otherwise we wouldn’t be here Quality Assurance adds a new perspective to the flatness of the current market of triple-play services Quality measurement for service assurance End-to-end quality monitoring SLA based on quality delivered to end-user New business models and scenarios Se la vuoi dire più formale: “QoS assurance adds a new dimension in the space of the current market of triple-play services.”

6 QoS vs QoE Quality of Service (QoS) refers to the capability of a network to provide better service to selected network traffic over various technologies. QoS is a measure of performance at the packet level from the network perspective. Quality of Experience (QoE) describes the performance of a device, system, service, or application (or any combination thereof) from the user’s point of view. QoE is a measure of end-to- end performance at the service level from the user perspective.

7 MOS: Mean Opinion Score
From QoS to MOS MOS: Mean Opinion Score Used in POTS to have a quantitative value for a “qualitative” evaluation: How do you evaluate the quality you perceived during your last service usage/access? Very easy for simple services: telephony Very complex for complex services: multimedia (sound vs video vs data vs mix) Even more complex when quality of service depends on the distribution network AND terminals AND servers

8 QoS evaluation Per una valutazione E2E della QoS video bisogna collocarsi sui sistemi terminali.

9 Requirements Identify parameters contributing to a satisfactory QoE
Define network performance requirements to achieve target QoE Design measurement methods to verify QoE

10 Performance parameters
IPTV service is highly sensitive to packet loss The impact of packet loss depends on several factors: Compression algorithm (MPEG2, H.264) GOP structure Type of information lost (I, P, B frame) Codec performance (coding, decoding) Complexity of the video content Error concealment at STB

11 Traditional metrics such as PSNR, PLR, BER are inadequate
Quality Measurement Quality Measurement Objective Pure computational Network performance Objective perceptual Measurements representative of human perception Traditional metrics such as PSNR, PLR, BER are inadequate Requirements for objective perceptual metrics

12 Why Quality-Monitoring is hard?
Measures have to be: Time-based Remoted Distributed Sharp Highly etherogeneous environments (codecs, CPEs, media-types, …) Sampled measures? SLAs are not sampled. In order to ensure quality, measures have to be carried out with quality

13 Why Quality-Monitoring is hard?
High impact also of content based factors: MPEG performance depends on content “pattern” and scene changes Highly variable (movements, colours, lights) scenes generates more data Stallone vs Bergman or better Rambo vs The Seventh Seal

14 Methods: state of the art
Full-Reference Reduced-Reference No-Reference

15 Full-reference Measures are performed at both the input to the encoder and the output of the decoder Both the source and the processed video sequences are available Requires a reliable communication channel in order to collect measurement data

16 Reduced-Reference Extracts only a (meaningful) sub-set of features from both the source video and the received video A perceptual objective assessment of the video quality is made The transmitter needs to send extracted features in addition to video data

17 No-Reference Perceptual video quality evaluation is made based solely on the processed video sequence There is no need for the source sequence Measurements results are intrinsically based on a predictive model

18 Standards for voice quality assessment
ITU-T P.862 (Feb. 2001): Full-reference perceptual model (PESQ) Signal-based measurement Narrow-band telephony and speech codecs P provides output mapping for prediction on MOS scale ITU-T P.563 (May 2004): No-reference perceptual model Narrow-band telephony applications

19 Standards for voice quality assessment
ITU-T P (Nov. 2005): Extension of ITU-T P.862 Wide-band telephony and speech codecs (5 ~ 7Khz) ITU-T P.VQT (ongoing) Targeted at VoIP applications Uses P.862 as a reference measurement Models analyze packet statistics; speech payload is assumed

20 Standards for video quality assessment
ITU-T J.144 and ITU-R BT.1683 (2004) Full reference perceptual model Digital TV Rec. 601 image resolution (PAL/NTSC) Bit rates: 768 kbps ~ 5 Mbps Compression errors

21 Standards for video quality assessment
IETF RFC 4445 (April 2006): A proposed Media Delivery Index (MDI) MDI can be used as a quality indicator for monitoring a network intended to deliver applications such as streaming media, MPEG video, Voice over IP, or other information sensitive to arrival time and packet loss. It provides an indication of traffic jitter, a measure of deviation from nominal flow rates, and a data loss at-a-glance measure for a particular video flow.

22 Our research Objectives: Approach:
Real-time computation of achieved quality level “Quality” as perceived by the user Per-single-user measurements Light computation (about +5% overhead) Approach: Media playout and measures are both part of an integrated process Measurement subsystems exposes a consistent abstract interface Measurements results are high-level quality indicators In questa slide si presentano gli obiettivi del nostro sistema di misurazione. Il secondo solo è una scelta filosofica. Il primo ed il terzo comportano un’alta granularità delle misure, su due dimensioni (tempo ed utenti), introducendo potenziali problemi di scalabilità. Il quarto aggiunge l’ulteriore complicazione dovuta al fatto che il CPE non è potentissimo. Di seguito sono elencate le scelte di alto livello che abbiamo fatto, che puntano a risolvere le problematiche di cui sopra: Il sistema di misurazione non può essere immaginato come avulso dal sistema di rendering e guardare ad esso come una scatola nera. Bensì i due sistemi devono essere integrati e cooperare. Questo consente una serie di importanti ottimizzazioni (al costo di una progettazione replicata per i vari casi), e tende a distribuire capillarmente l’infrastruttura di misurazione della qualità (cosa che va a vantaggio della scalabilità). Il sistema di misura espone i risultati mediante un’interfaccia astratta di alto livello (un semplice esempio è il MOS, ma non è il solo) che prescinde dai dettagli implementativi (codec, parametri di rete, ecc.), tanto un video è pur sempre un video e ciò che ci interessa è che l’utente sia contento. Questo secondo punto è cruciale perché grazie ad esso non solo è possibile convogliare verso un control center i risultati delle misure che a questo punto sono omogenei (quindi confronti, statistiche, …), ma è anche possibile operare delle aggregazioni che ricalcano la struttura gerarchica della catena di distribuzione. Anche questo quindi gioca a favore della scalabilità, che non viene così compromessa (le misure aggregate p. es. occupano poca banda verso il control center). Inoltre è comunque possibile scendere a livelli di dettaglio granulari (singolo utente) laddove le misure aggregate mostrassero delle criticità. I risultati delle misure consistono in pochi indicatori sufficientemente rappresentativi. Non si punta dunque ad avere una serie di informazioni che consentano di diagnosticare qual è stato l’eventuale problema, ma solo se c’è stato il problema (in altre parole: il troubleshooting non si fa in produzione). Questo aspetto consente di raggiungere l’obiettivo del basso impatto delle misure sul sistema.

23 Evaluates the video quality as perceived by the user
VQM (1/2) No-Reference Evaluates the video quality as perceived by the user QoS  QoE Based on MPEG2 Light parsing Doesn’t parse motion vectors, DCT coefficients, and other macroblock-specific information degradation due to packet losses is estimated using only the high-level information contained in Group of Pictures, frame, and slice headers

24 VQM (2/2) i.e. what kind of error concealment strategy it uses.
Does not need to make assumptions concerning how the decoder deals with corrupted information i.e. what kind of error concealment strategy it uses. Based on this information it determines exactly which slices are lost GoP loss-rate Frame loss-rate Slice loss-rate Differentiation per frame type (I, P, B) It computes how the error from missing slices propagates spatially and temporally into other slices Appropriate for measuring video quality in a real-time fashion within a network

25 Parsing method (1/2) GOP I B B P B B P B B P B B X Frame
Questa slide mostra come errori su pictures (le X) si propagano su altre pictures dipendenti (le righe rosse sotto le pictures). E di sotto si mostra un errore (per es. perdita di un pachetto) come condiziona il rendering di una picture (slices che non si vedono bene, freezing, ecc).

26 Parsing method (2/2) MPEG-2 video bitstream DECODER Quality Measurement La catena di decodifica, a partire dallo stream ricevuto. Decoded video stream RENDERING HEADERS

27 QoE vs. MOS Mapping between Quality of Experience evaluation and MOS (Mean Opinion Score – ITU/T P.800) value QoE MOS QMAX 5 4 3 2 1 Qui va solo detto che abbiamo definito una tecnica euristica di mapping tra la qualità percepita (soggettiva) ed un indicatore (oggettivo), il MOS.

28 MOS vs SLAs Knowledge of the function MOS(t) directly enables SLAs monitoring DOWN TIME 5 4 3 2 1 MOS TIME SLA TRESHOLD Avere sotto controllo l’andamento temporale del MOS consente di verificare se un SLA è stato violato. E’ particolarmente interessante notare che sono pochi i dati che devono confluire verso il control center. Si può pensare per esempio agli istanti temporali in cui il MOS subisce una variazione di livello (che concettualmente non è altro che una tecnica di compressione del segnale temporale). Ecco perché la scalabilità del sistema è salva.

29 Video Characteristics: MPEG2-TS Constant Bit Rate: 3.9Mbps
Experimental testbed Controlled-Loss Router Video Server Dropped Packets Video Client + Quality Meter Video Characteristics: MPEG2-TS Constant Bit Rate: Mbps

30 High Quality Throughput: 5.0 Mbps

31 Medium Quality Throughput: 3.9 Mbps

32 Low Quality Throughput: 3.0 Mbps

33 From SLA to PLA: Provisioning Level Agreements
Scuola di Dottorato in Ingegneria Informatica Palermo, settembre 2007 From SLA to PLA: Provisioning Level Agreements Giorgio Ventre The COMICS Research Group @ The University of Napoli Federico II, & ITEM Laboratory, Italian University Consortium on Informatics

34 A service model for resilient networks
We are moving from Quality of Service to a more complex concept of quality+resiliency

35 Quality of future distributed services
The most important QoS characteristic for future distributed services is arguably going to be resilience Resilience is the property of a system to restore services to normal after a failure (as fast as the service users need) However, an investigation into resilience reveals the importance of considering risk when developing our future research agenda

36 The need for resilience
We are increasingly reliant on the Internet and on networked systems in general (including of course the Web) This is happening in businesses and indeed in every walk of life including the home The EU is promoting and developing the Information Society, which is based on communication technologies and systems

37 Interdependence of networks (1)
Not only are we dependent on networks But all sorts of other networks are, too Electricity, water, gas Corporate networks Banking networks Health networks … Information networks are crucial to the successful operation of other networks

38 Interdependence of networks (2)
Interdependencies of critical infrastructures Power nets and information nets: The virtual utility “The introduction of proper supporting ICT of power nets forming a virtual utility is an important instance of networked enabled capabilities (NEC) systems. Furthermore, by pursuing this task we can gain experiences and develop models and technologies that besides addressing societal critical systems also can be useful in other efforts on development and maintenance of complex systems.” In Italy, Report del Comitato sulla Protezione delle Infrastrutture Critiche, Presidenza del Consiglio dei Ministri, 2004

39 Internet meltdown? Article in The Independent (UK) 8 September 2004:
“The internet is becoming a utility” [Karl Auerbach] As a utility, the net will have to live up to different, more stringent standards than its previous uses as an academic and research playground, and then a mainstream experiment. People are building billion- dollar businesses, governments are turning themselves digital, and in the meantime there isn't so much as a service-level agreement to guarantee that the most basic level of connectivity will be there tomorrow. If the technologists no longer believe they can fix it by themselves, the Internet really has hit a meltdown.

40 Vulnerabilities The Internet was originally designed to withstand basic link and switch failures But it was never envisaged as a utility (i.e. offering near- perfect availability), supporting commercial initiatives and acting as a vital infrastructure Whatever vulnerabilities are present in the infrastructure may be inherited by the applications it aims to support

41 Attacks Complex, well engineered systems should be built by keeping in mind faults Today, we need to keep into account other disruption sources Network attacks of all sorts are increasing in variety and number: Spam / junk Viruses, Worms etc. DDoS attacks Physical … These cause huge costs in time and energy, but no coherent approach to a solution

42 Multiple levels This is of course a multi-level problem
Physical layer Networking / IP Middleware layer / O.S. Web / applications A solution to achieving resilience needs to apply at all levels: this is a grand challenge for future networked systems infrastructure

43 Complexity This is a distributed computing problem
According to Leonard Kleinrock, we have no suitable theory to handle this, because of its inherent complexity This is compounded by nomadicity [complicated = difficult to study but fit for purpose, static; whereas complex = growing, evolving]

44 Complexity not simplicity
In spite of all hype on global network architectures, today we face a complex, heterogeneous reality: Fixed access networks: POTS, xDSL, CATV, MetroLAN Mobile, wireless access networks: GPRS, UMTS, WiFi, Wimax Interoperability with terrestrial digital broadcasting Additional complexity issues: New, diverse terminals: (Symbian Cell. Phones, PDAs, smart set-top-boxes) Dynamic creation of novel services and applications

45 Complexity as an opportunity
The availability of a multiplicity of networks, devices and services should be seen as an opportunity: No single infrastructure of critical importance Ease of access to all players: government, companies, common people Availability of a multitude of sources of information Availability of a multitude of computing resources Availability of a multitude of communication media/networks … provided that such a rich scenario can be managed as a system

46 We learned some lessons recently:
Some recent events We learned some lessons recently: 9/ Attacks US East Coast Blackout Italy Blackout Series of attacks: Worms (NIMDA, Witty, Slammer …) DDOS Routing attacks We probably need to re-discover traditional values typical of traditional engineering practice

47 Findings of the Committee
Lessons from 9/ “The Internet under Crisis Conditions” A Committee of the National Research Council of the National Academies ( Findings of the Committee Attacks had very limited effects on the Internet as a global, best effort communication system Internet technology appears to be robust per se but considerable efforts are needed to protect Internet- based systems Many critical interdependencies discovered only after the attacks

48 Known and less known effects
Dependency of Internet on other telecommunication systems (fixed, wireless, cellular) Obvious: co-location of sites, tubes, cables; running out of diesel… Not so obvious : e.g. communications between NYC ISPs and TelCos hampered by problems to toll-free numbers Facility disaster planning as a rare expertise/culture in the Internet world Very limited capacity of backup power generation even in major ISP sites/POPs Other issues, e.g. DNS for .za domain was hosted on a server in NYC WiFi LANs of two major Manhattan hospitals operating in outsourcing via Internet

49 Lessons from 9/ Anticipated by the US East Coast blackout: much larger scale than WTC but apparently more limited damage Different effects and impacts POTS infrastructure capable of enduring very long power outages: practically no effects Cellular Networks locally in deep crisis National TV and Radio broadcasters OK, local players generally in crisis “Global” and VoIP operators knocked-out What about the Internet? All IT based services affected : AAA, CDN, Servers

50 Lessons in ATC systems Press Releases ( detail.aspx?id=394) MASSIVE POWER, COMMUNICATIONS FAILURE AT MAJOR AIR TRAFFIC CONTROL CENTER PUTS CONTROLLERS IN DARK, FLIGHTS IN JEOPARDY 07/19/2006 Bob Marks                                              PALMDALE, Calif. – A massive power and communications failure late Tuesday at the Los Angeles Air Route Traffic Control Center left scrambling air traffic controllers to deal with a nightmare scenario – how to keep dozens of flights away from each other above a large swath of the Southwestern United States despite the inability to see them, talk to them or relay crucial instructions for 15 excruciatingly long minutes. Every ounce of skill, heart and determination that controllers bring into the control room every day was put to the test during one of the worst outages to ever hit the facility. It was so bad, controllers say, that the only thing they had of use to aid the situation that actually worked was their cell phones – devices which the Federal Aviation Administration, inexplicably, has barred from control rooms, further impeding the safety of the system. More details in

51 Issues for research (1) Forget OSI-type layering/abstractions
Services depend not only on peer and adjacent layers Resiliency is a system-wide issue, with vertical and horizontal dependencies Start speaking about networked systems and not only of networks IT based services must be considered as part of the whole picture Contributions from several disciplines Multi-level approach Cross-layer approach

52 Issues for research (2) Monitoring of services and infrastructures
We can’t trust what we can’t control Robustness of services To unexpected situations: faults, misconfigurations, excessive demand, soft attacks (DDOS) To expected but complex situations: tools/methodologies for proper dimensioning of services (Service Engineering) Resiliency of infrastructures Focus on survivability of communication systems to hard attacks (terrorist hits, natural disasters) Reconfigurability of communication systems Make different networks/systems a single infrastructure

53 Issues for research (3) Towards a GRID of communication infostructures
Connect them all physically Make them resilient separately Allow for services to migrate Prepare for interconnecting them if needed From the computational GRID to the communication GRID But try to make it with an autonomic communication flavour

54 Issues for research Resiliency of infrastructures
Focus on survivability of communication systems to hard attacks (terrorist hits, natural disasters) Reconfigurability of communication systems Make different networks a single infrastructure Resiliency of services To unexpected situations: faults, excessive demand, soft attacks (D-DOS) To expected but complex situations: tools/methodologies for proper dimensioning of services (Service Engineering)

55 From QoS to Resiliency to …
We should not forget the past QoS is as important as resiliency, and is back “Value of supporting Class-of-Services in IP Backbones”, M. Yuksel et al., IWQOS 2007 Also because QoS is a good mechanism to improve resiliency of a distributed system So, we should probably talk about QoSiliency

56 User-Centered Architectures
Service Directories Commercial Service Access Controllers User Service Controllers Resource Controllers QoS-capable Networks

57 SLS SLA SLAs are the triggers Service Access Directories Service
Info about content (metadata) Service Directories Access Controllers Service Controller SLA SLS Resource Controllers User --- Policy rules QoS-capable Networks

58 A change of perspective
One of the major problems with SLA based architectures was their limited capability to scale with the number of users and services We therefore introduce the concept of Provisioning Level Agreement (PLA): A PLA is a contract between a service provider and the owner of the Infrastructure defining the level of service to be guaranteed to final users during the provisioning of a service on top of that Infrastructure.

59 A change of perspective (cont.)
In a PLA it is the service provider who defines the type of service the treatment the service needs to get from the network (QoS, resiliency needs, security and privacy reqs.) the classes of possible SLAs that can be subscribed by the users A PLA is signed at service deployment time, and can be dynamically modified and updated any time the service characteristics and requirements change Once a PLA is signed, Provisioning Level Specifications are produced to allow the infrastructure to be properly configured to accommodate the new service and future service subscriptions by final users

60 Service Centered Architectures
Info about content (metadata) Service Centered Architectures Service Directories Service Provider PLA PLS Resource Controllers --- Policy rules QoS-figurable Networks PLAs are the triggers


Download ppt "A case study: IPTV SLA Monitoring"

Similar presentations


Ads by Google