Presentation on theme: "A case study: IPTV SLA Monitoring"— Presentation transcript:
1 A case study: IPTV SLA Monitoring Corso di Reti di Calcolatori IIA case study: IPTV SLA MonitoringGiorgio VentreThe COMICS Research Group@The University of Napoli Federico II,
2 The general problem: SLA, who cares? A business case for QoS OutlineThe general problem: SLA, who cares?A business case for QoSDefining Service Level AgreementsA Real-Life SLA monitoring serviceA case study: IPTV SLA Monitoring
3 Recent trends in the industry New emerging multimedia services both in fixed and wireless networksTraditional voice carriers are moving to NGN:Essential to control costs and drive up revenuesTriple play services: Voice – Video – DataVideo represents a key element of the service portfolioPrice/quality balance must attract/retain usersTV quality must compete with satellite and cable
4 Challenges and quality issues Users are conditioned to expect high quality TV pictures:Users unlikely tolerate poor/fair quality pictures in IPTVEarly delivery of broadband services is unfeasible due to the limited bandwidth compared to cable and satelliteCompulsory data compression can potentially degrade qualityNeed for robust transmission to minimize data- loss and delay
5 Why Quality Assurance is a major issue? Because otherwise we wouldn’t be hereQuality Assurance adds a new perspective to the flatness of the current market of triple-play servicesQuality measurement for service assuranceEnd-to-end quality monitoringSLA based on quality delivered to end-userNew business models and scenariosSe la vuoi dire più formale: “QoS assurance adds a new dimension in the space of the current market of triple-play services.”
6 QoS vs QoEQuality of Service (QoS) refers to the capability of a network to provide better service to selected network traffic over various technologies. QoS is a measure of performance at the packet level from the network perspective.Quality of Experience (QoE) describes the performance of a device, system, service, or application (or any combination thereof) from the user’s point of view. QoE is a measure of end-to- end performance at the service level from the user perspective.
7 MOS: Mean Opinion Score From QoS to MOSMOS: Mean Opinion ScoreUsed in POTS to have a quantitative value for a “qualitative” evaluation:How do you evaluate the quality you perceived during your last service usage/access?Very easy for simple services: telephonyVery complex for complex services: multimedia (sound vs video vs data vs mix)Even more complex when quality of service depends on the distribution network AND terminals AND servers
8 QoS evaluationPer una valutazione E2E della QoS video bisogna collocarsi sui sistemi terminali.
9 Requirements Identify parameters contributing to a satisfactory QoE Define network performance requirements to achieve target QoEDesign measurement methods to verify QoE
10 Performance parameters IPTV service is highly sensitive to packet lossThe impact of packet loss depends on several factors:Compression algorithm (MPEG2, H.264)GOP structureType of information lost (I, P, B frame)Codec performance (coding, decoding)Complexity of the video contentError concealment at STB
11 Traditional metrics such as PSNR, PLR, BER are inadequate Quality MeasurementQuality MeasurementObjectivePure computationalNetwork performanceObjective perceptualMeasurements representative of human perceptionTraditional metrics such as PSNR, PLR, BER are inadequateRequirements for objective perceptual metrics
12 Why Quality-Monitoring is hard? Measures have to be:Time-basedRemotedDistributedSharpHighly etherogeneous environments (codecs, CPEs, media-types, …)Sampled measures?SLAs are not sampled.In order to ensure quality, measures have to be carried out with quality
13 Why Quality-Monitoring is hard? High impact also of content based factors:MPEG performance depends on content “pattern” and scene changesHighly variable (movements, colours, lights) scenes generates more dataStallone vs Bergmanor betterRambo vs The Seventh Seal
14 Methods: state of the art Full-ReferenceReduced-ReferenceNo-Reference
15 Full-referenceMeasures are performed at both the input to the encoder and the output of the decoderBoth the source and the processed video sequences are availableRequires a reliable communication channel in order to collect measurement data
16 Reduced-ReferenceExtracts only a (meaningful) sub-set of features from both the source video and the received videoA perceptual objective assessment of the video quality is madeThe transmitter needs to send extracted features in addition to video data
17 No-ReferencePerceptual video quality evaluation is made based solely on the processed video sequenceThere is no need for the source sequenceMeasurements results are intrinsically based on a predictive model
18 Standards for voice quality assessment ITU-T P.862 (Feb. 2001):Full-reference perceptual model (PESQ)Signal-based measurementNarrow-band telephony and speech codecsP provides output mapping for prediction on MOS scaleITU-T P.563 (May 2004):No-reference perceptual modelNarrow-band telephony applications
19 Standards for voice quality assessment ITU-T P (Nov. 2005):Extension of ITU-T P.862Wide-band telephony and speech codecs (5 ~ 7Khz)ITU-T P.VQT (ongoing)Targeted at VoIP applicationsUses P.862 as a reference measurementModels analyze packet statistics; speech payload is assumed
20 Standards for video quality assessment ITU-T J.144 and ITU-R BT.1683 (2004)Full reference perceptual modelDigital TVRec. 601 image resolution (PAL/NTSC)Bit rates: 768 kbps ~ 5 MbpsCompression errors
21 Standards for video quality assessment IETF RFC 4445 (April 2006): A proposed Media Delivery Index (MDI)MDI can be used as a quality indicator for monitoring a network intended to deliver applications such as streaming media, MPEG video, Voice over IP, or other information sensitive to arrival time and packet loss.It provides an indication of traffic jitter, a measure of deviation from nominal flow rates, and a data loss at-a-glance measure for a particular video flow.
22 Our research Objectives: Approach: Real-time computation of achieved quality level“Quality” as perceived by the userPer-single-user measurementsLight computation (about +5% overhead)Approach:Media playout and measures are both part of an integrated processMeasurement subsystems exposes a consistent abstract interfaceMeasurements results are high-level quality indicatorsIn questa slide si presentano gli obiettivi del nostro sistema di misurazione. Il secondo solo è una scelta filosofica. Il primo ed il terzo comportano un’alta granularità delle misure, su due dimensioni (tempo ed utenti), introducendo potenziali problemi di scalabilità. Il quarto aggiunge l’ulteriore complicazione dovuta al fatto che il CPE non è potentissimo.Di seguito sono elencate le scelte di alto livello che abbiamo fatto, che puntano a risolvere le problematiche di cui sopra:Il sistema di misurazione non può essere immaginato come avulso dal sistema di rendering e guardare ad esso come una scatola nera. Bensì i due sistemi devono essere integrati e cooperare. Questo consente una serie di importanti ottimizzazioni (al costo di una progettazione replicata per i vari casi), e tende a distribuire capillarmente l’infrastruttura di misurazione della qualità (cosa che va a vantaggio della scalabilità).Il sistema di misura espone i risultati mediante un’interfaccia astratta di alto livello (un semplice esempio è il MOS, ma non è il solo) che prescinde dai dettagli implementativi (codec, parametri di rete, ecc.), tanto un video è pur sempre un video e ciò che ci interessa è che l’utente sia contento. Questo secondo punto è cruciale perché grazie ad esso non solo è possibile convogliare verso un control center i risultati delle misure che a questo punto sono omogenei (quindi confronti, statistiche, …), ma è anche possibile operare delle aggregazioni che ricalcano la struttura gerarchica della catena di distribuzione. Anche questo quindi gioca a favore della scalabilità, che non viene così compromessa (le misure aggregate p. es. occupano poca banda verso il control center). Inoltre è comunque possibile scendere a livelli di dettaglio granulari (singolo utente) laddove le misure aggregate mostrassero delle criticità.I risultati delle misure consistono in pochi indicatori sufficientemente rappresentativi. Non si punta dunque ad avere una serie di informazioni che consentano di diagnosticare qual è stato l’eventuale problema, ma solo se c’è stato il problema (in altre parole: il troubleshooting non si fa in produzione). Questo aspetto consente di raggiungere l’obiettivo del basso impatto delle misure sul sistema.
23 Evaluates the video quality as perceived by the user VQM (1/2)No-ReferenceEvaluates the video quality as perceived by the userQoS QoEBased on MPEG2Light parsingDoesn’t parse motion vectors, DCT coefficients, and other macroblock-specific informationdegradation due to packet losses is estimated using only the high-level information contained in Group of Pictures, frame, and slice headers
24 VQM (2/2) i.e. what kind of error concealment strategy it uses. Does not need to make assumptions concerning how the decoder deals with corrupted informationi.e. what kind of error concealment strategy it uses.Based on this information it determines exactly which slices are lostGoP loss-rateFrame loss-rateSlice loss-rateDifferentiation per frame type (I, P, B)It computes how the error from missing slices propagates spatially and temporally into other slicesAppropriate for measuring video quality in a real-time fashion within a network
25 Parsing method (1/2) GOP I B B P B B P B B P B B X Frame Questa slide mostra come errori su pictures (le X) si propagano su altre pictures dipendenti (le righe rosse sotto le pictures). E di sotto si mostra un errore (per es. perdita di un pachetto) come condiziona il rendering di una picture (slices che non si vedono bene, freezing, ecc).
26 Parsing method (2/2)MPEG-2 video bitstreamDECODERQuality MeasurementLa catena di decodifica, a partire dallo stream ricevuto.Decoded video streamRENDERINGHEADERS
27 QoE vs. MOSMapping between Quality of Experience evaluation and MOS (Mean Opinion Score – ITU/T P.800) valueQoEMOSQMAX54321Qui va solo detto che abbiamo definito una tecnica euristica di mapping tra la qualità percepita (soggettiva) ed un indicatore (oggettivo), il MOS.
28 MOS vs SLAsKnowledge of the function MOS(t) directly enables SLAs monitoringDOWN TIME54321MOSTIMESLA TRESHOLDAvere sotto controllo l’andamento temporale del MOS consente di verificare se un SLA è stato violato. E’ particolarmente interessante notare che sono pochi i dati che devono confluire verso il control center. Si può pensare per esempio agli istanti temporali in cui il MOS subisce una variazione di livello (che concettualmente non è altro che una tecnica di compressione del segnale temporale). Ecco perché la scalabilità del sistema è salva.
29 Video Characteristics: MPEG2-TS Constant Bit Rate: 3.9Mbps Experimental testbedControlled-LossRouterVideoServerDroppedPacketsVideo Client+Quality MeterVideo Characteristics: MPEG2-TS Constant Bit Rate: Mbps
33 From SLA to PLA: Provisioning Level Agreements Scuola di Dottorato in Ingegneria InformaticaPalermo, settembre 2007From SLA to PLA: Provisioning Level AgreementsGiorgio VentreThe COMICS Research Group@The University of Napoli Federico II,&ITEM Laboratory, Italian University Consortium on Informatics
34 A service model for resilient networks We are moving from Quality of Service to a more complex concept of quality+resiliency
35 Quality of future distributed services The most important QoS characteristic for future distributed services is arguably going to be resilienceResilience is the property of a system to restore services to normal after a failure (as fast as the service users need)However, an investigation into resilience reveals the importance of considering risk when developing our future research agenda
36 The need for resilience We are increasingly reliant on the Internet and on networked systems in general (including of course the Web)This is happening in businesses and indeed in every walk of life including the homeThe EU is promoting and developing the Information Society, which is based on communication technologies and systems
37 Interdependence of networks (1) Not only are we dependent on networksBut all sorts of other networks are, tooElectricity, water, gasCorporate networksBanking networksHealth networks …Information networks are crucial to the successful operation of other networks
38 Interdependence of networks (2) Interdependencies of critical infrastructuresPower nets and information nets:The virtual utility“The introduction of proper supporting ICT of power nets forming a virtual utility is an important instance of networked enabled capabilities (NEC) systems. Furthermore, by pursuing this task we can gain experiences and develop models and technologies that besides addressing societal critical systems also can be useful in other efforts on development and maintenance of complex systems.”In Italy, Report del Comitato sulla Protezione delle Infrastrutture Critiche, Presidenza del Consiglio dei Ministri, 2004
39 Internet meltdown? Article in The Independent (UK) 8 September 2004: “The internet is becoming a utility” [Karl Auerbach]As a utility, the net will have to live up to different, more stringent standards than its previous uses as an academic and research playground, and then a mainstream experiment. People are building billion- dollar businesses, governments are turning themselves digital, and in the meantime there isn't so much as a service-level agreement to guarantee that the most basic level of connectivity will be there tomorrow.If the technologists no longer believe they can fix it by themselves, the Internet really has hit a meltdown.
40 VulnerabilitiesThe Internet was originally designed to withstand basic link and switch failuresBut it was never envisaged as a utility (i.e. offering near- perfect availability), supporting commercial initiatives and acting as a vital infrastructureWhatever vulnerabilities are present in the infrastructure may be inherited by the applications it aims to support
41 AttacksComplex, well engineered systems should be built by keeping in mind faultsToday, we need to keep into account other disruption sourcesNetwork attacks of all sorts are increasing in variety and number:Spam / junkViruses, Worms etc.DDoS attacksPhysical …These cause huge costs in time and energy, but no coherent approach to a solution
42 Multiple levels This is of course a multi-level problem Physical layerNetworking / IPMiddleware layer / O.S.Web / applicationsA solution to achieving resilience needs to apply at all levels: this is a grand challenge for future networked systems infrastructure
43 Complexity This is a distributed computing problem According to Leonard Kleinrock, we have no suitable theory to handle this, because of its inherent complexityThis is compounded by nomadicity[complicated = difficult to study but fit for purpose, static; whereas complex = growing, evolving]
44 Complexity not simplicity In spite of all hype on global network architectures, today we face a complex, heterogeneous reality:Fixed access networks: POTS, xDSL, CATV, MetroLANMobile, wireless access networks: GPRS, UMTS, WiFi, WimaxInteroperability with terrestrial digital broadcastingAdditional complexity issues:New, diverse terminals: (Symbian Cell. Phones, PDAs, smart set-top-boxes)Dynamic creation of novel services and applications
45 Complexity as an opportunity The availability of a multiplicity of networks, devices and services should be seen as an opportunity:No single infrastructure of critical importanceEase of access to all players: government, companies, common peopleAvailability of a multitude of sources of informationAvailability of a multitude of computing resourcesAvailability of a multitude of communication media/networks… provided that such a rich scenario can be managed as a system
46 We learned some lessons recently: Some recent eventsWe learned some lessons recently:9/ AttacksUS East Coast BlackoutItaly BlackoutSeries of attacks:Worms (NIMDA, Witty, Slammer …)DDOSRouting attacksWe probably need to re-discover traditional values typical of traditional engineering practice
47 Findings of the Committee Lessons from 9/“The Internet under Crisis Conditions” A Committee of the National Research Council of the National Academies (www.nap.edu)Findings of the CommitteeAttacks had very limited effects on the Internet as a global, best effort communication systemInternet technology appears to be robust per se but considerable efforts are needed to protect Internet- based systemsMany critical interdependencies discovered only after the attacks
48 Known and less known effects Dependency of Internet on other telecommunication systems (fixed, wireless, cellular)Obvious: co-location of sites, tubes, cables; running out of diesel…Not so obvious : e.g. communications between NYC ISPs and TelCos hampered by problems to toll-free numbersFacility disaster planning as a rare expertise/culture in the Internet worldVery limited capacity of backup power generation even in major ISP sites/POPsOther issues, e.g.DNS for .za domain was hosted on a server in NYCWiFi LANs of two major Manhattan hospitals operating in outsourcing via Internet
49 Lessons from 9/Anticipated by the US East Coast blackout: much larger scale than WTC but apparently more limited damageDifferent effects and impactsPOTS infrastructure capable of enduring very long power outages: practically no effectsCellular Networks locally in deep crisisNational TV and Radio broadcasters OK, local players generally in crisis“Global” and VoIP operators knocked-outWhat about the Internet?All IT based services affected : AAA, CDN, Servers
50 Lessons in ATC systemsPress Releases (http://www.natca.org/mediacenter/press-release- detail.aspx?id=394)MASSIVE POWER, COMMUNICATIONS FAILURE AT MAJOR AIR TRAFFIC CONTROL CENTER PUTS CONTROLLERS IN DARK, FLIGHTS IN JEOPARDY07/19/2006 Bob Marks PALMDALE, Calif. – A massive power and communications failure late Tuesday at the Los Angeles Air Route Traffic Control Center left scrambling air traffic controllers to deal with a nightmare scenario – how to keep dozens of flights away from each other above a large swath of the Southwestern United States despite the inability to see them, talk to them or relay crucial instructions for 15 excruciatingly long minutes. Every ounce of skill, heart and determination that controllers bring into the control room every day was put to the test during one of the worst outages to ever hit the facility. It was so bad, controllers say, that the only thing they had of use to aid the situation that actually worked was their cell phones – devices which the Federal Aviation Administration, inexplicably, has barred from control rooms, further impeding the safety of the system.More details in
51 Issues for research (1) Forget OSI-type layering/abstractions Services depend not only on peer and adjacent layersResiliency is a system-wide issue, with vertical and horizontal dependenciesStart speaking about networked systems and not only of networksIT based services must be considered as part of the whole pictureContributions from several disciplinesMulti-level approachCross-layer approach
52 Issues for research (2) Monitoring of services and infrastructures We can’t trust what we can’t controlRobustness of servicesTo unexpected situations: faults, misconfigurations, excessive demand, soft attacks (DDOS)To expected but complex situations: tools/methodologies for proper dimensioning of services (Service Engineering)Resiliency of infrastructuresFocus on survivability of communication systems to hard attacks (terrorist hits, natural disasters)Reconfigurability of communication systemsMake different networks/systems a single infrastructure
53 Issues for research (3) Towards a GRID of communication infostructures Connect them all physicallyMake them resilient separatelyAllow for services to migratePrepare for interconnecting them if neededFrom the computational GRID to the communication GRIDBut try to make it with an autonomic communication flavour
54 Issues for research Resiliency of infrastructures Focus on survivability of communication systems to hard attacks (terrorist hits, natural disasters)Reconfigurability of communication systemsMake different networks a single infrastructureResiliency of servicesTo unexpected situations: faults, excessive demand, soft attacks (D-DOS)To expected but complex situations: tools/methodologies for proper dimensioning of services (Service Engineering)
55 From QoS to Resiliency to … We should not forget the pastQoS is as important as resiliency, and is back “Value of supporting Class-of-Services in IP Backbones”, M. Yuksel et al., IWQOS 2007Also because QoS is a good mechanism to improve resiliency of a distributed systemSo, we should probably talk aboutQoSiliency
57 SLS SLA SLAs are the triggers Service Access Directories Service Info about content(metadata)ServiceDirectoriesAccessControllersServiceControllerSLASLSResourceControllersUser---Policy rulesQoS-capableNetworks
58 A change of perspective One of the major problems with SLA based architectures was their limited capability to scale with the number of users and servicesWe therefore introduce the concept of Provisioning Level Agreement (PLA): A PLA is a contract between a service provider and the owner of the Infrastructure defining the level of service to be guaranteed to final users during the provisioning of a service on top of that Infrastructure.
59 A change of perspective (cont.) In a PLA it is the service provider who definesthe type of servicethe treatment the service needs to get from the network (QoS, resiliency needs, security and privacy reqs.)the classes of possible SLAs that can be subscribed by the usersA PLA is signed at service deployment time, and can be dynamically modified and updated any time the service characteristics and requirements changeOnce a PLA is signed, Provisioning Level Specifications are produced to allow the infrastructure to be properly configured to accommodate the new service and future service subscriptions by final users
60 Service Centered Architectures Info about content(metadata)Service Centered ArchitecturesServiceDirectoriesServiceProviderPLAPLSResourceControllers---Policy rulesQoS-figurableNetworksPLAs are the triggers