Multimedia conferencing Raphael Coeffic Based partly on slides of Ofer Hadar, Jon Crowcroft.

Multimedia conferencing Raphael Coeffic (rco@iptel.org) Based partly on slides of Ofer Hadar, Jon Crowcroft

Which Applications? Conferencing:  Audio/video communication and application sharing  First multicast session IETF 1992  Many-to-many scenarios Media Broadcast  Internet TV and radio  One to many scenario Gaming  Many to many

What is needed? Efficient transport:  enable real time transmission.  avoid sending the same content more than once.  Best transport depends on available bandwidth and technology. Audio processing:  How to ensure Audio/Video Quality?  How to Mix the streams? Conference setup:  who is allowed to start a conference?  how fast can a conference be initiated? Security and privacy:  How to prevent not-wanted people from joining?  How to secure the exchanged content? Floor control:  How to maintain some talking order?

How to Realize? Centralized All register at a central point All send to central point Central point forwards to others Simple to implement Single point of failure High bandwidth consumption at center point  Must receive N flows High processing overhead at center point  Must decode N flows mix the flows and encode N flows  With no mixing the central point would send Nx(N-1) flows Appropriate for small to medium sized conferences Simple to manage and administer:  Allows access control and secure communication  Allows usage monitoring  Support floor control Most widely used scenario No need to change end systems Tightly coupled: Some instances know all information about all participants at all times

All establish a connection to each other All can send directly to the others Each host will need to maintain N connections Outgoing bandwidth:  Send N copies of each packet  simple voice session with 64kb/s would translate to 64xN kb/s Incoming bandwidth:  If silence suppression is used then only active speakers send data In case of video lots of bandwidth might be consumed  Unless only active speakers send video Floor control only possible with cooperating users Security: simple! do not send data to members you do not trust End systems need to mix the traffic –more complex end systems How to Realize? Full Mesh

All establish a connection to the chosen mixer. Outgoing bandwidth at the mixer end point:  Send N copies of each packet  simple voice session with 64kb/s would translate to 64xN kb/s Incoming bandwidth:  If silence suppression is used then only active speakers send data In case of video lots of bandwidth might be consumed  Unless only active speakers send video One of the end systems need to mix the traffic –more complex end system. Mostly used solution for three-way conferencing. How to Realize? End point based

How to Realize? Peer-to-Peer Mixing is done at the end systems Increases processing over-head at the end systems Increases overall delay  Possibly mixed a multiple times If central points leave a conference the conference is dissolved Security: Must trust all members  Any member could send all data to non-trusted users Access control: Must trust all members  Any member can invite new members Floor control: requires cooperating users

Transport considerations Transport layer:  Most of the group communication systems on top of unicast sessions.  Very popular in the past: multicast. Application layer:  RTP over UDP.  Why not TCP?  Better NAT traversal capabilites (used by Skype as the last solution).  But, not really suitable for real time feed back (Why?). Control protocol:  Interactive conferencing: SIP, H.323, Skype, etc...  Webcast: RTSP, Real audio and other flavours. Session description:  SDP (Session description protocol).

IP Multicast Why?  Most group communication applications are based on top of unicast sessions.  By unicast, each single packet has a unique receipient. How?  Enhance the network with support for group communication  Optimal distribution is delegated to the network routers instead of end systems  Receivers inform the network of their wish to receive the data of a communication session  Senders send a single copy which is distributed to all receivers

Multicast vs. Unicast A E B D C File transfer from C to A,B,D and E Unicast: multiple copies Multicast: single copy

IP Multicast True N-way communication  Any participant can send at any time and everyone receives the message Unreliable delivery  Based on UDP: Why?  Avoids hard problem (e.g., ACK explosion) Efficient delivery  Packets only traverse network links once (i.e., tree delivery) Location independent addressing  One IP address per multicast group Receiver-oriented service model  Receivers can join/leave at any time  Senders do not know who is listening

IP Multicast addresses Reserved IP addresses  special IP addresses (class D): 224.0.0.0 through 239.255.255.255  class D: 1110+28 bitsà 268 million groups (plus scope for add. reuse)  224.0.0.x: local network only  224.0.0.1: all hosts  Static addresses for popular services (e.g., SAP –Session Announcement protocol)

Alternatives to Multicast Use application level multicast  Multicast routing done using end hosts  Hosts build a multicast routing tables and act as multicast router (but on application level)  User request content using unicast  Content distributed over unicast to the final users

Application level Multicast vs. unicast Content source Traditional Content source Application level multicast

Conference mixer architecture Main components for centralized conference mixer:  Coder / decoder (+ quality ensuring components).  Synchronization  Mixer Processing pipeline:

Audio Mixing G.711 E G.729 E GSM E Periodic timer B A C X=A+B+C E G.729 E GSM E B A C X-A=B+C X-B=A+C X-C=B+A E: Encoder D: Decoder G.711 D G.729 D GSM D G.711

Audio Quality Mostly based on „Best effort“ networks:  No garanty for nothing.  Packet get lost and/or delayed depending on the congestion status of the network. Depending on the codec, different quality can be reached:  Mostly reducible to a „needed bandwidth vs. quality“ tradeoff.  Wanted properties: loss resistancy, low complexity (easy to implement in embedded hardware). Audio datas have to be played at the same rate they have been sampled:  Different buffering techniques have to be considered, depending on the application.  Pure streaming (Radio/TV) are not interactive and thus not influenced by the delay. Quality is everything.  Interactive conferencing need short delays to garanty the real time property. Delay is experienced as „very annoying“ by users in such applications.

Codecs quality measurements Codecs: Mean Opinion Score (MOS) measurements:

Codecs: loss resistancy

Codecs: complexity

Audio quality: packet loss Packet loss:  The impact on voice quality depends on many factors:  Average rate: rate under 2~5% (depending on the codec) are almost unhearable. Over 15% (highly depending on the burstiness), most calls are experienced as ununderstandable.  Burstiness: depending on the loss distribution, the impairement can vary from small artifacts due to packet loss concealment to really anoying quality loss.  Modern codecs like iLBC, which are exclusively focused on VoIP, are much more resistant and should thus be prefered to PSTN based low-bitrate codecs.  Considering media servers and specially conferencing bridge, we should concentrate on receiver based methods, as every other method would not be compatible with the customers‘ phones.  Solutions: support appropriate codecs, assert a minimal link quality and implement a reasonable PLC algorithm.

Audio quality: jitter Delay variation (Jitter)  Why?  varying buffering time at the routers on the packets‘ way.  Inherent to the transmission medium (WiFi).  Depending on the buffering algorithm, quality impairements are mostly caused by a too high ear-to-mouth delay or late loss.  Ear-to-mouth delay:  Whereby delays under 100 ms are not noticeable, value over 400 ms make a natural conversation very difficult.  Late loss:  If the buffering delay is smaller than the actual delay, some packets arrive after their playout schedule. This effect in called ‚Late loss‘.  Delivering a good voice quality means, apart from packet loss concealment, minimizing delay and late loss.

Jitter: example

Adaptive playout Static buffer  Playout is delayed by a fix value.  Buffer size has to be computed once for the rest of call.  Some clients implement a panic mode, increasing the buffer size dramaticaly (x 2) if the late loss rate is too high.  Advantages:  Very low complexity.  Drawbacks:  High delay.  Performs poorly if the jitter is too high.  Does not solve the clock skew problem.

Adaptive playout (2) Dynamic buffer: talk spurt based.  Within a phone, a speaker is rarely active all the time. So it is possible to distinguish between voiced and unvoiced segments.  Ajusting the buffering delay within unvoiced segments has no negative impact on the voice quality.  Using a delay prediction algorithm on the previous packets, we then try to calculate the appropriate buffering delay for the next voiced segment.  Advantages:  Low complexity.  Solves the clock skew problem.  Drawbacks:  Needs Voice Activity Detection (VAD), either at the sender or at the receiver.  High delay.  Performs poorly if the jitter is varying fast (within a voice segment).

Adaptive playout (3) Dynamic buffer: packet based.  Based on Waveform Similarity Overlap Add Time-scale modification (WSOLA)  Enables packet scaling without pitch distortion.  Very good voice quality: scaling factors from 0.5 to 2.0 are mostly unhearable if done locally.  But: High processing complexity.

WSOLA: how does it work?

Multimedia conferencing Raphael Coeffic Based partly on slides of Ofer Hadar, Jon Crowcroft.

Similar presentations

Presentation on theme: "Multimedia conferencing Raphael Coeffic Based partly on slides of Ofer Hadar, Jon Crowcroft."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multimedia conferencing Raphael Coeffic Based partly on slides of Ofer Hadar, Jon Crowcroft.

Similar presentations

Presentation on theme: "Multimedia conferencing Raphael Coeffic Based partly on slides of Ofer Hadar, Jon Crowcroft."— Presentation transcript:

Similar presentations

About project

Feedback