Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.

Similar presentations


Presentation on theme: "Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh."— Presentation transcript:

1 Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh

2 Outline Virtualization High Availability (HA) in Virtualized Platforms – XEN and REMUS (HA solution for XEN) Remus applied to IP Telephony (IPT) applications – Scalability and Reliability of IPT applications using Virtualization Experimental Results Conclusion

3 Virtualization and its Benefit Abstraction layer (Hypervisor) between the physical hardware and the OS. Single physical machine can host multiple virtual machines each running a different OS + application stack VMMs – Xen, VMWare, Microsoft HyperV Benefits – Server consolidation – Green computing – Cost savings – space and power – High Availability Reliability solutions, ease of upgrades with near zero down-times

4 Virtualized hosting for IP Telephony Virtualized hosting for IP Telephony already available – Avaya, Cisco, Asterix etc. IP Telephony in Cloud – Scalability: ability to elastically add/remove additional servers while supporting High-Availability for all servers – Reliability: protection against hardware and software failures HA features in virtualization platforms Memory state check pointing

5 Virtualization and High Availability Seamless fail-over, Efficient and transparent migration of VM to another physical machine – Live Migration with very small down-times – Minimal or no impact to client nodes Asynchronous check-pointing – Continuously syncs the state between the primary and secondary host We use – Remus: A High Availability Solution for XEN

6 Remus on XEN Remus is a High Availability solution available on the Xen VMM Remus uses continuous check-pointing and keeps a consistent client view of network state The secondary machine hosts a paused replica of the primary VM Uses a heart-beat mechanism – Failure to receive periodic heart-beat on secondary will un-pause the backup VM – Heart beat time-out can be configured 6 Image: http://osnet.cs.nchu.edu.tw/powpoint/seminar/2008/Remus.pdfhttp://osnet.cs.nchu.edu.tw/powpoint/seminar/2008/Remus.pdf Fig 1

7 Remus on XEN (contd.) Remus modes of operation – Net Mode – Highly reliable – No-Net Mode – better performance with negligible packet loss in case of failure – Tunable for Reliability vs. Performance Image: http://osnet.cs.nchu.edu.tw/powpoint/seminar/2008/Remus.pdfhttp://osnet.cs.nchu.edu.tw/powpoint/seminar/2008/Remus.pdf Net Mode: Buffers outgoing network packets until execution state is synced with the back up VM (on secondary host). reliability at cost of performance Net Mode: Buffers outgoing network packets until execution state is synced with the back up VM (on secondary host). reliability at cost of performance Disk writes and Network WritesFig. 2

8 Remus applied to IP Telephony - Scale with Reliability Our work using HA in XEN extends: “architecture for fail-over and load sharing for IP Telephony” proposed by Kundan Singh et. al. Challenges: – Overheads of virtualization on IP Telephony performance – Co-Hosted/Co-located media server causes interference because of heavy I/O workload

9 Reliability and Scalability using Virtual Machines Scalability using load balancer (LB) – LB can elastically add more VMs as demand grows Reliability using Remus in XEN Stateless Load balancer Stateless Load balancer Reliability Architecture using Virtual machines For every primary Virtual Machine there is a back up VM in paused state. Since, backup VM is paused, it allows to place other running VMs on the same physical machine Provides N to M elastic/backup model (m back up for n primary) For every primary Virtual Machine there is a back up VM in paused state. Since, backup VM is paused, it allows to place other running VMs on the same physical machine Provides N to M elastic/backup model (m back up for n primary)

10 Reliability and Scalability using Virtual Machines (contd.) Reliability – Provided by Xen + Remus – Failure of primary starts the execution of the secondary with IP address takeover – Clients continue to execute un-affected Signaling and Media Server: – Co-located on same VM – allows better utilization, – no overhead of inter-vm communication – Placed on different VM – elastic scaling of media and signaling VM’s

11 Studying Performance Implications Experimental setup – Primary /Backup Servers – Intel Core 2 Quad Processors, 2.5 Ghz, 8 GB RAM, 4MB L2 Cache – Hypervisor – Xen 3.2.1 + Remus – Default Credit Scheduler configuration – Guest OS : Para Virtualized Linux 2.6.18 IP Telephony Workload – Modeled our workload using SIPStone Measured % success of registrations during failover Used UDP and TCP as transport for registrations – Used OpenSIPs as SIP server – RTPProxy as Media Server – SIPp for generating signaling and media traffic

12 Analysis and Results: Signaling Guest VM and Domain 0 both have high CPU utilization with tcp_n (new tcp connection for each REGISTER) UDP and tcp_1 (1 tcp connection for all REGISTER) have similar overhead. CPU utilization (in guest VM, dom0) Udp means with udp transport, tcp_1 means same connection for all call, tcp_n means new connection for each call With Remus NET mode, Registration overhead.

13 Analysis and Results: Signaling CPU overhead increases with proportionately with signaling loads Dom0 has significant overheads due to check- pointing overheads. Net Mode gives good results for Signaling With 1400 regs/sec failure was induced – with 100% completion of all by failover to the back up

14 Analysis and Results: Media Media loads with Net Mode gives poor results Media with No-Net gives good performance even with 400 streams with 2% losses – This can be further reduced by tweaking scheduler parameters 100% fail-over of all calls in progress during media experiments No Net Mode 100, 200, 400, 600 and 800 streams Net Mode 100, 200, 400, 600 and 800 streams

15 Conclusion Using No-Net mode for media streams gives us a balance between performance(loss and delay) and reliability(failover) while still being able to migrate 100% of all calls in progress (using TCP) which is a significant result Net Mode for Signaling is a good configuration with 100% registration completion with failover No-Net mode for the Media server deployment provides significant improvement in performance: loss and delay reduces significantly – While the No-Net configuration performs better for media, it may not provide call completion guarantees during the fail- over operation for signaling Migration of user registration and call setup operations was 100% successful

16 Contributions Extended load sharing and failover architecture using Virtualization Proposed use of high availability feature in virtualized platforms to achieve reliability in IP Telephony Proposed placement scheme of signaling and media applications for scale(elasticity) and efficiency (utilization) Systematic evaluation of overheads involved in use of virtualization for IP Telephony Applications Demonstrated that High Availability using Virtual Machines can be deployed for medium scale IP Telephony infrastructure

17 Future Work More detailed analysis of overheads – Overhead because of check pointing in virtualization platform – Overhead because of I/O in Domain 0 Propose solutions to improve performance – Improve I/O handing in XEN VMM Propose better VM placement algorithm for IP Telephony applications – Utilizing fine grained overhead measurements for resource allocation – Considering I/O (media) vs. memory (signaling state replication) optimizations – Elasticity with co-location of media and signaling server on same VM

18 Questions vs2140@columbia.edu


Download ppt "Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh."

Similar presentations


Ads by Google