Presentation on theme: "Optimizing XenServer Deployments to Best Support XenDesktop"— Presentation transcript:
1 Optimizing XenServer Deployments to Best Support XenDesktop Daniel LazarHello and welcome to the “Optimizing XenServer Deployments to Best Support XenDesktop” session, my name is Dan Lazar – I am a Senior Escalation Engineer with the North America XenServer Escalation team. I have been working with Citrix Technical Support for over 4 years. I started as a PVS escalation engineer and for about the last 2.5 years I have been supporting XenServer and its related products.This session is intended to provide you with a deeper understanding of how XenServer and XenDesktop work, and sometimes don’t work, together. In particular, we will talk about some of the common issues that can crop up in large virtual desktop environments running on XenServer and how to monitor for these issues. We will also discuss how to troubleshoot and resolve many of these issues. My goal today is for you to leave here with a high degree of confidence in administering large virtual desktop solutions leveraging XenDesktop on XenServer.Please hold your questions for the end, or feel free to meet with me after the presentation.Senior Escalation EngineerMay 8, 2012
2 Tweet about this session with hashtag #SUM302 and #CitrixSummit You can tweet about this session with hashtag SUM302 and CitrixSummit.
3 Agenda XenServer and XenDesktop Interoperability Overview Monitoring XenServer to Identify ProblemsBest Practices and TroubleshootingResourcesQuestionsOur agenda today will first include an overview of how XenServer and XenDesktop run and affect each other in large deployments.We will then move on to discuss how to monitor XenServer to identify issues in large-scale XenDesktop environments.Then we will discuss best practices for configuring XenServer and XenDesktop to work together, and also troubleshooting some common issues.At the end of the presentation I have included slides with links to articles and resources mentioned throughout the presentation.And finally, after the presentation, we will have a few minutes to answer your questions.
4 XenServer and XenDesktop Interoperability Overview Let’s first review how these two powerful products function together in the same environment.
5 Interoperability Overview PoolMasterSlavesResource PoolHostingManagementHypervisorCommunicationLibrary(HCL)Database(SQLServer)VDAActive DirectoryDesktop ControllerConnection to XAPI on pool master via HTTP port 80SharedStorageVirtual Desktops running ReceiverWindows Communication Foundation (WCF)[shrink speaker notes]First let’s take a couple minutes to review how XenServer and XenDesktop communicate with each other, and how these interactions might present certain bottlenecks that could compromise performance and stability in larger deployments.[reveal]A XenServer resource pool is a group of machines running XenServer on similar hardware that are bound together into a single managed entity which can host virtual machines. One machine within the resource pool is always acting as the pool master, while the rest of the machines are slaves. Though any machine can take the master role, only one host at a time is allowed to assume it. The XenServer host designated as the pool master is responsible for receiving, queuing and processing tasks issued to the XenAPI from the outside world. The XenAPI, or XAPI, is the service stack that runs on the XenServer hosts which is responsible for handling management functions for the hypervisor.A resource pool can be combined with shared storage to facilitate live migration, or XenMotioning, of virtual machines between hosts with minimal downtime.On the XenDesktop side we have a suite of services that handle communications between the hypervisor, in this case XenServer, and also the virtual desktops themselves.Hosting management is responsible for queuing operations to be sent to the hypervisor, such as starting virtual desktops. This communication is facilitated by the hypervisor communication library, or HCL, [reveal] which understands which protocols and communication methods to initiate when logging into and requesting services from the XenServer hosts.XenDesktop communicates directly with the XenAPI on the pool master and thus can queue tasks in the pool automatically.When virtual desktops are booted and running on the XenServer pool, the virtual machines launch receiver, which is an agent that communicates directly back to the VDA management services on the desktop controller and facilitate registration of the virtual desktops to notify the controller that they are ready and waiting for user connections.In large-scale XenDesktop deployments some of these components can present bottlenecks which can degrade performance and cause instability.If enough virtual desktops are started up on a single host we might start to see performance issues on that host and its virtual machines. This can happen gradually over time as additional virtual desktops are booted to accommodate user demand for desktops, or over a short period of time if many virtual desktops are booted or rebooted all at once. We’ll talk more about this in a minute.With technologies like PVS and IntelliCache, the shared storage could pose an issue if many storage operations are occurring at once. We will discuss this in more detail later in the presentation.And lastly, the number and frequency of tasks being sent to the pool master from the desktop controller can also create performance issues in certain circumstances.
6 Interoperability Overview How do large-scale XenDesktop implementations impact XenServer?Large number of concurrently running VMs-per-host.Boot/Reboot StormsPVS/IntelliCache can add storage management overheadThe XAPI task queue[zoom speaker notes]So, let’s quickly review what was just discussed in the last slide.How does a large-scale XenDesktop implementation impact XenServer?[reveal]Read 1. This can cause performance degradation, stability issues and lead to problems with the Desktop Delivery Controller managing the desktop groups.Read 2. A common business requirement in environments leveraging XenDesktop is to “reset” or “recycle” desktop sessions, for example in between shift changes at a hospital or between classes at a university. This would normally involve shutting down or rebooting the virtual desktops to revert them to their original state using provisioning services or MCS. If too many desktops are instructed by the XenDesktop Controller to reboot at once it can cause a surge in load on the XenServer pool as the hosts are busy processing all of the shutdown or restart requests. This can result in a cascading failure condition where the desktops take too long to restart and register with the Desktop Controller, so it attempts to start additional VMs to meet the idle pool count requirements for the desktop group.Read 3. In XenDesktop environments leveraging either PVS or IntelliCache it is important to understand how these features make use of XenServer shared storage to avoid possible problems. Typically virtual desktops will create and attach to temporary write cache disks, on-the-fly, and depending how the storage is configured in the XS pool this could produce enough load to compromise the stability of the hosts. I will come back to this a bit more when we discuss how to monitor the XenServer pool for specific issues.Read 4. If not optimized correctly it is possible for the XenDesktop Controller to push too many tasks into the queue, or push them too frequently, and this could cause the pool master to be unable to process the queue efficiently which could lead to performance issues for both the XenServers and the virtual desktops.
7 Monitoring XenServer to Identify Problems When hosting an enterprise XenDesktop solution on XenServer, it is important to understand how to monitor the XenServer environment to identify issues before or as they are happening. Let’s talk about some things we might want to keep an eye on in such environments.
8 Monitoring XenServer to Identify Problems Online MonitoringLoad Average# topXAPI Task Queue# xe task-listGeneral storage and network monitoring# iostat, hdparm, dd# tcpdump, netstat, ifconfigTypically we monitor XenServer hosts in two ways, either online while logged into the hosts in an active session, or offline reviewing log and configuration data that was extracted from the machines sometime in the past. First we will focus on some commands and utilities that we can use while actively logged into the machines to give us a picture of their health. Typically, when monitoring XenServer hosts in an online fashion, the command-line environment is the most effective way to interact with the hosts.[reveal]XenCenter can be used to log into the XenServer hosts via the command-line, though terminal emulators, such as putty, can also be used to connect in this manner.In the case of XenDesktop, the thing we will normally concentrate on monitoring is the load that the pool master is experiencing. There are a few load indicators we will want to check to give us a picture of overall load:Load averages using ‘top’The task queue using the XenAPIOverall storage and network health using a number of commands that we will review in a minute.For a complete guide to the XenServer command line environment, refer to Appendix A of the XenServer Administrator’s Guide, which I have included a link to in the resources section at the end of the presentation.
9 Monitoring XenServer to Identify Problems Online Monitoring – Load AverageUse ‘top’ to get real-time information relating to loadtop - 13:35:11 up 2 days, 19:02, 4 users, load average: 36.27, 23.64, 14.73Tasks: 435 total, 27 running, 408 sleeping, 0 stopped, 0 zombieCpu0 : 13.6%us, 60.5%sy, 0.0%ni, 4.7%id, 18.4%wa, 0.0%hi, 0.0%si, 2.5%stCpu1 : 14.2%us, 60.4%sy, 0.0%ni, 2.8%id, 19.9%wa, 0.0%hi, 0.0%si, 2.4%stCpu2 : 13.0%us, 60.7%sy, 0.0%ni, 4.6%id, 18.9%wa, 0.0%hi, 0.0%si, 2.5%stCpu3 : 13.3%us, 60.4%sy, 0.0%ni, 6.1%id, 17.5%wa, 0.0%hi, 0.0%si, 2.5%stMem: k total, k used, k free, k buffersSwap: k total, k used, k free, k cachedPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND8722 root m 46m 4320 S :09.34 xapiR :12.56 qemu-dmR :48.05 qemu-dmR :38.39 qemu-dmR :12.09 qemu-dmR :21.79 qemu-dm1-Minute Avg5-Minute Avg15-Minute AvgIf you have worked with Linux before than the ‘top’ utility is probably familiar to you. It can give a comprehensive snapshot of various utilization statistics, such as memory and CPU consumption, but it also provides other useful information such as the load average for the machine.[reveal]When monitoring a XenServer host that is running XenDesktop workloads the load average is an important statistic to keep an eye on since it will indicate a trend for demand for CPU resources for that machine, as opposed to the CPU percentage utilization which only gives an instantaneous reading of what CPU resources are being consumed at a specific moment. The load average also factors in processes that are queued and waiting for CPU cycles, so that together with actual utilization of CPU cycles, and trended over time gives a fairly complete picture of the overall demand for CPU resources by the host. In the case of XenServers running many workloads and dealing with a high rate of management functions the load average can show when the host is maxed out and is causing processes to “queue up” waiting for processing cycles.With XenServer it is important to understand that when ‘top’ is issued on the CLI it is being run inside the control domain, or Domain 0, on the host. The control domain is itself a virtual machine with privileged access to the host hardware, and it is tasked with brokering resources to the rest of the VMs running on the machine, handling all of the host management functions controlled by the XenAPI, or xapi, and manages the backend storage and network connections.In the current release of XenServer, the control domain is aware and makes use of 4 CPUs. Even if the host itself has more than 4 CPUs installed, the control domain will only utilize CPUs 0 to 3, the rest will be left solely for the virtual machines running on that host to utilize. This means that when ‘top’ is issued inside the control domain, the utilization statistics will only reference the existence of 4 CPUs (or less if the host has fewer than 4 CPUs).<highlight load average in output>Note the load average indicated in the example output. There are three numbers separated by commas. So how do we interpret these? They represent the 1, 5 and 15-minute load average and so indicate the load average trend for the previous 15 minutes of machine uptime. The point of perfect utilization, meaning that the CPUs are always busy and, yet, no process ever waits for one, is the average matching the number of CPUs available to the system. So in Dom0 that has 4 CPUs available to it, if the reported one-minute load average is 4.00, the machine has been utilizing its processors perfectly for the last 60 seconds. This understanding can be extrapolated to the 5- and 15-minute averages.Note in this example that all three load averages are significantly higher than 4, with the 15-minute load the lowest and trending up to the present. This indicates that this host was experiencing heavy load when the top command was issued, and that this load was gradually increasing for at least the last 15 minutes. With large XenDesktop pools this trend is common when large groups of virtual desktops are all booting or rebooting at the same time, perhaps during a shift change. A machine with a sustained high load average like this will become sluggish and unresponsive and, if the load gets high enough, could even become unstable and crash. If this happens to be the pool master, then the entire pool could be affected with the XenDesktop Controller being unable to communicate with the hosts and therefore also unable to manage the virtual desktops. Tuning the XenDesktop Controller to limit the number of tasks issued to the pool master, and adjusting the idle pool counts, can mitigate this issue. We will talk more about that in a bit.
10 Monitoring XenServer to Identify Problems Online Monitoring – Task QueueYou can count the number of running tasks:# xe task-list | grep ‘VM.<type>’ | wc –lExample task types could be start, shutdown or migrate.Try to tune the XenDesktop Controller to minimize the number of tasks concurrently being processed on the pool master.A second important metric to watch in large XenDesktop environments is the xapi task queue. When the desktop delivery controller needs to boot or shutdown VMs, it will send a request to the XenServer pool master which gets queued in the xapi task queue. Only a certain number of these tasks can be processed at once, and if too many are pushed onto the queue it can cause increased load on the pool master which can degrade pool responsiveness.[reveal]The command ‘xe task-list’ will list all of the tasks queued on the host. Restricting the count to specific tasks and counting the number of those tasks, as shown in the command example here, can give an indication of whether the host is keeping up with its queue of tasks, or whether it is falling behind and might get into trouble.[Reveal]As with controlling the load average, the XenDesktop Controller can be tuned to limit the number and frequency of tasks sent to the pool master. Finding the right balance for the pool master, the size of the desktop groups, and the idle pool requirements for the environment is important to maintaining a stable and optimal virtual desktop solution.
11 Monitoring XenServer to Identify Problems Online Monitoring – Storage# iostat # Reports basic I/O stats for devices and partitions# hdparm # Performs timed sequential reads# dd # Simple, common block device copy utilitySee CTX for more information on how to monitor storage in XenServer.Regarding storage, three useful CLI commands for monitoring storage performance in XenServer are iostat, hdparm and dd.These commands will report on I/O throughput giving an indication of overall storage performance. In the case of iostat it will report on average disk throughput over time. In the case of hdparm and dd they will perform active benchmarking of I/O throughput via timed disk reads and writes.[Reveal]Read 4
12 Monitoring XenServer to Identify Problems Online Monitoring – Network# tcpdump # Dumps traffic on a network# netstat # Display network interface statistics# ifconfig # Display and configure network interfacesSee CTX for more information on monitoring the network in XenServerXenServer also has some CLI tools that can be used to monitor and troubleshoot network performance.[reveal]Tcpdump will dump all packets on a specified network interface. The output can be filtered and/or redirected to an output file and opened in a trace analysis tool, like Wireshark, for detailed analysis.Netstat can be useful for verifying whether network interfaces are active and functional. It will tell you which ports are listening or have established connections, it can list the routing tables on the host and can also troubleshoot incoming and outgoing packet errors.Ifconfig can be used to validate and potentially reconfigure individual network interfaces. It will tell you if a NIC is connected or disconnected, what speed and duplex is established, basic transmit and receive statistics and a host of other information relating to the network interfaces.Refer to CTX for detailed information about how to troubleshoot network issues in XenServer.[Reveal][Read tip]TIP: You can always type ‘man’ followed by a Linux command name (i.e., ‘man netstat’) to get detailed help for the command.
13 Monitoring XenServer to Identify Problems Offline Monitoring – System LogsSystem Status ReportsXAPI Connection Limit ExceededTools as a Service (TaaS)System Activity Reporting (SAR) – Part of the CentOS baseIf you don’t have direct access to the systems experiencing issues, you can still assess the load they have experienced recently by reviewing log data extracted from Dom0.[reveal]XenServer can generate a diagnostic data package called a system status report. In addition to the system logs, it contains a complete record of the host and pool configuration including hardware information, storage and network configurations and even a complete copy of the pool database which describes all of the virtual machines and their states when the status report was generated. It can be used to parse for specific error conditions, such as xapi exceeding its connection limit, which we will discuss in a moment.The status report can also be uploaded to Citrix Tools as a Service, or TAAS, for automated analysis. TAAS is a great utility for visualizing the state of the pool when the report was generated and can be leveraged to determine load metrics for the hosts.Another set of useful data is System Activity Reporting, or SAR. It is a component of XenServer which has been adopted directly from the CentOS Linux distribution and keeps track of specific load-related data over a historical period.Let’s talk a bit more about both of these troubleshooting and monitoring resources….
14 Monitoring XenServer to Identify Problems Status reports available via XenCenter, or...the command line by running‘# xen-bugtool –yestoall’See CTX for detailed instructions.The status report can be generated using either XenCenter or via the command-line. In either case a compressed archive is created which can be copied, extracted and analyzed on a separate system.Refer to CTX for detailed instructions on how to collect log data from XenServer.
15 Monitoring XenServer to Identify Problems Offline Monitoring – XAPI Connection LimitXAPI and the control domain (Dom0) can only maintain 200 concurrent connections (per-host).Limit can be reached more easily in XenDesktop environments because the number of tasks being queued on the pool master will often be high.Can parse /var/log/xensource.log* for "db_gc] Session.destroy“ to give an indication of whether connection limits are being met. A simple bash script can do this quickly:#!/bin/bashLIST="$(find -name xensource.lo*)"for i in "$LIST"; dogrep -h "db_gc] Session.destroy" $idoneWhen monitoring for interoperability issues with XenServer and XenDesktop, one condition to look for is whether the XAPI connection limit has been exceeded. If this limit is met then connections to the pool master will timeout and fail and management operations will not be successful. If left unchecked this can eventually cause the pool master to become unresponsive requiring a reboot to recover it. The limit is 200 concurrent connections.Read 2[reveal]The xensource log located under /var/log in the status report keeps track of connections made to XAPI, and if the limit has been hit it will begin to log errors when additional connections are attempted.Parsing these logs from a status report can identify this condition and allow an administrator to take corrective action.A simple bash script can automate this monitoring, and can be implemented on a live system or used to parse offline logs too.
16 At this time I would like to take us through a brief demo of how we can use tools as a service to help identify performance issues relating to load on a XenServer host.[click]
17 Monitoring XenServer to Identify Problems Offline Monitoring – SAR LogsLocated under /var/log/saThey are NOT included in the host system status report and therefore need to be collected manually for analysis:# tar -cvzf /tmp/$HOSTNAME-$(date +%F-%H-%M-%S)-SARlogs.tar.gz /var/log/sa/*Can give a historical picture of average load on the host and indicate when and if there are periods of high load in the environment.The SAR logs can give a unique picture of load in the environment that is difficult to attain via other monitoring methods.[reveal]Read point 1Read point 2. I have included a command that can be issued on each host in the pool which will create a compressed archive of the SAR logs under the /tmp directory. This archive can then be manually retrieved using a utility such as WinSCP.Read point 3
18 Monitoring XenServer to Identify Problems Offline Monitoring – SAR Logs00:00: runq-sz plist-sz ldavg-1 ldavg-5 ldavg-1513:10:13:20:13:30:13:40:13:50:14:00:14:10:14:20:Looking at the example SAR data in this slide you might see some familiar data points. [highlight ldavg] SAR keeps a historical record of the very same load average that top reports, which we discussed earlier. As SAR logs have a longer retention than the 15 minutes that top gives us they can provide a critical picture of the historical load average trends going back days, or possibly weeks. This can help to isolate usage patterns in the pool where load might exceed acceptable thresholds, such as morning and evening shift changes at a hospital, or hourly student changeovers in a classroom at a university. Understanding and anticipating these periods of increased load can help to tune and optimize the XenDesktop and XenServer environment to accommodate the users.[reveal]One last point about SAR logging. There are available third-party, open source, tools that will accept SAR logs as input and produce graphical representations of historical load trends. XenServer does not include any of these tools out-of-the-box but a quick Internet search will return multiple options.TIP: Third-party tools are available to graph and analyze SAR data files!
19 Monitoring XenServer to Identify Problems VHD link dependencies with PVS amd IntelliCacheProvisioning Services (PVS) and IntelliCache leverage disk fast-cloning to quickly provision many virtual desktops.Fast-clones create new VDIs which are linked in parent-child relationships.Large XenDesktop environments can create many of these links, and this can cause issues….The last topic I will discuss in this section is a problem that can occur in larger XenDesktop environments relating to XenServer virtual disk image (or VDI) management. Two powerful features available to XenDesktop administrators are Provisioning Services and Machine Creation Services with IntelliCache. In the context of virtual desktop management the two solutions provide similar functionality, though they do it very differently. Details regarding how these features work are outside the scope of this presentation, but what is important and relevant is that they both utilize XenServer fast-cloning to quickly create local storage for use with the virtual desktops.[reveal]Read point 2Read point 3
20 Storage Repository (SR) Parent VDI (Base Copy)Storage Repository (SR)Child VDI (fast-clone/diff-disk)VDA001-diffVDIVDA002-diffPBDVBDXenServer HostVirtual MachineTo understand what can go wrong here, let’s review how these features leverage XenServer storage management technologies.With XenServer shared storage there are Virtual Disk Images, or VDIs, that are objects containing virtual hard disks, or VHDs, that the virtual machines see as their local hard disks. In the case of LVM-based storage, these VDIs are stored as logical volumes and are managed by the Linux Volume Manager which is a linux-based file system for abstracting physical storage devices into flexible storage containers, similar to RAID. With file-based ext or NFS shared storage the VHDs are files stored directly on the file system and referred to directly by XAPI when attaching them to the virtual machines.[reveal]When initiating a fast-clones new VDIs are created that are linked in a parent-child relationship. By default, IntelliCache and PVS (through the Machine Creation Wizard) create fast-clone copies of the parent disk, or template. The fast clones will only store the delta, or differences, between the parent VDI and the child. As the VM writes data to the child node, the parent node will remain unchanged.The more of these links that exist the more overhead is put on the storage management services of the host to manage the “delta” between the parent and child nodes.Therefore, if many VMs would be fast-cloned during initialization of a desktop group, it would result in an equal number of child nodes to the parent disk.VDA003-diff.VDAxxx-diff
21 Monitoring XenServer to Identify Problems MCS/PVS – Why might this be a problem?MCS profile creation and management can take a very long time, or fail completely.Host storage operations can be affected.Can cause poor performance, or even instability and XenServer host crashes!So why might this be a problem?Read point 1Read point 2, which can impact the startup and shutdown processes for VMs, or make it difficult to manage VM storage.Read point 3So how can we identify if the environment is susceptible to this issue?
22 Monitoring XenServer to Identify Problems Uploading a status report from the pool to TaaS can give a good graphical representation of the VDI link dependenciestaas.citrix.comTaaS can be a good tool for helping to identify this condition.Read point 1Simply point your browser to taas.citrix.com and begin exploring its many powerful troubleshooting capabilities.
23 Monitoring XenServer to Identify Problems MCS/PVS – How to MonitorFor LVM-based storage:# vhd-util scan -f -c -p -m 'VHD-*' -l VG_XenStorage-<SR UUID>For NFS-based storage:# vhd-util scan -f -c -p -m /var/run/sr-mount/<SR UUID>/*.vhd# vhd-util scan -f -c -p -m -v 'VHD-*' -l VG_XenStorage-8021d7b1-0b4d-03ff-d ef6eaf01vhd=VHD-759d484d-2bc9-44c2-8d40-c84a b hidden=1 parent=nonevhd=VHD a5-bd42-4d89-a172-a31387ed1c7a hidden=0 parent=VHD-759d484d-2bc9-44c2-...vhd=VHD-1daf350c fb d hidden=0 parent=VHD-759d484d-2bc9-44c2-...vhd=VHD c7d-4b03-a dfb2e5f67 hidden=0 parent=VHD-759d484d-2bc9-44c2-...vhd=...In addition to TAAS the ‘vhd-util’ command can be issued on the host CLI to output VDI parent-child dependencies, as well.The command is issued differently depending on the type of shared storage that is in use. Lvm-based storage requires the ‘-l’ argument and passing the volume group name of the SR as a parameter, whereas for NFS file-based SRs you simply need to point to the location of the VHD files under the SR directory in /var/run/sr-mount.[reveal]Looking at an example of the output note the indentation indicating the parent VDI and its dependent children.Parent VDI (base copy)Child VDIs (diff disks)
24 Monitoring XenServer to Identify Problems MCS/PVS – How to Monitortop - 13:35:11 up 2 days, 19:02, 4 users, load average: 36.27, 23.64, 14.73Tasks: 435 total, 27 running, 408 sleeping, 0 stopped, 0 zombieCpu(s): 15.2%us, 67.2%sy, 0.0%ni, 3.1%id, 10.1%wa, 0.0%hi, 1.8%si, 2.7%stMem: k total, k used, k free, k buffersSwap: k total, k used, k free, k cachedPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND8722 root m 46m 4320 S :09.34 xapi20438 root S :00.24 tapdisk220571 root S :02.19 tapdisk2...In an online fashion, ‘top’ can be used to check for this condition as well. If storage management is consuming a majority of resources on the host, the ‘tapdisk2’ process will have sustained high CPU consumption.[Highlight tapdisk2]Check ‘tapdisk2’ process utilization
25 Monitoring XenServer to Identify Problems MCS/PVS – How to Monitor00:00: CPU %user %nice %system %iowait %steal %idle00:10: all00:10:00:10:...19:10: all19:10:19:10:19:10:19:10:In an offline manner the SAR logs can provide key data points to indicate there is a VHD chaining issue, as well.[reveal]Abnormally high iowait times when provisioning and starting new VMs is a key indicator of this issue, as highlighted in the example SAR logging on this slide.
26 Monitoring XenServer to Identify Problems MCS/PVS – How to MonitorFor extended monitoring, performance capture scripts can also be deployed on the XenServers to collect process and memory utilization statistics over time.For more information see CTX and CTXRead point 1Read point 2
27 Best Practices and Troubleshooting Let’s spend some time talking about what we can do to address some of the issues we discussed in the last section.
28 Best Practices and Troubleshooting Tweaking XenServerIncrease Dom0 memory allocationSee CTX for instructions.Limit the number of hosts in the pool, or create multiple poolsOptimize shared storage for PVS or IntelliCacheSee CTX for more information.Ensure XenServer and XenDesktop are at compatible versions and have all critical hotfixes and updates applied.There are certain configuration changes and design decisions that can greatly improve performance and reliability for XenServer pools running XenDesktop. Here are a few tweaks that can make a significant difference:[reveal]Read point 1. By increasing the amount of memory available to Dom0 it is possible to run more virtual machines on a single host than with the default value configured “out-of-the-box”. With Dom0 memory increased, up to 130 desktop VMs on a single host are supported.Read pointRead point 2. Officially 16 hosts in a single resource pool are supported, however, with very large XenDesktop environments where thousands of desktops are being managed it may be necessary to split a large pool like this into two or more smaller pools. The reason for this simply comes down to load placed on the pool master. In a large 16-host pool the pool master can have an increased load to bear as it is managing pool operations for 15 other machines in addition to itself. Splitting the pool into several pools means that more machines with a master role will be participating in the management of the hosts, and each master will be under less load as it will be managing a smaller pool. While this can present some complexity with regards to resource pool management, in very large environments it might be critical to maintaining stable host operations.Read point 3. Though all types of shared storage are supported with PVS and IntelliCache, Citrix does recommend using NFS type storage as it supports true thin-provisioning of VDIs which minimizes storage overhead and consumption with regards to handling the large VDI chains that result from using these technologies. Refer to CTX for details on best practices for configuring storage with XenDesktop and XenServer.Read point 4.
29 Best Practices and Troubleshooting Tweaking XenDesktopMax active actionsMax new actions per minuteMax power actions as percentage of desktopsAs there are settings on the XenServer hosts which can help with scaling up large XenDesktop environments, the same is true on the XenDesktop side. From the Desktop Studio there are some optimizations that can be configured to control how it sends requests to the XenServer pool for managing the desktop VMs.[reveal]Read 1. This will control the maximum number of concurrent tasks the XenDesktop Controller can have pending with the XenServer pool at any given time. The default is 100 – this should be reduced for larger XenServer pools.Read 2. This will control the number of tasks the XenDesktop Controller can send to the XenServer pool master within a 60 second period. The default for this is 10 which is OK for most environments, and can possibly be increased if load on the pool master is nominal but desktops are still taking longer than desired to boot, reboot or shutdown.Read 3. This setting will dictate the number of power-state-change requests the XenDesktop Controller can issue to the XenServer pool as a percentage of the total desktops being managed. The default for this setting is 20 which might be a bit high for larger pools and can be reduced.Obviously the correct values for these settings will be different from one XenDesktop environment to another and different values should be experimented with to find the settings that are most optimal.
30 Best Practices and Troubleshooting Tweaking XenDesktop (continued)Power Management and idle pool configuration.Also in the XenDesktop Studio, within the Desktop Group properties, is the ability to tune the power management settings for the virtual desktops. Controlling the number of virtual desktops which are powered up, idle and waiting for user connections is an important feature of XenDesktop that permits administrators to keep a pool of desktops available at all times for users who might be connecting to their desktops. It is important for an administrator to also understand how these settings affect the XenServer hosts with regards to load and number of VMs running-per-host during peak and off-peak hours. During periods of high utilization, if the XenServer hosts are unable to meet the power management demands of the desktop group, it may be necessary to reduce the number of idle VMs running in the pool.
31 Best Practices and Troubleshooting Decoupling VDIsWhen the VHD chains get too long, the VDIs need to be decoupled, i.e., fully copied to remove the parent-child links.Can be tedious and time-consuming.Contact Citrix Technical Support for expert assistance!So I have talked about the VHD chaining issue a few times, but what can we actually do to address this problem?[reveal]Read point 1.Read point 2.Read point 4. if you are experiencing this issue and need a hand.
32 Best Practices and Troubleshooting General Storage and Network TroubleshootingWhen using NFS monitor for connection drops with the storage:# grep 'kernel: nfs: server [0-9.]* not responding, timed out' /var/log/messagesTry to isolate storage, VM and host management network traffic.Ensure all the virtual desktops can communicate directly with the XenDesktop Controller.Here are a few general troubleshooting tips with regards to storage and networking with XenServer and XenDesktop.[reveal]Read point 1. You can use grep to search the system logs for errors from the hosts relating to NFS connection drops.Read point 2. In large XenServer pools running XenDesktop it is important to isolate communication by type as much as possible. Generally there are three network communication paths that can contend with each other if sharing the same network interfaces and network segments, the XenSever management communication, XenServer shared storage communication, in the case of iSCSI or NFS, and the VM network communication. Isolating these three types of network traffic from each other will reduce contention for any one of these services and provide better performance, ease of management and better reliability of overall pool functions.Read point 3. If the virtual desktops are unable to communicate with the XenDesktop Controller they will be unable to register with the desktop pool and will not be accessible to inbound users trying to make virtual desktop connections. Monitoring the VM-to-controller communication is important to ensuring optimal virtual desktop and XenServer host operations.
33 Best Practices and Troubleshooting Other Common IssuesVirtual desktops disconnect or hang when connectingCheck the virtual desktop to see if 3rd-party apps are interfering with loginsConfirm the virtual desktop is not having issues communicating with the XenDesktop Controller.Ensure there are no GPOs or other Active Directory policies enabled on the virtual desktops that would interfere with logon/logoff behavior.In the next couple slides I want to touch on some other common issues that might arise in XenDesktop environments running on XenServer, and how we might address those issues.Read problem.Read solutions.
34 Best Practices and Troubleshooting Other Common IssuesVMs fail to register in Desktop StudioMake sure the VMs are booting up OK and the XenServer hosts are not under heavy load preventing the VMs from operating normally.Also might be due to communication issues between the virtual desktops and the XenDesktop Controller—ensure there is network connectivity between them.Make sure DNS is configured correctly in the environment.Read problem.Read solutions.
35 Best Practices and Troubleshooting Other Common IssuesXenServer pool master crashes or becomes unresponsiveEnsure there are not too many VMs running on the master. Offloading VMs to the slaves and/or placing desktop groups into maintenance mode can mitigate this in the short-term.Monitor the load average in the pool and confirm that the master is not overburdened with specific tasks, such as storage management or XAPI task management.Check for long VHD chains.In large pools this could indicate the need to split the pool to decrease load on the pool master.Read problem.Read solutions.
36 Best Practices and Troubleshooting Other Common IssuesTakes a long time for all the virtual desktops to boot and register with the Desktop ControllerCheck the Advanced Host Configuration in the Desktop Studio and compare to the XAPI task queue and master load average to determine if the XenDesktop Controller is sending too many tasks at once.Check the idle pool settings to determine if the pool is trying to maintain too high of an idle pool count, and if so reduce the idle pool count to bring the pool into balance.Issues with registration which relate to load in large pools might also indicate the need to split the pool into multiple smaller pools.Read problem.Read solutions.
37 Resources Monitoring and Troubleshooting CTX XenServer performance: reality and mythsCTX Memco.sh - Memory Data Collection Script for XenServer Dom0 or Linux SystemsCTX Perfco.sh - Performance Data Collection Script for XenServer Dom0CTX Troubleshooting XenServer DeploymentsCTX Troubleshooting XenDesktop, Provisioning Services & XenServer Integration
38 Resources Storage and Networking CTX XenServer Storage Management and TroubleshootingCTX Introduction to Storage TechnologiesCTX Overview of XenServer Distributed Virtual Switch/Controller and Troubleshooting Network IssuesCTX Introduction to XenServer NetworkingCTX XenDesktop Planning Guide - Storage Best Practices
39 Resources Configuration and Reference CTX XenServer 6.0 Administrator's GuideCTX XenDesktop Planning Guide – XenServer IntegrationCTX How to Collect Diagnostic Information for Citrix XenServerCTX How to Configure Dom0 Memory in XenServer 5.6 or later
40 Find out how to rev up environment maintenance Tools as a ServiceFind out how to rev up environment maintenanceSee your Citrix pit crew in the expo hall with theThe TaaS team can be found at one of the Citrix booths in the Expo hall.Please go see them to take a closer look to see howTaaS can help your environment.checkered racing shoes
41 QuestionsAt this time I would like to open the forum to your questions, and after that I would like to leave you with a final thought before we part ways.
42 We value your feedback!Take a survey of this session now in the mobile appClick 'Sessions' buttonClick on today's tabFind this sessionClick 'Surveys'So, one final thought as you leave here today. Bear in mind that while XenDesktop and XenServer together provide the most powerful, flexible and cost-effective virtual desktop solution, they are still complex products that can experience issues working together, especially in large-scale enterprise deployments. Remember that tuning and optimizing the products to best work together is the key to maintaining stable operations, but when things go wrong there are many ways to monitor and determine what is going on right out-of the-box. And, at the end of the day, Citrix Technical Support is here to help so please give us a call and we will be happy to lend a hand.Thank you and enjoy the rest of your time here at Summit and Synergy!
43 Before you leave…Conference surveys are available online at starting Thursday, May 10Provide your feedback and pick up a complimentary gift at the registration deskDownload presentations starting Monday, May 21, from your My Organizer tool located in your My Account