2 Purpose of This Document There is a lot of talk about private cloud. But how does it look like at technical level? How do we really assure SLA, and have 3 Tier of service? If I’m a small company with just 50 servers, what does my architecture look like? If I have 2000 VM, how does it look like? For existing VMware customers, I go around and do a lot of “health check” at customers site. The No 1 question is around design best practice. So this doc serves as quick reference for me. I can pull a slide from here for discussion. I am employee of VMware. But this is my personal opinion. Please don’t take it as official and formal VMware recommendation. I’m not authorised to do so. Also, generally we should judge the content, rather than the organisation/person behind the content. A technical fact is a technical fact, regardless whether an intern said it or 50-year IT engineer said it. Technology changes 10 Gb ethernet, Flash, SSD disk, FCoE, Converged Infrastructure, SDN, NSX, storage virtualisation, etc will impact the design. A lot ot new innovation coming within next 2 years, and some already featured in VMworld New modules/products from VMware’s Ecosystem Partners will also impact the design. This is a guide Not a Reference Architecture, let alone a Detailed Blueprint. Don’t print and follows to the dot. This is for you to think and tailor. It is written for hands-on vSphere Admin who have attended Design Workshop & ICM A lot of the design consideration is covered in vSphere Design Workshop. It complements vCAT 3.0 You should be at least a VCP 5, preferably VCAP-DCD 5 No explanation on features. Sorry, it’s already >100 slides. With that, let’s have a professional* discussion * Not emotional & religious & political discussion Let’s not get angry over technical stuff. Not worth your health. Folks, some disclaimers: Use it like a book, not slide
3 Table of Contents Introduction Architecting in vSphere Application with special consideration Requirements & Assumptions Design Summary vSphere Design: Datacenter Datacenter, Cluster (DRS, HA, DPM, Resource Pool) vSphere Design: Server ESXi, physical host vSphere Design: Network vSphere Design: Storage vSphere Design: Security & Compliance vCenter roles/permission, config management vSphere Design: VM vSphere Design: Management See this deck: http://communities.vmware.com/docs/DOC-17841 Disaster Recovery See this deck: http://communities.vmware.com/docs/DOC-19992 Additional Info: email me for Appendix slideAppendix slide Some slides have speaker notes for details. Refer to Module 1 and Module 2 of vSphere Design Workshop. I’m going straight into more technical material here. Topic only covers items that have major design impact. Non-design items are not covered. This focuses on vCloud Suite 5.1 (infrastructure). Application specific (e.g. Database) is not covered. Topic only covers items that have major design impact. Non-design items are not covered. This focuses on vCloud Suite 5.1 (infrastructure). Application specific (e.g. Database) is not covered.
5 vCloud Suite Architecture: what do I consider Architecturing a vSphere-based DC is very different to physical Data Center It breaks best practice, as virtualisation is a disruptive technology and it changes paradigm. Do not apply physical-world paradigm into virtual-world. There are many “best practices” in physical world that are caused by physical world limitation. Once the limitation is removed, the best practice is no more valid. Adopt emerging technology as virtualisation is still innovating rapidly. Best practice means proven practice, and that might mean outdated practice. Consider unrequested requirements as business expect cloud to be agile. You have experienced VM sprawl right My personal principle: Do not design something you cannot troubleshoot. A good IT Architect does not setup potential risk for Support Person down the line. I tend to keep things simple and modular. Cost will go up a bit, but it is worth the benefits. What I consider in vSphere based architecture No 1: Upgradability This is unique in the virtual world. A key component of cloud that people have not talked much. After all my apps run on virtual infrastructure, how do I upgrade the virtualisation layer itself? How do you upgrade SRM? Based on historical data, VMware releases major upgrade every 2 years. Your architecture will likely span 3 years, so check with your VMware rep for NDA roadmap presentation No 2: Debug-ability Troubleshooting in virtual environment is harder than physical, as boundary is blurred and physical resources are shared. 3 types of troubleshooting: Configuration. This does not normally happen in production, as once it is configured, it is not normally changed. Stability. Stability means something hang or crash (BSOD, PSOD, etc) or corrupted Performance. This is the hardest among the 3, especially if the slow performance is short lived and in most cases it is performing well. This is why the design has extra server and storage, so we can isolate some VM while doing joint troubleshooting with App team. Supportability This is related, but not the same with Debug-ability. Support relates to things that make day to day support easier. Monitoring counters, reading logs, setting up alerts, etc. For example, centralising the log via syslog and providing intelligent search improves Supportability A good design makes it harder for Support team to make human error. Virtualisation makes task easy, sometimes way too easy relative to physical world. Consider this operational/phychological impact in your design.
6 vCloud Suite Architecture: what do I consider Consideration Cost You will notice that the “Small” Design example has a lot more limitations than the “Large” Design. An even bigger cost is ISV. Some, like Oracle, charges for the entire Cluster. Dedicating cluster for them is cheaper. DR Site serves 3 purposes to reduce cost. VMs from different Business Units are mixed in 1 cluster. If they can share same Production LAN and SAN, same reason can apply to hypervisor. Window, Linux and Solaris VMs are mixed in 1 cluster. In large environment, separate them to maximise your OS license. DMZ and non DMZ are mixed in 1 cluster. Security & Compliance vSphere Security Hardening Guide split security into 3 levels: Production, DMZ and SSLF Prod and Non-Prod don’t share the same cluster, storage, network Easy to make mistake. Easy to move in and out of Production environment. Production is more controlled and secure Non-Prod may spike (e.g. doing load testing). Availability Software has Bugs. Hardware has Fault. We cater for hardware fault mostly. What about software bugs? I try to cater for software bug, which is why the design has 2 VMware clusters with 2 vCenter. This lets you test cluster-related features in one cluster, while keeping your critical VM on another cluster. Cluster is always based on 1 host failure. In small cluster, the overhead can be high (50% in a 2-node cluster) Reliability Related to availabity, but not the same. Availability is normally achieved by redundancy. Reliability is normally achieved by keeping things simple, using proven components, separating things, standardising. For example, solution for Small Design is simpler (a lot less features relative to Large Design). It also uses 1 vSwitch for 1 purpose, as opposed to a big vSwitch with many port groups and complex NIC fail-over policy. You will notice a lot of standardisation in all 3 examples. The drawback of standardisation is overhead, as we have to round up to the next bracket. A VM with 24 GB RAM ends up getting 30 GB.
7 vCloud Suite Architecture: what do I consider Consideration Performance (1 and Many) 2 types: How fast can we do 1 transaction? Latency, clock speed matters here. How many transactions can we do within SLA? Throughput and scalability matters here. Storage, Network, VMkernel, VMM, Guest OS, etc are considered. We are aiming for <1% CPU Ready Time and near 0 Memory Ballooning in Tier 1. In Tier 3, we can and should have higher ready time and some ballooning, so long it still meet SLA. Some technique to address: add ESX, add cluster, add spindles, etc. Includes both horizontal and vertical. Includes both hardware and software. Skills of IT team Especially the SAN vs NAS skill. This is more important than the protocol itself. Skills include both internal and external (preferred vendor who complement the IT team) In Small/Medium environment, it is impossible to be expert on all areas. Consider complementing the internal team by establishing long term partnership with an IT vendor. Having a vendor/vendi relationship saves cost initially, but in the long run there is a cost. Existing environment How does the new component fit into existing environment? E.g. adding a new Brand A server into a data center full of Brand B servers need to take into account management and compatibility with common components. Most customers do not have a separate network for DR test. Another word, they test their DR in production network. Improvement Beside meeting current requirements, can we improve things? Almost all companies need to have more servers, especially in non production. So when virtualisation happens, we have this VM Sprawl. As such, the design have head room. Moving toward “1 VM 1 OS 1 App”. In physical, some physical servers may serve multiple purpose. In virtual, they can afford, and should do so, to run 1 App per VM.
8 First Thing First: the applications Your cloud’s purpose is to run apps. We must know what type of VMs we are running. They impact the design or operation. Type of VMImpact on Design Microsoft NLB (network load balancer) Typical apps: IIS, VPN, ISA VMware recommends Multicast. Need to have its own port group. This port group needs to have Forged Transmit (as it will change the MAC address) MSCS Consider Symantec VCS instead as it has no restrictions on the right. Need FC. iSCSI, NFS, FCoE is not supported. Also, the array must explicitly certify on vSphere. Need Anti-Affinity Rule (Host to VM mapping, not VM-VM as VMware HA does not obey VM-VM affinity rule). As such, need 4 node in a cluster. Need RAM to be 100% reserves. Impact HA Slot Size if you use default settings. Disk has to be eagerzerothick, so it’s a full size. Thin Provisioning at Array will not help as we zeroed all the disk. Need 2 extra NIC ports per ESX for heart beat. Need RDM disk with Physical-Compatibility mode. So VM can’t be cloned or converted to template. vMotion is not supported as at vSphere 5. This is not due to physical RDM Impact on ESX upgrade as ESX version must be the same. With native multipathing (NMP), the path policy can’t be round robin It uses Microsoft NLB. Impact SRM 5. It works, but needs scripting. Preferably same IP, so create stretched VLAN if possible Microsoft ExchangeIf you need CCR (clustered continuous replication), then you need MSCS Oracle SoftwaresOracle charges per cluster (or subcluster, if you configure host-VM affinity) I’m not 100% sure if Oracle still charge per cluster if we do not configure automatic vMotion (so just Active/Passive HA, just like physical world) for the VM (set DRS to manual for this VM). Looks like it they will charge per host in this case, basing on their document dated 13 July 2010. But interpretation from Gartner is Oracle charges for the entire cluster.document App that is licenced per clusterSimilar to Oracle. I’m not aware of any other apps App that are not supportedWhile ISV support Vmware in general, they may only support for certain version. SAP, for example, only support from SAP NetWeaver 2004 (SAP Kernel 6.40) and only on Windows and Linux 64-bit (not on Solaris, for example)
9 VMs with additional consideration Type of VMImpact on Design Peer Applications (Apps that scale horizontally. Example: Web Servers, App Servers They need to exist on different ESX host in a cluster. So need to setup the Anti-Affinity Rule. You need to configure this per Peer. So if you have 5 set of Web servers from 5 different system (so 5 pair, 10 VM), you need to create 5 Anti-Affinity rule. Too many rules will create complexity, more so when #nodes is less than 4 Pair applications (Apps that protect each other for HA. Example: AD, DHCP Server) As above Security VM or network packet capture toolNeed to create another port group to separate VMs being monitored and not. Need to use Distributed vSwitch to turn on port mirroring or netflow. App that depends on MAC address for licenceNeed to have its own port group. May need to have MAC Address Change set to Yes. App that holds sensitive dataShould encrypt the data or the entire file system. vSphere 5 can’t encrypt the vmdk file yet. If you encrypt the Guest OS, back up product may not be able to do file-level back up. Should ensure no access by MS AD Group Administrator. Find out how it is back up, and who has access to the tape. If IT does not even have access to the system, then vSphere may not pass the audit requirement. Check partner products like Intel TXT and Hytrust Fault Tolerance requirementsImpact HA Slot Size (if we use this one) as it uses full reservation. Impact Resource Pool, make sure we cater for the VM overhead (small) App on Fault Tolerance hardwareFT is still limited to 1 core. Consider Stratus to complement vSphere 5
10 VMs with additional consideration Type of VMImpact on Design App that require hardware dongleDongle must be attached to 1 ESX. vSphere 4.1 adds this support. Best to use network dongle. In the DR site, the same dongle must be provided too. App with high IOPSNeed to size properly. No point having dedicated datastores if the underlying spindles are shared among multiple datastores. Apps that uses very large block sizeSharePoint uses 256 KB block size. So a mere 400 IOPS will saturate the GE link already. For such application, FC or FCoE will be a better protocol. Any application with 1 MB block size can easily saturate 1 GE link. App with very large RAM (>64 GB)This will impact DRS when a HA event occurs as it needs to have a host that house the VM. It will still boot so long reservation is not set to a high number. App that needs Jumbo FrameThis must be configured end to end (guest OS, port group, vSwitch, physical switch). Not all support 9000, so do a ping test and find the value. App with >95% CPU utilisation in the physical world and have high run queue Find out first why it is so high. We should not virtualise app that we are blind on its performance characteristic. App that is very sensitive to time accuracyTime drift is a possibility in virtual world. Find out business or technical impact if time deviates by 10 seconds. A group of apps with complex power on sequence and dependancy. Need to be aware of impact on application if during HA event. If 1 VM is shutdown by HA and then power on, the other VMs in the chain may need restart too. This should discussed with App Owner App that takes advantages of specific CPU Instruction Set Mixing with older CPU Architecture is not possible. This is a small problem if you are buying new server. EVC will not help, as it’s only a mask. See speaker notes App that need < 0.01 ms end to end latencySeparate cluster as the tuning is not suitable for “normal” cluster.
11 This entire deck does not cover Mission Critical Applications The deck focus on designing a generic platform for most applications. In the 80/20 concept, it focuses on the easiest 80. Special apps have unique requirements. They differ in the following areas: Size is much larger. So the S, M, L size for VM or ESXi host does not apply to them. VM has unique properties They might get dedicated cluster Picture on the right shows a VM with 12 vCPU, 160 GB vRAM, 3 SCSI controllers, usage of PVSCSI, 18 vDisks and 2 vNICs. This is an exceptional case. There are basically 2 overall architecture in vCloud Suite 5.1: One for the easiest 80% One for the hardest 20% The management cluster described later will still apply to both architecture.
12 3 Sizes: Assumptions Assumptions are needed to avoid the infamous “It depends…” answer. The architecture for 50 VM differs with that for 500 VM, which in turn differs with that for 5000 VM. It is the same vSphere, but you design it very differently. A design for large VM (20 vCPU, 200 GB vRAM) differs with a design for small VM (1 vCPU, 1 GB) Workload for SME is smaller for Large Enterprise. Exchange handling 100 staff vs 10000 staff results in different architecture A design for Server farm differs to Desktop farm. I provide 3 sizes in this document: 50, 500, 1500 VM The table below shows the definition I try to make it as real as possible for each choice. 3 sizes give you choice and shows reasoning used. Take the closest size to your needs, then tailor it to the specific customer (not project). Do not tailor to project as it is a subset to entire data center. Always architect for entire datacenter, not a subset. Size means size of entire company or branch, not size of Phase 1 of the journey A large company starting small should not use the “Small” option below; it should the “Large” option but reduce the # ESX. I believe in “begin with the end in mind”, projecting around 2 years. Longer than 3 years is rather hazy as private cloud is not fully matured yet. I expect major innovation until 2015. A large company can use the Small Cloud example for their remote office. But this needs further tailoring. VSA & ROBO Small VDCMedium VDCLarge VDC CompanySmall Company or Remote Branch MediumLarge IT Staff1-2 person doing everything4 person doing infra 2 person doing desktop 10 person doing apps Different teams for each. Matrix reporting. Lots of politics & little kingdoms Data Center1 or none (hosted) or just a corner rack 2, but no array-level replication2 with private connectivity. 5 satelite DC
13 Assumptions, Requirements, Constraints for our Architecture Small VDCMedium VDCLarge VDC # Servers currently25 servers. All are production ~150 servers 70% is production 700 servers 55% is production # Servers in 2 yearsProd: 30 servers Non Prod: 15 servers (50%) Prod: 250 servers Non Prod: 250 servers (100%) Prod: 500 servers Non Prod: 1000 servers (200%) # Server VM that our design needs to cater505001500 # View VM or Laptop500. With remote access. No need for offline VDI. 5000. With remote access. Need offline VDI 15000. With remote access + 2 FA Need offline VDI DR RequirementsYes Storage expertiseMinimal. Also keeping cost low by using IP Storage. No SAN.Yes. RDM will be used as some DB may be large. DMZ Zone / SSLF ZoneYes/No Yes/No. Intranet also zoned Back upDiskTape Network standardNo standard Cisco ITIL ComplianceNot applicableA few are in placeSome are in place Change ManagementMostly not in placeA few are in placeSome are in place Overall System Mgmt SW (BMC, CA, etc)NoNeeds to have tools Configuration ManagementNoNeeds to have tools Oracle RACNoYes Audit TeamNoExternalExternal & Internal Capacity PlanningNoNeeds to have tools Oracle softwares (BEA, DB, etc)No Yes
14 3 Sizes: Design Summary The table below provides the overall comparison, so you can easily compare what was taken out in the Small or Medium design. Just like any other design, there is no 1 perfect answer. Example: you may use FC or iSCSI for Small. This assumes 100% virtualised. It is easier to have 1 platform than 2. Certain things in company, you should only have 1 (email, directory, office suite, back up). Something as big as a “platform” should be standardised. That’s why they are called platform. Design for Medium will be in between Small and Large. SmallLarge # FT VM0 – 3 (in Prod Cluster only)0 – 6 (Prod Cluster only) VMware productsvSphere Standard SRM Standard vCloud Security & Networking Horizon View Enterprise vCenter Operations Standard vSphere Storage Appliance vCloud Suite Enterprise vCenter Server Standard vCenter Server Heartbeat Horizon Suite VMware certification & Skill1 VCP1 VCAP DCA, 1 VCAP DCD VMware Mission Critical Support StorageiSCSI vSphere replication FC + iSCSI, with snapshot vSphere + Array replication Server2x Xeon 5650, 72 GB RAM2x Xeon (8-10 core/socket), 128 GB RAM Back upVMware Data Protection to Array 2VADP + 3 rd party to Tape
15 Other Design possibilities What if you need to architect for larger environment? Take the Large Cloud sample as starting point. It can be scaled to 10,000 VM. Above 1000 VM, you should consider a Pod approach. Upsize it by: Adding larger ESXi Host. I’m using an 8-core socket, based on Xeon 5600. You should use 10-core Xeon 7500 to fit larger VM. Take note of cost. Adding more ESX in the existing cluster. Keep it maximum 10 nodes per cluster. Adding more cluster. For example, you can have multiple Tier 1 Clusters. Adding Fault Tolerant Hardware from Stratus. Make this Stratus server as a member of the Tier 1 Cluster. It appears as 1 ESX, although there are 2 physical hardware. Stratus has its own hardware, so ensure the consistency in your cluster design. Split the IT Datastore into multiple. Group by function or criticality. If you are using Blade server and have filled 2 chassis, put the IT Cluster outside the blade and use rack mount. Separating the Blade and the server managing it minimise chance of human error as we avoid the “Managing Itself” complexity. Migrating inter cluster vSphere 5.1 supports live migration between cluster that don’t have common datastore. I don’t advocate live migration from/to Production Envi. It should be part of Change Control. The Large Cloud is not yet architected for vCloud Director vCloud Director has its own best practices for vSphere design. Adding vCloud + SRM on DR site requires proper design by itself. And this deck is already 100+ slides….
16 Design Methodology Architecting a Private Cloud is not a sequential process There are 8 components. Application is driving infrastructure. The components are inter-linked. Like a mash. In >1000 VM category, where it takes >2 years to virtualise >1000 VM, new vSphere will change the design. Even the Bigger Picture is not sequential Sometimes, you may even have to leave Design and go back to Requirements or Budgetting. There is no perfect answer. Below is one example. This entire document is about Design only. Operation is another big space. I have not taken into account Audit, Change Control, ITIL, etc. The steps are more like this
17 Data Center Design Data Center, DR, Cluster, Resource Pool
18 Virtual Datacenter Physical Datacenter 2 Physical Datacenter 1 Just what is a software-defined datacenter anyway? Physical Compute Function Compute Vendor 1 Compute Vendor 2 Physical Network Function Network Vendor 1 Network Vendor 2 Physical Storage Function Storage Vendor 1 Storage Vendor 2 Physical Compute Function Compute Vendor 1 Compute Vendor 2 Physical Network Function Network Vendor 1 Network Vendor 2 Physical Storage Function Storage Vendor 1 Storage Vendor 2 Shared Nothing Architecture. Not stretched between 2 physical DC. Production might be 10.10.x.x. DR might be 20.20.x.x Shared Nothing Architecture. No replication between 2 physical DC. Production might be FC. DR might be iSCSI. Shared Nothing Architecture. No stretched cluster between 2 physical DC. Each site has its own vCenter.
19 2-distinct Layer: Consumer and Producer Separation & Abstraction (done by the Hypervisor or DC OS) Separation & Abstraction (done by the Hypervisor or DC OS) 2 distinct layers Supporting the principle of Consumer and Producer. VM is Consumer. Does not care about underlying technology. Its sole purpose is to run the application. DC Infra is Producer. Provide common services. VM is freed from (or independent of) underlying technology. These technology can change without impacting VM: Storage protocol (iSCSI, NFS, FC, FCoE) Storage file system (NFS, VMFS, VVOL) Storage multi-pathing (VMware, EMC, etc) Storage replication Network teaming The Datacenter OS provides a lot of services, such as: Security: Firewall, IDS, IPS, Virtual Patching Networking: LB, NAT Availability: backup, cluster, HA, DR, FT Management & Monitoring A lot of agents are removed from VM, resulting in simpler server. DC Services Datacenter Technologies Datacenter Technologies Datacenter Implementation Datacenter Implementation
23 Methodology Define how many physical data centers are required DR requirements normally dictate 2 For each Physical DC, define how many vCenter are required Desktop and Server should be separated by vCenter Connected to same SSO server, fronted by same Web Client VM View comes with bundled vSphere (unless you are buying add-on) Ease of management. In some cases (Hybrid Active/Active), a vCenter may span multiple physical DC. For each vCenter, define how many virtual data centers are required Virtual Data Center serve as name boundary. A good way to separate IT (Provider) and Business (Consumer) For each vDC, define how many Cluster are required In large setup, there will be multiple clusters for each Tier. For each Cluster, define how many ESXi are required Preferably 4 – 12. 2 is too small a size Standardise the host spec across cluster. While each cluster can have its own host type, this adds complexity Physical DC vCenter Virtual DC ClusterESXi Physical DC vCenter (Server pool) Virtual DC (IT) Virtual DC (Biz) Tier 1 Cluster Tier 2 Cluster ESXi vCenter (Desktop pool) Virtual DC
24 Large: The need for Non Prod Cluster This is unique in the virtual data center. We don’t have “Cluster” to begin with in physical DC as cluster means different thing. Non-Prod Cluster serves multiple purposes Run Non Production VM In our design, all Non-Production run on DR Site to save cost. A consequence of our design is migrating from/to Production can mean copying large data across WAN. Disaster Recovery Test-Bed for Infrastructure patching or updates. Test-Bed for Infrastructure upgrade or expansion Evaluating or Implementing new features In Virtual Data Centre, a lot of enhancements can impact entire data centre e.g. Distributed Switch, Nexus 1000V, Fault Tolerant, vShield All the above need proper testing. Non-Prod Cluster should provide sufficient large scale scope to make testing meaningful Upgrade of the core virtual infrastructure e.g. from vSphere 4 to 5 (major release) This needs extensive testing and roll back plan. Even with all the above… How are you going to test SRM upgrade & updates properly? In Singapore, MAS TRM guidelines require Financial Institution to test before updating production. SRM test needs 2 vCenters, 2 arrays, 2 SRM servers. If all are used in production, then where is the test-environment for SRM? When happens when you are upgrading SRM? You will lose protection during this period. Business IT This new layer does not exist in physical world. It is software, hence needs its own Non Prod envi. This new layer does not exist in physical world. It is software, hence needs its own Non Prod envi.
25 Large: The need for IT Cluster Special purpose cluster More than Management Cluster. It runs non Management VMs that are not owned by Business. Examples: Active Directory File Server Email & Collaboration (in the Large example, this might warrant its own cluster) Running all the IT VMs used to manage the virtual DC or provide core services The Central Management will reside here too Separated for ease for management & security The next page shows the list of VMs that resides on the IT Cluster. Each line represent a VM. This shows for Production Site. DR Site will have a subset of this. Except for vCloud Director, which is only deployed on DR Site Explanation of some of the servers below: Security Management Server = VM to manage security (e.g TrendMicro Deep Security) This separation keeps Business Cluster clean, “strictly for business”. This separation keeps Business Cluster clean, “strictly for business”.
26 Large: IT Cluster (part 1) The table provides samples of VMs that should run on the IT cluster. 4 ESXi Host should be sufficient as most VM is not demanding. They are mostly management tool. Relatively more demanding VMs are vCenter Operations. There are many databases here. Standardise on 1. I will not put these databases together with DB running business workload. Keep Business and IT separate. CategoryLarge Cloud Base PlatformvCenter (for Server Cloud) – active node vCenter (for Server Cloud) – passive node vCenter (for Server Cloud) DB – active node vCenter (for Server Cloud) DB – passive node vCenter Web Server vCenter Inventory Server vCenter SSO Server x2 with Global HA vCenter Heartbeat Auto-Deploy + Authentication Proxy (1 per vCenter) vCenter Update Manager + DB. 1 per vCenter. vCenter Update Manager Download Service (in DMZ) Auto-Deploy + vSphere Authentication Proxy vCloud Director (Non Prod) + DB Certificate Server StorageStorage Mgmt tool (need physical RDM to get fabric info) VSA Manager Back up Server NetworkNetwork Management Tool (need a lot of bandwidth) Nexus 1000V Manager (VSM) x 2 Sys Admin ToolsAdmin client (1 per Sys Admin) with PowerCLI VMware Converter vMA (management Assistant) vCenter Orchestrator + DB
27 Large Cloud: IT Cluster (page 2) Continued from previous page. What IT apps that are not in this Cluster: View Security Servers. These servers reside in the DMZ zone. It is directly accessible from the Internet. Putting them in the management cluster means the management cluster needs to support Internet facing network. CategoryLarge Cloud Application MgmtAppDirector Hyperic Advance vDC Services -Security -Availability Site Recovery Manager + DB SRM Replication Mgmt Server + DB vSphere Replication Servers (1 per 1 Gbps bandwidth, 1 per site) AppHA Server (e.g. Symantec) Security Management Server (e.g. TrendMicro DeepSecurity) vShield Manager Management -Performance -Capacity -Configuration vCenter Operations Enterprise (2 VM) vCenter Infrastructure Navigator vCloud Automation Center (5 VM) VCM: Web + App + DB (3 VM) Chargeback + DB, Chargeback Data Collector (2) Help Desk system CMDB Change Management system Desktop as a ServiceView Managers + DB View Security Servers (sitting in DMZ zone!) ThinApp Update Server vCenter (for Desktop Cloud) + DB vCenter Operations for View Horizon Suite Mirage Server Core InfraMS AD 1, AD 2, AD 3, etc. DNS, DHCP, etc Syslog server + Core Dump server File Server (FTP Server) for IT File Server (FTP Server) for Business (multiple) Print Server Core ServicesEmail & Collaboration
28 Cluster Size I recommend 6-10 nodes per cluster, depending on the Tier. Why not 4 or 12 or 16 or 32? A balance between too small (4 hosts) and too large (>12 hosts) DRS: 8 give DRS sufficient host to “maneuver”. 4 is rather small from DRS scheduler point of view. With “sub cluster” ability introduced in 4.1, we can get the benefit of small cluster without creating one Best practice for cluster is same hardware spec with same CPU frequency. Eliminates risk of incompatibility Complies with Fault Tolerant & VMware View best practices So more than 8 means it’s more difficult/costly to keep them all the same. You need to buy 8 hosts a time. Upgrading >8 servers at a time is expensive ($$) and complex. A lot of VMs will be impacted when you upgrade > 8 hosts. Manageability Too many hosts are harder to manage (patch, performance troubleshooting, too many VMs per cluster, HW upgrade) Allow us to isolate 1 host for VM-troubleshooting purpose. At 4 node, we can’t afford such ”luxury” VM Restart priority is simpler when you don’t have too many VM Too many paths to a LUN can be complex to manage and troubleshoot Normally, a LUN is shared by 2 clusters, which are “adjacent” cluster. 1 ESX is 4 paths. So 8 ESX is 32 paths. 2 clusters is 64 paths. This is a rather high number (if you compare with physical world) N+2 for Tier 1 and N+1 for others With 8 host, you can withstand 2 host failures if you design it to. At 4 nodes, it is too expensive as payload is only 50% at N+2 Small Cluster size In a lot of cases, the cluster size is just 2 – 4 nodes. From Availability and Performance point of view, this is rather risky. Say you have 3-node cluster…. You are doing maintenance on Host 1 and suddenly Host 2 goes down… you are exposed with just 1 node. Assuming HA Admission Control is enabled (which you should), the affected VM may not even boot. When a host is placed into maintenance mode, or disconnected for that matter, it is taken out of the admission control calculation. Cost: Too few hosts result in overhead (the “spare” host)
29 Small Cloud: Cluster Design
30 Small Cloud: Design Limitation It is important to document clearly, the design limitation. It is perfectly fine for a design to have limitation. After all you have limited budget. Inform CIO and Business clearly on the limitation. It is based on vSphere Standard edition No Storage vMotion No DRS and DPM No Distributed Switch Can’t use 3rd party multi-pathing. Does not support MSCS Veritas VCS does not have this restriction vSphere 5.1 only support FC for now. I use iSCSI in this design. For 30-server environment, HA with VM monitoring should be sufficient. In vSphere 5.1 HA, a script can be added that ping the application (services) is active on its given port/socket. Alternative, a script within the Guest OS check the process if it’s up or not. If not, it sends alert. Only 1 cluster in primary data center Production, DMZ and IT all run on the same cluster. Network are segregated as they use different network Storage are separated as they use different datastore
31 Small Cloud: Scaling to 100 VM The next slide shows an example where the requirement is for 100 VM instead of 50. We have 7 hosts in DC 1 instead of 3 hosts We have 3 hosts in DC 2 instead of 2 hosts Only 1 cluster in primary data center Production, DMZ and IT all run on the same cluster. Network are segregated as they use different network Storage are separated as they use different datastore Since we have more hosts, we can do sub-cluster. We will place the following as sub-cluster Host 1 – 2: Oracle BEA SubCluster Host 6 – 7 : Oracle DB SubCluster Production is soft cluster. So a host failure means it can use Host 1 – 2 too. Complex Affinity and Host/VM Be careful in designing VM Anti-Affinity rule We are using Group Affinity as we have sub-cluster. So we have extra constraint.
32 Small Cloud: Scaling to 100 VM Certainly, there can be possible variations. 2 are described below. If we can add 1 more ESX host, we can create 2 cluster of 4 node each. This will simplify the Affinity Rule We can use a 1-socket ESX host instead of 2-socket Save on VMware licence Extra cost on servers Extra cooling/power operational cost Oracle BEA DMZ LAN Production LAN Management LAN Oracle DB Rest of VMs
33 Small Cloud: Scaling to 150 VM We have more “room” to design, but it is still too small Production needs 7 hosts IT needs 2 hosts DMZ needs 2 hosts Putting IT with DMZ is a design trade-off vShield is used to separate IT and DMZ If the above is deemed not enough, we can add VLAN. If it is still not enough, use different physical cables or switch The more you separate physically, the more you defeat your purpose of virtualisation. Oracle BEA Rest of VMs Production LAN Oracle DB
34 Small: Scaling to 150 VM
35 Large: Overall Architecture
37 Large: DataCenter and Cluster In our design, we will have 2 Datacenter only Separating the IT Cluster from the Business Clusters. Certain objects can go across Cluster, but not across Data Center You can vMotion from one cluster to another within a datacenter, but not to another datacenter. Networking: Distributed Switch, VXLAN, vShield Edge can’t go across DC as at vCloud Suite 5.1 Datastore name is per DataCenter. So network and storage are per Data Center You can still clone a VM within a datacenter and to a different datacenter
38 Large: Cluster Design
39 Large: Tiered Cluster The 3 tiers becomes the standard offering that Infra team provides to app team. If Tier 3 is charged $X/VM, then Tier 2 is priced at 2x and Tier 1 is priced at 4x. Apps team can then choose based on their budget. Cluster size varies, depending on criticality. A test/dev might have 10 node, while a Tier-0 might have just 2 node. The Server Cluster also maps 1:1 to the Storage Cluster This keeps thing simple. If a VM is so important that it is on Tier 1 cluster, then its storage should be on Tier 1 cluster too. This excludes Tier 0, which is special and handled per application. Tier 0 means the cost of infra is very low relative to the value & cosst of the apps to the business. Tier “SW” is a dedicated cluster running a particular software. Normally, this is Oracle, MS SQL, Exchange. While we can have “sub-cluster”, it is simpler to dedicate entire cluster. Tier# HostNode Spec?Failure Tolerance MSCSMax #VMMonitoringRemarks Tier 1Always 6Always Identical2 hostsYes25Application level. Extensive Alert Only for Critical App. No Resource Overcommit. Tier 24-8Maybe1 hostLimited75App can be vMotioned to Tier 1 during critical run Tier 34-10No1 hostNo150Infrastructure level Minimal Alert. Some Resource Overcommit SW2-10Maybe1-3 hostsNo25Application specificRunning expensive softwares. Oracle, SQL are the norms as part of DB as a Service
40 Large: Example 1 Goal is to provide 500 Prod VM and 1000 Non Prod VM As you scale >1000 VM, keep in mind the number of clusters & hosts. As you scale >10 clusters, consider using 4 socket hosts. This example does have Large VM cluster, which is an exception cluster. Large VM in this case is > 8 vCPU and > 64 GB vRAM. As you scale >1000 VM, keep in mind the number of clusters & hosts. As you scale >10 clusters, consider using 4 socket hosts. This example does have Large VM cluster, which is an exception cluster. Large VM in this case is > 8 vCPU and > 64 GB vRAM.
41 Large: Example 2 Same goal as previous, but we’re going for higher consolidation ratio (and hence using 40-core box)
42 Large: Example Pod (with Converged Hardware) Compute + Storage Converged Block. 32 RU Compute + Storage Converged Block. 32 RU Management Block. 2 RU Management Block. 2 RU 4x 48 ports. 10 GE. Total 192 ports. 1x 48 ports. 1 GE (for Management) 4x 48 ports. 10 GE. Total 192 ports. 1x 48 ports. 1 GE (for Management) Each ESXi hosts has: -4x 10 GE for network and storage -1x 1 GE for iLO -1x Flash for performance -2x SSD for performance -4x SAS for capacity Total ports requirements per rack: -34 x 4 = 136 10GE ports -34 x 1 = 34 1GE ports -ISL & uplinks = 6 GE ports Total compute per Pod: 2 racks x 32 x 16 cores = 1024 cores Each ESXi hosts has: -4x 10 GE for network and storage -1x 1 GE for iLO -1x Flash for performance -2x SSD for performance -4x SAS for capacity Total ports requirements per rack: -34 x 4 = 136 10GE ports -34 x 1 = 34 1GE ports -ISL & uplinks = 6 GE ports Total compute per Pod: 2 racks x 32 x 16 cores = 1024 cores Network Block. 5 RU Network Block. 5 RU Compute + Storage Converged Block. 32 RU Compute + Storage Converged Block. 32 RU Management Block. 2 RU Management Block. 2 RU Network Block. 5 RU Network Block. 5 RU Rack 1 (42 RU)Rack 2 (42 RU) IT Cluster. It’s a 4-node cluster.
43 Resource Pool: Best Practices What they are not A way to organise VM. Use folder for this. A way to segregate admin access for VM. Use folder for this. For Tier 1 cluster, where all the VMs are critical to business Architect for Availability first, Performance second. Translation: Do not over-commit. So resource pool, reservation, etc are immaterial as there is enough for everyone. But size each VM accordingly. No oversizing as it might slow down. For Tier 3 cluster, use carefully Tier 3 = overcommit. Use Reservation sparingly, even at VM level. This guarantees resource, so it impacts the cluster slot size. Naturally, you can’t boot additional VM if your guarantee is fully used Take note of extra complexity in performance troubleshooting. Use as a mechanism to reserve at “group of VMs” level. If Department A pays for half the cluster, then creating an RP with 50% of cluster resource will guarantee them the resource, in the event of contention. They can then put as many VM as they need. But as a result, you cannot overcommit at cluster level, as you have guaranteed at RP level. Introduce a scheduled task which sets the shares per resource pool, based on the number of VMs/vCPUs they contain. E.g.: a PowerCLI script which runs daily and takes corrective actions. Just google it Don’t put VM and RP as “sibling” or same level See my Resource Management slide for detailsslide See my Resource Management slide for detailsslide
44 VM-level Reservation & Limit CPU reservation: Guarantees a certain level of resources to a VM Influences the admission control (PowerOn) CPU reservation isn’t as bad as often referenced: CPU reservation doesn’t claim the CPU when VM is idle (is refundable) CPU reservation caveats: CPU reservation does not always equal priority VM uses processors and “Reserved VM” is claiming those CPUs = ResVM has to wait until threads / tasks are finished Active threads can’t be “de-schedules” if you do so = Blue Screen / Kernel Panic Memory reservation Guarantees a certain level of resources to a VM Influences the admission control (PowerOn) Memory reservation is as bad as often referenced. “Non-Refundable” once allocated Windows is zeroing out every bit of memory during startup… Memory reservation caveats: Will drop the consolidation ratio May waste resources (idle memory cant’ be reclaimed) Introduces higher complexity (capacity planning) Do not configure high CPU or RAM, then use Limit E.g. configure with 4 vCPU, then use limit to make it “2” vCPU It can result in unpredictable performance as Guest OS does not know. High CPU or high RAM has higher overhead. Limit is used when you need to force slow down a VM. Using Shares won’t achieve the same result
45 Fault Tolerance Design Consideration Still limited to 1 vCPU in vSphere 5.1 FT impacts Reservation. It will auto reserve at 100% Reservation impacts HA Admission Control as slot size is bigger. HA does not check Slot Size nor actual utilisation when booting up. It checks Reservation of that affected VM. FT impacts Resource Pool. Make sure the RP includes the RAM Overhead. Cluster Size is minimum 3, recommended 4. Tune the application and Windows HAL to use 1 CPU. In Win2008 this no longer matters [e1: need to verify] General guides Assuming 10:1 consolidation ratio, I’d cap FT usage to just 10% of Production VM So 80 VM means around 8 ESX host means around 8 FT VM. This translates to 1 Primary VM + 1 Secondary VM per host. Small cluster size (<5 nodes) are more affected when there is a HA. See picture for a 3-node example. Limitation Turn off FT before doing Storage vMotion FT protect infra, not app. Use Symantec ApplicationHA to protect App
46 Branch or remote sites Some small sites may not warrant its own vCenter No expertise to manage it either. Consider vSphere Essential Plus ROBO edition. Need 10 sites for best financial return as it is sold in 10 units. Features that are network heavy should be avoided. Auto deploy means sending around 150 MBtye. If link is 10 Mbit shared, it will add up. Best practices Install a copy of template at remote site. If not, use OVF as it is compressed. Increase vCenter Server and vSphere hosts timeout values to ~3 hours Consider manual vCenter agent installs prior to connecting ESXi hosts Use RDP/SSH instead of Remote Console for VM console access If absolutely needed, reduce remote console displays to smaller values, e.g. 800x600/16-bit vCenter 5.1 improvement over 4.1 on remote ESX Use web client if vCenter is remote, which uses less bandwidth No other significant changes Certain vCenter operations that involve a heavier payload E.g. Add Host, vCenter agent upgrades, HA enablement, Update Manager based host patching
47 Server Design ESXi Host
48 Approach General guidelines as at Q3 2013: Use 2 sockets, 16-20 cores, Xeon 2820 with 128 GB RAM For large VM, use 4 sockets, 40 cores, Xeon 4820 with 256 GB RAM 8 GB RAM per core. A 12-core ESX box should have 96 GB. This should be enough to cater for VM with large RAM Consideration when deciding the size of ESXi host Look at overall cost, not just the cost of ESX host. Cost of network equipments, cost of management, power cost, space cost. Larger host can take larger VM or more VM/host. Think of cluster, not 1 ESX host when sizing the ESXi host. Cluster is the smallest logical building block in this Pod approach. Plan for 1 fiscal year, not just next 6 months. You should buy host per cluster. This ensures they are the same batch. Standardise the host spec makes management easier. Know #VM you need to host and their size. This gives you idea how many ESX you need. Define 2 VM sizing: Common and Large If your largest VM needs >8 core, go for >8 core pCPU. Ideally, a VM should fit inside socket to minimise NUMA effect. This happens in physical world too. If your largest VM needs 64 GB of RAM, then each socket should have 72 GB. I consider RAM overhead. Note that Extra RAM = Slower boot. ESXi is creating swap file that match the RAM size. You can use reservation to reduce this, so long you use “% based” in Cluster setting. ESXi host should be >2x the largest VM. Decide: Blade or Rack or Converged Decide: IP or FC storage If you use Converged, then it’s either NFS or iSCSI
49 ESXi Host: CPU CPU performance has improved drastically. Something like 1800% No need to buy the highest end CPU as the Premium is too high. Use the savings and buy more hosts instead, unless: the # hosts are becoming a problem you need to run high performance single thread You need to run more VM per host. The 2 table below VMmark result First table shows improvement from 2005-2010 Based on VMmark 1.x Second table shows from 2010 to May 2013 based on VMmark 2.x
50 ESXi Host: CPU Sizing Buffer the following: Agent VM or vmkernel module: vShield App or other hypervisor-based firewall Hypervisor based firewall such as vShield App Hypervisor based IDS/IPS such as TrendMicro Deep Security vSphere Replication Distributed Storage HA event. Performance isolation. Hardware maintenance Peak: month end, quarter end, year end Future requirements within the same fiscal year DR. If your cluster needs to run VM from the Production site. The table below is before we add HA into account. So it is purely from performance point of view. When you add the HA host, the day to day ratio will drop. So the utilisation will be lower as you have “spare” host Doing 2 vCPU per 1 physical core is around 1.6x over-subscribe, as there is benefit of Hyper-Threading. TiervCPU RatioVM RatioTotal vCPU (2 sockets, 16 cores) Average VM size Tier 12 vCPU per core5:132 vCPU32/5 = 6.4 vCPU each Tier 24 vCPU per core10:1 – 15:164 vCPU64/10 = 6.4 vCPU each Tier 3 or Dev6 vCPU per core20:1 – 30:196 vCPU96/30 = 3.2 vCPU each
51 UNIX X64 migration: Performance Sizing When migrating from UNIX to X64, we can use industry standard benchmark where both platforms participate. Benchmarks like SAP and SPEC are established benchmark, so we can easily get data from older UNIX machines (which are common source of migration as they have reached 5 years and hence have high maintenance cost). Based on SPEC-int2006 rate benchmark results published July 2012: HP Integrity Superdome (1.6GHz/24MB Dual-Core Intel Itanium 2) 128 cores SPEC-int2006 rate result: 1650 Fujitsu / Oracle SPARC Enterprise M9000 256 cores SPEC-int2006 rate result: 3150 IBM Power 780 (3.44 GHz, 96 core) SPEC-int2006 rate result: 3520 IBM result per core is higher than X64 as it uses MCM module. In Power series CPU and Software are priced at core basis, not socket. Bull Bullion E7-4870 (160 cores - 4TB RAM) SPEC-int2006 rate result : 4110 Sizing of RAM, Disk and Network are much easier as we can ignore the speed/generation. We simply match it. For example, if the UNIX apps need 7000 IOPS and 100 GB of RAM we simply match it. The higher speed of RAM is a bonus. With Flash and SSD, IOPS is no longer concern. The vCPU is the main factor as UNIX partition can be large (e.g. 48 cores), and we need to reduce the vCPU.
52 ESXi Host: RAM sizing How much RAM? It depends on the # core in previous slide. Not so simple anymore. Each vendor is different. 8 GB DIMM is cheaper than 2x 4 GB DIMM. 8 GB per core. So 12 core means around 96 GB. Consider the channel best practice Don’t leave some empty. This bring benefits of memory interleaving. Check with the server vendor on the specific model. Some models now comes with 16 slots per socket, so you might be able to use lower DIMM size. Some vendors like HP has similar price between 4 GB and 8 GB. Dell R710 has 18 DIMM slots (?) IBM x3650 M3 has 18 DIMM slots HP DL 360/380 G8 has 24 DIMM slots HP DL 380 G7 and BL490cG6/G7 have 18 DIMM slots Cisco has multiple models. B200 M3 has 24 slots. VMkernel has Home Node concept in NUMA system. For ideal performance, fit a VM within 1 CPU-RAM “pair” to avoid “remote memory” effect. # of vCPUs + 1 <= # of cores in 1 socket. So running a 5 vCPU VM in a quad-core will force remote memory situation VM memory <= memory of one node Turn on Large Page, especially for Tier 1. Need application-level support 64 GB
53 ESXi Host: IO & Management IO requirements will increase in 2014. The table provides estimate. It is a prediction based on tech preview or VMworld 2012. Actual result may vary. Converged Infrastructure needs high bandwidth IO card I personally prefer 4x 10 GE NIC Not supported: mixing hardware iSCSI and software iSCSI. Management Lights-out management So you don’t have to be in front of physical server to do certain thing (e.g. go into CLI as requested by VMware Support) Hardware agent is properly configured Very important to monitor hardware health due to many VMs in 1 box. PCI Slot on the motherboard Since we are using 8 Gb FC HBA, make sure the physical PCI- E slot has sufficient bandwidth. A single-dual port FC port makes more sense if the saving is high and you need the slot. But there is a risk of bus failure. Also, double check to ensure the chip can handle the throughput of both ports. If you are using blade, and have to settle for a single 2-port HBA (instead of two 1-port HBA), then ensure the PCI slot has bandwidth for 16 Gb. When using a dual-port HBA, ensure the chip & bus in the HBA can handle the peak load of 16 Gb. PurposeBandwidthRemarks VM4 GbFor ~20 VM. vShield Edge VM needs a lot of bandwidth as all traffic pass through it FT10 GbBased on VMworld 2012 presentation. Distributed Storage10 GbBased on Tech Preview in 5.1 and Nutanix vMotion8 GbvSphere 5.1 introduces shared-nothing live migration. This increases the demand as vmdk is much larger than vRAM. Include multi-NIC vMotion for faster vMotion when there are multiple VMs to be migrated. Management1 GbCopying a powered-off VM to another host without shared datastore takes this bandwidth IP Storage6 GbNFS or iSCSI. Not the same with the Distributed Storage as DS is not serving VM. No need 10 Gb as the storage array is likely shared by 10-50 hosts. The array may only have 40 Gb total for all these hosts. vSphere Replication1 GbShould be sufficient as the WAN link is likely the bottlenect Total40 Gb Estimated ESXi IO bandwidth in early 2014
54 Large: Sample ESXi host specification Estimated Hardware Cost: US$ 10K per ESXi. Configuration included in the above price: 2 Xeon X5650. The E series has different performance & price attributes 128 GB RAM 4x 10 GE NIC. No HBA 2x 100 GB SSD. Swap to host-cache feature in ESXi 5 Running agent VM that is IO intensive Could be handy during troubleshooting. Only need 1 HD as it’s for troubleshooting purpose. No installation disk. We will use auto-deploy, except for management cluster. Light-Out Management. Avoid using WoL. Uses IPMI or HP iLO. Costs not yet included LAN switches. Around S$15 K for a pair of 48-port GE switch (total 96 ports) SAN switches.
55 Blade or Rack or Converged Both are good. Both have pro and cons. Table below is relative comparison, not absolute. Consult principal for specific model. Below is just for guidelines. Comparison below is only for vSphere purpose. Not for other use case, say HPC or non VMware. There is a 3 rd choice, which is converged infrastructure. Example is Nutanix. BladeRack Relative Advantages Some blades come with built-in 2x 10 GE port. To use it, you just need to get 10 GE switch. Less cabling, less problem. Easier to replace a blade Better power efficiency. Better rack space efficiency. Better cooling efficiency. The larger fan (4 RU) is better than the small fan (2 RU) used in rack Some blade can be stateless. The management software can clone 1 ESX to another. Better management, especially when you have many ESXi hosts. Typical 1RU rack server normally comes with 4 built-in ports. Better suited for <20 ESX per site More local storage Relative Disadvantages More complex management, both on Switch and Chassis. Proprietary too. Need to learn the rules of the chassis/switches. Positioning of the switch matters in some model. Some blade virtualise the 10 GE NIC and can slice it. This adds another layer + complexity. Some replacement or major upgrade may require entire chassis to be powered off Some have 2 PCI slots only. Might not support if you need >20 GE per ESXi. Best practice recommends 2 enclosures. The enclosure is passive, it does not contain electronic. There is initial cost as each chassis/enclosure needs to have 2 switches. Ownership of the SAN/LAN switches in the chassis needs to be made clear. The common USB port in the enclosure may not be accessible by ESX. Need to check with respective blade vendor. USB dongle (which you should not use) can only be mounted in front. Make sure it’s short enough that you can still close the rack door. The 1 RU rack server has very small fan, which is not as good as larger fan. Less suited when each DC is big enough to have 2 chassis Cabling & rewiring
56 ESXi boot options 4 methods of ESXi boot Need installation: Local Compact Flash Local Disk SAN Boot No need installation. LAN Boot (PXE) with Auto-Deploy Auto Deploy Environment with >30 ESXi should consider Auto Deploy. Best practice is to put the IT Cluster on non-autodeploy. An ideal ESX is just pure CPU and RAM. No disk, no PCI card, no identity. Auto-Deploy is also good for environment where you need to prove to security team that your ESXi has not been tempered (you can simply boot it and it is back to “normal” ) Centralised image management. Consideration when the Auto-Deploy infrastructure are also VM: Keep the IT using local install. Advantages of Local Disk to SAN boot No SAN complexity Need to label the LUN properly. Disadvantages of Local Disk to SAN boot Need 2 local disk, mirrored. Certain organisation does not like local disk. Disk is a moving part. Lower MTBF. Save power/cooling
57 Storage Design
58 Methodology Most app team will not know their IOPS and Latency requirement. Make it as part of Storage Tiering, so they consider the bigger picture Turn on Storage IO Control Storage IO Control is per datastore. If underlying LUN shares spindles with all other LUN, then it may not achieve the result. Consult with storage vendor on this as they have entire array visibility/control. SLA Datastore Cluster VM inputMappingMonitor Define the standard (Storage Driven Profile) Map each VM to each datastore Create another DS if insufficient (either capacity or performance) See next slide for detail For each VM, ask the owner to choose: Capacity (GB) Which Tier they want to buy. Let them decide as they know their own app Define the Datastore profile. Map Cluster to Datastore Cluster
59 SLA: 3 Tier pools of Storage Profile Create 3 Tiers of Storage with Storage DRS This become the type of Storage Pool presented to clusters or VM. Implement VASA so the profiles are automatically presented and compliance check can be performed. Paves for standardisation Choose 1 size for each Tier. Keep it consistent. Choose an easy number (e.g. 1000 vs 800). Tier 3 is also used in non production. Map the ESX Cluster tier to the Datastore Tier. If a VM is on Tier 1 Production cluster, then it will be placed on Tier 1 Datastore, not Tier 2 datastore. The strict mapping reduces the #paths drastically. Example Based on the Large Cloud scenario. Small Cloud will have simpler and smaller design. Snapshot means protected with array-level snapshot for fast restore VMDK larger than 1.5 TB will be provisioned as RDM. RDM will be used sparingly. Virtual-compatibility mode used unless App team said so. Tier 2 and 3 can have large Datastore as replication is done at vSphere layer. Interface will be FC for all. This means storage vMotion can be done with VAAI Consult storage vendor for array specific design. I don’t think the array has Shares & Reservation concept. IOPS Array can’t guarantee or control latency per Tier. TierPriceMin IOPS Max Latency RAIDRPORTOSizeLimitReplicated Method Array Snapshot # VM 14x/GB600010 ms1015 minute1 hour2 TB70%Array levelYes~10 VM. EagerZeroedThick 22x/GB400020 ms102 hour4 hour3 TB80%vSphere levelNo~20 VM. Normal Thick 31x/GB200030 ms104 hour8 hour4 TB80%vSphere levelNo~30 VM. Thin Provision
60 Arrangement within an array Below is a sample diagram, showing disk grouping inside an array. The array has 48 disks. Hot Spare not shown for simplicity This example only has 1 RAID Group (2+2) for simplicity Design consideration Datastore 1 and Datastore 2 performance can impact one another, as they share physical spindles. Each datastore spans across 16 spindles. IOPS is only 2800 max (based on 175 IOPS for a 15K RPM FC disk). Because of RAID, the effective IOPS will be lower. The only way they don’t impact if there are “Share” and “Reservation” concept at “meta slice” level. Datastore 3, 4, 5, 6 performance can impact one another. DS 1 and DS 3 can impact each other since they share the same Controller (or SP). This contention happens if the shared component becomes bottlenect (e.g. cache, RAM, CPU). The only way to prevent is to implement “Share” or “Reservation” at SP level. For Storage IO Control to be effective, it should be applied to all datastores sharing the same physical spindles. So if we enable for Datastore 3, then Datastore 4, 5, 6 should be enabled too. For Storage IO Control to be effective, it should be applied to all datastores sharing the same physical spindles. So if we enable for Datastore 3, then Datastore 4, 5, 6 should be enabled too.
61 Storage IO Control Storage IO Control is at the Datastore level There is no control at RDM level. ??? But array normally share spindles. In the example below, the array has 3 volumes. Each volume is configured the same way. Each has 32 spindles in RAID10 configuration (8 units of 2+2 disk groups). There are non vSphere sharing the same spindles Best practices Unless the array has “Shares” or “Reservation” concept, then avoid sharing spindles between each Storage Profile. Datastore ADatastore B SIOC
62 Storage DRS, Storage IO Control and physical array Array is not aware of VM inside VMFS. It only sees LUN. Moving VM from 1 datastore to another will look like large IO operation to the array. One LUN will decrease in size, while the other one increase drastically. With array capable of auto-tiering: VMware recommends configuring Storage DRS in Manual Mode with I/O metric disabled Use Storage DRS for initial placement and out of space avoidance features Whitepaper on Storage DRS interoperability with Storage Technologies: http://www.vmware.com/resources/techresources/10286http://www.vmware.com/resources/techresources/10286 Feature or ProductInitial PlacementMigration Recommendations Array-based replication (SRDF, MirrorView, SnapMirror, etc ) Supported Moving VM from one datastore to another can cause a temporary lapse in SRM protection (?) and increase size of next replication transfer. Array-based snapshotsSupported Moving VM from one datastore to another can cause increase in space usage in the destination LUN, so the snapshot takes longer. Array-based DedupeSupported Moving VM from one datastore to another can cause temporary increase in space usage, so the dedupe takes longer. Array based thin provisioningSupported Supported on VASA-enabled arrays only [e1: reason??] Array-based auto-tiering (EMC FAST, Compellent Data Progression, etc) SupportedDo not use IO-based balancing. Just use Space-based. Array-based I/O balancing (Dell Equallogic) n/a as it is controlled by the array Do not use IO-based balancing. Just use Space-based.
63 RAID type In this example, I’m using just RAID10. Generally speaking, I see a rather high Write ratio (around 50%). RAID5 will result in higher cost, as it needs more spindle. More spindles gives the impression we have enough Storage. It is difficult to say no to request when you don’t have storage issue. More spindles mean you’re burning the environment more. vCloud Suite introduces additional IOPS outside the guest OS VM boots results in writing the vRAM to disk. Storage vMotion and Storage DRS Snapshot Mixing RAID5 and RAID10 This increases complexity. RAID5 was used for capacity. But nowadays each disk is huge (2 TB). I’d go for mixing SSD and Disk, then mixing RAID type. So it is: SSD RAID 10 for performance & IOPS Disk RAID 10 for capacity I’d for just 2 tier instead of 3. This minimises the movement. Each movement cost both read and write. Sample below is based on 150 IOPS per spindle. Need to achieve 1200 IOPS RAID Level # Disks required. (20% Write) # Disks required. (80% Write) 61640 51327 (nearly 2X of RAID10) 10 14 RAID TypeWrite IO Penalty 54 66 102
64 Cluster Mapping: Host to Datastore Always know which ESX cluster mounts what datastore cluster Keep the diagram simple. Main purpose is to communicate with other team. Not too many info. The idea is to have a mental picture that they can understand If your diagram has too many lines, too many datastores, too many clusters, then it is too complex. Create a Pod when such thing happens. Modularisation makes management and troubleshooting easier.
65 Mapping: Datastore Replication
66 Type of Datastores Types of datastore Business VM Tier 1 VM, Tier 2 VM, Tier 3 VM, Single VM Each Tier may have multiple datastores. IT VM Staging VM From P2V process, or moving from Non-Prod to Prod. Isolated VM Template & ISO Desktop VM Mostly local datastore on ESXi host, backed by SSD. SRM Placeholder Datastore Heartbeat Pro: Dedicated DS so we don’t accidentaly impact while offlining a datastore. Cons: another 2 DS to manage per cluster. Increase scanning time. Can use the SRM placeholder as heartbeat? Always know where a key VM is stored. A Datastore corruption, while rare, is possible. 1 datastore = 1 LUN Relative to “1 LUN = Many VMFS”, it gives better performance due to less SCSI reservation Other guides: Use Thin Provisioning at array level, not ESX level. Separate Production and Non Production. Add a process to migrate into Prod. You can’t guarantee Production performance if VM moves in and out without control. RAID level does not matter so much if Array has sufficient cache (with battery backed, naturally) 20% free capacity for VM swap files, snapshots, logs, thin volume growth, and storage vMotion (inter tier).
67 Special Purpose Datastore 1 low cost Datastores for ISO and Templates Need 1 per vCenter data center. Need 1 per physical Data Center. Else you will transfer GBs of data across WAN. Around 1 TB ISO directory structure: 1 staging/troubleshooting datastore To isolate a VM. Proof to Apps team that datastore is not affected by other VM. For storage performance study or issue. Makes it easier to corelate with data from Array. The underlying spindles should have enough IOPS & Size for the single VM Our sizing: Small Cloud: 1 TB Large Cloud: 1 TB 1 SRM Placeholder datastore So you always know where it is. Sharing with other datastore may confuse others. Used in SRM 5 to place the VMs metadata so it can be seen in vCenter. 10 GB enough. Low performance. \ISO\ \OS\Windows \OS\Linux \Non OS\ store things like anti virus, utilities, etc
68 Storage Capacity Planning Theory and Reality can differ. Theory is the initial, high level planning you do. Reality is what it is after 1-2 years. Theory or Initial Planning For green field deployment, use the Capacity Planner. The info on actual usage is usefull as the utilisation can be low. The IOPS info is good indicator too. For brown field deployment, use the existing VMs as indicator. If you have virtualised 70%, this 70% will be a good indicator as it’s your actual environment You can also use rules of thumb, such as: 100 IOPS per normal VM. 100 IOPS per VM is low. But this is a Concurrent Average. If you have 1000 VM, this will be 100K IOPS. 500 IOPS per database VM 20 GB per C:\ drive (or where you store OS + Apps) 50 GB per data drive for small VM 500 GB per data drive for Database VM 2 TB per data drive for File server Actual or Reality Use tool, such as VC Ops 5.6 for actual measurement. VC Ops 5.6 needs to be configured (read: tailored) to your environment. Create custom groups. For each group, adjust the buffer accordingly. You will need at least 3 groups, 1 per tier. I’d not use spreadsheet or Rules of Thumb for >100 VM environment.
69 Multi-Pathing Different protocol has different technology NFS, iSCSI, FC all have different solution NFS uses single-path for a given datastore. No multi-pathing. So use multiple datastore to spread load In this design, I do not go for high-end array due to cost High-end Array gives Active/Active, so we don’t have to do regular load balancing. Most mid-range is Active-Passive (ALUA). Always ensure the LUNs are balanced among the 2 SP. This is done manually within the array. Choose ALUA array instead of plain Active/Passive Less manual work on the balancing and selecting the optimal path. Both controller can receive IO request/command, although only 1 owns the LUN. Path from the managing controller is the optimized path. Better utilization of the array storage processors (minimize unnecessary SP failover) vSphere will show both path as Active, but the Preferred one is marked “Active IO” Round Robin will issue IO across all optimized paths and will use non-optimized paths only if no optimized paths are available. See http://www.yellow-bricks.com/2009/09/29/whats-that-alua-exactly/http://www.yellow-bricks.com/2009/09/29/whats-that-alua-exactly/ Array TypeMy selection Active/ActiveRound Robin or Fixed ALUARound Robin or MRU Active/PassiveRound Robin or MRU EMCPowerPath/VE 5.4 SP2 Dell EquaLogicEquaLogic MMP HP/HDSPowerPath/VE 5.4 SP2?
70 FC: Multi-pathing VMware recommends 4 paths Path is point to point. The Switch in the middle is not part of the path as far as vSphere is concerned. Ideally, they are all active-active for a given datastore. Fixed means 1 path active, 3 idle. 1 zone per HBA port. The zone should see all the Target ports. If you are buying new SAN Switches, consider the direction for the next 3 years. Whatever you choose will likely be in your data center for the next 5 years. If you are buying a Director-class, then consider for the next 5 years. Upgrading Director is a major work, so plan for 5 years usage. Consider both EOL and EOSL date. Discuss with SAN switches vendors and understand their roadmap. 8 Gb and FCoE are becoming common Round-Robin It is per Datastore, not per HBA. 1 ESX host typically has multiple datastores. 1 Array certainly has multiple datastores. All these datastores share the same SP, Cache, Ports, and possibly spindles. It is active/passive at a given datastore. Leave the default settings of 1000. No need to set iooperationslimit=1 Choose this over MRU. MRU needs manual fail back after path failure.
71 FC: Zoning & Masking Implement zoning Do it before going live, or during quite maintenance window due to high risk potential 1 zone per HBA port. 1 HBA port does not need to know the existence of others. This eliminates the Registered State Change Notification Use soft zoning, not hard zoning Hard zone: zone based on the SAN Switch port. Any HBA connects to this switch port get this zone. So this is more secure. But be careful when recabling things into the SAN switch! Soft zone: zone based on the HBA port. The switch port is irrelevant. Situation that needs rezoning in Soft Zone: Changing HBA, replacing ESX server (which comes with new HBA), upgrading HBA Situation that needs rezoning in Hard Zone: reassigning the ESX to another zone, port failure in the SAN switch. Virtual HBA can further reduce cost and offer more flexibility Implement LUN Masking Complement zoning. Zoning is about path segregation, zone is about access. Do at array level, not ESX level. Mask on the array, not on each ESXi host. Masking done at the ESXi host level is often based on controller, target, and LUN numbers, all of which can change with the hardware configuration
72 FC: Zoning & Masking See the figure, there are 3 zones. Zone A has 1 initiator and 1 target. Single-Initiator zone is good. Zone B has two initiators and targets. This is bad. Zone C has 1 initiator and 1 target Both SAN switches are connected via an Inter-Switch Link. If Host X rebooted and it’s HBA in Zone B logs out of the SAN, an RSCN will be sent to Host Y’s initiator in Zone B and cause all I/O going to that initiator to halt momentarily and recover within seconds. Another RSCN will be sent out to Host Y’s initiator in Zone B when Host X’s HBA logs back in to the SAN and cause another momentary halt in I\O. Initiators in Zone A and Zone C are protected from these events because there are no other initiators in these zones. Most latest SAN switches provide RSCN suppression methods. But suppressing RSCNs is not recommended, since RSCNs are the primary way for initiators to determine an event has occurred and to act on the specified event such as lost of access to targets.
73 Large: Reasons for FC (partial list) Network issue does not create storage issue Troubleshooting storage does not mean troubleshooting network too FC vs IP FC protocol is more efficient & scalable than IP protocol for storage Path failover is <30 seconds, compared with <60 seconds for iSCSI Lower CPU cost See the chart. FC has lowest CPU hit to process the IO, followed by hardware iSCSI Storage vMotion is best served with 10 GE FC consideration Need SAN skills. Troubleshooting skills, not just Install/Configure/Manage. Need to be aware of WWWWW. This can impact upgrade later on as new component may not work with older component
74 Large: Backup with VADP 1 back up job per ESX, so impact to production is minimized.
75 Backup Server A backup server is an "I/O Machine" By far, majority of work done is I/O related Performance of disk is key Fast internal bus is key. Multiple internal buses desirable. No share path. 1 port from ESX (source) and 1 port to tape (target) Lots of data in from clients and out to disk or tape Not much CPU usage. 1 socket 4-core Xeon 5600 is more than sufficient Not much RAM usage. 4 GB is more than enough But Deduplication uses CPU and RAM Deduplication relies on CPU to compare segments (or blocks) of data to determine if they have been previously backed up or if they are unique. This comparison is done in RAM. Consider 32 GB RAM (64 bit Windows) Size the concurrency properly Too many simultaneous backups can actually slow the overall backup speed. Use backup policy to control the number of backups that occur against any datastore. This minimizes that I/O impact on datastore, as it must still serve production usage. 2 ways of back up: Mount the VMDK file as a virtual disk (with a drive letter). Back up software can then browse the directory. Mount the VM as image file.
76 Network Design
77 Methodology Plan how VXLAN and SDN impacts your architecture Define how vShield will complement your VLAN based network Decide if you will use 10 GE or 1 GE I’d go for 10 GE for the Large Cloud example If you use 10 GE, define how you will use Network IO Control Decide if you use IP storage or FC storage Decide the vSwitch to use: local, distributed, Nexus Decide when to use Load Based Teaming Select blade or rack mount This has impact on NIC ports and Switches Define the detailed design with vendor of choice
78 VXLAN Complete isolation of network layer Overlay networks are isolated from each other and the physical network Separation of Virtualization and Network layers Physical network has no knowledge of virtual networks Virtual networks spun up automatically as needed for VDCs Loss of visibility as all overlay traffic is now UDP tunneled Can’t isolate virtual network traffic from physical network Virtual networks can have overlapping address spaces Today’s network management tools useless in VXLAN environments
79 Network Architecture (still VLAN-based, not vCNS-based)
80 ESXi Network configuration
81 Design Consideration Design consideration for 10 GE We only have 2-4 physical port. This means we only have 1-2 vSwitch. Some customers have gone with 4 physical ports as 20 GE may not be enough for both Storage and Network Distributed Switch relies on vCenter Database corruption on vCenter will impact it. vCenter availability is more critical. Use Load Based Teaming This prevents one burst from impacting Production. For example, a large vMotion can send a lot of traffic. Some best practices Enable jumbo frame Disable STP on ESXi-facing ports on the physical switch Enable PortFast mode on ESXi-facing ports Do not use DirectPath IO, unless the app really has proof that it needs it.
82 Network IO Control 2x 10 GE is much preferred to 12x 1 GE 10 GE ports give flexibility. Example, vMotion can exceed 1 GE when physical cable not used by other traffic But a link failure means losing 10 GE External communication can still be 1 GE. Not an issue if most communication is among VM. Use Use ingress traffic shaping to control traffic type into the ESX? Shares Bandwidth (per pNIC) FunctionvShieldRemarks 20%VM – Production VM – Non Production VM – Admin Network VM – Back up LAN (agent) YesA good rule of thumb is ~8 VM’s per Gigabit Admin Network is used for basic network services like DNS server, AD Server. Use vShield App to separate with Production. Complement existing VLAN, no need to create more VLAN The Infra VM is not connected to Production LAN, rather they are connected to Management LAN. 10%Management LAN VMware Management VMware Cluster Heartbeat NoIn some cases, the Nexus Control & Nexus Packet need to be physically separated from Nexus Management. 20%vMotionNoNon routable, private network 15%Fault TolerantNoNon routable, private network 0 – 10%VM – TroubleshootingYesSame with Production. Used when we need to isolate the networking performance 5%Host-Based ReplicationNo?Only for ESXi that is assigned to do vSphere replication. From throughput point of view, if the inter-site link is only 100 Mb, then you only need 0.1 GE max. 20%StorageYes
83 Large: IP Address scheme The example below is based on 1500 server VM and 10000 desktop VM, which is around 125 ESX and 125 ESX respectively. Do we separate the network between Server and Desktop farms? Since we are using the x.x.x.1 address, the basic network address (gateway) will be on x.x.x.254. PurposeIP AddressTotal SegmentsRemarks ESX iLO1 per ESX1Out of band management & console access ESX Mgmt1 per ESX.1 ESX iSCSI1-2 per ESX1Need 2 (1 address per active path) if we don’t use LBT and do static mapping ESX vMotion2 per ESX1Multi-NIC vMotion ESX FT11Cannot multi path? Agent VMs5 per ESX3vShield App, TrendMicro DS, Distributed Storage,etc. Mgmt VMs1 per DC1vCenter, SRM, Update Manager, vCloud, etc. Group in 20 so similar VMs have sequential IP address, easier to remember AddressESXi #001ESXi #125Remarks iLO10.10.10.110.10.10.12510.10.10.x for Server farm. Enough for 254 ESX. 10.10.11.x for Desktop farm 10.10.12.x for non ESX (e.g. network switch, array, etc) Mgmt10.10.13.110.10.13.12510.10.13.x for Server farm. Enough for 254 ESX. 10.10.14.x for Desktop farm iSCSI10.10.15.1 10.10.16.1 10.10.15.125 10.10.16.125 This is for ESX only. Other devices should be on 10.10.17.x VSA will have many addresses when it scales beyond 3 ESX. vMotion10.10.17.1 10.10.18.1 10.10.17.125 10.10.18.125 Fault Tolerance10.10.19.110.10.19.125 Agent VMs10.10.20.1 10.10.21.1 10.10.22.1 10.10.20.125 10.10.21.125 10.10.22.125
84 Security & Compliance Design
85 Areas to consider Source & tools vSphere hardening guide VMware Configuration Manager Other industry requirement like PCI-DSS Take advantage of vCNS Changing the paradigm in security. From “Hypervisor as another point to secure” to “Hypervisor to give unfair advantage for security team”. vShield App for firewall and vShield End Point for anti virus (only Trend Micro has the product as at Sep 2011) Does not need to throw away physical firewall first. Complement it by adding “object-based” rules that follows the VM. VMVM Guest OS vmdk & Prevent DoS Log review VMware Tools Guest OS vmdk & Prevent DoS Log review VMware Tools ServerServer Lockdown mode Firewall SSH Log review Lockdown mode Firewall SSH Log review StorageStorage Zoning and LUN masking VMFS & LUN iSCSI CHAP NFS storage Zoning and LUN masking VMFS & LUN iSCSI CHAP NFS storage NetworkNetwork VLAN & PVLAN Management LAN No air gap with vShield Virtual appliance VLAN & PVLAN Management LAN No air gap with vShield Virtual appliance ManagementManagement vSphere roles Separation of duty vSphere roles Separation of duty
86 Enterprise IT space Separation of Duties with vSphere VMware Admin >< AD Admin In small setup, it’s the same person doing both. AD Admin has access to NTFS. This can be too powerful if it has data Segregate the virtual world Split vSphere access into 3. Storage Server Network Give Network to Network team. Give Storage to Storage team. Role with all access to vSphere should be rarely used. VM owner can be given some access that they don’t have in physical world. They will like the empowerment (self service) vSphere space VMware Admin Networking Admin Server Admin OperatorVM OwnerOperatorVM Owner Storage Admin MS AD Admin Storage Admin Network Admin DBA Apps Admin
87 Folder Properly use it Do not use Resource Pool to organise VM. Caveat: the Host/Cluster view + VM is the only view where you can see both ESX and VM. Study the hierarchy on the right It is Folder everywhere. Folder is the way to limit access. Certain object don’t have its own access control. They rely on folder. E.g. You cannot set permissions directly on a vNetwork Distributed Switches. To set permissions, create a folder on top of it.
88 Storage related access Non-Storage Admin should not have the following access Initiate Storage vMotion Rename or Move Datastore Create Low level file operations Different ways of controlling access Network level. The ESXi will not be able to access the entire array as it can’t even see it on the network Array level. Control which ESXi hosts can or cannot see. For iSCSI, we can configure per target using CHAP For FC, we can use Fibre Channel zoning or LUN masking vCenter level. Using the vCenter permissions (folder level or datastore level). Most granular.
89 Network related access Server Admin should not have the following access Move network This can be a security concern Configure network Remove network Server Admin should have Assign network To assign a network to a VM
90 Roles and Groups Create new groups for vCenter Server users. Avoid using MS AD built-in groups or other existing groups Do not use default user “Administrator” in any operation Each vCenter plug-in should have their own user, so you can differentiate among all the plug-in Disable the default user “Administrator” Use your own personal ID. The idea is security should be trace-able to an individual. Do not create another generic user (e.g. VMware Admin). This defeats the purpose, and is practically no different to “Administrator” Creating a generic user increase risk of sharing, since it has no personal data. Create 3 roles (not user) in MS AD Network Admin Storage Admin Security Admin Create a unique ID for each of the vSphere plug-in that you use SRM, Update Manager, Chargeback, CapacityIQ, vShield Zone, Converter, Nexus, etc E.g. SRM Admin, Chargeback Admin This is the ID that the product will use to login to vCenter. This is not the ID you use to login to this product. Use your personal ID for this purpose. This helps in troubleshooting. Otherwise too many “Administrator” and you are not sure who they _really_ are. Also, if the Administrator password has to change, then you don’t have to change everywhere.
91 VM Design
92 Standard VM sizing: Follow McDonald 1 VM = 1 App = 1 purpose. No bundling of services. Having multiple application or services in 1 OS tend to create more problem. Apps team knows this better. Start with Small size, especially for CPU & RAM. Use as few virtual CPUs (vCPUs) as possible. CPU impact on scheduler, hence performance Hard to take back once you give them. Also, the app might be configured to match the processor (you will not know unless you ask the application team). Maintaining a consistent memory view among multiple vCPUs consumes resources. There is licencing impact if you assign more CPU. vSphere 4.1 multi-core can help (always verify with ISV) Virtual CPUs not used still consumes timer interrupts and execute the idle loops of the guest OS In physical world, CPU tend to be oversized. Right size it in virtual world. RAM RAM starts with 1 GB, not 512 MB. Patch can be large (330 MB for XP SP3) and needs RAM Size impact vMotion, ballooning, etc, so you want to trim the fat Tier 1 Cluster should use Large Page. Anything above XL needs to be discussed case by case. Utilise Hot Add to start small (need DC edition) See speaker notes for more info ItemSmall VMMedium VMLargeCustom CPU1248 – 32 RAM1 GB2 GB4 GB8, 12, 16 GB, etc Disk50 GB100 GB200 GB300, 400, etc GB
93 SMP and UP HAL Does not apply to recent OS such as Windows Vista, Win7, Win2008 Design Principle Going from 1 vCPU to many is ok. Windows XP and Windows Server 2003 automatically upgrade to the ACPI Multiprocessor HAL Going from many to 1 is not ok. To change from 1 vCPU to 2 vCPU Must change the kernel to SMP. "In Windows 2000, you can change to any listed HAL type. However, if you select an incorrect HAL, the computer may not start correctly. Therefore, only compatible HALs are listed in Windows Server 2003 and Windows XP. If you run a multiprocessor HAL with only a single processor installed, the computer typically works as expected, and there is little or no affect on performance. http://support.microsoft.com/default.aspx?scid=kb;EN-US;811366 Step to change: http://support.microsoft.com/kb/237556/http://support.microsoft.com/kb/237556/ To change from many vCPU to 1. Step is simple. But MS recommends reinstall. “In this scenario, an easier solution is to create the image on the ACPI Uniprocessor computer. “ http://kb.vmware.com/kb/1003978 http://support.microsoft.com/kb/309283
94 MS Windows: Standardisation Data Center edition is cheaper on >6 VM per box MS Licensing is complex. Table below may not apply in your case Source: http://www.microsoft.com/windowsserver2008/en/us/hyperv-calculators.aspx per VM. 10 VM means 10 licence per 4 VM. 10 VM means 3 licence per socket. 2 socket means 2 licence. Unlimited VM per box
95 Guest OS Use 64-bit if possible Access to > 3 GB RAM. Performance penalty is generally negligible, or even negative In Linux VM, Highmem could show significant overheads with 32 bit. 64 bit guests can offer better performance. Large memory footprint workloads will benefit more with 64 bit guests Some Microsoft & VMware products have dropped support for 32 bit Increase scalability in VM. Example: for Update Manager 4 If it is installed on 64 bit Windows, it can concurrently scan 4000 VM. But if it’s installed on 32 bit, the concurrency drops to 200 Powered‐on Windows VM scan per VUM server is 72. Most other numbers are not as drastic as the above example. Disable unnecessary device from Guest OS Choose the right SCSI controller Set the right IO Time out On Windows VM, increase the value of the SCSI TimeoutValue parameter to allow Windows to better tolerate delayed I/O resulting from path failover. For Windows VM, stagger anti-virus scan. Performance will degrade significantly if you scan all VM simultaneously
97 vCenter Run vCenter Server as a VM vCenter Server VM best practices: Disable DRS on all vCenter VMs. Move them to first ESXi on your farm. Always remember where you run your vCenter. Remember both the host name and IP address of that first ESXi host. Start in this order: Active Directory DNS vCenter DB vCenter Set HA to high priority Limitations Windows patching of vCenter VM can’t be done via Update Manager Can’t cold clone the VM. Use hot clone instead. VM-level operation that requires the VM to be powered-off, can be done via ESX. Login directly to the ESXi host that has the vCenter VM. Do the changes, then boot the VM. Not connected to Production LAN. Connect to management LAN, so VLAN Trunking required as vSwitches are shared (assuming you are not having dedicated IT Cluster) Security Protect the special-purpose local vSphere administrator account from regular usage. Instead, rely on accounts tied to specific individuals for clearer accountability. Other configuration Keep the Statistic Level at Level 1. But use vCenter Operations to complement. Level 3 is a big jump in terms of data collected
98 Naming convention ObjectStandardExamplesRemarks Data centerPurposeProductionThis is the virtual data center in vCenter. Normally, a physical data centers has 1 or many virtual data center. As you will only have a few of these, no need to create cryptic naming convention. Avoid renaming it. ClusterPurposeAs above. ESXi host nameEsxi_locationcode_##.domain.nameesxi_SGP_01.vmware.com esxi_KUL_01.vmware.com Don’t include version no as it may change. No space. VMProject_Name Purpose ##Intranet WebServer 01Don’t include OS name. Can include space DatastoreEnvironment_type_##PROD_FC_01 TEST_iSCSI_01 DEV_NFS_01 Local_ESXname_01 Type is useful when we have multiple type. If you have 1 type, but multiple vendor, you can use vendor name (EMC, IBM, etc) instead. Prefix all Local so they are separated easily in the dialog boxes. “Admin ID” forProductName-PurposeVCOps-Collector Chargeback- All the various plug-in to vSphere needs Admin access. Folder Avoid special characters as you (or other VMware and 3 rd party products or plug-in) may need to access them programmatically. If you are using VC Ops to manage multiple vCenters, then the naming convention should ensure it’s unique across vCenters.
99 vCenter Server: HA Many vSphere features depend on vCenter Distributed Switch Auto-Deploy HA (management) DRS and vMotion Storage vMotion Licensing Many add-on depends on vCenter vShield vCenter SRM VCM vCenter Operations vCenter Chargeback vCloud Director View + Composer Implement vCenter Heartbeat Automated recovery from hardware, OS, application, network Awareness of all vCenter Server components Only solution fully supported by VMware Can protect database (SQL Server) and vCenter plug-ins View Composer, Update Manager, Converter, Orchestrator 99
100 vMA: Centralised Logging Benefits Ability to search across ESX convenience Best practices One vMA per 100 hosts with vilogger Place vMA on management LAN Use static IP address, FQDN and DNS Limit use of resxtop (used for real time troubleshooting not monitoring) Enable remote system logging for targets vilogger (enable/disable/updatepolicy/list) Rotation default is 5 Maxfiles defaults to 5MB Collection period is 10 seconds ESX/ESXi log files go to /var/log/vmware/ vxpa logs are not sent to syslog See KB1017658