Presentation on theme: "VCAP-DCD, TOGAF Certified"— Presentation transcript:
1 VCAP-DCD, TOGAF Certified Software Defined Datacenter Sample Architectures based on vCloud Suite 5.1 Singapore, Q2 2013VCAP-DCD, TOGAF CertifiedIwan ‘e1’ RahabokLinkedin.com/in/e1ang/ | tinyurl.com/SGP-User-GroupM: |This is a sample architecture, not The architecture. There is no 1 size fits all and it all depends.
2 Purpose of This Document Use it like a book, not slideThere is a lot of talk about private cloud. But how does it look like at technical level?How do we really assure SLA, and have 3 Tier of service?If I’m a small company with just 50 servers, what does my architecture look like? If I have 2000 VM, how does it look like?For existing VMware customers, I go around and do a lot of “health check” at customers site.The No 1 question is around design best practice. So this doc serves as quick reference for me. I can pull a slide from here for discussion.I am employee of VMware. But this is my personal opinion.Please don’t take it as official and formal VMware recommendation. I’m not authorised to do so.Also, generally we should judge the content, rather than the organisation/person behind the content. A technical fact is a technical fact, regardless whether an intern said it or 50-year IT engineer said it.Technology changes10 Gb ethernet, Flash, SSD disk, FCoE, Converged Infrastructure, SDN, NSX, storage virtualisation, etc will impact the design. A lot ot new innovation coming within next 2 years, and some already featured in VMworldNew modules/products from VMware’s Ecosystem Partners will also impact the design.This is a guideNot a Reference Architecture, let alone a Detailed Blueprint.Don’t print and follows to the dot. This is for you to think and tailor.It is written for hands-on vSphere Admin who have attended Design Workshop & ICMA lot of the design consideration is covered in vSphere Design Workshop.It complements vCAT 3.0You should be at least a VCP 5, preferably VCAP-DCD 5No explanation on features. Sorry, it’s already >100 slides.With that, let’s have a professional* discussion* Not emotional & religious & political discussion Let’s not get angry over technical stuff. Not worth your health.Folks, some disclaimers:This decks builds from the vSphere Design Workshop. I’ve attended the course and highly recommend it.
3 Table of ContentsIntroductionArchitecting in vSphereApplication with special considerationRequirements & AssumptionsDesign SummaryvSphere Design: DatacenterDatacenter, Cluster (DRS, HA, DPM, Resource Pool)vSphere Design: ServerESXi, physical hostvSphere Design: NetworkvSphere Design: StoragevSphere Design: Security & CompliancevCenter roles/permission, config managementvSphere Design: VMvSphere Design: ManagementSee this deck:Disaster RecoverySee this deck:Additional Info: me for Appendix slideRefer to Module 1 and Module 2 of vSphere Design Workshop. I’m going straight into more technical material here.Topic only covers items that have major design impact.Non-design items are not covered.This focuses on vCloud Suite 5.1 (infrastructure).Application specific (e.g. Database) is not covered.Some slides have speaker notes for details.Specifically, the focus here is the vSphere layer, not the vCloud Director layer. It is important to get the foundation right. For vCD, I recommend vCAT
5 vCloud Suite Architecture: what do I consider Architecturing a vSphere-based DC is very different to physical Data CenterIt breaks best practice, as virtualisation is a disruptive technology and it changes paradigm. Do not apply physical-world paradigm into virtual-world. There are many “best practices” in physical world that are caused by physical world limitation. Once the limitation is removed, the best practice is no more valid.Adopt emerging technology as virtualisation is still innovating rapidly. Best practice means proven practice, and that might mean outdated practice.Consider unrequested requirements as business expect cloud to be agile. You have experienced VM sprawl right My personal principle: Do not design something you cannot troubleshoot.A good IT Architect does not setup potential risk for Support Person down the line.I tend to keep things simple and modular. Cost will go up a bit, but it is worth the benefits.What I consider in vSphere based architectureNo 1: UpgradabilityThis is unique in the virtual world. A key component of cloud that people have not talked much.After all my apps run on virtual infrastructure, how do I upgrade the virtualisation layer itself? How do you upgrade SRM?Based on historical data, VMware releases major upgrade every 2 years.Your architecture will likely span 3 years, so check with your VMware rep for NDA roadmap presentationNo 2: Debug-abilityTroubleshooting in virtual environment is harder than physical, as boundary is blurred and physical resources are shared.3 types of troubleshooting:Configuration. This does not normally happen in production, as once it is configured, it is not normally changed.Stability. Stability means something hang or crash (BSOD, PSOD, etc) or corruptedPerformance. This is the hardest among the 3, especially if the slow performance is short lived and in most cases it is performing well.This is why the design has extra server and storage, so we can isolate some VM while doing joint troubleshooting with App team.SupportabilityThis is related, but not the same with Debug-ability.Support relates to things that make day to day support easier. Monitoring counters, reading logs, setting up alerts, etc. For example, centralising the log via syslog and providing intelligent search improves SupportabilityA good design makes it harder for Support team to make human error. Virtualisation makes task easy, sometimes way too easy relative to physical world. Consider this operational/phychological impact in your design.Physical world thinking or best practiceA lot of them do not make sense in virtual world.Architecting a virtual DC based on physical DC school of thought will result in inferior design.Supportability also means using components that are supported by the vendors. But this should be obvious as we should not deploy unsupported configuration. For example, SAP support is from certain versions onward.vSphere 4.0 was released on May 2009, 5.0 was Sep 2011.
6 vCloud Suite Architecture: what do I consider ConsiderationCostYou will notice that the “Small” Design example has a lot more limitations than the “Large” Design.An even bigger cost is ISV. Some, like Oracle, charges for the entire Cluster. Dedicating cluster for them is cheaper.DR Site serves 3 purposes to reduce cost.VMs from different Business Units are mixed in 1 cluster. If they can share same Production LAN and SAN, same reason can apply to hypervisor.Window, Linux and Solaris VMs are mixed in 1 cluster. In large environment, separate them to maximise your OS license.DMZ and non DMZ are mixed in 1 cluster.Security & CompliancevSphere Security Hardening Guide split security into 3 levels: Production, DMZ and SSLFProd and Non-Prod don’t share the same cluster, storage, networkEasy to make mistake. Easy to move in and out of Production environment.Production is more controlled and secureNon-Prod may spike (e.g. doing load testing).AvailabilitySoftware has Bugs. Hardware has Fault. We cater for hardware fault mostly. What about software bugs?I try to cater for software bug, which is why the design has 2 VMware clusters with 2 vCenter. This lets you test cluster-related features in one cluster, while keeping your critical VM on another cluster.Cluster is always based on 1 host failure. In small cluster, the overhead can be high (50% in a 2-node cluster)ReliabilityRelated to availabity, but not the same. Availability is normally achieved by redundancy. Reliability is normally achieved by keeping things simple, using proven components, separating things, standardising.For example, solution for Small Design is simpler (a lot less features relative to Large Design). It also uses 1 vSwitch for 1 purpose, as opposed to a big vSwitch with many port groups and complex NIC fail-over policy.You will notice a lot of standardisation in all 3 examples. The drawback of standardisation is overhead, as we have to round up to the next bracket. A VM with 24 GB RAM ends up getting 30 GB.Specialized security limited functionality-level (SSLF) recommendations are applicable tospecialized environments that have some unique aspect that makes them especially vulnerableto sophisticated attacks. Recommendations at this level might result in loss of functionality. For example, VMware recommends separate vSwitches for management and data (mgmt vSwitch will handle vMotion, Mgmt, Heart beat, and IP Storage).
7 vCloud Suite Architecture: what do I consider ConsiderationPerformance (1 and Many)2 types:How fast can we do 1 transaction? Latency, clock speed matters here.How many transactions can we do within SLA? Throughput and scalability matters here.Storage, Network, VMkernel, VMM, Guest OS, etc are considered.We are aiming for <1% CPU Ready Time and near 0 Memory Ballooning in Tier 1. In Tier 3, we can and should have higher ready time and some ballooning, so long it still meet SLA.Some technique to address: add ESX, add cluster, add spindles, etc.Includes both horizontal and vertical. Includes both hardware and software.Skills of IT teamEspecially the SAN vs NAS skill. This is more important than the protocol itself.Skills include both internal and external (preferred vendor who complement the IT team)In Small/Medium environment, it is impossible to be expert on all areas. Consider complementing the internal team by establishing long term partnership with an IT vendor. Having a vendor/vendi relationship saves cost initially, but in the long run there is a cost.Existing environmentHow does the new component fit into existing environment? E.g. adding a new Brand A server into a data center full of Brand B servers need to take into account management and compatibility with common components.Most customers do not have a separate network for DR test. Another word, they test their DR in production network.ImprovementBeside meeting current requirements, can we improve things?Almost all companies need to have more servers, especially in non production. So when virtualisation happens, we have this VM Sprawl. As such, the design have head room.Moving toward “1 VM 1 OS 1 App”. In physical, some physical servers may serve multiple purpose. In virtual, they can afford, and should do so, to run 1 App per VM.vSphere Essential is not used as it can’t be scaled to higher version or more ESXi host. In some cases, it is a viable choice.
8 First Thing First: the applications Your cloud’s purpose is to run apps. We must know what type of VMs we are running. They impact the design or operation.Type of VMImpact on DesignMicrosoft NLB (network load balancer)Typical apps: IIS, VPN, ISAVMware recommends Multicast. Need to have its own port group. This port group needs to have Forged Transmit (as it will change the MAC address)MSCSConsider Symantec VCS instead as it has no restrictions on the right.Need FC. iSCSI, NFS, FCoE is not supported. Also, the array must explicitly certify on vSphere.Need Anti-Affinity Rule (Host to VM mapping, not VM-VM as VMware HA does not obey VM-VM affinity rule). As such, need 4 node in a cluster.Need RAM to be 100% reserves. Impact HA Slot Size if you use default settings.Disk has to be eagerzerothick, so it’s a full size. Thin Provisioning at Array will not help as we zeroed all the disk.Need 2 extra NIC ports per ESX for heart beat.Need RDM disk with Physical-Compatibility mode. So VM can’t be cloned or converted to template.vMotion is not supported as at vSphere 5. This is not due to physical RDMImpact on ESX upgrade as ESX version must be the same.With native multipathing (NMP), the path policy can’t be round robinIt uses Microsoft NLB.Impact SRM 5. It works, but needs scripting. Preferably same IP, so create stretched VLAN if possibleMicrosoft ExchangeIf you need CCR (clustered continuous replication), then you need MSCSOracle SoftwaresOracle charges per cluster (or subcluster, if you configure host-VM affinity)I’m not 100% sure if Oracle still charge per cluster if we do not configure automatic vMotion (so just Active/Passive HA, just like physical world) for the VM (set DRS to manual for this VM). Looks like it they will charge per host in this case, basing on their document dated 13 July But interpretation from Gartner is Oracle charges for the entire cluster.App that is licenced per clusterSimilar to Oracle. I’m not aware of any other appsApp that are not supportedWhile ISV support Vmware in general, they may only support for certain version. SAP, for example, only support from SAP NetWeaver 2004 (SAP Kernel 6.40) and only on Windows and Linux 64-bit (not on Solaris, for example)Unicast mode works seamlessly with all routers and Layer 2 switches. However, this mode induces switch flooding, a condition in which all switch ports are flooded with Network Load Balancing traffic, even ports to which servers not involved in Network Load Balancing are attached. Since all hosts in the cluster have the same IP Address and the same MAC Address, there is no inter-host communication possible between the hosts configured in Unicast mode therefore a second NIC needed for other host communication. UNICAST requires you to modify the vSwitches in an ugly way. The switch looks at the source MAC address in the Ethernet frame header in order to learn which MAC addresses are associated with its ports. NLB creates a bogus MAC address and assigns that bogus MAC address to each server in the NLB cluster. NLB assigns each NLB server a different bogus MAC address based on the host ID of the member. This address appears in the Ethernet frame header.In addition to an initial MAC address, each virtual adapter has an effective MAC address. Theeffective MAC address filters out incoming network traffic with a destination MAC addressdifferent from the effective MAC address. A virtual adapter’s effective MAC address and initialMAC address are the same when they are created. But the VM’s operating system mightalter the effective MAC address to another value at any time. If the VM operating systemchanges the MAC address, the operating system can send frames with an impersonated source MACaddress at any time. This allows an operating system to stage malicious attacks on the devices in anetwork by impersonating a network adapter authorized by the receiving network.System administrators can use virtual switch security profiles on ESX/ESXiSRM and MSCSMSCS needs 2 network.MSCS needs RDMMSCS needsManual is silent whether Physical RDM can be vMotion. Looks like it is possible, else manual would have said so in the RDM Limitation section.No partition mapping – RDM requires the mapped device to be a whole LUN. Mapping to a partition is not supported.RDM uses a SCSI serial number to identify the mapped device. Because block devices and some direct-attach RAID devices do not export serial numbers, they cannot be used with RDMs.From Oracle document atHard partitioning physically segments a server, by taking a single large server and separating it into distinct smaller systems. Each separated system acts as a physically independent, self-contained server, typically with its own CPUs, operating system, separate boot area, memory, input/output subsystem and network resources.Examples of such partitioning type include: Dynamic System Domains (DSD) -- enabled by Dynamic Reconfiguration (DR), Solaris 10 Containers (capped Containers only), LPAR (adds DLPAR with AIX 5.2), Micro-Partitions (capped partitions only), vPar, nPar, Integrity VM (capped partitions only), Secure Resource Partitions (capped partitions only), Static Hard Partitioning, Fujitsu’s PPAR. Oracle VM can also be used as hard partitioning technology only as described in the following documentMy analysis:- Oracle does not acknowledge vSphere as hard partitioning. Their logic is the VM can use any of the physical cores. The fact that the VM will only use that’s configured for the VM does not matter.- The document, dated 13 July 2010, does not even mention that VMware has “Cluster” concept. Most VMware is deployed using VMware HA or DRS cluster, not just a single ESXi host. In a single ESX host, Oracle charges for the entire host. But what about the entire Cluster? The document does not mention it. Basing on the fact that the VM can’t move to another host, then by definition this should be hard partitioned. Always get a legally binding statement from your vendor (not an from Account Manager) on this.
9 VMs with additional consideration Type of VMImpact on DesignPeer Applications(Apps that scale horizontally. Example: Web Servers, App ServersThey need to exist on different ESX host in a cluster.So need to setup the Anti-Affinity Rule.You need to configure this per Peer. So if you have 5 set of Web servers from 5 different system (so 5 pair, 10 VM), you need to create 5 Anti-Affinity rule. Too many rules will create complexity, more so when #nodes is less than 4Pair applications(Apps that protect each other for HA. Example: AD, DHCP Server)As aboveSecurity VM or network packet capture toolNeed to create another port group to separate VMs being monitored and not. Need to use Distributed vSwitch to turn on port mirroring or netflow.App that depends on MAC address for licenceNeed to have its own port group. May need to have MAC Address Change set to Yes.App that holds sensitive dataShould encrypt the data or the entire file system. vSphere 5 can’t encrypt the vmdk file yet. If you encrypt the Guest OS, back up product may not be able to do file-level back up.Should ensure no access by MS AD Group Administrator.Find out how it is back up, and who has access to the tape.If IT does not even have access to the system, then vSphere may not pass the audit requirement.Check partner products like Intel TXT and HytrustFault Tolerance requirementsImpact HA Slot Size (if we use this one) as it uses full reservation.Impact Resource Pool, make sure we cater for the VM overhead (small)App on Fault Tolerance hardwareFT is still limited to 1 core. Consider Stratus to complement vSphere 5Certain apps do not support NAS. Example is MS Exchange 2010 does not support the following:NAS Storage for Exchange files (mailbox database, HT queue, logs)Thin virtual disksVirtual machine snapshots (what about backups?)MS TechNet – Understanding Exchange 2010 Virtualization: (http://technet.microsoft.com/en-us/library/jj126252)
10 VMs with additional consideration Type of VMImpact on DesignApp that require hardware dongleDongle must be attached to 1 ESX. vSphere 4.1 adds this support. Best to use network dongle.In the DR site, the same dongle must be provided too.App with high IOPSNeed to size properly.No point having dedicated datastores if the underlying spindles are shared among multiple datastores.Apps that uses very large block sizeSharePoint uses 256 KB block size. So a mere 400 IOPS will saturate the GE link already. For such application, FC or FCoE will be a better protocol.Any application with 1 MB block size can easily saturate 1 GE link.App with very large RAM (>64 GB)This will impact DRS when a HA event occurs as it needs to have a host that house the VM. It will still boot so long reservation is not set to a high number.App that needs Jumbo FrameThis must be configured end to end (guest OS, port group, vSwitch, physical switch). Not all support 9000, so do a ping test and find the value.App with >95% CPU utilisation in the physical world and have high run queueFind out first why it is so high. We should not virtualise app that we are blind on its performance characteristic.App that is very sensitive to time accuracyTime drift is a possibility in virtual world.Find out business or technical impact if time deviates by 10 seconds.A group of apps with complex power on sequence and dependancy.Need to be aware of impact on application if during HA event. If 1 VM is shutdown by HA and then power on, the other VMs in the chain may need restart too. This should discussed with App OwnerApp that takes advantages of specific CPU Instruction SetMixing with older CPU Architecture is not possible. This is a small problem if you are buying new server.EVC will not help, as it’s only a mask. See speaker notesApp that need < 0.01 ms end to end latencySeparate cluster as the tuning is not suitable for “normal” cluster.An ill-behaved application is one that does not use CPU-vendor-recommended methods of detecting features supported on a CPU. The recommended method is to run the CPUID instruction and look for the correct feature bits for the capabilities the application is expected to use. Unsupported methods used by ill-behaved applications include try-catch-fail or inferring the features present from the CPU version information. When unsupported methods are used, an application might detect features on a host in an EVC cluster that are being masked from the VMs. The CPUID-masking MSRs provided by CPU vendors do not disable the actual features. Therefore, an application can still use masked features. If a VM running such an application is then migrated with VMotion to a host that does not physically support those features, the application might fail. VMware is not aware of any commercially-available ill-behaved applications. See KB (http://kb.vmware.com/kb/ ) for details.From virtualgeek:With small block I/O (like 8K) – this is 12,500 IOPs – or put differently, roughly the performance of 70 15K spindles. But, on the other end, if you have a Sharepoint VM (or are doing a guest-level backup) – they tend to do IO sizes of 256K or larger. With 256K IO sizes, that’s 390 IOPs – or the performance of roughly 2 15K spindles – and likely not enough.This white paper summarizes our findings and recommends best practices to tune the different layers of an application’s environment for similar latency-sensitive workloads. By latency-sensitive, we mean workloads that are looking at optimizing for a few microseconds to a few tens of microseconds end-to-end latencies; we don’t mean workloads in the hundreds of microseconds to tens of milliseconds end-to-end-latencies. In fact, many of the recommendations in this paper that can help with the microsecond level latency can actually end up hurting the performance of applications that are tolerant of higher latency
11 This entire deck does not cover Mission Critical Applications The deck focus on designing a generic platform for most applications. In the 80/20 concept, it focuses on the easiest 80.Special apps have unique requirements. They differ in the following areas:Size is much larger. So the S, M, L size for VM or ESXi host does not apply to them.VM has unique propertiesThey might get dedicated clusterPicture on the right shows a VM with 12 vCPU, 160 GB vRAM, 3 SCSI controllers, usage of PVSCSI, 18 vDisks and 2 vNICs. This is an exceptional case.There are basically 2 overall architecture in vCloud Suite 5.1:One for the easiest 80%One for the hardest 20%The management cluster described later will still apply to both architecture.Mission Critical Applications (MCA) are of different nature. They need to be handled one by one. That means per instance. If there are 5 MCA, and all of them require Oracle 11g, we need to look at it 5x, not 1x. Yes, it’s the same Oracle 11g, but we need to look at each instance as they may have different pattern.Below is an example of things to consider on a database. As you can see, there are a lot of things.Things to consider on MS SQL Server:SLAs, RPOs, RTOsBaseline current workload, at least 1 business cycleBaseline existing (workload) vSphere implementationEstimated growth ratesI/O requirements (I/O per sec, throughput, latency)Storage (Disk type/speed, RAID, flash cache solution, etc)Software versions (vSphere, Windows, SQL)Product KeysLicensing (may determine architecture)Workload type (OLTP, Batch, Warehouse)Accounts needed for installation / service accountsHigh Availability strategyBackup & Recovery strategySource: Ntirety presentation at VMworld Title is Virtualizing SQL Server 2012: Doing IT Right
12 3 Sizes: AssumptionsAssumptions are needed to avoid the infamous “It depends…” answer.The architecture for 50 VM differs with that for 500 VM, which in turn differs with that for 5000 VM. It is the same vSphere, but you design it very differently.A design for large VM (20 vCPU, 200 GB vRAM) differs with a design for small VM (1 vCPU, 1 GB)Workload for SME is smaller for Large Enterprise. Exchange handling 100 staff vs staff results in different architectureA design for Server farm differs to Desktop farm.I provide 3 sizes in this document: 50, 500, 1500 VMThe table below shows the definitionI try to make it as real as possible for each choice. 3 sizes give you choice and shows reasoning used.Take the closest size to your needs, then tailor it to the specific customer (not project). Do not tailor to project as it is a subset to entire data center. Always architect for entire datacenter, not a subset.Size means size of entire company or branch, not size of Phase 1 of the journeyA large company starting small should not use the “Small” option below; it should the “Large” option but reduce the # ESX.I believe in “begin with the end in mind”, projecting around 2 years. Longer than 3 years is rather hazy as private cloud is not fully matured yet. I expect major innovation until 2015.A large company can use the Small Cloud example for their remote office. But this needs further tailoring.VSA & ROBOSmall VDCMedium VDCLarge VDCCompanySmall Company orRemote BranchMediumLargeIT Staff1-2 person doing everything4 person doing infra2 person doing desktop10 person doing appsDifferent teams for each.Matrix reporting.Lots of politics & little kingdomsData Center1 or none (hosted) or just a corner rack2, but no array-level replication2 with private connectivity.5 satelite DCVDC = Virtual Datacenter = Software-Defined Datacenter = Private Cloud.I’m trying to avoid the word Cloud as the Asia CIO of a regional bank, and a man I respect, told me cloud is something in the sky. What he has is virtual datacenter. And yes, it is based on vCloud Suite Enterprise.
13 Assumptions, Requirements, Constraints for our Architecture Small VDCMedium VDCLarge VDC# Servers currently25 servers.All are production~150 servers70% is production700 servers55% is production# Servers in 2 yearsProd: 30 serversNon Prod: 15 servers (50%)Prod: 250 serversNon Prod: 250 servers (100%)Prod: 500 serversNon Prod: 1000 servers (200%)# Server VM that our design needs to cater505001500# View VM or Laptop500. With remote access.No need for offline VDI.5000. With remote access.Need offline VDI15000.With remote access + 2 FADR RequirementsYesStorage expertiseMinimal. Also keeping cost low by using IP Storage.No SAN.Yes. RDM will be used as some DB may be large.DMZ Zone / SSLF ZoneYes/NoYes/No. Intranet also zonedBack upDiskTapeNetwork standardNo standardCiscoITIL ComplianceNot applicableA few are in placeSome are in placeChange ManagementMostly not in placeOverall System Mgmt SW (BMC, CA, etc)NoNeeds to have toolsConfiguration ManagementOracle RACAudit TeamExternalExternal & InternalCapacity PlanningOracle softwares (BEA, DB, etc)The above is an example of the Assumptions, Requirements and Constraint that dictates our Architecture. It’s nice to know they form the first 3 letters of Architecture. It’s a reminder for us as Architect to get them right if we have to have the right architecture at the end.Oracle RAC is now supported on VMware. However, have a good understanding on how to troubleshoot before you virtualise something so critical (I’m assuming you deploy RAC as clustering is not acceptable).
14 3 Sizes: Design SummaryThe table below provides the overall comparison, so you can easily compare what was taken out in the Small or Medium design.Just like any other design, there is no 1 perfect answer. Example: you may use FC or iSCSI for Small.This assumes 100% virtualised. It is easier to have 1 platform than 2.Certain things in company, you should only have 1 ( , directory, office suite, back up). Something as big as a “platform” should be standardised. That’s why they are called platform.Design for Medium will be in between Small and Large.SmallLarge# FT VM0 – 3 (in Prod Cluster only)0 – 6 (Prod Cluster only)VMware productsvSphere StandardSRM StandardvCloud Security & NetworkingHorizon View EnterprisevCenter Operations StandardvSphere Storage AppliancevCloud Suite EnterprisevCenter Server StandardvCenter Server HeartbeatHorizon SuiteVMware certification & Skill1 VCP1 VCAP DCA, 1 VCAP DCDVMware Mission Critical SupportStorageiSCSIvSphere replicationFC + iSCSI, with snapshotvSphere + Array replicationServer2x Xeon 5650, 72 GB RAM2x Xeon (8-10 core/socket), 128 GB RAMBack upVMware Data Protection to Array 2VADP + 3rd party to TapeWhy not FC for Small/Medium Cloud?For most virtualization environments, NFS and iSCSI provide suitable I/O performance. The comparison has been the subject of many papers and projects. One posted on VMTN islocated at:The general conclusion reached by the above paper is that for most workloads, the performance is similarwith a slight increase in ESX Server CPU overhead per transaction for NFS and a bit more for software iSCSI.For most virtualization environments, the end user might not even be able to detect the performance deltafrom one VM running on IP based storage vs. another on FC storage.
15 Other Design possibilities What if you need to architect for larger environment?Take the Large Cloud sample as starting point. It can be scaled to 10,000 VM.Above 1000 VM, you should consider a Pod approach.Upsize it by:Adding larger ESXi Host. I’m using an 8-core socket, based on Xeon You should use 10-core Xeon 7500 to fit larger VM. Take note of cost.Adding more ESX in the existing cluster. Keep it maximum 10 nodes per cluster.Adding more cluster. For example, you can have multiple Tier 1 Clusters.Adding Fault Tolerant Hardware from Stratus. Make this Stratus server as a member of the Tier 1 Cluster. It appears as 1 ESX, although there are 2 physical hardware. Stratus has its own hardware, so ensure the consistency in your cluster design.Split the IT Datastore into multiple. Group by function or criticality.If you are using Blade server and have filled 2 chassis, put the IT Cluster outside the blade and use rack mount. Separating the Blade and the server managing it minimise chance of human error as we avoid the “Managing Itself” complexity.Migrating inter clustervSphere 5.1 supports live migration between cluster that don’t have common datastore.I don’t advocate live migration from/to Production Envi. It should be part of Change Control.The Large Cloud is not yet architected for vCloud DirectorvCloud Director has its own best practices for vSphere design.Adding vCloud + SRM on DR site requires proper design by itself. And this deck is already 100+ slides….iSCSIPerformance is relatively similar. But iSCSI can have “multi-pathing” and have lower CPUSome servers, like HP Blade, have built-in hardware iSCSI initiatorsSome backup/DR solution can be achieved cheaply on iSCSI vs FC ==> low cost DR via iSCSI
16 Design Methodology Application The steps are more like this VM Architecting a Private Cloud is not a sequential processThere are 8 components.Application is driving infrastructure.The components are inter-linked. Like a mash.In >1000 VM category, where it takes >2 years to virtualise >1000 VM, new vSphere will change the design.Even the Bigger Picture is not sequentialSometimes, you may even have to leave Design and go back to Requirements or Budgetting.There is no perfect answer. Below is one example.This entire document is about Design only. Operation is another big space.I have not taken into account Audit, Change Control, ITIL, etc.VMServerStorageNetworkSecurityData CenterMgmtDRThe steps are more like this Application
17 Data Center, DR, Cluster, Resource Pool Data Center DesignData Center, DR, Cluster, Resource Pool
18 Just what is a software-defined datacenter anyway? Virtual DatacenterPhysical Datacenter 1Physical Datacenter 2Physical Compute FunctionPhysical Compute FunctionCompute Vendor 1Compute Vendor 2Physical Network FunctionNetwork Vendor 1Network Vendor 2Physical Storage FunctionStorage Vendor 1Storage Vendor 2Shared Nothing Architecture.Not stretched between 2 physical DC.Production might be x.x. DR might be x.xNo replication between 2 physical DC.Production might be FC. DR might be iSCSI.No stretched cluster between 2 physical DC.Each site has its own vCenter.Compute Vendor 1Compute Vendor 2Physical Network FunctionA software-defined datacenter, or virtual DC, differs radically to the physical datacenter that we know. The fundamental difference is we no longer do the architecture in the physical layer. The physical layer is just there to provide resource. These resources are not aware of one another. The intelligence is in the software, which defines the whole datacenter.Datacenter consists of 3 functions:Compute (normally called server. I don’t use Server as with Converged Infrastructure a “server” does storage too)NetworkThere are 2 sub-function here: core network (e.g. routing) and network services (e.g. firewall, DHCP, LB, IDS)StorageEach of these Physical Function is supported, or shall I say instantiated into physical world, by the respective hardware vendors. For server, you might have Nutanix, HP, IBM, Dell, etc that you trust and know. I draw 2 vendors to show the message that they do not define the architecture. They are there to support the Function of that layer (e.g. Compute Function). So you can have 10 clusters, and 3 could be Vendor A and 7 could be Vendor B.The same approach is then implemented in the Physical Datacenter 2, but without the mindset that they have to be the same vendor. Take the Storage Function for example. You might have Vendor A on Production, and Vendor B on Non-Production. You are no longer bound by hardware compatibility (e.g. storage replication normally require same model, same protocol). You can do this as the physical datacenters are completely independent of each other. <click> They are not connected and stretched. You might decide to keep the same vendor, but that’s for a different reason As you can see here, there are very fundamental differences:Storage is not replicated.Network is not stretched.Compute is not stretched.The Shared-Nothing Architecture, operationally speaking, is the only architecture that will guarantee a failure on 1 DC does not propagate to the other DC. In a large datacenter with >1000s VMs, where there are different people working on different part of the datacenter, we need to avoid disaster due to human error. Here is a good read on the Network component:So how do we achieve DR then? Well, this is where the Software comes in.This is a virtual datacenter, so all servers are VM. VM is entirely defined in software. In fact, most VMs file are stored in 1 folder. This folder can be replicated, then boot on another physical datacenter.vSphere 5.1 has built-in host-based replication. It can replicate individual VMs, and provides finer granularity than LUN-based replication. Replication can be done independantly of storage protocol (FC, iSCSI, NFS) and vmdk type (thick, thin).As at vSphere 5.1, here are the main limitations that you need to be aware of:Nicira is not yet integrated into vSphere as at Q Hence the true software-defined networking cannot be achieved. In the mean time, use solution such as vCNS, F5 to isolate & create virtual network from the physical network.vSphere Replication has RPO of 15 minutes. So if you need real time, you need array based or Active/Active Application.Network Vendor 1Network Vendor 2Physical Storage FunctionStorage Vendor 1Storage Vendor 2
19 2-distinct Layer: Consumer and Producer 2 distinct layersSupporting the principle of Consumer and Producer.VM is Consumer. Does not care about underlying technology. Its sole purpose is to run the application.DC Infra is Producer. Provide common services.VM is freed from (or independent of) underlying technology. These technology can change without impacting VM:Storage protocol (iSCSI, NFS, FC, FCoE)Storage file system (NFS, VMFS, VVOL)Storage multi-pathing (VMware, EMC, etc)Storage replicationNetwork teamingThe Datacenter OS provides a lot of services, such as:Security: Firewall, IDS, IPS, Virtual PatchingNetworking: LB, NATAvailability: backup, cluster, HA, DR, FTManagement & MonitoringA lot of agents are removed from VM, resulting in simpler server.Separation & Abstraction(done by the Hypervisor or DC OS)DC ServicesDC ServicesDatacenterTechnologiesDatacenterImplementationThe following agents are removed from VM, and is provided by the Datacenter OS:Management (Performance, Configuration, etc)Anti VirusBackup (in most cases)Clustering (in most cases)The added benefit is security. For example, by not allowing the VM to see the NFS server or the iSCSI server, we are not exposing the storage network to VM. From overall datacenter point of view, the VM is not trusted. There are 1000s VMs running in the datacenter, and we cannot guarantee that they are secured, as we don’t even have access to the VM. So we need to hide as much as possible from a VM. This is done by removing as many things from inside the VM.
20 Large: A closer look at Active/Active Datacenter 250 Prod VMsProd Clusters500 Test/Dev VMsT/D ClustersvCenterLots of traffic between:Prod to ProdT/D to T/D500 Prod VMsProd ClustersvCenter1000 Test/Dev VMsT/D ClustersWhich one is simpler? Active/Passive is by far simpler.Which one takes a lot less inter-site bandwidth? A/A takes a lot more WAN bandwidth.Which one gives bigger pool to be shared? Active/Passive gives a lot bigger room to play.With Active/Active, both vCenters become Production too. Where do you test your vCenter for patch/upgrade? vCenter is a complex & large component, not merely a passive management monitoring tool. It is an application, not just a tool.There is another challenge with this so-called Active/Active. There network packets from outside these 2 datacenters need to come from 1 of them. These are the options:1. Stretched VLAN via Metro EthernetTraffic ingresses and egresses one data center.Does not utilize WAN links from both data centers2. VM Mobility using Cisco LISPTraffic ingresses and egresses at local data center. But this is not a common solution.Requires two Nexus 7ks which is cost prohibitive3. Stretched VLAN with FHRP IsolationTraffic ingresses one data center and egresses local data centerUse Vlan ACL to block HSRP traffic between data centersUse any Cisco switch that is capable of using Vlan ACLs (i.e., Cisco 2960, 3560, etc.) at either data centerVLAN ACL is applied inline on Layer 2 Metro Ethernet linkSource: VMworld 2012 presentation. Title is Deploying an Active/Active Datacenter with SRM 5. Speakers are Michael Bailess, American National Bank and Joe Kelly, Varrow
21 Large: Adding Active/Active to a mostly Active/Passive vSphere vCentervCenter500 Prod VMs1000 Test/Dev VMsProd ClustersT/D ClustersAnd in this slide we added a small cluster for those “critical” systems that require active/active. So we don’t have to make the _entire_ datacenter active active.This is an example of how we apply the 80/20 principle. We keep the 80% simple.I recommend Active/Passive over Active/Active as A/A is not practical in reality:Not all application can have 2 independent instances. How do you synchronise the data? Can they take advantage of replicated database like GemFire? Do the Apps work with global load balancer? Are you going to front all application with global load balancer?If you cannot have a solution that covers _all_ application, that means you need to have another solution. So you need both.vCentervCenterGlobal LBGlobal LB500 Prod VMs50 VMs1000 Test/Dev VMsProd Clusters1 ClusterT/D Clusters
22 Large: Level of Availability TierTechnologyRPORTOTier 0Active/Active at Application Layer0 hoursTier 1Array-based Replication2 hourTier 2vSphere Replication15 min1 hourTier 3No replication. Backup & Restore1 day8 hoursHere is an example of level of availability that Infra team can provide to the Application (Business) team.While it shows 4 tiers, Tier 0 is a solution at the application layer. It is not something provided at the Infra layer, other than global load balancer. At Tier 0, the traditional DR concept does not apply, as the application always runs on both side. There is no need to do “recovery” as it is active/active.So there is actually 3 Tiers provided by Infra team.Tier 1:RPO is 0 to demonstrate the unique capability of array-based replication. It can do immediate, albeit a-sync so it doesn’t really impact performance. Hence no data loss, which is appealing to business.RTO is 2 hour as it still take time to mount LUN, add VMs to inventory, and boot them in the right order. Database needs to run consistency check. Linux VM needs to run fsck. Since there are multiple VMs, especially in a 3-tier application, it can take a while to boot.Tier 2:RPO is 15 minutes as that’s the limit of vSphere Replication. It cannot be lower than this.RTO is 1 hour to show the point that generally it takes faster to restore. No LUN to scan, mount, as the datastore is already mounted. No need to run consistency check, especially in Windows VM as it supports VSS.Tier 3:- I’m using no replication here to show that not all application needs DR.
23 vCenter (Desktop pool) MethodologyDefine how many physical data centers are requiredDR requirements normally dictate 2For each Physical DC, define how many vCenter are requiredDesktop and Server should be separated by vCenterConnected to same SSO server, fronted by same Web Client VMView comes with bundled vSphere (unless you are buying add-on)Ease of management.In some cases (Hybrid Active/Active), a vCenter may span multiple physical DC.For each vCenter, define how many virtual data centers are requiredVirtual Data Center serve as name boundary.A good way to separate IT (Provider) and Business (Consumer)For each vDC, define how many Cluster are requiredIn large setup, there will be multiple clusters for each Tier.For each Cluster, define how many ESXi are requiredPreferably 4 – is too small a sizeStandardise the host spec across cluster.While each cluster can have its own host type, this adds complexityPhysical DCvCenterVirtual DCClusterESXiPhysical DCvCenter (Server pool)Virtual DC (IT)Virtual DC (Biz)Tier 1 ClusterTier 2 ClusterESXivCenter (Desktop pool)Virtual DCFor highly sensitive apps, you need to think if you trust your Storage Admin, vCenter Admin, Windows Admin, AD admin, Network admin. If you don’t, or you have to kill them if you tell them, then you have to separate vCenter and array.
24 Large: The need for Non Prod Cluster This is unique in the virtual data center.We don’t have “Cluster” to begin with in physical DC as cluster means different thing.Non-Prod Cluster serves multiple purposesRun Non Production VMIn our design, all Non-Production run on DR Site to save cost.A consequence of our design is migrating from/to Production can mean copying large data across WAN.Disaster RecoveryTest-Bed for Infrastructure patching or updates.Test-Bed for Infrastructure upgrade or expansionEvaluating or Implementing new featuresIn Virtual Data Centre, a lot of enhancements can impact entire data centree.g. Distributed Switch, Nexus 1000V, Fault Tolerant, vShieldAll the above need proper testing.Non-Prod Cluster should provide sufficient large scale scope to make testing meaningfulUpgrade of the core virtual infrastructuree.g. from vSphere 4 to 5 (major release)This needs extensive testing and roll back plan.Even with all the above…How are you going to test SRM upgrade & updates properly?In Singapore, MAS TRM guidelines require Financial Institution to test before updating production.SRM test needs 2 vCenters, 2 arrays, 2 SRM servers. If all are used in production, then where is the test-environment for SRM?When happens when you are upgrading SRM?You will lose protection during this period.BusinessITThis new layer does notexist in physical world.It is software, hence needsits own Non Prod envi.
25 Large: The need for IT Cluster Special purpose clusterMore than Management Cluster. It runs non Management VMs that are not owned by Business. Examples:Active DirectoryFile Server& Collaboration (in the Large example, this might warrant its own cluster)Running all the IT VMs used to manage the virtual DC or provide core servicesThe Central Management will reside here tooSeparated for ease for management & securityThe next page shows the list of VMs that resides on the IT Cluster. Each line represent a VM.This shows for Production Site. DR Site will have a subset of this.Except for vCloud Director, which is only deployed on DR SiteExplanation of some of the servers below:Security Management Server = VM to manage security (e.g TrendMicro Deep Security)This separation keepsBusiness Cluster clean,“strictly for business”.vCD is only deployed on the DR Site in this example architecture.
26 Large: IT Cluster (part 1) The table provides samples of VMs that should run on the IT cluster.4 ESXi Host should be sufficient as most VM is not demanding. They are mostly management tool.Relatively more demanding VMs are vCenter Operations.There are many databases here. Standardise on 1.I will not put these databases together with DB running business workload.Keep Business and IT separate.CategoryLarge CloudBase PlatformvCenter (for Server Cloud) – active nodevCenter (for Server Cloud) – passive nodevCenter (for Server Cloud) DB – active nodevCenter (for Server Cloud) DB – passive nodevCenter Web ServervCenter Inventory ServervCenter SSO Server x2 with Global HAvCenter HeartbeatAuto-Deploy + Authentication Proxy (1 per vCenter)vCenter Update Manager + DB. 1 per vCenter.vCenter Update Manager Download Service (in DMZ)Auto-Deploy + vSphere Authentication ProxyvCloud Director (Non Prod) + DBCertificate ServerStorageStorage Mgmt tool (need physical RDM to get fabric info)VSA ManagerBack up ServerNetworkNetwork Management Tool (need a lot of bandwidth)Nexus 1000V Manager (VSM) x 2Sys Admin ToolsAdmin client (1 per Sys Admin) with PowerCLIVMware ConvertervMA (management Assistant)vCenter Orchestrator + DB1 vSphere Replication “replication server” appliance can process up to 1 Gbps of sustained throughput using approximately 95% of 1 vCPU.1 Gbps is much larger than most WAN bandwidthFor a VM protected by VR the impact on application performance is 2 - 6% throughput lossvCD and VSM data collectorDeploy at least 2 data collectors for vCD and VSM each for high availabilityChargeback Manager instance can be installed/upgraded at the time of vCD install/upgrade or laterThe vCenter Server, the vSphere Web Client Server, and the vCenter Inventory Service can all be installed on the same system, or can be split across multiple systems, depending on their resource needs and the available hardware resources. If you have < 32 hosts and < 4000 VM, install all three modules on the same system.
27 Large Cloud: IT Cluster (page 2) Continued from previous page.What IT apps that are not in this Cluster:View Security Servers. These servers reside in the DMZ zone. It is directly accessible from the Internet. Putting them in the management cluster means the management cluster needs to support Internet facing network.CategoryLarge CloudApplication MgmtAppDirectorHypericAdvance vDC ServicesSecurityAvailabilitySite Recovery Manager + DBSRM Replication Mgmt Server + DBvSphere Replication Servers (1 per 1 Gbps bandwidth, 1 per site)AppHA Server (e.g. Symantec)Security Management Server (e.g. TrendMicro DeepSecurity)vShield ManagerManagementPerformanceCapacityConfigurationvCenter Operations Enterprise (2 VM)vCenter Infrastructure NavigatorvCloud Automation Center (5 VM)VCM: Web + App + DB (3 VM)Chargeback + DB, Chargeback Data Collector (2)Help Desk systemCMDBChange Management systemDesktop as a ServiceView Managers + DBView Security Servers (sitting in DMZ zone!)ThinApp Update ServervCenter (for Desktop Cloud) + DBvCenter Operations for ViewHorizon SuiteMirage ServerCore InfraMS AD 1, AD 2, AD 3, etc.DNS, DHCP, etcSyslog server + Core Dump serverFile Server (FTP Server) for ITFile Server (FTP Server) for Business (multiple)Print ServerCore Services& CollaborationvSphere Replication for ROBO case does not need 2 vCenters. Yes, only 1 VR appliance required too.
28 Cluster SizeI recommend 6-10 nodes per cluster, depending on the Tier. Why not 4 or 12 or 16 or 32?A balance between too small (4 hosts) and too large (>12 hosts)DRS: 8 give DRS sufficient host to “maneuver”. 4 is rather small from DRS scheduler point of view.With “sub cluster” ability introduced in 4.1, we can get the benefit of small cluster without creating oneBest practice for cluster is same hardware spec with same CPU frequency.Eliminates risk of incompatibilityComplies with Fault Tolerant & VMware View best practicesSo more than 8 means it’s more difficult/costly to keep them all the same. You need to buy 8 hosts a time.Upgrading >8 servers at a time is expensive ($$) and complex. A lot of VMs will be impacted when you upgrade > 8 hosts.ManageabilityToo many hosts are harder to manage (patch, performance troubleshooting, too many VMs per cluster, HW upgrade)Allow us to isolate 1 host for VM-troubleshooting purpose. At 4 node, we can’t afford such ”luxury”VM Restart priority is simpler when you don’t have too many VMToo many paths to a LUN can be complex to manage and troubleshootNormally, a LUN is shared by 2 clusters, which are “adjacent” cluster.1 ESX is 4 paths. So 8 ESX is 32 paths. 2 clusters is 64 paths. This is a rather high number (if you compare with physical world)N+2 for Tier 1 and N+1 for othersWith 8 host, you can withstand 2 host failures if you design it to.At 4 nodes, it is too expensive as payload is only 50% at N+2Small Cluster sizeIn a lot of cases, the cluster size is just 2 – 4 nodes. From Availability and Performance point of view, this is rather risky.Say you have 3-node cluster…. You are doing maintenance on Host 1 and suddenly Host 2 goes down… you are exposed with just 1 node. Assuming HA Admission Control is enabled (which you should), the affected VM may not even boot. When a host is placed into maintenance mode, or disconnected for that matter, it is taken out of the admission control calculation.Cost: Too few hosts result in overhead (the “spare” host)8 hosts per cluster.Some cluster changes in the Advanced Attributes requires cluster to be disable and enableHarder/longer to do this when there are many hostsMgmt: 8 is easy number to remember. And a lucky one, if you believe. And we all know that production needs luck, not just experience
30 Small Cloud: Design Limitation It is important to document clearly, the design limitation.It is perfectly fine for a design to have limitation. After all you have limited budget.Inform CIO and Business clearly on the limitation.It is based on vSphere Standard editionNo Storage vMotionNo DRS and DPMNo Distributed SwitchCan’t use 3rd party multi-pathing.Does not support MSCSVeritas VCS does not have this restrictionvSphere 5.1 only support FC for now. I use iSCSI in this design.For 30-server environment, HA with VM monitoring should be sufficient.In vSphere 5.1 HA, a script can be added that ping the application (services) is active on its given port/socket.Alternative, a script within the Guest OS check the process if it’s up or not. If not, it sends alert.Only 1 cluster in primary data centerProduction, DMZ and IT all run on the same cluster.Network are segregated as they use different networkStorage are separated as they use different datastore
31 Small Cloud: Scaling to 100 VM The next slide shows an example where the requirement is for 100 VM instead of 50.We have 7 hosts in DC 1 instead of 3 hostsWe have 3 hosts in DC 2 instead of 2 hostsOnly 1 cluster in primary data centerProduction, DMZ and IT all run on the same cluster.Network are segregated as they use different networkStorage are separated as they use different datastoreSince we have more hosts, we can do sub-cluster. We will place the following as sub-clusterHost 1 – 2: Oracle BEA SubClusterHost 6 – 7 : Oracle DB SubClusterProduction is soft cluster. So a host failure means it can use Host 1 – 2 too.Complex Affinity and Host/VMBe careful in designing VM Anti-Affinity ruleWe are using Group Affinity as we have sub-cluster. So we have extra constraint.
32 Small Cloud: Scaling to 100 VM Certainly, there can be possible variations. 2 are described below.If we can add 1 more ESX host, we can create 2 cluster of 4 node each.This will simplify the Affinity RuleWe can use a 1-socket ESX host instead of 2-socketSave on VMware licenceExtra cost on serversExtra cooling/power operational costRest of VMsOracle DBOracle BEADMZ LANProduction LANManagement LAN
33 Small Cloud: Scaling to 150 VM We have more “room” to design, but it is still too smallProduction needs 7 hostsIT needs 2 hostsDMZ needs 2 hostsPutting IT with DMZ is a design trade-offvShield is used to separate IT and DMZIf the above is deemed not enough, we can add VLAN.If it is still not enough, use different physical cables or switchThe more you separate physically, the more you defeat your purpose of virtualisation.Rest of VMsOracle BEAOracle DBProduction LAN
35 Large: Overall Architecture The diagram shows the key layers in SDDC (based on vCloud Suite 5.1)The overall Global Management provides visibility across 2 datacenter. This is based on VC Ops 5.7, vCAC 5.2 and vCO 5.1. The products will be installed on Site 1, but protected by SRM. It will live on the IT Cluster, and managed by the vCenter for Server VMs.The diagram uses 2 physical datacenters. I’m not a big fan of 3 datacenter design as it increases complexity drastically. It is better do to it at application-layer as most apps will not require 3 DC.Storage is replicated via vSphere Replication (for most apps) and array-replication (for Tier 1 apps). Array-based is minimised following the shared-nothing principal described in earlier slide.Network is not stretched, following the shared-nothing principal.The green layer is the management (software-defined). So we have:vCenter. I separate vCenter for Server and for Desktop. This is to enable independent upgrade, and even independent operations. Plus the Server workload will be integrated with SRM, while View does not need SRM.SSO is a key component. Because the VC is separated, it makes sense to separate the SSO also.vCNS is shown here as we need to show how basic network services (firewall, NAT, private network) is provided. Also, 1 vCNS can only serve 1 VM, hence the mapping in the diagram.The blue layer is the ESXi clusterA large farm will have many clusters. There will be 1 cluster for each purpose. For example, a cluster might only contain Oracle DB and nothing else.The orange layer is the storage layer.I tend to map the cluster to the datastore, in a 1:1 mapping. This keeps things simple. With the shared-nothing vMotion in 5.1, there is no need for a shared datastore anymore.
36 This diagram continues from the previous diagram, by showing more details component. I’ve taken out EUC (End User Computing) and focus on the server workload.The diagram still shows the 2 sites, as you can see the diagram is symmetrical. As more and more products support multiple vCenters, the diagram will change and get simplified.The cream color shows the operational management or monitoring component. This is essentially VC Ops + vCAC and vCO. They are not part of the core architecture, rather they provide monitoring. A failure on this component does not impact your infrastructure.The green color shows the Availability component. A failure at this layer also does not mean your infrastructure is down, but this time around you lose protection. This layer, especially SRM 5.1, impacts your overall architecture as it needs 2 vCenter.The red color shows the core architecture. They are forming, or rather they are defining your architecture. A failure at this layer impact your infrastructure. For example, if vShield Edge fails you lose networking. I do not draw the connecting lines for SSO and AD as almost all components talk to them.The grey color shows the physical or base layer. This is where the resources (compute, storage, network) is provided. I am showing vShield App in this layer as I consider it as part of the hypervisor (1 vShield App per hypervisor)Limitation in vCloud Suite 5.1:vShield Manager can only manage 1 vCenter. So if your VMs are being fronted by vShield Edge, you need to ensure the rules are replicated to the DR Site.
37 Large: DataCenter and Cluster In our design, we will have 2 Datacenter onlySeparating the IT Cluster from the Business Clusters.Certain objects can go across Cluster, but not across Data CenterYou can vMotion from one cluster to another within a datacenter, but not to another datacenter.Networking: Distributed Switch , VXLAN, vShield Edge can’t go across DC as at vCloud Suite 5.1Datastore name is per DataCenter. So network and storage are per Data CenterYou can still clone a VM within a datacenter and to a different datacenterThe datacenter defines the namespace for networks and datastores. The namesfor these objects must be unique within a datacenter. For example, you cannothave two datastores with the same name within a single datacenter, but youcan have two datastores with the same name in two different datacenters.VMs, templates, and clusters need not be unique within thedatacenter, but must be unique within their folder.Objects with the same name in two different datacenters are not necessarily thesame object. Because of this, moving objects between datacenters can createunpredictable results. For example, a network named networkA indatacenterA might not be the same network as a network named networkA indatacenterB. Moving a VM connected to networkA fromdatacenterA to datacenterB results in the VM changing the networkit is connected to
39 Large: Tiered ClusterThe 3 tiers becomes the standard offering that Infra team provides to app team.If Tier 3 is charged $X/VM, then Tier 2 is priced at 2x and Tier 1 is priced at 4x.Apps team can then choose based on their budget.Cluster size varies, depending on criticality. A test/dev might have 10 node, while a Tier-0 might have just 2 node.The Server Cluster also maps 1:1 to the Storage ClusterThis keeps thing simple.If a VM is so important that it is on Tier 1 cluster, then its storage should be on Tier 1 cluster too.This excludes Tier 0, which is special and handled per application.Tier 0 means the cost of infra is very low relative to the value & cosst of the apps to the business.Tier “SW” is a dedicated cluster running a particular software.Normally, this is Oracle, MS SQL, Exchange. While we can have “sub-cluster”, it is simpler to dedicate entire cluster.Tier# HostNode Spec?FailureToleranceMSCSMax #VMMonitoringRemarksTier 1Always 6Always Identical2 hostsYes25Application level.Extensive AlertOnly for Critical App.No Resource Overcommit.Tier 24-8Maybe1 hostLimited75App can be vMotioned to Tier 1 during critical runTier 34-10No150Infrastructure levelMinimal Alert.Some Resource OvercommitSW2-101-3 hostsApplication specificRunning expensive softwares. Oracle, SQL are the norms as part of DB as a ServiceTier 1: 6 hosts, 25 VM. Effective host is 4, which is 25:4 or 6:1 consolidation ratio.Typical Tier 2: 8 hosts, 75 VM.Typical Tier 3: 10 hosts, 150 VMAt Tier 3:150 VM, at 1.5 vCPU as average = 225 vCPU.10 ESXi hosts, each 12 cores = 120 pCores.So this is 225:120 or around 1.9x CPU oversubcribe. This is ok for Tier 3.The above is suitable for <1000 VM private cloud. For >1000 VM, we need to have higher consolidation ratio, and use 4 socket hosts. Keep the cluster size below 10.
40 Large: Example 1 Goal is to provide 500 Prod VM and 1000 Non Prod VM As you scale >1000 VM, keep in mind the number of clusters & hosts.As you scale >10 clusters, consider using 4 socket hosts.This example does have Large VM cluster, which is an exception cluster. Large VM in this case is > 8 vCPU and > 64 GB vRAM.Virtualisation has gone beyond the low hanging fruit, the average VM size has gone beyond 1 vCPU and 2 GB of vRAM. In this example, it gives 3.4 vCPU and 19 GB as the average VM size.As most VMs are not using their resources 100%, the model uses over-subscribe. I’m using 1.5x for Production and 2x for Non Production.In this example, I have also taken into account that each host will need dedicated 2 cores and 6-7 GB of RAM for the following purpose:HypervisorvMotionHypervisor based firewall like vCNSHypervisor based AVHypervisor based IDS/IPSHypervisor based storage. Example is VMware Distributed Storage (tech preview) or NutanixvSphere Replication.The above is an example. It is based on a rather conservative approach. Notice the end Consolidation Ratio is rather low, which means more cabling and overhead.The number of datastores might grow because of array-based replication constraint, where entire LUN must be migrated as 1. In that case, we might need to have more datastore. Keep the total <10 per cluster.We should stop discussing the _overall_ consolidation ratio. Prod and Non-Prod are very different, and should be treated differently.Standard = 2 socket, 16 core ESX. For vCPU VMLarge = 4-socket, 40 core ESX. For vCPU VM.An extra large VM (e.g. 20 vCPU, 128 GB vRAM) can be placed on the Large cluster. But take note their impact on performance.
41 Large: Example 2Same goal as previous, but we’re going for higher consolidation ratio (and hence using 40-core box)In this example, we take a different approach. There is no The Best approach. It all depends on your situation. Here we’re focusing on a higher consolidation ratio to keep the number of ESXi host manage-able.Notice that Tier 1 has _more_ datastores than Tier 2 or Tier 3. This is because Tier 1 primarily uses array-based replication for DR. It is not using vSphere Replication. As a result, we hit the limitation of per LUN replication.We are also using 1 type of ESXi host for ease of management4-socket, 40 core ESX, 256 GB RAM.vSphere Replication vSCSI FilterRuns in ESXi kernelAttached to the virtual device, intercepts all I/O to the diskEach replica corresponds to a lightweight snapshotBitmap of changed blocks is maintained between replications (backed by on disk state file)vSphere Replication AgentRuns in Host AgentImplements configuration of replication in primary siteManages VMs replication processInterposes on operations that impact replicationThe amount of CPU reservation thus depends on the number of vMotion NICs and their speeds;10% of a processor core for each 1Gb network interface,100% of a processor core for each 10Gb network interface,a minimum total reservation of 30% of a processor core.
42 Large: Example Pod (with Converged Hardware) Rack 1 (42 RU)Rack 2 (42 RU)Network Block.5 RUNetwork Block.5 RU4x 48 ports. 10 GE. Total 192 ports.1x 48 ports. 1 GE (for Management)Management Block.2 RUManagement Block.2 RUIT Cluster. It’s a 4-node cluster.Compute + StorageConverged Block.32 RUCompute + StorageConverged Block.32 RUEach ESXi hosts has:4x 10 GE for network and storage1x 1 GE for iLO1x Flash for performance2x SSD for performance4x SAS for capacityTotal ports requirements per rack:34 x 4 = GE ports34 x 1 = GE portsISL & uplinks = 6 GE portsTotal compute per Pod: 2 racks x 32 x 16 cores = 1024 coresThis particular example is using a converged Compute + Storage hardware. For a more common approach, we need to add rack space for Array (typically around 20 RU).We are keeping the design simple for better manage-ability. This means the hardware cost is not 100% optimised.We keep each rack identical. We could have used less network switch and have cables going across rack as the network ports are not fully utilised.We are not fully populating the rack space. We still have 3 RU for each.
43 Resource Pool: Best Practices What they are notA way to organise VM. Use folder for this.A way to segregate admin access for VM. Use folder for this.For Tier 1 cluster, where all the VMs are critical to businessArchitect for Availability first, Performance second.Translation: Do not over-commit.So resource pool, reservation, etc are immaterial as there is enough for everyone.But size each VM accordingly. No oversizing as it might slow down.For Tier 3 cluster, use carefullyTier 3 = overcommit.Use Reservation sparingly, even at VM level.This guarantees resource, so it impacts the cluster slot size.Naturally, you can’t boot additional VM if your guarantee is fully usedTake note of extra complexity in performance troubleshooting.Use as a mechanism to reserve at “group of VMs” level.If Department A pays for half the cluster, then creating an RP with 50% of cluster resource will guarantee them the resource, in the event of contention. They can then put as many VM as they need.But as a result, you cannot overcommit at cluster level, as you have guaranteed at RP level.Introduce a scheduled task which sets the shares per resource pool, based on the number of VMs/vCPUs they contain.E.g.: a PowerCLI script which runs daily and takes corrective actions. Just google it Don’t put VM and RP as “sibling” or same levelDRS load balancing occurs in 5 minutes, not seconds. See yellow-brick.Introduce a scheduled task which sets the shares per resource pool, based on the number of VMs/vCPUs they contain.E.g.: a PowerShell script which runs daily and takes corrective actionsSee my Resource Management slide for details
44 VM-level Reservation & Limit CPU reservation:Guarantees a certain level of resources to a VMInfluences the admission control (PowerOn)CPU reservation isn’t as bad as often referenced:CPU reservation doesn’t claim the CPU when VM is idle (is refundable)CPU reservation caveats: CPU reservation does not always equal priorityVM uses processors and “Reserved VM” is claiming those CPUs = ResVM has to wait until threads / tasks are finishedActive threads can’t be “de-schedules” if you do so = Blue Screen / Kernel PanicMemory reservationMemory reservation is as bad as often referenced. “Non-Refundable” once allocatedWindows is zeroing out every bit of memory during startup…Memory reservation caveats:Will drop the consolidation ratioMay waste resources (idle memory cant’ be reclaimed)Introduces higher complexity (capacity planning)Do not configure high CPU or RAM, then use LimitE.g. configure with 4 vCPU, then use limit to make it “2” vCPUIt can result in unpredictable performance as Guest OS does not know.High CPU or high RAM has higher overhead.Limit is used when you need to force slow down a VM. Using Shares won’t achieve the same resultVM uses processors and “Reserved VM” is claiming those CPUs = ResVM has to wait until threads / tasks are finishedActive threads can’t be “de-schedules” if you do so = Blue Screen / Kernel Panic
45 Fault Tolerance Design Consideration General guides Limitation Still limited to 1 vCPU in vSphere 5.1FT impacts Reservation. It will auto reserve at 100%Reservation impacts HA Admission Control as slot size is bigger.HA does not check Slot Size nor actual utilisation when booting up. It checks Reservation of that affected VM.FT impacts Resource Pool. Make sure the RP includes the RAM Overhead.Cluster Size is minimum 3, recommended 4.Tune the application and Windows HAL to use 1 CPU.In Win2008 this no longer matters [e1: need to verify]General guidesAssuming 10:1 consolidation ratio, I’d cap FT usage to just 10% of Production VMSo 80 VM means around 8 ESX host means around 8 FT VM.This translates to 1 Primary VM + 1 Secondary VM per host.Small cluster size (<5 nodes) are more affected when there is a HA. See picture for a 3-node example.LimitationTurn off FT before doing Storage vMotionFT protect infra, not app. Use Symantec ApplicationHA to protect AppWhen a host fails and the VMs need to be booted on other ESX hosts… I think/guess this is what happens:Say there are 4 nodes in the cluster. Node 4 is the largest (more GHz and more RAM)The cluster setting is “Host failures”, and to tolerate 1 host failure.So Node 4 is not included in the Slot Size. HA is being conservative here. It needs to cater for “the largest node fails” situation.Host 1 suddenly dies. It has 10 VM.HA will look at Host 2, 3, and 4 (yes, no 4 too) and see which host has the most Reservation. Slot size is not considered.Say that Host 4 has enough capacity to handle the Total reservation of 6 VM.The 6 VM will boot in Host 4.HA considers if the VM have affinity rule??? HA does consider “VM compatibility”, but not sure what this means.After Host 4, say Host 2 has the largest remaining Unreserved capacity. Note that HA uses reservation, not utilisation.Host 2 has enough capacity to handle the Total reservation of the 4 VM.The 4 VM will boot in Host 2.Host 3 will not be used.DRS kicks in to load balance.
46 Branch or remote sitesSome small sites may not warrant its own vCenterNo expertise to manage it either.Consider vSphere Essential Plus ROBO edition. Need 10 sites for best financial return as it is sold in 10 units.Features that are network heavy should be avoided.Auto deploy means sending around 150 MBtye. If link is 10 Mbit shared, it will add up.Best practicesInstall a copy of template at remote site. If not, use OVF as it is compressed.Increase vCenter Server and vSphere hosts timeout values to ~3 hoursConsider manual vCenter agent installs prior to connecting ESXi hostsUse RDP/SSH instead of Remote Console for VM console accessIf absolutely needed, reduce remote console displays to smaller values, e.g. 800x600/16-bitvCenter 5.1 improvement over 4.1 on remote ESXUse web client if vCenter is remote, which uses less bandwidthNo other significant changesCertain vCenter operations that involve a heavier payloadE.g. Add Host, vCenter agent upgrades, HA enablement, Update Manager based host patchingvCenter 4.1 improvements over vCenter 4.0:1.5x to 4.5x improvement in operational time associated with typical vCenter management tasksAll traffic between vCenter and ESXi hosts is now compressedStatistics data sent between hosts and vCenter Server flows over TCP, not UDP; eliminates lost metricsMost vCenter operations fare well over 64 Kbps links
48 Approach Consideration when deciding the size of ESXi host General guidelines as at Q3 2013:Use 2 sockets, cores, Xeon 2820 with 128 GB RAMFor large VM, use 4 sockets, 40 cores, Xeon 4820 with 256 GB RAM8 GB RAM per core. A 12-core ESX box should have 96 GB. This should be enough to cater for VM with large RAMConsideration when deciding the size of ESXi hostLook at overall cost, not just the cost of ESX host. Cost of network equipments, cost of management, power cost, space cost.Larger host can take larger VM or more VM/host.Think of cluster, not 1 ESX host when sizing the ESXi host. Cluster is the smallest logical building block in this Pod approach.Plan for 1 fiscal year, not just next 6 months.You should buy host per cluster. This ensures they are the same batch.Standardise the host spec makes management easier.Know #VM you need to host and their size.This gives you idea how many ESX you need.Define 2 VM sizing: Common and LargeIf your largest VM needs >8 core, go for >8 core pCPU. Ideally, a VM should fit inside socket to minimise NUMA effect. This happens in physical world too.If your largest VM needs 64 GB of RAM, then each socket should have 72 GB. I consider RAM overhead. Note that Extra RAM = Slower boot. ESXi is creating swap file that match the RAM size. You can use reservation to reduce this, so long you use “% based” in Cluster setting.ESXi host should be >2x the largest VM.Decide: Blade or Rack or ConvergedDecide: IP or FC storageIf you use Converged, then it’s either NFS or iSCSIAll Tier 1 vendors (HP, Dell, IBM, Cisco, etc) make great ESXi hosts.Hence the following guidelines are relatively minor to the base spec.Additional guidelines for selecting an ESXi Servers:Does it have Embedded ESXi?How much local SSD (capacity and IOPS) can it handle? This is useful for stateless desktop architecture. Useful when using local SSD as cache or virtual storage.Does it have built-in 2x 10 GE ports?Does the built-in NIC card have hardware iSCSI capability?Memory cost. Most ESXi Server has around 128 GB of RAMWhat are the server unique features for ESXi?Management integration. Majority of the server vendors have integrated management with vCenter. Most are free. Dell is not free, although it has more features?DPM support
49 ESXi Host: CPU CPU performance has improved drastically. Something like 1800%No need to buy the highest end CPU as the Premium is too high. Use the savings and buy more hosts instead, unless:the # hosts are becoming a problemyou need to run high performance single threadYou need to run more VM per host.The 2 table below VMmark resultFirst table shows improvement fromBased on VMmark 1.xSecond table shows from 2010 to May 2013based on VMmark 2.xIntel® Xeon® E (2.3GHz/6-core/15MB/95W) Processor priced at US$ 879.Xeon 5600 delivers 9x improvement over Xeon 5100Clock speed only improves by 0.1x and # core by 3xFujitsu delivers VMmark result of at 27 tiles on July 2010Recommendation (for Intel)Use Xeon 2803 or E if budget is the constraint and you don’t need to run >6 vCPU VM.Use Xeon 2820 or E5-2650L if you need to run 8 vCPU VM.Use Xeon 2850 if you need to run 10 vCPU VMRecommendation (for Intel, 4 socket box)Use 4807, then 4820, then 4850.AMD Opteron at 2 sockets 24 cores is tiles. An impressive number too.Xeon is around 18% faster, not a huge margin anymore.But Xeon uses 12 cores, while Opteron uses 24 cores. So each core is around 2x faster.
50 ESXi Host: CPU Sizing Buffer the following: Agent VM or vmkernel module:vShield App or other hypervisor-based firewallHypervisor based firewall such as vShield AppHypervisor based IDS/IPS such as TrendMicro Deep SecurityvSphere ReplicationDistributed StorageHA event.Performance isolation.Hardware maintenancePeak: month end, quarter end, year endFuture requirements within the same fiscal yearDR. If your cluster needs to run VM from the Production site.The table below is before we add HA into account. So it is purely from performance point of view.When you add the HA host, the day to day ratio will drop. So the utilisation will be lower as you have “spare” hostDoing 2 vCPU per 1 physical core is around 1.6x over-subscribe, as there is benefit of Hyper-Threading.TiervCPU RatioVM RatioTotal vCPU(2 sockets, 16 cores)Average VM sizeTier 12 vCPU per core5:132 vCPU32/5 = 6.4 vCPU eachTier 24 vCPU per core10:1 – 15:164 vCPU64/10 = 6.4 vCPU eachTier 3 or Dev6 vCPU per core20:1 – 30:196 vCPU96/30 = 3.2 vCPU each
51 UNIX X64 migration: Performance Sizing When migrating from UNIX to X64, we can use industry standard benchmark where both platforms participate. Benchmarks like SAP and SPEC are established benchmark, so we can easily get data from older UNIX machines (which are common source of migration as they have reached 5 years and hence have high maintenance cost).Based on SPEC-int2006 rate benchmark results published July 2012:HP Integrity Superdome (1.6GHz/24MB Dual-Core Intel Itanium 2) 128 coresSPEC-int2006 rate result: 1650Fujitsu / Oracle SPARC Enterprise M coresSPEC-int2006 rate result: 3150IBM Power 780 (3.44 GHz, 96 core)SPEC-int2006 rate result: 3520IBM result per core is higher than X64 as it uses MCM module. In Power series CPU and Software are priced at core basis, not socket.Bull Bullion E (160 cores - 4TB RAM)SPEC-int2006 rate result : 4110Sizing of RAM, Disk and Network are much easier as we can ignore the speed/generation. We simply match it. For example, if the UNIX apps need 7000 IOPS and 100 GB of RAM we simply match it. The higher speed of RAM is a bonus.With Flash and SSD, IOPS is no longer concern. The vCPU is the main factor as UNIX partition can be large (e.g. 48 cores), and we need to reduce the vCPU.Source: Bull presentation at Vmworld 2012
52 ESXi Host: RAM sizing 64 GB 64 GB How much RAM? It depends on the # core in previous slide.Not so simple anymore. Each vendor is different.8 GB DIMM is cheaper than 2x 4 GB DIMM.8 GB per core. So 12 core means around 96 GB.Consider the channel best practiceDon’t leave some empty. This bring benefits of memory interleaving.Check with the server vendor on the specific model.Some models now comes with 16 slots per socket, so you might be able to use lower DIMM size.Some vendors like HP has similar price between 4 GB and 8 GB.Dell R710 has 18 DIMM slots (?)IBM x3650 M3 has 18 DIMM slotsHP DL 360/380 G8 has 24 DIMM slotsHP DL 380 G7 and BL490cG6/G7 have 18 DIMM slotsCisco has multiple models. B200 M3 has 24 slots.VMkernel has Home Node concept in NUMA system. For ideal performance, fit a VM within 1 CPU-RAM “pair” to avoid “remote memory” effect.# of vCPUs + 1 <= # of cores in 1 socket. So running a 5 vCPU VM in a quad-core will force remote memory situationVM memory <= memory of one nodeTurn on Large Page, especially for Tier 1.Need application-level support64 GB64 GBInput from HP:We only put premium on special 8GB or 4GB DIMM which are much lower power. This is only for unique customer which want to reduce the thermal envelope of their environment.Memory performance of memory will drop as you put more DIMM per channel.HP had worked with Intel to boost 2 DIMM per channel, so with hp servers, there is a feature to turn dimm per channel at 1333mhz (ie: 12 dimm slots populated but still run 1333mhz).This is good to cater to sweet spot memory config of 96GB RAM per 2 socket cpu)
53 ESXi Host: IO & Management IO requirements will increase in The table provides estimate.It is a prediction based on tech preview or VMworld Actual result may vary.Converged Infrastructure needs high bandwidthIO cardI personally prefer 4x 10 GE NICNot supported: mixing hardware iSCSI and software iSCSI.ManagementLights-out managementSo you don’t have to be in front of physical server to do certain thing (e.g. go into CLI as requested by VMware Support)Hardware agent is properly configuredVery important to monitor hardware health due to many VMs in 1 box.PCI Slot on the motherboardSince we are using 8 Gb FC HBA, make sure the physical PCI- E slot has sufficient bandwidth.A single-dual port FC port makes more sense if the saving is high and you need the slot. But there is a risk of bus failure. Also, double check to ensure the chip can handle the throughput of both ports.If you are using blade, and have to settle for a single 2-port HBA (instead of two 1-port HBA), then ensure the PCI slot has bandwidth for 16 Gb. When using a dual-port HBA, ensure the chip & bus in the HBA can handle the peak load of 16 Gb.Estimated ESXi IO bandwidth in early 2014PurposeBandwidthRemarksVM4 GbFor ~20 VM.vShield Edge VM needs a lot of bandwidth as all traffic pass through itFT10 GbBased on VMworld 2012 presentation.Distributed StorageBased on Tech Preview in 5.1 and NutanixvMotion8 GbvSphere 5.1 introduces shared-nothing live migration. This increases the demand as vmdk is much larger than vRAM.Include multi-NIC vMotion for faster vMotion when there are multiple VMs to be migrated.Management1 GbCopying a powered-off VM to another host without shared datastore takes this bandwidthIP Storage6 GbNFS or iSCSI.Not the same with the Distributed Storage as DS is not serving VM.No need 10 Gb as the storage array is likely shared by hosts. The array may only have 40 Gb total for all these hosts.vSphere ReplicationShould be sufficient as the WAN link is likely the bottlenectTotal40 GbUse a NIC that supports the following:Checksum offloadCapability to handle high memory DMA (64-bit DMA addresses)Capability to handle multiple scatter/gather elements per Tx frameJumbo frames. The benefit of Jumbo Frames seems to be not conclusive. I think it’s a good practice as it prepares for VXLAN.We require NetQueue to achieve 10 GE inside the VM. Without it, VM gets only 3–4 Gbps.Good consideration on PCI and bandwidth:It will be interesting to measure how much Ethernet and FC bandwidth is required for various tier of clusters for a said config of VMeg: a box with 10 VM for PROD GOLD TIER should have x GB Ethernet and y Gb FC bandwidth.This will ensure no oversubscription at the network and FC I/O point of view and to ensure end to end performance.A lot of people spending big $$ on cpu and forgot about end-to-end scaling for performance.Many customers simply mix production Tier 1 and Tier 3, without knowing how much LAN and SAN bandwidth does Tier 1 needGood reading on remote mgmt and hardware mgmt via iLO:
54 Large: Sample ESXi host specification Estimated Hardware Cost: US$ 10K per ESXi.Configuration included in the above price:2 Xeon X5650. The E series has different performance & price attributes128 GB RAM4x 10 GE NIC. No HBA2x 100 GB SSD.Swap to host-cache feature in ESXi 5Running agent VM that is IO intensiveCould be handy during troubleshooting. Only need 1 HD as it’s for troubleshooting purpose.No installation disk. We will use auto-deploy, except for management cluster.Light-Out Management. Avoid using WoL. Uses IPMI or HP iLO.Costs not yet includedLAN switches. Around S$15 K for a pair of 48-port GE switch (total 96 ports)SAN switches.In the world of virtualization, the use of 1 GE has both pragmatic and flexibility issues, look in to cable management, proliferation of access switches and integration issues into the data center architecture. Look into 10G LOM solutions.memory performance of memory will drop as u put more DIMM per channel.1 dimm per channel = 1333 mhz ==> total 6 dimms populated2 DIMM per channel = 1066 mhz ==> total 12 dimms populated3 DIMM per channel = 800mhz ==> 18 dimms populatedhowever, hp had worked with intel to boost 2 DIMM per channel, so with hp servers, there is a feature to turn dimm per channel at 1333mhz (ie: 12 dimm slots populated but still run 1333mhz).This is good to cater to sweet spot memory config of 96GB RAM per 2 socket cpu)
55 Blade or Rack or Converged Both are good. Both have pro and cons. Table below is relative comparison, not absolute.Consult principal for specific model. Below is just for guidelines.Comparison below is only for vSphere purpose. Not for other use case, say HPC or non VMware.There is a 3rd choice, which is converged infrastructure. Example is Nutanix.BladeRackRelativeAdvantagesSome blades come with built-in 2x 10 GE port. To use it, you just need to get 10 GE switch.Less cabling, less problem.Easier to replace a bladeBetter power efficiency. Better rack space efficiency.Better cooling efficiency. The larger fan (4 RU) is better than the small fan (2 RU) used in rackSome blade can be stateless. The management software can clone 1 ESX to another.Better management, especially when you have many ESXi hosts.Typical 1RU rack server normally comes with 4 built-in ports.Better suited for <20 ESX per siteMore local storageDisadvantagesMore complex management, both on Switch and Chassis. Proprietary too. Need to learn the rules of the chassis/switches. Positioning of the switch matters in some model.Some blade virtualise the 10 GE NIC and can slice it. This adds another layer + complexity.Some replacement or major upgrade may require entire chassis to be powered offSome have 2 PCI slots only. Might not support if you need >20 GE per ESXi.Best practice recommends 2 enclosures. The enclosure is passive, it does not contain electronic.There is initial cost as each chassis/enclosure needs to have 2 switches.Ownership of the SAN/LAN switches in the chassis needs to be made clear.The common USB port in the enclosure may not be accessible by ESX. Need to check with respective blade vendor.USB dongle (which you should not use) can only be mounted in front. Make sure it’s short enough that you can still close the rack door.The 1 RU rack server has very small fan, which is not as good as larger fan.Less suited when each DC is big enough to have 2 chassisCabling & rewiringFromI have also seen customers with 2 Blade Chassis in a C Blades in each. An firmware issue affected all switch modules simultaneously instantly isolating all blades in the same chassis. Because they were the first 6 blades built it took down all 5 Primary HA agents. The VMs powered down and never powered back up. Because of this I recommend using two chassis and limiting cluster size to 8 nodes to ensure that the 5 primary nodes will never all reside on the same chassis.NOT all Blade Enclosure are passive device. There is/are blade enclosure that contains many Active components.On Rack-mount, the size of fan does not really matter actually as each fan is designed to suit its server.
56 ESXi boot options 4 methods of ESXi boot Need installation: Local Compact FlashLocal DiskSAN BootNo need installation.LAN Boot (PXE) with Auto-DeployAuto DeployEnvironment with >30 ESXi should consider Auto Deploy. Best practice is to put the IT Cluster on non-autodeploy.An ideal ESX is just pure CPU and RAM. No disk, no PCI card, no identity.Auto-Deploy is also good for environment where you need to prove to security team that your ESXi has not been tempered (you can simply boot it and it is back to “normal” )Centralised image management.Consideration when the Auto-Deploy infrastructure are also VM:Keep the IT using local install.Advantages of Local Disk to SAN bootNo SAN complexityNeed to label the LUN properly.Disadvantages of Local Disk to SAN bootNeed 2 local disk, mirrored.Certain organisation does not like local disk.Disk is a moving part. Lower MTBF.Save power/coolingSAN boot is the new trends toward stateless computing. This further enable abstraction for ESX host in the cloud computing world by decoupling logical resources from physical at the host level.
58 Methodology SLA Datastore Cluster VM input Mapping Monitor Define the standard (Storage Driven Profile)Define the Datastore profile.Map Cluster to Datastore ClusterFor each VM, ask the owner to choose:Capacity (GB)Which Tier they want to buy. Let them decide as they know their own appMap each VM to each datastoreCreate another DS if insufficient (either capacity or performance)See next slide for detailMost app team will not know their IOPS and Latency requirement.Make it as part of Storage Tiering, so they consider the bigger pictureTurn on Storage IO ControlStorage IO Control is per datastore. If underlying LUN shares spindles with all other LUN, then it may not achieve the result. Consult with storage vendor on this as they have entire array visibility/control.
59 SLA: 3 Tier pools of Storage Profile Create 3 Tiers of Storage with Storage DRSThis become the type of Storage Pool presented to clusters or VM.Implement VASA so the profiles are automatically presented and compliance check can be performed.Paves for standardisationChoose 1 size for each Tier. Keep it consistent. Choose an easy number (e.g vs 800).Tier 3 is also used in non production.Map the ESX Cluster tier to the Datastore Tier.If a VM is on Tier 1 Production cluster, then it will be placed on Tier 1 Datastore, not Tier 2 datastore.The strict mapping reduces the #paths drastically.ExampleBased on the Large Cloud scenario. Small Cloud will have simpler and smaller design.Snapshot means protected with array-level snapshot for fast restoreVMDK larger than 1.5 TB will be provisioned as RDM.RDM will be used sparingly. Virtual-compatibility mode used unless App team said so.Tier 2 and 3 can have large Datastore as replication is done at vSphere layer.Interface will be FC for all. This means storage vMotion can be done with VAAIConsult storage vendor for array specific design. I don’t think the array has Shares & Reservation concept.IOPSArray can’t guarantee or control latency per Tier.1 TB was selected as it provides the best balance between performance and manageability with approximately VMs and virtual disks per volume. For manageability, it allows an adequately large portion of disks to better use resources and limit storage sprawl. A smaller size maintains a reasonable RTO and reduces the risks associated with losing a single LUN. In addition, the size limits the number of VMs that remain on a single LUN.I got different results from checking the IOPS:Commercial SSD, with Multi-Layer Cell: 1000 – 2000 IOPSEnterprise Flash Drive, with Single-Layer Cell & much higher speed & buffer: 6000 – 30K IOPS15k rpm: IOPS. Some say 250. Some say 150 – 200. So I take 150 as it’s easy to remember.10k rpm: IOPS. Some say 100 – 150. So I take 100 as it’s easy to remember.7200 rpm: IOPS. Some say Some say 120. I don’t use this in my design. Too big a capacity will encourage more VM.5400 rpm: IOPSSAS drives are a good compromise (cost vs capacity vs speed) at IOPSFrom virtualgeek:Even if you do thin provisioning at the array level, eagerzeroedthick VMDKs “cancel it out” – because they “pre-zero” every portion of the VMDK. Note that production storage dedupe or compression techniques can solve this (since all the zeros can be eliminated), but thin-provisioning at the array level on it’s own cannot.The data we have collected reveals:Both thin and thick disks perform similarly on various workloads.Thin provisioned disks show similar performance trends as thick disks do when scaled across different hosts.External fragmentation has negligible impact on the performance of thin provisioned disks.There is insignificant performance impact on existing thick disks if thin provisioning is implemented on a shared array.There are five key features that Storage DRS offers:Resource aggregationInitial PlacementDatastore Maintenance ModeLoad BalancingAffinity RulesTierPriceMin IOPSMax LatencyRAIDRPORTOSizeLimitReplicatedMethodArraySnapshot# VM14x/GB600010 ms1015 minute1 hour2 TB70%Array levelYes~10 VM. EagerZeroedThick22x/GB400020 ms2 hour4 hour3 TB80%vSphere levelNo~20 VM. Normal Thick31x/GB200030 ms8 hour4 TB~30 VM. Thin Provision
60 Arrangement within an array Below is a sample diagram, showing disk grouping inside an array.The array has 48 disks. Hot Spare not shown for simplicityThis example only has 1 RAID Group (2+2) for simplicityDesign considerationDatastore 1 and Datastore 2 performance can impact one another, as they share physical spindles.Each datastore spans across 16 spindles.IOPS is only 2800 max (based on 175 IOPS for a 15K RPM FC disk). Because of RAID, the effective IOPS will be lower.The only way they don’t impact if there are “Share” and “Reservation” concept at “meta slice” level.Datastore 3, 4, 5, 6 performance can impact one another.DS 1 and DS 3 can impact each other since they share the same Controller (or SP). This contention happens if the shared component becomes bottlenect (e.g. cache, RAM, CPU).The only way to prevent is to implement “Share” or “Reservation” at SP level.For Storage IO Control to be effective, it should be applied to all datastores sharing the same physical spindles.So if we enable for Datastore 3, then Datastore 4, 5, 6 should be enabled too.Avoid different settings for datastores sharing underlying resources (spindles, controller, cache, port)Use same congestion threshold them. Use comparable share values (e.g. use Low / Normal / High everywhere)Avoid using extent. SIOC is not supported on datastores with multiple extents. Therefore Storage DRS cannot be used for I/O load balancing.
61 Storage IO Control Datastore A Datastore B SIOC Storage IO Control is at the Datastore levelThere is no control at RDM level. ???But array normally share spindles.In the example below, the array has 3 volumes. Each volume is configured the same way. Each has 32 spindles in RAID10 configuration (8 units of 2+2 disk groups).There are non vSphere sharing the same spindlesBest practicesUnless the array has “Shares” or “Reservation” concept, then avoid sharing spindles between each Storage Profile.Datastore ADatastore BSIOC
62 Storage DRS, Storage IO Control and physical array Array is not aware of VM inside VMFS. It only sees LUN.Moving VM from 1 datastore to another will look like large IO operation to the array. One LUN will decrease in size, while the other one increase drastically.With array capable of auto-tiering:VMware recommends configuring Storage DRS in Manual Mode with I/O metric disabledUse Storage DRS for initial placement and out of space avoidance featuresWhitepaper on Storage DRS interoperability with Storage Technologies:Feature or ProductInitial PlacementMigration RecommendationsArray-based replication(SRDF, MirrorView, SnapMirror, etc )SupportedMoving VM from one datastore to another can cause a temporary lapse in SRM protection (?) and increase size of next replication transfer.Array-based snapshotsMoving VM from one datastore to another can cause increase in space usage in the destination LUN, so the snapshot takes longer.Array-based DedupeMoving VM from one datastore to another can cause temporary increase in space usage, so the dedupe takes longer.Array based thin provisioningSupported on VASA-enabled arrays only[e1: reason??]Array-based auto-tiering(EMC FAST, Compellent Data Progression, etc)Do not use IO-based balancing. Just use Space-based.Array-based I/O balancing(Dell Equallogic)n/a as it is controlled by the arrayFor array-based replication, SDRS initial placement is supported, but we recommend using manual mode for its migration recommendations, so that the recommendations can be approved with their impact on protection and replication transfer size in mind.Similarly, for array-based snapshots & dedupe, SDRS initial placement is supported, but we recommend using manual mode for its migration recommendations, so the potential increase in space usage can be considered by the admin.Storage DRS uses the construct “DrmDisk”The DrmDisk represent a consumer of datastore resourcesThe smallest entity it can migrateVMDK = DrmDisk.Snapshot part of DrmDisk.VM files (Swap, VMX, log) contained into one DrmDisk (VM configuration).Storage DRS can load balance per DrmDisk
63 RAID type In this example, I’m using just RAID10. Generally speaking, I see a rather high Write ratio (around 50%). RAID5 will result in higher cost, as it needs more spindle.More spindles gives the impression we have enough Storage. It is difficult to say no to request when you don’t have storage issue.More spindles mean you’re burning the environment more.vCloud Suite introduces additional IOPS outside the guest OSVM boots results in writing the vRAM to disk.Storage vMotion and Storage DRSSnapshotMixing RAID5 and RAID10This increases complexity. RAID5 was used for capacity. But nowadays each disk is huge (2 TB).I’d go for mixing SSD and Disk, then mixing RAID type. So it is:SSD RAID 10 for performance & IOPSDisk RAID 10 for capacityI’d for just 2 tier instead of 3. This minimises the movement. Each movement cost both read and write.Sample below is based on150 IOPS per spindle.Need to achieve 1200 IOPSThe above table comes from Patrick Carmichael, VMware, presentation at VMworld Session ID is INF-STO1807.How many IOPs can I achieve with a given number of disks?Total Raw IOPS = Disk IOPS * Number of disksFunctional IOPS = (Raw IOPS * Write%)/(Raid Penalty) + (Raw IOPS * Read %)How many disks are required to achieve a required IOPS value?Disks Required = ((Read IOPS) + (Write IOPS*Raid Penalty))/ Disk IOPSRAID Level# Disks required.(20% Write)(80% Write)6164051327(nearly 2X of RAID10)1014RAID TypeWrite IO Penalty546102
64 Cluster Mapping: Host to Datastore Always know which ESX cluster mounts what datastore clusterKeep the diagram simple. Main purpose is to communicate with other team. Not too many info. The idea is to have a mental picture that they can understandIf your diagram has too many lines, too many datastores, too many clusters, then it is too complex. Create a Pod when such thing happens. Modularisation makes management and troubleshooting easier.This diagram shows clearly the mapping between datastores and clusters. You need to draw something like this. Keep it simple, as the idea is to have a mental picture.Good read:Why should a zoned lun not be shared between clusters? I tried to find an official stance from VMWare on the communities section and the only information I could find was that you could share a lun between clusters or more accurately between all ESX host in multiple clusters. I would rather not share the luns between clusters, but a team member seems to be bent on doing it. What reasons personal, political, or technical led to the statement in the book about not having luns cross environmets?There are several reasons I would never do it in an environment that I was designing. 1) You want to limit the number of hosts connecting to any particular storage volume. This is primarily for performance reasons. If you have say, 20 hosts, each with 2 data paths, all talking to the same LUN for multiple VMs, you likely have a lot of disparate I/O going on between host and array. VMFS as a file system cannot effectively handle more than hosts all talking to the same LUN at the same time actively (Although the cluster limit was bumped from 16 to 32). The idea here, is you will never have all 32 hosts of a cluster communicating to a single LUN at the same time due to how you spread your VM load. 2) Assigning all LUNs to all hosts is a management nightmare. Services such as VMotion, DRS, and HA cannot function across multiple clusters, so the primary driver of "shared storage" no longer exists. Forcing the ESX Admin into ugly management of determining "Where can I put particular VMs for a particular cluster without impacting another's performance?" will become extremely tedious. Simply not zoning a LUN to a group of servers is something a storage admin SHOULD be familiar with and should have no impact on the backend infrastructure... Managing a nasty spider web of data paths is hard enough within a single cluster...let alone multiple. I consider this on par with why you would VLAN a network. Just because you CAN create a giant class A for every node in your network, why in the world would you? 3) Data security is another reason. If your clusters serve particular purposes such as DMZ vs Internal vs Partner/Customer clusters or simply Prod vs Dev, you don't want to have the ability for someone to accidentally map storage of an internal or partner system onto a DMZ server. While controls should be in place to prevent it from a process perspective, you will never be able to fix "human error", unless you simply make it impossible to do.
65 Mapping: Datastore Replication You should also have a datastore replication diagram.Just like previous diagram, is serves the same purpose. So keep it simple.
66 Type of Datastores Types of datastore Business VMTier 1 VM, Tier 2 VM, Tier 3 VM, Single VMEach Tier may have multiple datastores.IT VMStaging VMFrom P2V process, or moving from Non-Prod to Prod.Isolated VMTemplate & ISODesktop VMMostly local datastore on ESXi host, backed by SSD.SRM PlaceholderDatastore HeartbeatPro: Dedicated DS so we don’t accidentaly impact while offlining a datastore.Cons: another 2 DS to manage per cluster. Increase scanning time.Can use the SRM placeholder as heartbeat?Always know where a key VM is stored. A Datastore corruption, while rare, is possible.1 datastore = 1 LUNRelative to “1 LUN = Many VMFS”, it gives better performance due to less SCSI reservationOther guides:Use Thin Provisioning at array level, not ESX level.Separate Production and Non Production. Add a process to migrate into Prod. You can’t guarantee Production performance if VM moves in and out without control.RAID level does not matter so much if Array has sufficient cache (with battery backed, naturally)20% free capacity for VM swap files, snapshots, logs, thin volume growth, and storage vMotion (inter tier).VMware, NetApp, & EMC all recommend that an applications with high I/O requirements or one which is sensitive to latency variation - these require a storage design that focuses on that particular VM, and should be isolated form other datasets. Ideally, the data will reside on a VMDK stored on a datastore that is connected to multiple ESX servers, yet is only accessed by a single VM. The name of the game with these workloads isn’t scale in terms of VMs per datastore, but scaling the performance of one VM.
67 Special Purpose Datastore 1 low cost Datastores for ISO and TemplatesNeed 1 per vCenter data center.Need 1 per physical Data Center. Else you will transfer GBs of data across WAN.Around 1 TBISO directory structure:1 staging/troubleshooting datastoreTo isolate a VM. Proof to Apps team that datastore is not affected by other VM.For storage performance study or issue. Makes it easier to corelate with data from Array.The underlying spindles should have enough IOPS & Size for the single VMOur sizing:Small Cloud: 1 TBLarge Cloud: 1 TB1 SRM Placeholder datastoreSo you always know where it is.Sharing with other datastore may confuse others.Used in SRM 5 to place the VMs metadata so it can be seen in vCenter.10 GB enough. Low performance.\ISO\\OS\Windows\OS\Linux\Non OS\ store things like anti virus, utilities, etc
68 Storage Capacity Planning Theory and Reality can differ. Theory is the initial, high level planning you do. Reality is what it is after 1-2 years.Theory or Initial PlanningFor green field deployment, use the Capacity Planner. The info on actual usage is usefull as the utilisation can be low. The IOPS info is good indicator too.For brown field deployment, use the existing VMs as indicator. If you have virtualised 70%, this 70% will be a good indicator as it’s your actual environmentYou can also use rules of thumb, such as:100 IOPS per normal VM.100 IOPS per VM is low. But this is a Concurrent Average. If you have 1000 VM, this will be 100K IOPS.500 IOPS per database VM20 GB per C:\ drive (or where you store OS + Apps)50 GB per data drive for small VM500 GB per data drive for Database VM2 TB per data drive for File serverActual or RealityUse tool, such as VC Ops 5.6 for actual measurement.VC Ops 5.6 needs to be configured (read: tailored) to your environment.Create custom groups. For each group, adjust the buffer accordingly.You will need at least 3 groups, 1 per tier.I’d not use spreadsheet or Rules of Thumb for >100 VM environment.Use tools to find out the peak period of a VM. The built-in chart at vCenter requires you to see each VM disk usage manually, which can be time consuming for 100 VM.FLASH is becoming the storage of choice for performance sensitive environmentsCreates a new tier of storage between system memory and diskFLASH as is faster than disk, FLASH in the server is faster than FLASH in the SAN
69 Multi-Pathing Different protocol has different technology NFS, iSCSI, FC all have different solutionNFS uses single-path for a given datastore. No multi-pathing. So use multiple datastore to spread loadIn this design, I do not go for high-end array due to costHigh-end Array gives Active/Active, so we don’t have to do regular load balancing.Most mid-range is Active-Passive (ALUA). Always ensure the LUNs are balanced among the 2 SP. This is done manually within the array.Choose ALUA array instead of plain Active/PassiveLess manual work on the balancing and selecting the optimal path.Both controller can receive IO request/command, although only 1 owns the LUN.Path from the managing controller is the optimized path.Better utilization of the array storage processors (minimize unnecessary SP failover)vSphere will show both path as Active, but the Preferred one is marked “Active IO”Round Robin will issue IO across all optimized paths and will use non-optimized paths only if no optimized paths are available.SeeArray TypeMy selectionActive/ActiveRound Robin or FixedALUARound Robin or MRUActive/PassiveEMCPowerPath/VE 5.4 SP2Dell EquaLogicEquaLogic MMPHP/HDSPowerPath/VE 5.4 SP2?ALUA allows hosts to determine the states of target ports and prioritize paths. The host uses some of the active paths as primary while others as secondary.vSphere 4.0 changes the naming pattern from vmhbaN:N:N:N by adding the channel.The channel is useful in iSCSI, although not all iSCSI array vendors use it.Vmware software iSCSI is part of the Cisco iSCSI Initiator Command Reference.Dell/Equalogic PSP- Uses a “least deep queue” algorithm rather than basic round robinCan redirect IO to different peer storage nodesAsymmetric Logical Unit Access (ALUA). ALUA-complaint storage systems provide different levels of access per port. ALUA allows hosts todetermine the states of target ports and prioritize paths. The host uses some of the active paths as primary while others as secondary.Picking between MRU and Fixed is easy in my opinion as MRU is aware of optimized and unoptimized paths it is less static and error prone than Fixed. When using MRU however be aware of the fact that your LUNs need to be equally balanced between the storage processors, if they are not you might be overloading one storage processor while the other is doing absolutely nothing. This might be something you want to make your storage team aware off. The other option of course is Round Robin. With RR 1000 commands will be send down a path before it switches over to the next one. Although theoretically this should lead to a higher throughput I haven’t seen any data to back this “claim” up. Would I recommend using RR? Yes I would, but I would also recommend to perform benchmarks to ensure you are making the right decision.
70 FC: Multi-pathing VMware recommends 4 paths Path is point to point. The Switch in the middle is not part of the path as far as vSphere is concerned.Ideally, they are all active-active for a given datastore.Fixed means 1 path active, 3 idle.1 zone per HBA port. The zone should see all the Target ports.If you are buying new SAN Switches, consider the direction for the next 3 years.Whatever you choose will likely be in your data center for the next 5 years.If you are buying a Director-class, then consider for the next 5 years. Upgrading Director is a major work, so plan for 5 years usage. Consider both EOL and EOSL date.Discuss with SAN switches vendors and understand their roadmap.8 Gb and FCoE are becoming commonRound-RobinIt is per Datastore, not per HBA.1 ESX host typically has multiple datastores.1 Array certainly has multiple datastores.All these datastores share the same SP, Cache, Ports, and possibly spindles.It is active/passive at a given datastore.Leave the default settings of No need to set iooperationslimit=1Choose this over MRU. MRU needs manual fail back after path failure.When you start your ESX/ESXi host or rescan your storage adapter, the host discovers all physical paths tostorage devices available to the host. Based on a set of claim rules defined in the /etc/vmware/esx.conf file,the host determines which multipathing plug-in (MPP) should claim the paths to a particular device andbecome responsible for managing the multipathing support for the device.By default, the host performs a periodic path evaluation every 5 minutes causing any unclaimed paths to beclaimed by the appropriate MPP.The claim rules are numbered. For each physical path, the host runs through the claim rules starting with thelowest number first. The attributes of the physical path are compared to the path specification in the claim rule.If there is a match, the host assigns the MPP specified in the claim rule to manage the physical path. Thiscontinues until all physical paths are claimed by corresponding MPPs, either third-party multipathing pluginsor the native multipathing plug-in (NMPBelmont:Note that the FC switches need not be physically separated especially when you have multiple zone, the complexity in management and lack of flexibility can be an issue. Look into VSANs when you have many zones and leverage on FCoE capable switches for investment protection, since you are not going to deploy this for just 1-2 years…
71 FC: Zoning & MaskingImplement zoningDo it before going live, or during quite maintenance window due to high risk potential1 zone per HBA port.1 HBA port does not need to know the existence of others.This eliminates the Registered State Change NotificationUse soft zoning, not hard zoningHard zone: zone based on the SAN Switch port. Any HBA connects to this switch port get this zone. So this is more secure. But be careful when recabling things into the SAN switch!Soft zone: zone based on the HBA port. The switch port is irrelevant.Situation that needs rezoning in Soft Zone: Changing HBA, replacing ESX server (which comes with new HBA), upgrading HBASituation that needs rezoning in Hard Zone: reassigning the ESX to another zone, port failure in the SAN switch.Virtual HBA can further reduce cost and offer more flexibilityImplement LUN MaskingComplement zoning. Zoning is about path segregation, zone is about access.Do at array level, not ESX level.Mask on the array, not on each ESXi host.Masking done at the ESXi host level is often based on controller, target, and LUN numbers, all of which can change with the hardware configurationPort zoning allows devices attached to particular ports on the switch to communicate only with devices attached to other ports in the same zone. The SAN switch keeps a table indicating which ports are allowed to communicate with each other. Port zoning is more secure than WWN zoning, but it creates a number of problems because it limits the flow of data to connections between specific ports on the fabric.We use QLogic and Emulex open source Linux drivers, which are already FC-SW2 compliant (tested / validated with switch vendors).
72 FC: Zoning & Masking See the figure, there are 3 zones. Zone A has 1 initiator and 1 target. Single-Initiator zone is good.Zone B has two initiators and targets. This is bad.Zone C has 1 initiator and 1 targetBoth SAN switches are connected via an Inter-Switch Link.If Host X rebooted and it’s HBA in Zone B logs out of the SAN, an RSCN will be sent to Host Y’s initiator in Zone B and cause all I/O going to that initiator to halt momentarily and recover within seconds.Another RSCN will be sent out to Host Y’s initiator in Zone B when Host X’s HBA logs back in to the SAN and cause another momentary halt in I\O.Initiators in Zone A and Zone C are protected from these events because there are no other initiators in these zones.Most latest SAN switches provide RSCN suppression methods.But suppressing RSCNs is not recommended, since RSCNs are the primary way for initiators to determine an event has occurred and to act on the specified event such as lost of access to targets.Zoning is implemented in the switches and controls which HBA ports have access to which storage processor portsSource:In the figure above, there are three zones illustrated. Zone A and Zone C both have one initiator and one target, while Zone B has two initiators and targets. Both switches in the SAN are connected via an Inter-Switch Link. If Host X rebooted and it’s HBA in Zone B logs out of the SAN, an RSCN will be sent to Host Y’s initiator in Zone B and cause all I/O going to that initiator to halt momentarily and recover within seconds. However, another RSCN will be sent out to Host Y’s initiator in Zone B when Host X’s HBA logs back in to the SAN and cause another momentary halt in I\O. Initiators in Zone A and Zone C are protected from these events because there are no other initiators in these zones. Most latest SAN switches provide RSCN suppression methods, however, suppressing RSCNs is not recommended since RSCNs are the primary way for initiators to determine an event has occurred and to act on the specified event such as lost of access to targets. It is important to follow established SAN best practices such as single initiator zones in order to avoid situations described and others not listed.
73 Large: Reasons for FC (partial list) Network issue does not create storage issueTroubleshooting storage does not mean troubleshooting network tooFC vs IPFC protocol is more efficient & scalable than IP protocol for storagePath failover is <30 seconds, compared with <60 seconds for iSCSILower CPU costSee the chart. FC has lowest CPU hit to process the IO, followed by hardware iSCSIStorage vMotion is best served with 10 GEFC considerationNeed SAN skills. Troubleshooting skills, not just Install/Configure/Manage.Need to be aware of WWWWW. This can impact upgrade later on as new component may not work with older componentWe need to look at TCO vs per device cost, the 1G complexity can lead to outages due to L2 complexity in a highly virtualized environments. Further, if you work out the bandwidth and HA requirement for Clusters and VM mobility, the access switch design may not be cheap.In order to use VMware Storage VMotion your storage infrastructure must provide sufficient availablestorage bandwidth. For the best Storage VMotion performance you should make sure that the availablebandwidth will be well above the minimum requiredFor iSCSI and NFS, make sure that your network topology does not contain Ethernet bottlenecks, wheremultiple links are routed through fewer links, potentially resulting in oversubscription and droppednetwork packets. Any time a number of links transmitting near capacity are switched to a smaller numberof links, such oversubscription is a possibility.Recovering from dropped network packets results in large performance degradation. In addition to timespent determining that data was dropped, the retransmission uses network bandwidth that couldotherwise be used for new transactions.Applications or systems that write large amounts of data to storage, such as data acquisition ortransaction logging systems, should not share Ethernet links to a storage device. These types ofapplications perform best with multiple connections to storage devicesBefore using VMware Storage VMotion make sure you have sufficient storage bandwidth between theESX host where the VM is running and both the source and destination storage arrays. Thisis necessary because the VM will continue to read from and write to the source storage array,while at the same time the virtual disk to be moved is being read from the source storage array and writtento the destination storage array. While this is happening, both storage arrays may have other traffic (fromother VMs on the same ESX host, from other ESX hosts, etc.) that can further reduce theeffective bandwidth.With insufficient storage bandwidth, Storage VMotion can fail. With barely sufficient bandwidth, whileStorage VMotion will succeed, its performance might be poor.When possible, schedule Storage VMotion operations during times of low storage activity, when availablestorage bandwidth is highest. This way Storage VMotion will have the highest performance.
74 Large: Backup with VADP 1 back up job per ESX, so impact to production is minimized.Because there were four ESX server/datastore pairs, four NetBackup policies were configured – one policy per ESX server. This allowed us to limit the numberof simultaneous backups that occurred against each ESX server. Using this method, the backup I/O load on each ESXdatastore was similar, backup performance and reliability was optimizedNotes for the NFS (Small and Medium cloud):With NFS and array based snapshots, one has the greatest ease and flexibility on what level of granularity canbe restored. With an array-based snapshot of a NFS datastore, one can quickly mount a point-in-time copy ofthe entire NFS datastore, and then selectively extract any level of granularity they want. Although this doesopen up a bit of a security risk, NFS does provide one of the most flexible and efficient restore from backupoption available today. For this reason, NFS earns high marks for ease of backup and restore capability.The vSphere 4 vStorage APIs for Data Protection have been designed so that no additional holding tank or staging is required. This also means that the concept of a backup proxy no longer applies. VM backups can be configured using standard NetBackup master server, media server, or clients. Special purpose BackupProxy systems designed specifically for VM backups and additional staging area storage no longer need to be purchased.Another feature of the vStorage API is changed block tracking. This is a block-level incremental backup implementation. After the initial full backup is performed, subsequent block-level incremental backups transfer to the backup system only the blocks that have changed since the previous full or incremental backup.This shortens backup windows while retaining full disaster recovery restore functionalityVMware Consolidated Backup did not support an incremental backup technology where the entire VM could be automatically restored from both full and incremental (vmdk) backups to a specific point in time.
75 Backup Server A backup server is an "I/O Machine" By far, majority of work done is I/O relatedPerformance of disk is keyFast internal bus is key. Multiple internal buses desirable.No share path. 1 port from ESX (source) and 1 port to tape (target)Lots of data in from clients and out to disk or tapeNot much CPU usage. 1 socket 4-core Xeon 5600 is more than sufficientNot much RAM usage. 4 GB is more than enoughBut Deduplication uses CPU and RAMDeduplication relies on CPU to compare segments (or blocks) of data to determine if they have been previously backed up or if they are unique.This comparison is done in RAM. Consider 32 GB RAM (64 bit Windows)Size the concurrency properlyToo many simultaneous backups can actually slow the overall backup speed.Use backup policy to control the number of backups that occur against any datastore. This minimizes that I/O impact on datastore, as it must still serve production usage.2 ways of back up:Mount the VMDK file as a virtual disk (with a drive letter). Back up software can then browse the directory.Mount the VM as image file.Source: Symantec public presentation at VMworld 2009 and VMware documentationDeduplication relies heavily on both memory (RAM) and CPU resources. Today’s CPUs are powerful that for traditional backups it iscommon for the backup system CPU to be underutilized. But deduplication changed this significantly. Deduplication relies heavily on CPU power to compare segments (or blocks) of data to determine if they havebeen previously backed up or if they are unique. More and faster CPUs can improve overall deduplication performancewhich in turn improves backup performanceDeduplication technologies are particularly suited to take advantage of large amounts of RAM. Beforebackup data is committed to disk, it is compared with data that has been previously backed up. This comparison processis performed in RAM instead of constantly comparing backup data that exists on disk
77 Methodology Plan how VXLAN and SDN impacts your architecture Define how vShield will complement your VLAN based networkDecide if you will use 10 GE or 1 GEI’d go for 10 GE for the Large Cloud exampleIf you use 10 GE, define how you will use Network IO ControlDecide if you use IP storage or FC storageDecide the vSwitch to use: local, distributed, NexusDecide when to use Load Based TeamingSelect blade or rack mountThis has impact on NIC ports and SwitchesDefine the detailed design with vendor of choice
78 IP Multicast forwarding is required (based on IETF draft) VXLANComplete isolation of network layerOverlay networks are isolated from each other and the physical networkSeparation of Virtualization and Network layersPhysical network has no knowledge of virtual networksVirtual networks spun up automatically as needed for VDCsLoss of visibility as all overlay traffic is now UDP tunneledCan’t isolate virtual network traffic from physical networkVirtual networks can have overlapping address spacesToday’s network management tools useless in VXLAN environmentsThe physical network – while excellent at forwarding packets – is an inflexible, complex and costly barrier to realizing the full agility required by cloud servicesNetworking is bogged down in a 20-year-old operational model originally designed for manual provisioning on a device-by-device basis.Capability required:Network fault managementConfiguration managementSNMP MIB pollingProtocol analysisCapacity planning / modelingChallenges Edge Solves for Peak 10 (based on VMworld presentation, INF-NET2166. They are SP provider)Reduce physical firewall sprawlReduce Ethernet cross connectsDramatically reduce provisioning time for new customersAbility to offer customers a VPN/SSL-VPN solution without the need for a dedicated hardware applianceLoad balancing solution for smaller customersPhysical firewall inventory will be greatly reducedEase of deployment (first level support can deploy new firewalls)Things to share with Network team regarding VXLAN:IP Multicast forwarding is required (based on IETF draft)More multicast groups are betterMultiple segments can be mapped to a single multicast groupIf VXLAN transport is contained to a single VLAN, IGMP Querier must be enabled on that VLANIf VXLAN transport is traversing routers, multicast routing must be enabled.Increased MTU needed to accommodate VXLAN encapsulation overheadPhysical infrastructure must carry 50 bytes more than the VM VNIC MTU size. e.g MTU on VNIC -> 1550 MTU on switches and routers.Leverage 5-tuple hash distribution for uplink and interswitch LACPIf VXLAN traffic is traversing a router, proxy ARP must be enabled on first hop routerPrepare for more traffic between L2 domains
79 Network Architecture (still VLAN-based, not vCNS-based) The above shows a fully virtualised network, where the network appliance is virtualised. From here you can show to network or security team that it is indeed possible to mix all the VLAN in the same ESX cluster.It is not using vCNS yet. This is still the traditional VLAN based solution.
80 ESXi Network configuration LBT is per port group, not per switch!The diagram does not include vShield App yetIt adds 1 hidden vSwitch per vSwitch for VM. Management network does not require vShield Zone protection.When using Cross-host Storage vMotion (shared nothing vMotion), to migrate a virtual machine with snapshots, you should provision at least a 1Gbps management network. This is because vMotion uses the Network File Copy (NFC) service to transmit the virtual machine's base disk and inactive snapshot points to the destination datastore. Because NFC traffic traverses the management network, the performance of your management network will determine the speed at which such snapshot content can be moved during vMotion migration.As in the non-snapshot disk-copy case, however, if the source host has access to the destination datastore, vMotion will preferentially use the source host’s storage interface to make the file copies, rather than using the management network.
81 Design Consideration Design consideration for 10 GE We only have 2-4 physical port.This means we only have 1-2 vSwitch.Some customers have gone with 4 physical ports as 20 GE may not be enough for both Storage and NetworkDistributed Switch relies on vCenterDatabase corruption on vCenter will impact it.vCenter availability is more critical.Use Load Based TeamingThis prevents one burst from impacting Production. For example, a large vMotion can send a lot of traffic.Some best practicesEnable jumbo frameDisable STP on ESXi-facing ports on the physical switchEnable PortFast mode on ESXi-facing portsDo not use DirectPath IO, unless the app really has proof that it needs it.vSphere 4.1 onward can do 8 concurrent vMotion in 10 GE.LACP in 5.1With LACP, no longer need to do static mode EtherChannels for link aggregation.LACP could not be used with either default Port ID or Load-based Teaming. Both the physical and the virtual switch using LACP must allow all MAC addresses to be sent over all interfaces simulatenously and the only vSphere NIC Teaming Policy that is compatible with this is IP Hash.LACP has a few advantages in the way it handles link failures and cabling mistakes.Source:--
82 Network IO Control 2x 10 GE is much preferred to 12x 1 GE 10 GE ports give flexibility. Example, vMotion can exceed 1 GE when physical cable not used by other trafficBut a link failure means losing 10 GEExternal communication can still be 1 GE. Not an issue if most communication is among VM.Use Use ingress traffic shaping to control traffic type into the ESX?Shares Bandwidth(per pNIC)FunctionvShieldRemarks20%VM – ProductionVM – Non ProductionVM – Admin NetworkVM – Back up LAN (agent)YesA good rule of thumb is ~8 VM’s per GigabitAdmin Network is used for basic network services like DNS server, AD Server.Use vShield App to separate with Production. Complement existing VLAN, no need to create more VLANThe Infra VM is not connected to Production LAN, rather they are connected to Management LAN.10%Management LANVMware ManagementVMware Cluster HeartbeatNoIn some cases, the Nexus Control & Nexus Packet need to be physically separated from Nexus Management.vMotionNon routable, private network15%Fault Tolerant0 – 10%VM – TroubleshootingSame with Production. Used when we need to isolate the networking performance5%Host-Based ReplicationNo?Only for ESXi that is assigned to do vSphere replication. From throughput point of view, if the inter-site link is only 100 Mb, then you only need 0.1 GE max.StorageWhy I don’t use 1 GEConsider 1-2 year ahead while looking at NIC ports. You may need to give more network for VM as you are running more VM , or running network-demanding VM.Once wired, it is hard (expensive too) to rewire. All cables are already connected and labelled properly to each physical switch ports.For blade, the choice is pretty much 10 GE as it has built-inBut for the real physical deployment our best practice is using 2x 10G (standard 10G or FCoE 10G)Use of Jumbo Frames is recommended for best VMotion performance.Physical switch must support Jumbo Frames too.Jumbo Frames adds complexity to network design and maintenance over timePerformance gains are marginal with common block sizes (4KB, 8KB, 32KB). vMotion uses large block sizeNote: IEEE 802 standards do not recognize jumbo framesWhile Jumbo Frame has 9000 Bytes, we need to deduct:- 20 Bytes for IP Header8 Bytes for ICMP HeaderIn OS like Windows, the MTU should be will fail at ping.In some apps like SQL Server, the default packet size should be set to 8192, as 8192/64 = 128.
83 Large: IP Address scheme The example below is based on 1500 server VM and desktop VM, which is around 125 ESX and 125 ESX respectively.Do we separate the network between Server and Desktop farms?Since we are using the x.x.x.1 address, the basic network address (gateway) will be on x.x.x.254.PurposeIP AddressTotal SegmentsRemarksESX iLO1 per ESX1Out of band management & console accessESX Mgmt1 per ESX.ESX iSCSI1-2 per ESXNeed 2 (1 address per active path) if we don’t use LBT and do static mappingESX vMotion2 per ESXMulti-NIC vMotionESX FTCannot multi path?Agent VMs5 per ESX3vShield App, TrendMicro DS, Distributed Storage,etc.Mgmt VMs1 per DCvCenter, SRM, Update Manager, vCloud, etc.Group in 20 so similar VMs have sequential IP address, easier to rememberAddressESXi #001ESXi #125RemarksiLOx for Server farm. Enough for 254 ESX.x for Desktop farmx for non ESX (e.g. network switch, array, etc)Mgmtx for Server farm. Enough for 254 ESX.x for Desktop farmiSCSIThis is for ESX only. Other devices should be on xVSA will have many addresses when it scales beyond 3 ESX.vMotionFault ToleranceAgent VMsFor iSCSI and vMotion to utilise >1 physical path, it has to have 1 IP address per path. The NIC teaming must be set to
84 Security & Compliance Design Example: tracking configuration changes in vSphere & patching are part of compliance, not security.
85 Areas to consider VM Server Storage Network Management Guest OS vmdk & Prevent DoSLog reviewVMware ToolsLockdown modeFirewallSSHLog reviewZoning and LUN maskingVMFS & LUNiSCSI CHAPNFS storageVLAN & PVLANManagement LANNo air gap with vShieldVirtual appliancevSphere rolesSeparation of dutySource & toolsvSphere hardening guideVMware Configuration ManagerOther industry requirement like PCI-DSSTake advantage of vCNSChanging the paradigm in security. From “Hypervisor as another point to secure” to “Hypervisor to give unfair advantage for security team”.vShield App for firewall and vShield End Point for anti virus (only Trend Micro has the product as at Sep 2011)Does not need to throw away physical firewall first. Complement it by adding “object-based” rules that follows the VM.VM securityvmdk encryption is not currently provided by vSphere, so we need to encrypt within the Guest OS instead.Need to ensure that the global AD administrator does not have access to mount the drive or login to any Windows. But strictly speaking this is outside the VMware security scope, as it’s normally done by the AD team.If a VM is compromised, it can be used to launch denial of service attack. It can also impact the performance of other shared component. Areas that can be impactedCPU and RAM: the VM can max out its vCPU & vRAM, or keep writing to its RAM, inducing load on vmkernel to serve it.Network: the VM can saturate the shared vmnic. Network IO Control can help here.Storage IOPS: the VM can saturate the IOPS, easily done by tools like Iometer. Storage IO control can help here.Storage space: the VM can generate lots of logs or fill up its vmdk.
86 Separation of Duties with vSphere VMware Admin >< AD AdminIn small setup, it’s the same person doing both.AD Admin has access to NTFS. This can be too powerful if it has dataSegregate the virtual worldSplit vSphere access into 3.StorageServerNetworkGive Network to Network team.Give Storage to Storage team.Role with all access to vSphere should be rarely used.VM owner can be given some access that they don’t have in physical world. They will like the empowerment (self service)VMware AdminNetworking AdminServer AdminOperatorVM OwnerStorage AdminMS AD AdminNetwork AdminDBAApps AdminEnterprise IT spacevSphere space
87 Folder Properly use it Do not use Resource Pool to organise VM. Caveat: the Host/Cluster view + VM is the only view where you can see both ESX and VM.Study the hierarchy on the rightIt is Folder everywhere.Folder is the way to limit access.Certain object don’t have its own access control. They rely on folder.E.g. You cannot set permissions directly on a vNetwork Distributed Switches. To set permissions, create a folder on top of it.I tried the following:Create a folder.Put all local datastore into a folderSpecify no access for a user.Login as that user.Go to any ESX, go to the Configuration tab.The local datastore not visible!
88 Storage related access Non-Storage Admin should not have the following accessInitiate Storage vMotionRename or Move DatastoreCreateLow level file operationsDifferent ways of controlling accessNetwork level. The ESXi will not be able to access the entire array as it can’t even see it on the networkArray level. Control which ESXi hosts can or cannot see.For iSCSI, we can configure per target using CHAPFor FC, we can use Fibre Channel zoning or LUN maskingvCenter level. Using the vCenter permissions (folder level or datastore level). Most granular.You can now manage datastores like other inventory objects. That means you can configure and manage datastores from a centralized view, organize datastores into folders, and set permissions on a per folder or per datastore level. For example, it is now possible to block the creation of VMs or the creation of snapshots on a per datastore basis or through the use of folders.To set a permission on an individual datastore, select the datastore and then the Permissions tab. Right-click anywhere on the permissions tab and select Add Permission.To set a permission on a folder, right-click the folder and select Add Permission.
89 Network related access Server Admin should not have the following accessMove networkThis can be a security concernConfigure networkRemove networkServer Admin should haveAssign networkTo assign a network to a VMYou can now manage datastores like other inventory objects. That means you can configure and manage datastores from a centralized view, organize datastores into folders, and set permissions on a per folder or per datastore level. For example, it is now possible to block the creation of VMs or the creation of snapshots on a per datastore basis or through the use of folders.To set a permission on an individual datastore, select the datastore and then the Permissions tab. Right-click anywhere on the permissions tab and select Add Permission.To set a permission on a folder, right-click the folder and select Add Permission.
90 Roles and Groups Create new groups for vCenter Server users. Avoid using MS AD built-in groups or other existing groupsDo not use default user “Administrator” in any operationEach vCenter plug-in should have their own user, so you can differentiate among all the plug-inDisable the default user “Administrator”Use your own personal ID. The idea is security should be trace-able to an individual.Do not create another generic user (e.g. VMware Admin). This defeats the purpose, and is practically no different to “Administrator”Creating a generic user increase risk of sharing, since it has no personal data.Create 3 roles (not user) in MS ADNetwork AdminStorage AdminSecurity AdminCreate a unique ID for each of the vSphere plug-in that you useSRM, Update Manager, Chargeback, CapacityIQ, vShield Zone, Converter, Nexus, etcE.g. SRM Admin, Chargeback AdminThis is the ID that the product will use to login to vCenter.This is not the ID you use to login to this product. Use your personal ID for this purpose.This helps in troubleshooting. Otherwise too many “Administrator” and you are not sure who they _really_ are.Also, if the Administrator password has to change, then you don’t have to change everywhere.
92 Standard VM sizing: Follow McDonald 1 VM = 1 App = 1 purpose. No bundling of services.Having multiple application or services in 1 OS tend to create more problem. Apps team knows this better.Start with Small size, especially for CPU & RAM.Use as few virtual CPUs (vCPUs) as possible.CPU impact on scheduler, hence performanceHard to take back once you give them. Also, the app might be configured to match the processor (you will not know unless you ask the application team).Maintaining a consistent memory view among multiple vCPUs consumes resources.There is licencing impact if you assign more CPU. vSphere 4.1 multi-core can help (always verify with ISV)Virtual CPUs not used still consumes timer interrupts and execute the idle loops of the guest OSIn physical world, CPU tend to be oversized. Right size it in virtual world.RAMRAM starts with 1 GB, not 512 MB. Patch can be large (330 MB for XP SP3) and needs RAMSize impact vMotion, ballooning, etc, so you want to trim the fatTier 1 Cluster should use Large Page.Anything above XL needs to be discussed case by case. Utilise Hot Add to start small (need DC edition)See speaker notes for more infoEven if some vCPUs are not used, configuring VMs with them still imposes some small resource requirements on ESX: Some older guest operating systems execute idle loops on unused vCPUs, thereby consumingresources that might otherwise be available for other uses (other VMs, the VMkernel, the console, etc.). The guest scheduler might migrate a single-threaded workload amongst multiple vCPUs, thereby losing cache locality.The per-virtual-machine memory space overhead includes space reserved for the VM devices(e.g., the SVGA frame buffer and several internal data structures maintained by the VMware software stack).These amounts depend on the number of vCPUs, the configured memory for the guest operating system, andwhether the guest operating system is 32-bit or 64-bit.To check if we don’t have enough RAMLook at the value of Memory Balloon (Average) in the vSphere Client Performance Chart.An absence of ballooning suggests that ESX is not under heavy memory pressure and thus memoryovercommitment is not affecting performance. (Note that some ballooning is quite normal and not indicative of a problem.)b Check for guest swap activity within that VM.This can indicate that ballooning might be starting to impact performance (though swap activity can also be related to other issues entirely within the guest).c Look at the value of Memory Swap Used (Average) in the vSphere Client Performance Chart.Memory swapping at the host level would indicate more significant memory pressure.Large pageIn addition to the usual 4KB memory pages, ESX also makes 2MB memory pages available (commonlyreferred to as “large pages”). By default ESX assigns these 2MB machine memory pages to guest operatingsystems that request them, giving the guest operating system the full advantage of using large pages. The useof large pages results in reduced memory management overhead and can therefore increase hypervisor performance.If an operating system or application can benefit from large pages on a native system, that operating systemor application can potentially achieve a similar performance improvement on a VM backed with2MB machine memory pages. Consult the documentation for your operating system and application todetermine how to configure them each to use large memory pages.More information about large page support can be found in the performance study entitled Large PagePerformance (available atLarge Page file support in Windows/Linux (requires application level support)This is not enabled by default.Within windows, enable PAE extensions and grant permission to the service account running the application to leverage the additional useable memory space.In Linux, you must pre allocate the large pagesItemSmall VMMedium VMLargeCustomCPU1248 – 32RAM1 GB2 GB4 GB8, 12, 16 GB, etcDisk50 GB100 GB200 GB300, 400, etc GB
93 SMP and UP HALDoes not apply to recent OS such as Windows Vista, Win7, Win2008Design PrincipleGoing from 1 vCPU to many is ok.Windows XP and Windows Server 2003 automatically upgrade to the ACPI Multiprocessor HALGoing from many to 1 is not ok.To change from 1 vCPU to 2 vCPUMust change the kernel to SMP."In Windows 2000, you can change to any listed HAL type. However, if you select an incorrect HAL, the computer may not start correctly. Therefore, only compatible HALs are listed in Windows Server 2003 and Windows XP. If you run a multiprocessor HAL with only a single processor installed, the computer typically works as expected, and there is little or no affect on performance.Step to change:To change from many vCPU to 1.Step is simple. But MS recommends reinstall.“In this scenario, an easier solution is to create the image on the ACPI Uniprocessor computer. “Microsoft does not support running a HAL other than the HAL that Windows Setup would typically install on the computer. For example, running a PIC HAL on an APIC computer is not supported. Although this configuration may appear to work, Microsoft does not test this configuration and you may have performance and interrupt issues. Microsoft also does not support swapping out the files that are used by the HAL to manually change HAL types. Microsoft recommends that you switch HALs for troubleshooting purposes only or to workaround a hardware problem.There are two types of hardware abstraction layers (HALs) and kernels: UP and SMP.UP historically stood for “uniprocessor,” but should now be read as “single-core.”SMP historically stood for “symmetric multi-processor,” but should now be read as multi-core.
94 MS Windows: Standardisation Data Center edition is cheaper on >6 VM per boxMS Licensing is complex.Table below may not apply in your caseper VM. 10 VM means 10 licenceper 4 VM. 10 VM means 3 licenceper socket. 2 socket means 2 licence. Unlimited VM per boxSource:
95 Guest OS Use 64-bit if possible Access to > 3 GB RAM. Performance penalty is generally negligible, or even negativeIn Linux VM, Highmem could show significant overheads with 32 bit. 64 bit guests can offer better performance.Large memory footprint workloads will benefit more with 64 bit guestsSome Microsoft & VMware products have dropped support for 32 bitIncrease scalability in VM.Example: for Update Manager 4If it is installed on 64 bit Windows, it can concurrently scan 4000 VM. But if it’s installed on 32 bit, the concurrency drops to 200Powered‐on Windows VM scan per VUM server is 72.Most other numbers are not as drastic as the above example.Disable unnecessary device from Guest OSChoose the right SCSI controllerSet the right IO Time outOn Windows VM, increase the value of the SCSI TimeoutValue parameter to allow Windows to better tolerate delayed I/O resulting from path failover.For Windows VM, stagger anti-virus scan. Performance will degrade significantly if you scan all VM simultaneouslyUnload ESX host USB drivers if not required
97 vCenter Run vCenter Server as a VM vCenter Server VM best practices: Disable DRS on all vCenter VMs. Move them to first ESXi on your farm.Always remember where you run your vCenter.Remember both the host name and IP address of that first ESXi host.Start in this order: Active Directory DNS vCenter DB vCenterSet HA to high priorityLimitationsWindows patching of vCenter VM can’t be done via Update ManagerCan’t cold clone the VM. Use hot clone instead.VM-level operation that requires the VM to be powered-off, can be done via ESX.Login directly to the ESXi host that has the vCenter VM. Do the changes, then boot the VM.Not connected to Production LAN. Connect to management LAN, so VLAN Trunking required as vSwitches are shared (assuming you are not having dedicated IT Cluster)SecurityProtect the special-purpose local vSphere administrator account from regular usage. Instead, rely on accounts tied to specific individuals for clearer accountability.Other configurationKeep the Statistic Level at Level 1. But use vCenter Operations to complement.Level 3 is a big jump in terms of data collectedYou can use the Microsoft Windows built-in system account or a user account to run vCenter Server.The Microsoft Windows built-in system account has more permissions and rights on the server thanthe vCenter Server system needs, which can contribute to security problemsLevel 2 vs Level 1:As an example, an average quad processor, ESX server will have 6 metrics collected at level1 during a sample interval, while level 2 collects a total of about 20 (+/- a few based on the number of devices in the host).This level is used most often in environments that do capacity planning andcharge back on VM’s. It allows you a pretty granular look at the informationabout the core four without grabbing level 3 counters (which is a big jump in the amount of metrics monitored.)
98 Naming conventionObjectStandardExamplesRemarksData centerPurposeProductionThis is the virtual data center in vCenter. Normally, a physical data centers has 1 or many virtual data center.As you will only have a few of these, no need to create cryptic naming convention.Avoid renaming it.ClusterAs above.ESXi host nameEsxi_locationcode_##.domain.nameesxi_SGP_01.vmware.comesxi_KUL_01.vmware.comDon’t include version no as it may change.No space.VMProject_Name Purpose ##Intranet WebServer 01Don’t include OS name.Can include spaceDatastoreEnvironment_type_##PROD_FC_01TEST_iSCSI_01DEV_NFS_01Local_ESXname_01Type is useful when we have multiple type.If you have 1 type, but multiple vendor, you can use vendor name (EMC, IBM, etc) instead.Prefix all Local so they are separated easily in the dialog boxes.“Admin ID” forProductName-PurposeVCOps-CollectorChargeback-All the various plug-in to vSphere needs Admin access.FolderAvoid special characters as you (or other VMware and 3rd party products or plug-in) may need to access them programmatically.If you are using VC Ops to manage multiple vCenters, then the naming convention should ensure it’s unique across vCenters.
99 vCenter Server: HA Many vSphere features depend on vCenter Distributed SwitchAuto-DeployHA (management)DRS and vMotionStorage vMotionLicensingMany add-on depends on vCentervShieldvCenter SRMVCMvCenter OperationsvCenter ChargebackvCloud DirectorView + ComposerImplement vCenter HeartbeatAutomated recovery from hardware, OS, application, networkAwareness of all vCenter Server componentsOnly solution fully supported by VMwareCan protect database (SQL Server) and vCenter plug-insView Composer, Update Manager, Converter, OrchestratorSRM, vCD, View utilizes vCenter to manage VMs (power on, provision, etc.)Loss of data in Chargeback could mean loss of revenue for the IT organizationCapacityIQ information and models may be skewed if vCenter is down frequently and/or for long periods of timeConfiguration Manager and Operations relies on vCenter Server for datavSphere Storage Appliance (VSA) requires vCenter to manageSome effect of vCenter Server downtime:Management requires direct connection to hostCan't provision new VMs from templatesNo host profilesHistorical records will have gaps during outagesvMotion, DRS (incl. Storage) unavailableDistributed Switches - Unable to deploy and managevCenter Plug-in’s (ex. VUM) unavailableHA failover works but can’t be reconfigured99
100 vMA: Centralised Logging BenefitsAbility to search across ESXconvenienceBest practicesOne vMA per 100 hosts with viloggerPlace vMA on management LANUse static IP address, FQDN and DNSLimit use of resxtop (used for real time troubleshooting not monitoring)Enable remote system logging for targetsvilogger (enable/disable/updatepolicy/list)Rotation default is 5Maxfiles defaults to 5MBCollection period is 10 secondsESX/ESXi log files go to /var/log/vmware/<hostname>vxpa logs are not sent to syslogSee KB