Presentation is loading. Please wait.

Presentation is loading. Please wait.


Similar presentations

Presentation on theme: "VCAP-DCD, TOGAF Certified"— Presentation transcript:

1 VCAP-DCD, TOGAF Certified
Software Defined Datacenter Sample Architectures based on vCloud Suite 5.1 Singapore, Q2 2013 VCAP-DCD, TOGAF Certified Iwan ‘e1’ Rahabok | M: | This is a sample architecture, not The architecture. There is no 1 size fits all and it all depends.

2 Purpose of This Document
Use it like a book, not slide There is a lot of talk about private cloud. But how does it look like at technical level? How do we really assure SLA, and have 3 Tier of service? If I’m a small company with just 50 servers, what does my architecture look like? If I have 2000 VM, how does it look like? For existing VMware customers, I go around and do a lot of “health check” at customers site. The No 1 question is around design best practice. So this doc serves as quick reference for me. I can pull a slide from here for discussion. I am employee of VMware. But this is my personal opinion. Please don’t take it as official and formal VMware recommendation. I’m not authorised to do so. Also, generally we should judge the content, rather than the organisation/person behind the content. A technical fact is a technical fact, regardless whether an intern said it or 50-year IT engineer said it. Technology changes 10 Gb ethernet, Flash, SSD disk, FCoE, Converged Infrastructure, SDN, NSX, storage virtualisation, etc will impact the design. A lot ot new innovation coming within next 2 years, and some already featured in VMworld New modules/products from VMware’s Ecosystem Partners will also impact the design. This is a guide Not a Reference Architecture, let alone a Detailed Blueprint. Don’t print and follows to the dot. This is for you to think and tailor. It is written for hands-on vSphere Admin who have attended Design Workshop & ICM A lot of the design consideration is covered in vSphere Design Workshop. It complements vCAT 3.0 You should be at least a VCP 5, preferably VCAP-DCD 5 No explanation on features. Sorry, it’s already >100 slides. With that, let’s have a professional* discussion * Not emotional & religious & political discussion  Let’s not get angry over technical stuff. Not worth your health. Folks, some disclaimers: This decks builds from the vSphere Design Workshop. I’ve attended the course and highly recommend it.

3 Table of Contents Introduction Architecting in vSphere Application with special consideration Requirements & Assumptions Design Summary vSphere Design: Datacenter Datacenter, Cluster (DRS, HA, DPM, Resource Pool) vSphere Design: Server ESXi, physical host vSphere Design: Network vSphere Design: Storage vSphere Design: Security & Compliance vCenter roles/permission, config management vSphere Design: VM vSphere Design: Management See this deck: Disaster Recovery See this deck: Additional Info: me for Appendix slide Refer to Module 1 and Module 2 of vSphere Design Workshop. I’m going straight into more technical material here. Topic only covers items that have major design impact. Non-design items are not covered. This focuses on vCloud Suite 5.1 (infrastructure). Application specific (e.g. Database) is not covered. Some slides have speaker notes for details. Specifically, the focus here is the vSphere layer, not the vCloud Director layer. It is important to get the foundation right. For vCD, I recommend vCAT

4 Introduction

5 vCloud Suite Architecture: what do I consider
Architecturing a vSphere-based DC is very different to physical Data Center It breaks best practice, as virtualisation is a disruptive technology and it changes paradigm. Do not apply physical-world paradigm into virtual-world. There are many “best practices” in physical world that are caused by physical world limitation. Once the limitation is removed, the best practice is no more valid. Adopt emerging technology as virtualisation is still innovating rapidly. Best practice means proven practice, and that might mean outdated practice. Consider unrequested requirements as business expect cloud to be agile. You have experienced VM sprawl right  My personal principle: Do not design something you cannot troubleshoot. A good IT Architect does not setup potential risk for Support Person down the line. I tend to keep things simple and modular. Cost will go up a bit, but it is worth the benefits. What I consider in vSphere based architecture No 1: Upgradability This is unique in the virtual world. A key component of cloud that people have not talked much. After all my apps run on virtual infrastructure, how do I upgrade the virtualisation layer itself? How do you upgrade SRM? Based on historical data, VMware releases major upgrade every 2 years. Your architecture will likely span 3 years, so check with your VMware rep for NDA roadmap presentation No 2: Debug-ability Troubleshooting in virtual environment is harder than physical, as boundary is blurred and physical resources are shared. 3 types of troubleshooting: Configuration. This does not normally happen in production, as once it is configured, it is not normally changed. Stability. Stability means something hang or crash (BSOD, PSOD, etc) or corrupted Performance. This is the hardest among the 3, especially if the slow performance is short lived and in most cases it is performing well. This is why the design has extra server and storage, so we can isolate some VM while doing joint troubleshooting with App team. Supportability This is related, but not the same with Debug-ability. Support relates to things that make day to day support easier. Monitoring counters, reading logs, setting up alerts, etc. For example, centralising the log via syslog and providing intelligent search improves Supportability A good design makes it harder for Support team to make human error. Virtualisation makes task easy, sometimes way too easy relative to physical world. Consider this operational/phychological impact in your design. Physical world thinking or best practice A lot of them do not make sense in virtual world. Architecting a virtual DC based on physical DC school of thought will result in inferior design. Supportability also means using components that are supported by the vendors. But this should be obvious as we should not deploy unsupported configuration. For example, SAP support is from certain versions onward. vSphere 4.0 was released on May 2009, 5.0 was Sep 2011.

6 vCloud Suite Architecture: what do I consider
Consideration Cost You will notice that the “Small” Design example has a lot more limitations than the “Large” Design. An even bigger cost is ISV. Some, like Oracle, charges for the entire Cluster. Dedicating cluster for them is cheaper. DR Site serves 3 purposes to reduce cost. VMs from different Business Units are mixed in 1 cluster. If they can share same Production LAN and SAN, same reason can apply to hypervisor. Window, Linux and Solaris VMs are mixed in 1 cluster. In large environment, separate them to maximise your OS license. DMZ and non DMZ are mixed in 1 cluster. Security & Compliance vSphere Security Hardening Guide split security into 3 levels: Production, DMZ and SSLF Prod and Non-Prod don’t share the same cluster, storage, network Easy to make mistake. Easy to move in and out of Production environment. Production is more controlled and secure Non-Prod may spike (e.g. doing load testing). Availability Software has Bugs. Hardware has Fault. We cater for hardware fault mostly. What about software bugs? I try to cater for software bug, which is why the design has 2 VMware clusters with 2 vCenter. This lets you test cluster-related features in one cluster, while keeping your critical VM on another cluster. Cluster is always based on 1 host failure. In small cluster, the overhead can be high (50% in a 2-node cluster) Reliability Related to availabity, but not the same. Availability is normally achieved by redundancy. Reliability is normally achieved by keeping things simple, using proven components, separating things, standardising. For example, solution for Small Design is simpler (a lot less features relative to Large Design). It also uses 1 vSwitch for 1 purpose, as opposed to a big vSwitch with many port groups and complex NIC fail-over policy. You will notice a lot of standardisation in all 3 examples. The drawback of standardisation is overhead, as we have to round up to the next bracket. A VM with 24 GB RAM ends up getting 30 GB. Specialized security limited functionality-level (SSLF) recommendations are applicable to specialized environments that have some unique aspect that makes them especially vulnerable to sophisticated attacks. Recommendations at this level might result in loss of functionality. For example, VMware recommends separate vSwitches for management and data (mgmt vSwitch will handle vMotion, Mgmt, Heart beat, and IP Storage).

7 vCloud Suite Architecture: what do I consider
Consideration Performance (1 and Many) 2 types: How fast can we do 1 transaction? Latency, clock speed matters here. How many transactions can we do within SLA? Throughput and scalability matters here. Storage, Network, VMkernel, VMM, Guest OS, etc are considered. We are aiming for <1% CPU Ready Time and near 0 Memory Ballooning in Tier 1. In Tier 3, we can and should have higher ready time and some ballooning, so long it still meet SLA. Some technique to address: add ESX, add cluster, add spindles, etc. Includes both horizontal and vertical. Includes both hardware and software. Skills of IT team Especially the SAN vs NAS skill. This is more important than the protocol itself. Skills include both internal and external (preferred vendor who complement the IT team) In Small/Medium environment, it is impossible to be expert on all areas. Consider complementing the internal team by establishing long term partnership with an IT vendor. Having a vendor/vendi relationship saves cost initially, but in the long run there is a cost. Existing environment How does the new component fit into existing environment? E.g. adding a new Brand A server into a data center full of Brand B servers need to take into account management and compatibility with common components. Most customers do not have a separate network for DR test. Another word, they test their DR in production network. Improvement Beside meeting current requirements, can we improve things? Almost all companies need to have more servers, especially in non production. So when virtualisation happens, we have this VM Sprawl. As such, the design have head room. Moving toward “1 VM 1 OS 1 App”. In physical, some physical servers may serve multiple purpose. In virtual, they can afford, and should do so, to run 1 App per VM. vSphere Essential is not used as it can’t be scaled to higher version or more ESXi host. In some cases, it is a viable choice.

8 First Thing First: the applications
Your cloud’s purpose is to run apps. We must know what type of VMs we are running. They impact the design or operation. Type of VM Impact on Design Microsoft NLB (network load balancer) Typical apps: IIS, VPN, ISA VMware recommends Multicast. Need to have its own port group. This port group needs to have Forged Transmit (as it will change the MAC address) MSCS Consider Symantec VCS instead as it has no restrictions on the right. Need FC. iSCSI, NFS, FCoE is not supported. Also, the array must explicitly certify on vSphere. Need Anti-Affinity Rule (Host to VM mapping, not VM-VM as VMware HA does not obey VM-VM affinity rule). As such, need 4 node in a cluster. Need RAM to be 100% reserves. Impact HA Slot Size if you use default settings. Disk has to be eagerzerothick, so it’s a full size. Thin Provisioning at Array will not help as we zeroed all the disk. Need 2 extra NIC ports per ESX for heart beat. Need RDM disk with Physical-Compatibility mode. So VM can’t be cloned or converted to template. vMotion is not supported as at vSphere 5. This is not due to physical RDM Impact on ESX upgrade as ESX version must be the same. With native multipathing (NMP), the path policy can’t be round robin It uses Microsoft NLB. Impact SRM 5. It works, but needs scripting. Preferably same IP, so create stretched VLAN if possible Microsoft Exchange If you need CCR (clustered continuous replication), then you need MSCS Oracle Softwares Oracle charges per cluster (or subcluster, if you configure host-VM affinity) I’m not 100% sure if Oracle still charge per cluster if we do not configure automatic vMotion (so just Active/Passive HA, just like physical world) for the VM (set DRS to manual for this VM). Looks like it they will charge per host in this case, basing on their document dated 13 July But interpretation from Gartner is Oracle charges for the entire cluster. App that is licenced per cluster Similar to Oracle. I’m not aware of any other apps App that are not supported While ISV support Vmware in general, they may only support for certain version. SAP, for example, only support from SAP NetWeaver 2004 (SAP Kernel 6.40) and only on Windows and Linux 64-bit (not on Solaris, for example) Unicast mode works seamlessly with all routers and Layer 2 switches. However, this mode induces switch flooding, a condition in which all switch ports are flooded with Network Load Balancing traffic, even ports to which servers not involved in Network Load Balancing are attached. Since all hosts in the cluster have the same IP Address and the same MAC Address, there is no inter-host communication possible between the hosts configured in Unicast mode therefore a second NIC needed for other host communication. UNICAST requires you to modify the vSwitches in an ugly way. The switch looks at the source MAC address in the Ethernet frame header in order to learn which MAC addresses are associated with its ports. NLB creates a bogus MAC address and assigns that bogus MAC address to each server in the NLB cluster. NLB assigns each NLB server a different bogus MAC address based on the host ID of the member. This address appears in the Ethernet frame header. In addition to an initial MAC address, each virtual adapter has an effective MAC address. The effective MAC address filters out incoming network traffic with a destination MAC address different from the effective MAC address. A virtual adapter’s effective MAC address and initial MAC address are the same when they are created. But the VM’s operating system might alter the effective MAC address to another value at any time. If the VM operating system changes the MAC address, the operating system can send frames with an impersonated source MAC address at any time. This allows an operating system to stage malicious attacks on the devices in a network by impersonating a network adapter authorized by the receiving network. System administrators can use virtual switch security profiles on ESX/ESXi SRM and MSCS MSCS needs 2 network. MSCS needs RDM MSCS needs Manual is silent whether Physical RDM can be vMotion. Looks like it is possible, else manual would have said so in the RDM Limitation section. No partition mapping – RDM requires the mapped device to be a whole LUN. Mapping to a partition is not supported. RDM uses a SCSI serial number to identify the mapped device. Because block devices and some direct-attach RAID devices do not export serial numbers, they cannot be used with RDMs. From Oracle document at Hard partitioning physically segments a server, by taking a single large server and separating it into distinct smaller systems. Each separated system acts as a physically independent, self-contained server, typically with its own CPUs, operating system, separate boot area, memory, input/output subsystem and network resources. Examples of such partitioning type include: Dynamic System Domains (DSD) -- enabled by Dynamic Reconfiguration (DR), Solaris 10 Containers (capped Containers only), LPAR (adds DLPAR with AIX 5.2), Micro-Partitions (capped partitions only), vPar, nPar, Integrity VM (capped partitions only), Secure Resource Partitions (capped partitions only), Static Hard Partitioning, Fujitsu’s PPAR. Oracle VM can also be used as hard partitioning technology only as described in the following document My analysis: - Oracle does not acknowledge vSphere as hard partitioning. Their logic is the VM can use any of the physical cores. The fact that the VM will only use that’s configured for the VM does not matter. - The document, dated 13 July 2010, does not even mention that VMware has “Cluster” concept. Most VMware is deployed using VMware HA or DRS cluster, not just a single ESXi host. In a single ESX host, Oracle charges for the entire host. But what about the entire Cluster? The document does not mention it. Basing on the fact that the VM can’t move to another host, then by definition this should be hard partitioned. Always get a legally binding statement from your vendor (not an from Account Manager) on this.

9 VMs with additional consideration
Type of VM Impact on Design Peer Applications (Apps that scale horizontally. Example: Web Servers, App Servers They need to exist on different ESX host in a cluster. So need to setup the Anti-Affinity Rule. You need to configure this per Peer. So if you have 5 set of Web servers from 5 different system (so 5 pair, 10 VM), you need to create 5 Anti-Affinity rule. Too many rules will create complexity, more so when #nodes is less than 4 Pair applications (Apps that protect each other for HA. Example: AD, DHCP Server) As above Security VM or network packet capture tool Need to create another port group to separate VMs being monitored and not. Need to use Distributed vSwitch to turn on port mirroring or netflow. App that depends on MAC address for licence Need to have its own port group. May need to have MAC Address Change set to Yes. App that holds sensitive data Should encrypt the data or the entire file system. vSphere 5 can’t encrypt the vmdk file yet. If you encrypt the Guest OS, back up product may not be able to do file-level back up. Should ensure no access by MS AD Group Administrator. Find out how it is back up, and who has access to the tape. If IT does not even have access to the system, then vSphere may not pass the audit requirement. Check partner products like Intel TXT and Hytrust Fault Tolerance requirements Impact HA Slot Size (if we use this one) as it uses full reservation. Impact Resource Pool, make sure we cater for the VM overhead (small) App on Fault Tolerance hardware FT is still limited to 1 core. Consider Stratus to complement vSphere 5 Certain apps do not support NAS. Example is MS Exchange 2010 does not support the following: NAS Storage for Exchange files (mailbox database, HT queue, logs) Thin virtual disks Virtual machine snapshots (what about backups?) MS TechNet – Understanding Exchange 2010 Virtualization: (

10 VMs with additional consideration
Type of VM Impact on Design App that require hardware dongle Dongle must be attached to 1 ESX. vSphere 4.1 adds this support. Best to use network dongle. In the DR site, the same dongle must be provided too. App with high IOPS Need to size properly. No point having dedicated datastores if the underlying spindles are shared among multiple datastores. Apps that uses very large block size SharePoint uses 256 KB block size. So a mere 400 IOPS will saturate the GE link already. For such application, FC or FCoE will be a better protocol. Any application with 1 MB block size can easily saturate 1 GE link. App with very large RAM (>64 GB) This will impact DRS when a HA event occurs as it needs to have a host that house the VM. It will still boot so long reservation is not set to a high number. App that needs Jumbo Frame This must be configured end to end (guest OS, port group, vSwitch, physical switch). Not all support 9000, so do a ping test and find the value. App with >95% CPU utilisation in the physical world and have high run queue Find out first why it is so high. We should not virtualise app that we are blind on its performance characteristic. App that is very sensitive to time accuracy Time drift is a possibility in virtual world. Find out business or technical impact if time deviates by 10 seconds. A group of apps with complex power on sequence and dependancy. Need to be aware of impact on application if during HA event. If 1 VM is shutdown by HA and then power on, the other VMs in the chain may need restart too. This should discussed with App Owner App that takes advantages of specific CPU Instruction Set Mixing with older CPU Architecture is not possible. This is a small problem if you are buying new server. EVC will not help, as it’s only a mask. See speaker notes App that need < 0.01 ms end to end latency Separate cluster as the tuning is not suitable for “normal” cluster. An ill-behaved application is one that does not use CPU-vendor-recommended methods of detecting features supported on a CPU. The recommended method is to run the CPUID instruction and look for the correct feature bits for the capabilities the application is expected to use. Unsupported methods used by ill-behaved applications include try-catch-fail or inferring the features present from the CPU version information. When unsupported methods are used, an application might detect features on a host in an EVC cluster that are being masked from the VMs. The CPUID-masking MSRs provided by CPU vendors do not disable the actual features. Therefore, an application can still use masked features. If a VM running such an application is then migrated with VMotion to a host that does not physically support those features, the application might fail. VMware is not aware of any commercially-available ill-behaved applications. See KB ( ) for details. From virtualgeek: With small block I/O (like 8K) – this is 12,500 IOPs – or put differently, roughly the performance of 70 15K spindles.  But, on the other end, if you have a Sharepoint VM (or are doing a guest-level backup) – they tend to do IO sizes of 256K or larger.  With 256K IO sizes, that’s 390 IOPs – or the performance of roughly 2 15K spindles – and likely not enough. This white paper summarizes our findings and recommends best practices to tune the different layers of an application’s environment for similar latency-sensitive workloads. By latency-sensitive, we mean workloads that are looking at optimizing for a few microseconds to a few tens of microseconds end-to-end latencies; we don’t mean workloads in the hundreds of microseconds to tens of milliseconds end-to-end-latencies. In fact, many of the recommendations in this paper that can help with the microsecond level latency can actually end up hurting the performance of applications that are tolerant of higher latency

11 This entire deck does not cover Mission Critical Applications
The deck focus on designing a generic platform for most applications. In the 80/20 concept, it focuses on the easiest 80. Special apps have unique requirements. They differ in the following areas: Size is much larger. So the S, M, L size for VM or ESXi host does not apply to them. VM has unique properties They might get dedicated cluster Picture on the right shows a VM with 12 vCPU, 160 GB vRAM, 3 SCSI controllers, usage of PVSCSI, 18 vDisks and 2 vNICs. This is an exceptional case. There are basically 2 overall architecture in vCloud Suite 5.1: One for the easiest 80% One for the hardest 20% The management cluster described later will still apply to both architecture. Mission Critical Applications (MCA) are of different nature. They need to be handled one by one. That means per instance. If there are 5 MCA, and all of them require Oracle 11g, we need to look at it 5x, not 1x. Yes, it’s the same Oracle 11g, but we need to look at each instance as they may have different pattern. Below is an example of things to consider on a database. As you can see, there are a lot of things. Things to consider on MS SQL Server: SLAs, RPOs, RTOs Baseline current workload, at least 1 business cycle Baseline existing (workload) vSphere implementation Estimated growth rates I/O requirements (I/O per sec, throughput, latency) Storage (Disk type/speed, RAID, flash cache solution, etc) Software versions (vSphere, Windows, SQL) Product Keys Licensing (may determine architecture) Workload type (OLTP, Batch, Warehouse) Accounts needed for installation / service accounts High Availability strategy Backup & Recovery strategy Source: Ntirety presentation at VMworld Title is Virtualizing SQL Server 2012: Doing IT Right

12 3 Sizes: Assumptions Assumptions are needed to avoid the infamous “It depends…” answer. The architecture for 50 VM differs with that for 500 VM, which in turn differs with that for 5000 VM. It is the same vSphere, but you design it very differently. A design for large VM (20 vCPU, 200 GB vRAM) differs with a design for small VM (1 vCPU, 1 GB) Workload for SME is smaller for Large Enterprise. Exchange handling 100 staff vs staff results in different architecture A design for Server farm differs to Desktop farm. I provide 3 sizes in this document: 50, 500, 1500 VM The table below shows the definition I try to make it as real as possible for each choice. 3 sizes give you choice and shows reasoning used. Take the closest size to your needs, then tailor it to the specific customer (not project). Do not tailor to project as it is a subset to entire data center. Always architect for entire datacenter, not a subset. Size means size of entire company or branch, not size of Phase 1 of the journey A large company starting small should not use the “Small” option below; it should the “Large” option but reduce the # ESX. I believe in “begin with the end in mind”, projecting around 2 years. Longer than 3 years is rather hazy as private cloud is not fully matured yet. I expect major innovation until 2015. A large company can use the Small Cloud example for their remote office. But this needs further tailoring. VSA & ROBO Small VDC Medium VDC Large VDC Company Small Company or Remote Branch Medium Large IT Staff 1-2 person doing everything 4 person doing infra 2 person doing desktop 10 person doing apps Different teams for each. Matrix reporting. Lots of politics & little kingdoms Data Center 1 or none (hosted) or just a corner rack 2, but no array-level replication 2 with private connectivity. 5 satelite DC VDC = Virtual Datacenter = Software-Defined Datacenter = Private Cloud. I’m trying to avoid the word Cloud as the Asia CIO of a regional bank, and a man I respect, told me cloud is something in the sky. What he has is virtual datacenter. And yes, it is based on vCloud Suite Enterprise.

13 Assumptions, Requirements, Constraints for our Architecture
Small VDC Medium VDC Large VDC # Servers currently 25 servers. All are production ~150 servers 70% is production 700 servers 55% is production # Servers in 2 years Prod: 30 servers Non Prod: 15 servers (50%) Prod: 250 servers Non Prod: 250 servers (100%) Prod: 500 servers Non Prod: 1000 servers (200%) # Server VM that our design needs to cater 50 500 1500 # View VM or Laptop 500. With remote access. No need for offline VDI. 5000. With remote access. Need offline VDI 15000. With remote access + 2 FA DR Requirements Yes Storage expertise Minimal. Also keeping cost low by using IP Storage. No SAN. Yes. RDM will be used as some DB may be large. DMZ Zone / SSLF Zone Yes/No Yes/No. Intranet also zoned Back up Disk Tape Network standard No standard Cisco ITIL Compliance Not applicable A few are in place Some are in place Change Management Mostly not in place Overall System Mgmt SW (BMC, CA, etc) No Needs to have tools Configuration Management Oracle RAC Audit Team External External & Internal Capacity Planning Oracle softwares (BEA, DB, etc) The above is an example of the Assumptions, Requirements and Constraint that dictates our Architecture. It’s nice to know they form the first 3 letters of Architecture. It’s a reminder for us as Architect to get them right if we have to have the right architecture at the end. Oracle RAC is now supported on VMware. However, have a good understanding on how to troubleshoot before you virtualise something so critical (I’m assuming you deploy RAC as clustering is not acceptable).

14 3 Sizes: Design Summary The table below provides the overall comparison, so you can easily compare what was taken out in the Small or Medium design. Just like any other design, there is no 1 perfect answer. Example: you may use FC or iSCSI for Small. This assumes 100% virtualised. It is easier to have 1 platform than 2. Certain things in company, you should only have 1 ( , directory, office suite, back up). Something as big as a “platform” should be standardised. That’s why they are called platform. Design for Medium will be in between Small and Large. Small Large # FT VM 0 – 3 (in Prod Cluster only) 0 – 6 (Prod Cluster only) VMware products vSphere Standard SRM Standard vCloud Security & Networking Horizon View Enterprise vCenter Operations Standard vSphere Storage Appliance vCloud Suite Enterprise vCenter Server Standard vCenter Server Heartbeat Horizon Suite VMware certification & Skill 1 VCP 1 VCAP DCA, 1 VCAP DCD VMware Mission Critical Support Storage iSCSI vSphere replication FC + iSCSI, with snapshot vSphere + Array replication Server 2x Xeon 5650, 72 GB RAM 2x Xeon (8-10 core/socket), 128 GB RAM Back up VMware Data Protection to Array 2 VADP + 3rd party to Tape Why not FC for Small/Medium Cloud? For most virtualization environments, NFS and iSCSI provide suitable I/O performance. The comparison has been the subject of many papers and projects. One posted on VMTN is located at: The general conclusion reached by the above paper is that for most workloads, the performance is similar with a slight increase in ESX Server CPU overhead per transaction for NFS and a bit more for software iSCSI. For most virtualization environments, the end user might not even be able to detect the performance delta from one VM running on IP based storage vs. another on FC storage.

15 Other Design possibilities
What if you need to architect for larger environment? Take the Large Cloud sample as starting point. It can be scaled to 10,000 VM. Above 1000 VM, you should consider a Pod approach. Upsize it by: Adding larger ESXi Host. I’m using an 8-core socket, based on Xeon You should use 10-core Xeon 7500 to fit larger VM. Take note of cost. Adding more ESX in the existing cluster. Keep it maximum 10 nodes per cluster. Adding more cluster. For example, you can have multiple Tier 1 Clusters. Adding Fault Tolerant Hardware from Stratus. Make this Stratus server as a member of the Tier 1 Cluster. It appears as 1 ESX, although there are 2 physical hardware. Stratus has its own hardware, so ensure the consistency in your cluster design. Split the IT Datastore into multiple. Group by function or criticality. If you are using Blade server and have filled 2 chassis, put the IT Cluster outside the blade and use rack mount. Separating the Blade and the server managing it minimise chance of human error as we avoid the “Managing Itself” complexity. Migrating inter cluster vSphere 5.1 supports live migration between cluster that don’t have common datastore. I don’t advocate live migration from/to Production Envi. It should be part of Change Control. The Large Cloud is not yet architected for vCloud Director vCloud Director has its own best practices for vSphere design. Adding vCloud + SRM on DR site requires proper design by itself. And this deck is already 100+ slides…. iSCSI Performance is relatively similar. But iSCSI can have “multi-pathing” and have lower CPU Some servers, like HP Blade, have built-in hardware iSCSI initiators Some backup/DR solution can be achieved cheaply on iSCSI vs FC ==> low cost DR via iSCSI

16 Design Methodology Application The steps are more like this  VM
Architecting a Private Cloud is not a sequential process There are 8 components. Application is driving infrastructure. The components are inter-linked. Like a mash. In >1000 VM category, where it takes >2 years to virtualise >1000 VM, new vSphere will change the design. Even the Bigger Picture is not sequential Sometimes, you may even have to leave Design and go back to Requirements or Budgetting. There is no perfect answer. Below is one example. This entire document is about Design only. Operation is another big space. I have not taken into account Audit, Change Control, ITIL, etc. VM Server Storage Network Security Data Center Mgmt DR The steps are more like this  Application

17 Data Center, DR, Cluster, Resource Pool
Data Center Design Data Center, DR, Cluster, Resource Pool

18 Just what is a software-defined datacenter anyway?
Virtual Datacenter Physical Datacenter 1 Physical Datacenter 2 Physical Compute Function Physical Compute Function Compute Vendor 1 Compute Vendor 2 Physical Network Function Network Vendor 1 Network Vendor 2 Physical Storage Function Storage Vendor 1 Storage Vendor 2 Shared Nothing Architecture. Not stretched between 2 physical DC. Production might be x.x. DR might be x.x No replication between 2 physical DC. Production might be FC. DR might be iSCSI. No stretched cluster between 2 physical DC. Each site has its own vCenter. Compute Vendor 1 Compute Vendor 2 Physical Network Function A software-defined datacenter, or virtual DC, differs radically to the physical datacenter that we know. The fundamental difference is we no longer do the architecture in the physical layer. The physical layer is just there to provide resource. These resources are not aware of one another. The intelligence is in the software, which defines the whole datacenter. Datacenter consists of 3 functions: Compute (normally called server. I don’t use Server as with Converged Infrastructure a “server” does storage too) Network There are 2 sub-function here: core network (e.g. routing) and network services (e.g. firewall, DHCP, LB, IDS) Storage Each of these Physical Function is supported, or shall I say instantiated into physical world, by the respective hardware vendors. For server, you might have Nutanix, HP, IBM, Dell, etc that you trust and know. I draw 2 vendors to show the message that they do not define the architecture. They are there to support the Function of that layer (e.g. Compute Function). So you can have 10 clusters, and 3 could be Vendor A and 7 could be Vendor B. The same approach is then implemented in the Physical Datacenter 2, but without the mindset that they have to be the same vendor. Take the Storage Function for example. You might have Vendor A on Production, and Vendor B on Non-Production. You are no longer bound by hardware compatibility (e.g. storage replication normally require same model, same protocol). You can do this as the physical datacenters are completely independent of each other. <click> They are not connected and stretched. You might decide to keep the same vendor, but that’s for a different reason  As you can see here, there are very fundamental differences: Storage is not replicated. Network is not stretched. Compute is not stretched. The Shared-Nothing Architecture, operationally speaking, is the only architecture that will guarantee a failure on 1 DC does not propagate to the other DC. In a large datacenter with >1000s VMs, where there are different people working on different part of the datacenter, we need to avoid disaster due to human error. Here is a good read on the Network component: So how do we achieve DR then? Well, this is where the Software comes in. This is a virtual datacenter, so all servers are VM. VM is entirely defined in software. In fact, most VMs file are stored in 1 folder. This folder can be replicated, then boot on another physical datacenter. vSphere 5.1 has built-in host-based replication. It can replicate individual VMs, and provides finer granularity than LUN-based replication. Replication can be done independantly of storage protocol (FC, iSCSI, NFS) and vmdk type (thick, thin). As at vSphere 5.1, here are the main limitations that you need to be aware of: Nicira is not yet integrated into vSphere as at Q Hence the true software-defined networking cannot be achieved. In the mean time, use solution such as vCNS, F5 to isolate & create virtual network from the physical network. vSphere Replication has RPO of 15 minutes. So if you need real time, you need array based or Active/Active Application. Network Vendor 1 Network Vendor 2 Physical Storage Function Storage Vendor 1 Storage Vendor 2

19 2-distinct Layer: Consumer and Producer
2 distinct layers Supporting the principle of Consumer and Producer. VM is Consumer. Does not care about underlying technology. Its sole purpose is to run the application. DC Infra is Producer. Provide common services. VM is freed from (or independent of) underlying technology. These technology can change without impacting VM: Storage protocol (iSCSI, NFS, FC, FCoE) Storage file system (NFS, VMFS, VVOL) Storage multi-pathing (VMware, EMC, etc) Storage replication Network teaming The Datacenter OS provides a lot of services, such as: Security: Firewall, IDS, IPS, Virtual Patching Networking: LB, NAT Availability: backup, cluster, HA, DR, FT Management & Monitoring A lot of agents are removed from VM, resulting in simpler server. Separation & Abstraction (done by the Hypervisor or DC OS) DC Services DC Services Datacenter Technologies Datacenter Implementation The following agents are removed from VM, and is provided by the Datacenter OS: Management (Performance, Configuration, etc) Anti Virus Backup (in most cases) Clustering (in most cases) The added benefit is security. For example, by not allowing the VM to see the NFS server or the iSCSI server, we are not exposing the storage network to VM. From overall datacenter point of view, the VM is not trusted. There are 1000s VMs running in the datacenter, and we cannot guarantee that they are secured, as we don’t even have access to the VM. So we need to hide as much as possible from a VM. This is done by removing as many things from inside the VM.

20 Large: A closer look at Active/Active Datacenter
250 Prod VMs Prod Clusters 500 Test/Dev VMs T/D Clusters vCenter Lots of traffic between: Prod to Prod T/D to T/D 500 Prod VMs Prod Clusters vCenter 1000 Test/Dev VMs T/D Clusters Which one is simpler? Active/Passive is by far simpler. Which one takes a lot less inter-site bandwidth? A/A takes a lot more WAN bandwidth. Which one gives bigger pool to be shared? Active/Passive gives a lot bigger room to play. With Active/Active, both vCenters become Production too. Where do you test your vCenter for patch/upgrade? vCenter is a complex & large component, not merely a passive management monitoring tool. It is an application, not just a tool. There is another challenge with this so-called Active/Active. There network packets from outside these 2 datacenters need to come from 1 of them. These are the options: 1. Stretched VLAN via Metro Ethernet Traffic ingresses and egresses one data center. Does not utilize WAN links from both data centers 2. VM Mobility using Cisco LISP Traffic ingresses and egresses at local data center. But this is not a common solution. Requires two Nexus 7ks which is cost prohibitive 3. Stretched VLAN with FHRP Isolation Traffic ingresses one data center and egresses local data center Use Vlan ACL to block HSRP traffic between data centers Use any Cisco switch that is capable of using Vlan ACLs (i.e., Cisco 2960, 3560, etc.) at either data center VLAN ACL is applied inline on Layer 2 Metro Ethernet link Source: VMworld 2012 presentation. Title is Deploying an Active/Active Datacenter with SRM 5. Speakers are Michael Bailess, American National Bank and Joe Kelly, Varrow

21 Large: Adding Active/Active to a mostly Active/Passive vSphere
vCenter vCenter 500 Prod VMs 1000 Test/Dev VMs Prod Clusters T/D Clusters And in this slide we added a small cluster for those “critical” systems that require active/active. So we don’t have to make the _entire_ datacenter active active. This is an example of how we apply the 80/20 principle. We keep the 80% simple. I recommend Active/Passive over Active/Active as A/A is not practical in reality: Not all application can have 2 independent instances. How do you synchronise the data? Can they take advantage of replicated database like GemFire? Do the Apps work with global load balancer? Are you going to front all application with global load balancer? If you cannot have a solution that covers _all_ application, that means you need to have another solution. So you need both. vCenter vCenter Global LB Global LB 500 Prod VMs 50 VMs 1000 Test/Dev VMs Prod Clusters 1 Cluster T/D Clusters

22 Large: Level of Availability
Tier Technology RPO RTO Tier 0 Active/Active at Application Layer 0 hours Tier 1 Array-based Replication 2 hour Tier 2 vSphere Replication 15 min 1 hour Tier 3 No replication. Backup & Restore 1 day 8 hours Here is an example of level of availability that Infra team can provide to the Application (Business) team. While it shows 4 tiers, Tier 0 is a solution at the application layer. It is not something provided at the Infra layer, other than global load balancer. At Tier 0, the traditional DR concept does not apply, as the application always runs on both side. There is no need to do “recovery” as it is active/active. So there is actually 3 Tiers provided by Infra team. Tier 1: RPO is 0 to demonstrate the unique capability of array-based replication. It can do immediate, albeit a-sync so it doesn’t really impact performance. Hence no data loss, which is appealing to business. RTO is 2 hour as it still take time to mount LUN, add VMs to inventory, and boot them in the right order. Database needs to run consistency check. Linux VM needs to run fsck. Since there are multiple VMs, especially in a 3-tier application, it can take a while to boot. Tier 2: RPO is 15 minutes as that’s the limit of vSphere Replication. It cannot be lower than this. RTO is 1 hour to show the point that generally it takes faster to restore. No LUN to scan, mount, as the datastore is already mounted. No need to run consistency check, especially in Windows VM as it supports VSS. Tier 3: - I’m using no replication here to show that not all application needs DR.

23 vCenter (Desktop pool)
Methodology Define how many physical data centers are required DR requirements normally dictate 2 For each Physical DC, define how many vCenter are required Desktop and Server should be separated by vCenter Connected to same SSO server, fronted by same Web Client VM View comes with bundled vSphere (unless you are buying add-on) Ease of management. In some cases (Hybrid Active/Active), a vCenter may span multiple physical DC. For each vCenter, define how many virtual data centers are required Virtual Data Center serve as name boundary. A good way to separate IT (Provider) and Business (Consumer) For each vDC, define how many Cluster are required In large setup, there will be multiple clusters for each Tier. For each Cluster, define how many ESXi are required Preferably 4 – is too small a size Standardise the host spec across cluster. While each cluster can have its own host type, this adds complexity Physical DC vCenter Virtual DC Cluster ESXi Physical DC vCenter (Server pool) Virtual DC (IT) Virtual DC (Biz) Tier 1 Cluster Tier 2 Cluster ESXi vCenter (Desktop pool) Virtual DC For highly sensitive apps, you need to think if you trust your Storage Admin, vCenter Admin, Windows Admin, AD admin, Network admin. If you don’t, or you have to kill them if you tell them, then you have to separate vCenter and array.

24 Large: The need for Non Prod Cluster
This is unique in the virtual data center. We don’t have “Cluster” to begin with in physical DC as cluster means different thing. Non-Prod Cluster serves multiple purposes Run Non Production VM In our design, all Non-Production run on DR Site to save cost. A consequence of our design is migrating from/to Production can mean copying large data across WAN. Disaster Recovery Test-Bed for Infrastructure patching or updates. Test-Bed for Infrastructure upgrade or expansion Evaluating or Implementing new features In Virtual Data Centre, a lot of enhancements can impact entire data centre e.g. Distributed Switch, Nexus 1000V, Fault Tolerant, vShield All the above need proper testing. Non-Prod Cluster should provide sufficient large scale scope to make testing meaningful Upgrade of the core virtual infrastructure e.g. from vSphere 4 to 5 (major release) This needs extensive testing and roll back plan. Even with all the above… How are you going to test SRM upgrade & updates properly? In Singapore, MAS TRM guidelines require Financial Institution to test before updating production. SRM test needs 2 vCenters, 2 arrays, 2 SRM servers. If all are used in production, then where is the test-environment for SRM? When happens when you are upgrading SRM? You will lose protection during this period. Business IT This new layer does not exist in physical world. It is software, hence needs its own Non Prod envi.

25 Large: The need for IT Cluster
Special purpose cluster More than Management Cluster. It runs non Management VMs that are not owned by Business. Examples: Active Directory File Server & Collaboration (in the Large example, this might warrant its own cluster) Running all the IT VMs used to manage the virtual DC or provide core services The Central Management will reside here too Separated for ease for management & security The next page shows the list of VMs that resides on the IT Cluster. Each line represent a VM. This shows for Production Site. DR Site will have a subset of this. Except for vCloud Director, which is only deployed on DR Site Explanation of some of the servers below: Security Management Server = VM to manage security (e.g TrendMicro Deep Security) This separation keeps Business Cluster clean, “strictly for business”. vCD is only deployed on the DR Site in this example architecture.

26 Large: IT Cluster (part 1)
The table provides samples of VMs that should run on the IT cluster. 4 ESXi Host should be sufficient as most VM is not demanding. They are mostly management tool. Relatively more demanding VMs are vCenter Operations. There are many databases here. Standardise on 1. I will not put these databases together with DB running business workload. Keep Business and IT separate. Category Large Cloud Base Platform vCenter (for Server Cloud) – active node vCenter (for Server Cloud) – passive node vCenter (for Server Cloud) DB – active node vCenter (for Server Cloud) DB – passive node vCenter Web Server vCenter Inventory Server vCenter SSO Server x2 with Global HA vCenter Heartbeat Auto-Deploy + Authentication Proxy (1 per vCenter) vCenter Update Manager + DB. 1 per vCenter. vCenter Update Manager Download Service (in DMZ) Auto-Deploy + vSphere Authentication Proxy vCloud Director (Non Prod) + DB Certificate Server Storage Storage Mgmt tool (need physical RDM to get fabric info) VSA Manager Back up Server Network Network Management Tool (need a lot of bandwidth) Nexus 1000V Manager (VSM) x 2 Sys Admin Tools Admin client (1 per Sys Admin) with PowerCLI VMware Converter vMA (management Assistant) vCenter Orchestrator + DB 1 vSphere Replication “replication server” appliance can process up to 1 Gbps of sustained throughput using approximately 95% of 1 vCPU. 1 Gbps is much larger than most WAN bandwidth For a VM protected by VR the impact on application performance is 2 - 6% throughput loss vCD and VSM data collector Deploy at least 2 data collectors for vCD and VSM each for high availability Chargeback Manager instance can be installed/upgraded at the time of vCD install/upgrade or later The vCenter Server, the vSphere Web Client Server, and the vCenter Inventory Service can all be installed on the same system, or can be split across multiple systems, depending on their resource needs and the available hardware resources. If you have < 32 hosts and < 4000 VM, install all three modules on the same system.

27 Large Cloud: IT Cluster (page 2)
Continued from previous page. What IT apps that are not in this Cluster: View Security Servers. These servers reside in the DMZ zone. It is directly accessible from the Internet. Putting them in the management cluster means the management cluster needs to support Internet facing network. Category Large Cloud Application Mgmt AppDirector Hyperic Advance vDC Services Security Availability Site Recovery Manager + DB SRM Replication Mgmt Server + DB vSphere Replication Servers (1 per 1 Gbps bandwidth, 1 per site) AppHA Server (e.g. Symantec) Security Management Server (e.g. TrendMicro DeepSecurity) vShield Manager Management Performance Capacity Configuration vCenter Operations Enterprise (2 VM) vCenter Infrastructure Navigator vCloud Automation Center (5 VM) VCM: Web + App + DB (3 VM) Chargeback + DB, Chargeback Data Collector (2) Help Desk system CMDB Change Management system Desktop as a Service View Managers + DB View Security Servers (sitting in DMZ zone!) ThinApp Update Server vCenter (for Desktop Cloud) + DB vCenter Operations for View Horizon Suite Mirage Server Core Infra MS AD 1, AD 2, AD 3, etc. DNS, DHCP, etc Syslog server + Core Dump server File Server (FTP Server) for IT File Server (FTP Server) for Business (multiple) Print Server Core Services & Collaboration vSphere Replication for ROBO case does not need 2 vCenters. Yes, only 1 VR appliance required too.

28 Cluster Size I recommend 6-10 nodes per cluster, depending on the Tier. Why not 4 or 12 or 16 or 32? A balance between too small (4 hosts) and too large (>12 hosts) DRS: 8 give DRS sufficient host to “maneuver”. 4 is rather small from DRS scheduler point of view. With “sub cluster” ability introduced in 4.1, we can get the benefit of small cluster without creating one Best practice for cluster is same hardware spec with same CPU frequency. Eliminates risk of incompatibility Complies with Fault Tolerant & VMware View best practices So more than 8 means it’s more difficult/costly to keep them all the same. You need to buy 8 hosts a time. Upgrading >8 servers at a time is expensive ($$) and complex. A lot of VMs will be impacted when you upgrade > 8 hosts. Manageability Too many hosts are harder to manage (patch, performance troubleshooting, too many VMs per cluster, HW upgrade) Allow us to isolate 1 host for VM-troubleshooting purpose. At 4 node, we can’t afford such ”luxury” VM Restart priority is simpler when you don’t have too many VM Too many paths to a LUN can be complex to manage and troubleshoot Normally, a LUN is shared by 2 clusters, which are “adjacent” cluster. 1 ESX is 4 paths. So 8 ESX is 32 paths. 2 clusters is 64 paths. This is a rather high number (if you compare with physical world) N+2 for Tier 1 and N+1 for others With 8 host, you can withstand 2 host failures if you design it to. At 4 nodes, it is too expensive as payload is only 50% at N+2 Small Cluster size In a lot of cases, the cluster size is just 2 – 4 nodes. From Availability and Performance point of view, this is rather risky. Say you have 3-node cluster…. You are doing maintenance on Host 1 and suddenly Host 2 goes down… you are exposed with just 1 node. Assuming HA Admission Control is enabled (which you should), the affected VM may not even boot. When a host is placed into maintenance mode, or disconnected for that matter, it is taken out of the admission control calculation. Cost: Too few hosts result in overhead (the “spare” host) 8 hosts per cluster. Some cluster changes in the Advanced Attributes requires cluster to be disable and enable Harder/longer to do this when there are many hosts Mgmt: 8 is easy number to remember. And a lucky one, if you believe. And we all know that production needs luck, not just experience 

29 Small Cloud: Cluster Design

30 Small Cloud: Design Limitation
It is important to document clearly, the design limitation. It is perfectly fine for a design to have limitation. After all you have limited budget. Inform CIO and Business clearly on the limitation. It is based on vSphere Standard edition No Storage vMotion No DRS and DPM No Distributed Switch Can’t use 3rd party multi-pathing. Does not support MSCS Veritas VCS does not have this restriction vSphere 5.1 only support FC for now. I use iSCSI in this design. For 30-server environment, HA with VM monitoring should be sufficient. In vSphere 5.1 HA, a script can be added that ping the application (services) is active on its given port/socket. Alternative, a script within the Guest OS check the process if it’s up or not. If not, it sends alert. Only 1 cluster in primary data center Production, DMZ and IT all run on the same cluster. Network are segregated as they use different network Storage are separated as they use different datastore

31 Small Cloud: Scaling to 100 VM
The next slide shows an example where the requirement is for 100 VM instead of 50. We have 7 hosts in DC 1 instead of 3 hosts We have 3 hosts in DC 2 instead of 2 hosts Only 1 cluster in primary data center Production, DMZ and IT all run on the same cluster. Network are segregated as they use different network Storage are separated as they use different datastore Since we have more hosts, we can do sub-cluster. We will place the following as sub-cluster Host 1 – 2: Oracle BEA SubCluster Host 6 – 7 : Oracle DB SubCluster Production is soft cluster. So a host failure means it can use Host 1 – 2 too. Complex Affinity and Host/VM Be careful in designing VM Anti-Affinity rule We are using Group Affinity as we have sub-cluster. So we have extra constraint.

32 Small Cloud: Scaling to 100 VM
Certainly, there can be possible variations. 2 are described below. If we can add 1 more ESX host, we can create 2 cluster of 4 node each. This will simplify the Affinity Rule We can use a 1-socket ESX host instead of 2-socket Save on VMware licence Extra cost on servers Extra cooling/power operational cost Rest of VMs Oracle DB Oracle BEA DMZ LAN Production LAN Management LAN

33 Small Cloud: Scaling to 150 VM
We have more “room” to design, but it is still too small Production needs 7 hosts IT needs 2 hosts DMZ needs 2 hosts Putting IT with DMZ is a design trade-off vShield is used to separate IT and DMZ If the above is deemed not enough, we can add VLAN. If it is still not enough, use different physical cables or switch The more you separate physically, the more you defeat your purpose of virtualisation. Rest of VMs Oracle BEA Oracle DB Production LAN

34 Small: Scaling to 150 VM

35 Large: Overall Architecture
The diagram shows the key layers in SDDC (based on vCloud Suite 5.1) The overall Global Management provides visibility across 2 datacenter. This is based on VC Ops 5.7, vCAC 5.2 and vCO 5.1. The products will be installed on Site 1, but protected by SRM. It will live on the IT Cluster, and managed by the vCenter for Server VMs. The diagram uses 2 physical datacenters. I’m not a big fan of 3 datacenter design as it increases complexity drastically. It is better do to it at application-layer as most apps will not require 3 DC. Storage is replicated via vSphere Replication (for most apps) and array-replication (for Tier 1 apps). Array-based is minimised following the shared-nothing principal described in earlier slide. Network is not stretched, following the shared-nothing principal. The green layer is the management (software-defined). So we have: vCenter. I separate vCenter for Server and for Desktop. This is to enable independent upgrade, and even independent operations. Plus the Server workload will be integrated with SRM, while View does not need SRM. SSO is a key component. Because the VC is separated, it makes sense to separate the SSO also. vCNS is shown here as we need to show how basic network services (firewall, NAT, private network) is provided. Also, 1 vCNS can only serve 1 VM, hence the mapping in the diagram. The blue layer is the ESXi cluster A large farm will have many clusters. There will be 1 cluster for each purpose. For example, a cluster might only contain Oracle DB and nothing else. The orange layer is the storage layer. I tend to map the cluster to the datastore, in a 1:1 mapping. This keeps things simple. With the shared-nothing vMotion in 5.1, there is no need for a shared datastore anymore.

36 This diagram continues from the previous diagram, by showing more details component. I’ve taken out EUC (End User Computing) and focus on the server workload. The diagram still shows the 2 sites, as you can see the diagram is symmetrical. As more and more products support multiple vCenters, the diagram will change and get simplified. The cream color shows the operational management or monitoring component. This is essentially VC Ops + vCAC and vCO. They are not part of the core architecture, rather they provide monitoring. A failure on this component does not impact your infrastructure. The green color shows the Availability component. A failure at this layer also does not mean your infrastructure is down, but this time around you lose protection. This layer, especially SRM 5.1, impacts your overall architecture as it needs 2 vCenter. The red color shows the core architecture. They are forming, or rather they are defining your architecture. A failure at this layer impact your infrastructure. For example, if vShield Edge fails you lose networking. I do not draw the connecting lines for SSO and AD as almost all components talk to them. The grey color shows the physical or base layer. This is where the resources (compute, storage, network) is provided. I am showing vShield App in this layer as I consider it as part of the hypervisor (1 vShield App per hypervisor) Limitation in vCloud Suite 5.1: vShield Manager can only manage 1 vCenter. So if your VMs are being fronted by vShield Edge, you need to ensure the rules are replicated to the DR Site.

37 Large: DataCenter and Cluster
In our design, we will have 2 Datacenter only Separating the IT Cluster from the Business Clusters. Certain objects can go across Cluster, but not across Data Center You can vMotion from one cluster to another within a datacenter, but not to another datacenter. Networking: Distributed Switch , VXLAN, vShield Edge can’t go across DC as at vCloud Suite 5.1 Datastore name is per DataCenter. So network and storage are per Data Center You can still clone a VM within a datacenter and to a different datacenter The datacenter defines the namespace for networks and datastores. The names for these objects must be unique within a datacenter. For example, you cannot have two datastores with the same name within a single datacenter, but you can have two datastores with the same name in two different datacenters. VMs, templates, and clusters need not be unique within the datacenter, but must be unique within their folder. Objects with the same name in two different datacenters are not necessarily the same object. Because of this, moving objects between datacenters can create unpredictable results. For example, a network named networkA in datacenterA might not be the same network as a network named networkA in datacenterB. Moving a VM connected to networkA from datacenterA to datacenterB results in the VM changing the network it is connected to

38 Large: Cluster Design

39 Large: Tiered Cluster The 3 tiers becomes the standard offering that Infra team provides to app team. If Tier 3 is charged $X/VM, then Tier 2 is priced at 2x and Tier 1 is priced at 4x. Apps team can then choose based on their budget. Cluster size varies, depending on criticality. A test/dev might have 10 node, while a Tier-0 might have just 2 node. The Server Cluster also maps 1:1 to the Storage Cluster This keeps thing simple. If a VM is so important that it is on Tier 1 cluster, then its storage should be on Tier 1 cluster too. This excludes Tier 0, which is special and handled per application. Tier 0 means the cost of infra is very low relative to the value & cosst of the apps to the business. Tier “SW” is a dedicated cluster running a particular software. Normally, this is Oracle, MS SQL, Exchange. While we can have “sub-cluster”, it is simpler to dedicate entire cluster. Tier # Host Node Spec? Failure Tolerance MSCS Max #VM Monitoring Remarks Tier 1 Always 6 Always Identical 2 hosts Yes 25 Application level. Extensive Alert Only for Critical App. No Resource Overcommit. Tier 2 4-8 Maybe 1 host Limited 75 App can be vMotioned to Tier 1 during critical run Tier 3 4-10 No 150 Infrastructure level Minimal Alert. Some Resource Overcommit SW 2-10 1-3 hosts Application specific Running expensive softwares. Oracle, SQL are the norms as part of DB as a Service Tier 1: 6 hosts, 25 VM. Effective host is 4, which is 25:4 or 6:1 consolidation ratio. Typical Tier 2: 8 hosts, 75 VM. Typical Tier 3: 10 hosts, 150 VM At Tier 3: 150 VM, at 1.5 vCPU as average = 225 vCPU. 10 ESXi hosts, each 12 cores = 120 pCores. So this is 225:120 or around 1.9x CPU oversubcribe. This is ok for Tier 3. The above is suitable for <1000 VM private cloud. For >1000 VM, we need to have higher consolidation ratio, and use 4 socket hosts. Keep the cluster size below 10.

40 Large: Example 1 Goal is to provide 500 Prod VM and 1000 Non Prod VM
As you scale >1000 VM, keep in mind the number of clusters & hosts. As you scale >10 clusters, consider using 4 socket hosts. This example does have Large VM cluster, which is an exception cluster. Large VM in this case is > 8 vCPU and > 64 GB vRAM. Virtualisation has gone beyond the low hanging fruit, the average VM size has gone beyond 1 vCPU and 2 GB of vRAM. In this example, it gives 3.4 vCPU and 19 GB as the average VM size. As most VMs are not using their resources 100%, the model uses over-subscribe. I’m using 1.5x for Production and 2x for Non Production. In this example, I have also taken into account that each host will need dedicated 2 cores and 6-7 GB of RAM for the following purpose: Hypervisor vMotion Hypervisor based firewall like vCNS Hypervisor based AV Hypervisor based IDS/IPS Hypervisor based storage. Example is VMware Distributed Storage (tech preview) or Nutanix vSphere Replication. The above is an example. It is based on a rather conservative approach. Notice the end Consolidation Ratio is rather low, which means more cabling and overhead. The number of datastores might grow because of array-based replication constraint, where entire LUN must be migrated as 1. In that case, we might need to have more datastore. Keep the total <10 per cluster. We should stop discussing the _overall_ consolidation ratio. Prod and Non-Prod are very different, and should be treated differently. Standard = 2 socket, 16 core ESX. For vCPU VM Large = 4-socket, 40 core ESX. For vCPU VM. An extra large VM (e.g. 20 vCPU, 128 GB vRAM) can be placed on the Large cluster. But take note their impact on performance.

41 Large: Example 2 Same goal as previous, but we’re going for higher consolidation ratio (and hence using 40-core box) In this example, we take a different approach. There is no The Best approach. It all depends on your situation. Here we’re focusing on a higher consolidation ratio to keep the number of ESXi host manage-able. Notice that Tier 1 has _more_ datastores than Tier 2 or Tier 3. This is because Tier 1 primarily uses array-based replication for DR. It is not using vSphere Replication. As a result, we hit the limitation of per LUN replication. We are also using 1 type of ESXi host for ease of management 4-socket, 40 core ESX, 256 GB RAM. vSphere Replication vSCSI Filter Runs in ESXi kernel Attached to the virtual device, intercepts all I/O to the disk Each replica corresponds to a lightweight snapshot Bitmap of changed blocks is maintained between replications (backed by on disk state file) vSphere Replication Agent Runs in Host Agent Implements configuration of replication in primary site Manages VMs replication process Interposes on operations that impact replication The amount of CPU reservation thus depends on the number of vMotion NICs and their speeds; 10% of a processor core for each 1Gb network interface, 100% of a processor core for each 10Gb network interface, a minimum total reservation of 30% of a processor core.

42 Large: Example Pod (with Converged Hardware)
Rack 1 (42 RU) Rack 2 (42 RU) Network Block. 5 RU Network Block. 5 RU 4x 48 ports. 10 GE. Total 192 ports. 1x 48 ports. 1 GE (for Management) Management Block. 2 RU Management Block. 2 RU IT Cluster. It’s a 4-node cluster. Compute + Storage Converged Block. 32 RU Compute + Storage Converged Block. 32 RU Each ESXi hosts has: 4x 10 GE for network and storage 1x 1 GE for iLO 1x Flash for performance 2x SSD for performance 4x SAS for capacity Total ports requirements per rack: 34 x 4 = GE ports 34 x 1 = GE ports ISL & uplinks = 6 GE ports Total compute per Pod: 2 racks x 32 x 16 cores = 1024 cores This particular example is using a converged Compute + Storage hardware. For a more common approach, we need to add rack space for Array (typically around 20 RU). We are keeping the design simple for better manage-ability. This means the hardware cost is not 100% optimised. We keep each rack identical. We could have used less network switch and have cables going across rack as the network ports are not fully utilised. We are not fully populating the rack space. We still have 3 RU for each.

43 Resource Pool: Best Practices
What they are not A way to organise VM. Use folder for this. A way to segregate admin access for VM. Use folder for this. For Tier 1 cluster, where all the VMs are critical to business Architect for Availability first, Performance second. Translation: Do not over-commit. So resource pool, reservation, etc are immaterial as there is enough for everyone. But size each VM accordingly. No oversizing as it might slow down. For Tier 3 cluster, use carefully Tier 3 = overcommit. Use Reservation sparingly, even at VM level. This guarantees resource, so it impacts the cluster slot size. Naturally, you can’t boot additional VM if your guarantee is fully used Take note of extra complexity in performance troubleshooting. Use as a mechanism to reserve at “group of VMs” level. If Department A pays for half the cluster, then creating an RP with 50% of cluster resource will guarantee them the resource, in the event of contention. They can then put as many VM as they need. But as a result, you cannot overcommit at cluster level, as you have guaranteed at RP level. Introduce a scheduled task which sets the shares per resource pool, based on the number of VMs/vCPUs they contain. E.g.: a PowerCLI script which runs daily and takes corrective actions. Just google it  Don’t put VM and RP as “sibling” or same level DRS load balancing occurs in 5 minutes, not seconds. See yellow-brick. Introduce a scheduled task which sets the shares per resource pool, based on the number of VMs/vCPUs they contain. E.g.: a PowerShell script which runs daily and takes corrective actions See my Resource Management slide for details

44 VM-level Reservation & Limit
CPU reservation: Guarantees a certain level of resources to a VM Influences the admission control (PowerOn) CPU reservation isn’t as bad as often referenced: CPU reservation doesn’t claim the CPU when VM is idle (is refundable) CPU reservation caveats: CPU reservation does not always equal priority VM uses processors and “Reserved VM” is claiming those CPUs = ResVM has to wait until threads / tasks are finished Active threads can’t be “de-schedules” if you do so = Blue Screen / Kernel Panic Memory reservation Memory reservation is as bad as often referenced. “Non-Refundable” once allocated Windows is zeroing out every bit of memory during startup… Memory reservation caveats: Will drop the consolidation ratio May waste resources (idle memory cant’ be reclaimed) Introduces higher complexity (capacity planning) Do not configure high CPU or RAM, then use Limit E.g. configure with 4 vCPU, then use limit to make it “2” vCPU It can result in unpredictable performance as Guest OS does not know. High CPU or high RAM has higher overhead. Limit is used when you need to force slow down a VM. Using Shares won’t achieve the same result VM uses processors and “Reserved VM” is claiming those CPUs = ResVM has to wait until threads / tasks are finished Active threads can’t be “de-schedules” if you do so = Blue Screen / Kernel Panic

45 Fault Tolerance Design Consideration General guides Limitation
Still limited to 1 vCPU in vSphere 5.1 FT impacts Reservation. It will auto reserve at 100% Reservation impacts HA Admission Control as slot size is bigger. HA does not check Slot Size nor actual utilisation when booting up. It checks Reservation of that affected VM. FT impacts Resource Pool. Make sure the RP includes the RAM Overhead. Cluster Size is minimum 3, recommended 4. Tune the application and Windows HAL to use 1 CPU. In Win2008 this no longer matters [e1: need to verify] General guides Assuming 10:1 consolidation ratio, I’d cap FT usage to just 10% of Production VM So 80 VM means around 8 ESX host means around 8 FT VM. This translates to 1 Primary VM + 1 Secondary VM per host. Small cluster size (<5 nodes) are more affected when there is a HA. See picture for a 3-node example. Limitation Turn off FT before doing Storage vMotion FT protect infra, not app. Use Symantec ApplicationHA to protect App When a host fails and the VMs need to be booted on other ESX hosts… I think/guess this is what happens: Say there are 4 nodes in the cluster. Node 4 is the largest (more GHz and more RAM) The cluster setting is “Host failures”, and to tolerate 1 host failure. So Node 4 is not included in the Slot Size. HA is being conservative here. It needs to cater for “the largest node fails” situation. Host 1 suddenly dies. It has 10 VM. HA will look at Host 2, 3, and 4 (yes, no 4 too) and see which host has the most Reservation. Slot size is not considered. Say that Host 4 has enough capacity to handle the Total reservation of 6 VM. The 6 VM will boot in Host 4. HA considers if the VM have affinity rule??? HA does consider “VM compatibility”, but not sure what this means. After Host 4, say Host 2 has the largest remaining Unreserved capacity. Note that HA uses reservation, not utilisation. Host 2 has enough capacity to handle the Total reservation of the 4 VM. The 4 VM will boot in Host 2. Host 3 will not be used. DRS kicks in to load balance.

46 Branch or remote sites Some small sites may not warrant its own vCenter No expertise to manage it either. Consider vSphere Essential Plus ROBO edition. Need 10 sites for best financial return as it is sold in 10 units. Features that are network heavy should be avoided. Auto deploy means sending around 150 MBtye. If link is 10 Mbit shared, it will add up. Best practices Install a copy of template at remote site. If not, use OVF as it is compressed. Increase vCenter Server and vSphere hosts timeout values to ~3 hours Consider manual vCenter agent installs prior to connecting ESXi hosts Use RDP/SSH instead of Remote Console for VM console access If absolutely needed, reduce remote console displays to smaller values, e.g. 800x600/16-bit vCenter 5.1 improvement over 4.1 on remote ESX Use web client if vCenter is remote, which uses less bandwidth No other significant changes Certain vCenter operations that involve a heavier payload E.g. Add Host, vCenter agent upgrades, HA enablement, Update Manager based host patching vCenter 4.1 improvements over vCenter 4.0: 1.5x to 4.5x improvement in operational time associated with typical vCenter management tasks All traffic between vCenter and ESXi hosts is now compressed Statistics data sent between hosts and vCenter Server flows over TCP, not UDP; eliminates lost metrics Most vCenter operations fare well over 64 Kbps links

47 Server Design ESXi Host

48 Approach Consideration when deciding the size of ESXi host
General guidelines as at Q3 2013: Use 2 sockets, cores, Xeon 2820 with 128 GB RAM For large VM, use 4 sockets, 40 cores, Xeon 4820 with 256 GB RAM 8 GB RAM per core. A 12-core ESX box should have 96 GB. This should be enough to cater for VM with large RAM Consideration when deciding the size of ESXi host Look at overall cost, not just the cost of ESX host. Cost of network equipments, cost of management, power cost, space cost. Larger host can take larger VM or more VM/host. Think of cluster, not 1 ESX host when sizing the ESXi host. Cluster is the smallest logical building block in this Pod approach. Plan for 1 fiscal year, not just next 6 months. You should buy host per cluster. This ensures they are the same batch. Standardise the host spec makes management easier. Know #VM you need to host and their size. This gives you idea how many ESX you need. Define 2 VM sizing: Common and Large If your largest VM needs >8 core, go for >8 core pCPU. Ideally, a VM should fit inside socket to minimise NUMA effect. This happens in physical world too. If your largest VM needs 64 GB of RAM, then each socket should have 72 GB. I consider RAM overhead. Note that Extra RAM = Slower boot. ESXi is creating swap file that match the RAM size. You can use reservation to reduce this, so long you use “% based” in Cluster setting. ESXi host should be >2x the largest VM. Decide: Blade or Rack or Converged Decide: IP or FC storage If you use Converged, then it’s either NFS or iSCSI All Tier 1 vendors (HP, Dell, IBM, Cisco, etc) make great ESXi hosts. Hence the following guidelines are relatively minor to the base spec. Additional guidelines for selecting an ESXi Servers: Does it have Embedded ESXi? How much local SSD (capacity and IOPS) can it handle? This is useful for stateless desktop architecture. Useful when using local SSD as cache or virtual storage. Does it have built-in 2x 10 GE ports? Does the built-in NIC card have hardware iSCSI capability? Memory cost. Most ESXi Server has around 128 GB of RAM What are the server unique features for ESXi? Management integration. Majority of the server vendors have integrated management with vCenter. Most are free. Dell is not free, although it has more features? DPM support

49 ESXi Host: CPU CPU performance has improved drastically.
Something like 1800% No need to buy the highest end CPU as the Premium is too high. Use the savings and buy more hosts instead, unless: the # hosts are becoming a problem you need to run high performance single thread You need to run more VM per host. The 2 table below VMmark result First table shows improvement from Based on VMmark 1.x Second table shows from 2010 to May 2013 based on VMmark 2.x Intel® Xeon® E (2.3GHz/6-core/15MB/95W) Processor priced at US$ 879. Xeon 5600 delivers 9x improvement over Xeon 5100 Clock speed only improves by 0.1x and # core by 3x Fujitsu delivers VMmark result of at 27 tiles on July 2010 Recommendation (for Intel) Use Xeon 2803 or E if budget is the constraint and you don’t need to run >6 vCPU VM. Use Xeon 2820 or E5-2650L if you need to run 8 vCPU VM. Use Xeon 2850 if you need to run 10 vCPU VM Recommendation (for Intel, 4 socket box) Use 4807, then 4820, then 4850. AMD Opteron at 2 sockets 24 cores is tiles. An impressive number too. Xeon is around 18% faster, not a huge margin anymore. But Xeon uses 12 cores, while Opteron uses 24 cores. So each core is around 2x faster.

50 ESXi Host: CPU Sizing Buffer the following:
Agent VM or vmkernel module: vShield App or other hypervisor-based firewall Hypervisor based firewall such as vShield App Hypervisor based IDS/IPS such as TrendMicro Deep Security vSphere Replication Distributed Storage HA event. Performance isolation. Hardware maintenance Peak: month end, quarter end, year end Future requirements within the same fiscal year DR. If your cluster needs to run VM from the Production site. The table below is before we add HA into account. So it is purely from performance point of view. When you add the HA host, the day to day ratio will drop. So the utilisation will be lower as you have “spare” host Doing 2 vCPU per 1 physical core is around 1.6x over-subscribe, as there is benefit of Hyper-Threading. Tier vCPU Ratio VM Ratio Total vCPU (2 sockets, 16 cores) Average VM size Tier 1 2 vCPU per core 5:1 32 vCPU 32/5 = 6.4 vCPU each Tier 2 4 vCPU per core 10:1 – 15:1 64 vCPU 64/10 = 6.4 vCPU each Tier 3 or Dev 6 vCPU per core 20:1 – 30:1 96 vCPU 96/30 = 3.2 vCPU each

51 UNIX  X64 migration: Performance Sizing
When migrating from UNIX to X64, we can use industry standard benchmark where both platforms participate. Benchmarks like SAP and SPEC are established benchmark, so we can easily get data from older UNIX machines (which are common source of migration as they have reached 5 years and hence have high maintenance cost). Based on SPEC-int2006 rate benchmark results published July 2012: HP Integrity Superdome (1.6GHz/24MB Dual-Core Intel Itanium 2) 128 cores SPEC-int2006 rate result: 1650 Fujitsu / Oracle SPARC Enterprise M cores SPEC-int2006 rate result: 3150 IBM Power 780 (3.44 GHz, 96 core) SPEC-int2006 rate result: 3520 IBM result per core is higher than X64 as it uses MCM module. In Power series CPU and Software are priced at core basis, not socket. Bull Bullion E (160 cores - 4TB RAM) SPEC-int2006 rate result : 4110 Sizing of RAM, Disk and Network are much easier as we can ignore the speed/generation. We simply match it. For example, if the UNIX apps need 7000 IOPS and 100 GB of RAM we simply match it. The higher speed of RAM is a bonus. With Flash and SSD, IOPS is no longer concern. The vCPU is the main factor as UNIX partition can be large (e.g. 48 cores), and we need to reduce the vCPU. Source: Bull presentation at Vmworld 2012

52 ESXi Host: RAM sizing 64 GB 64 GB
How much RAM? It depends on the # core in previous slide. Not so simple anymore. Each vendor is different. 8 GB DIMM is cheaper than 2x 4 GB DIMM. 8 GB per core. So 12 core means around 96 GB. Consider the channel best practice Don’t leave some empty. This bring benefits of memory interleaving. Check with the server vendor on the specific model. Some models now comes with 16 slots per socket, so you might be able to use lower DIMM size. Some vendors like HP has similar price between 4 GB and 8 GB. Dell R710 has 18 DIMM slots (?) IBM x3650 M3 has 18 DIMM slots HP DL 360/380 G8 has 24 DIMM slots HP DL 380 G7 and BL490cG6/G7 have 18 DIMM slots Cisco has multiple models. B200 M3 has 24 slots. VMkernel has Home Node concept in NUMA system. For ideal performance, fit a VM within 1 CPU-RAM “pair” to avoid “remote memory” effect. # of vCPUs + 1 <= # of cores in 1 socket. So running a 5 vCPU VM in a quad-core will force remote memory situation VM memory <= memory of one node Turn on Large Page, especially for Tier 1. Need application-level support 64 GB 64 GB Input from HP: We only put premium on special 8GB or 4GB DIMM which are much lower power. This is only for unique customer which want to reduce the thermal envelope of their environment. Memory performance of memory will drop as you put more DIMM per channel. HP had worked with Intel to boost 2 DIMM per channel, so with hp servers, there is a feature to turn dimm per channel at 1333mhz (ie: 12 dimm slots populated but still run 1333mhz). This is good to cater to sweet spot memory config of 96GB RAM per 2 socket cpu)

53 ESXi Host: IO & Management
IO requirements will increase in The table provides estimate. It is a prediction based on tech preview or VMworld Actual result may vary. Converged Infrastructure needs high bandwidth IO card I personally prefer 4x 10 GE NIC Not supported: mixing hardware iSCSI and software iSCSI. Management Lights-out management So you don’t have to be in front of physical server to do certain thing (e.g. go into CLI as requested by VMware Support) Hardware agent is properly configured Very important to monitor hardware health due to many VMs in 1 box. PCI Slot on the motherboard Since we are using 8 Gb FC HBA, make sure the physical PCI- E slot has sufficient bandwidth. A single-dual port FC port makes more sense if the saving is high and you need the slot. But there is a risk of bus failure. Also, double check to ensure the chip can handle the throughput of both ports. If you are using blade, and have to settle for a single 2-port HBA (instead of two 1-port HBA), then ensure the PCI slot has bandwidth for 16 Gb. When using a dual-port HBA, ensure the chip & bus in the HBA can handle the peak load of 16 Gb. Estimated ESXi IO bandwidth in early 2014 Purpose Bandwidth Remarks VM 4 Gb For ~20 VM. vShield Edge VM needs a lot of bandwidth as all traffic pass through it FT 10 Gb Based on VMworld 2012 presentation. Distributed Storage Based on Tech Preview in 5.1 and Nutanix vMotion 8 Gb vSphere 5.1 introduces shared-nothing live migration. This increases the demand as vmdk is much larger than vRAM. Include multi-NIC vMotion for faster vMotion when there are multiple VMs to be migrated. Management 1 Gb Copying a powered-off VM to another host without shared datastore takes this bandwidth IP Storage 6 Gb NFS or iSCSI. Not the same with the Distributed Storage as DS is not serving VM. No need 10 Gb as the storage array is likely shared by hosts. The array may only have 40 Gb total for all these hosts. vSphere Replication Should be sufficient as the WAN link is likely the bottlenect Total 40 Gb Use a NIC that supports the following: Checksum offload Capability to handle high memory DMA (64-bit DMA addresses) Capability to handle multiple scatter/gather elements per Tx frame Jumbo frames. The benefit of Jumbo Frames seems to be not conclusive. I think it’s a good practice as it prepares for VXLAN. We require NetQueue to achieve 10 GE inside the VM. Without it, VM gets only 3–4 Gbps. Good consideration on PCI and bandwidth: It will be interesting to measure how much Ethernet and FC bandwidth is required for various tier of clusters for a said config of VM eg: a box with 10 VM for PROD GOLD TIER should have x GB Ethernet and y Gb FC bandwidth. This will ensure no oversubscription at the network and FC I/O point of view and to ensure end to end performance. A lot of people spending big $$ on cpu and forgot about end-to-end scaling for performance. Many customers simply mix production Tier 1 and Tier 3, without knowing how much LAN and SAN bandwidth does Tier 1 need Good reading on remote mgmt and hardware mgmt via iLO:

54 Large: Sample ESXi host specification
Estimated Hardware Cost: US$ 10K per ESXi. Configuration included in the above price: 2 Xeon X5650. The E series has different performance & price attributes 128 GB RAM 4x 10 GE NIC. No HBA 2x 100 GB SSD. Swap to host-cache feature in ESXi 5 Running agent VM that is IO intensive Could be handy during troubleshooting. Only need 1 HD as it’s for troubleshooting purpose. No installation disk. We will use auto-deploy, except for management cluster. Light-Out Management. Avoid using WoL. Uses IPMI or HP iLO. Costs not yet included LAN switches. Around S$15 K for a pair of 48-port GE switch (total 96 ports) SAN switches. In the world of virtualization, the use of 1 GE has both pragmatic and flexibility issues, look in to cable management, proliferation of access switches and integration issues into the data center architecture. Look into 10G LOM solutions. memory performance of memory will drop as u put more DIMM per channel. 1 dimm per channel = 1333 mhz  ==> total 6 dimms populated 2 DIMM per channel = 1066 mhz   ==> total 12 dimms populated 3 DIMM per channel = 800mhz  ==> 18 dimms populated however, hp had worked with intel to boost 2 DIMM per channel, so with hp servers, there is a feature to turn dimm per channel at 1333mhz (ie: 12 dimm slots populated but still run 1333mhz). This is good to cater to sweet spot memory config of 96GB RAM per 2 socket cpu)

55 Blade or Rack or Converged
Both are good. Both have pro and cons. Table below is relative comparison, not absolute. Consult principal for specific model. Below is just for guidelines. Comparison below is only for vSphere purpose. Not for other use case, say HPC or non VMware. There is a 3rd choice, which is converged infrastructure. Example is Nutanix. Blade Rack Relative Advantages Some blades come with built-in 2x 10 GE port. To use it, you just need to get 10 GE switch. Less cabling, less problem. Easier to replace a blade Better power efficiency. Better rack space efficiency. Better cooling efficiency. The larger fan (4 RU) is better than the small fan (2 RU) used in rack Some blade can be stateless. The management software can clone 1 ESX to another. Better management, especially when you have many ESXi hosts. Typical 1RU rack server normally comes with 4 built-in ports. Better suited for <20 ESX per site More local storage Disadvantages More complex management, both on Switch and Chassis. Proprietary too. Need to learn the rules of the chassis/switches. Positioning of the switch matters in some model. Some blade virtualise the 10 GE NIC and can slice it. This adds another layer + complexity. Some replacement or major upgrade may require entire chassis to be powered off Some have 2 PCI slots only. Might not support if you need >20 GE per ESXi. Best practice recommends 2 enclosures. The enclosure is passive, it does not contain electronic. There is initial cost as each chassis/enclosure needs to have 2 switches. Ownership of the SAN/LAN switches in the chassis needs to be made clear. The common USB port in the enclosure may not be accessible by ESX. Need to check with respective blade vendor. USB dongle (which you should not use) can only be mounted in front. Make sure it’s short enough that you can still close the rack door. The 1 RU rack server has very small fan, which is not as good as larger fan. Less suited when each DC is big enough to have 2 chassis Cabling & rewiring From I have also seen customers with 2 Blade Chassis in a C Blades in each. An firmware issue affected all switch modules simultaneously instantly isolating all blades in the same chassis. Because they were the first 6 blades built it took down all 5 Primary HA agents. The VMs powered down and never powered back up. Because of this I recommend using two chassis and limiting cluster size to 8 nodes to ensure that the 5 primary nodes will never all reside on the same chassis. NOT all Blade Enclosure are passive device. There is/are blade enclosure that contains many Active components. On Rack-mount, the size of fan does not really matter actually as each fan is designed to suit its server.

56 ESXi boot options 4 methods of ESXi boot Need installation:
Local Compact Flash Local Disk SAN Boot No need installation. LAN Boot (PXE) with Auto-Deploy Auto Deploy Environment with >30 ESXi should consider Auto Deploy. Best practice is to put the IT Cluster on non-autodeploy. An ideal ESX is just pure CPU and RAM. No disk, no PCI card, no identity. Auto-Deploy is also good for environment where you need to prove to security team that your ESXi has not been tempered (you can simply boot it and it is back to “normal” ) Centralised image management. Consideration when the Auto-Deploy infrastructure are also VM: Keep the IT using local install. Advantages of Local Disk to SAN boot No SAN complexity Need to label the LUN properly. Disadvantages of Local Disk to SAN boot Need 2 local disk, mirrored. Certain organisation does not like local disk. Disk is a moving part. Lower MTBF. Save power/cooling SAN boot is the new trends toward stateless computing. This further enable abstraction for ESX host in the cloud computing world by decoupling logical resources from physical at the host level.

57 Storage Design

58 Methodology SLA Datastore Cluster VM input Mapping Monitor
Define the standard (Storage Driven Profile) Define the Datastore profile. Map Cluster to Datastore Cluster For each VM, ask the owner to choose: Capacity (GB) Which Tier they want to buy. Let them decide as they know their own app Map each VM to each datastore Create another DS if insufficient (either capacity or performance) See next slide for detail Most app team will not know their IOPS and Latency requirement. Make it as part of Storage Tiering, so they consider the bigger picture Turn on Storage IO Control Storage IO Control is per datastore. If underlying LUN shares spindles with all other LUN, then it may not achieve the result. Consult with storage vendor on this as they have entire array visibility/control.

59 SLA: 3 Tier pools of Storage Profile
Create 3 Tiers of Storage with Storage DRS This become the type of Storage Pool presented to clusters or VM. Implement VASA so the profiles are automatically presented and compliance check can be performed. Paves for standardisation Choose 1 size for each Tier. Keep it consistent. Choose an easy number (e.g vs 800). Tier 3 is also used in non production. Map the ESX Cluster tier to the Datastore Tier. If a VM is on Tier 1 Production cluster, then it will be placed on Tier 1 Datastore, not Tier 2 datastore. The strict mapping reduces the #paths drastically. Example Based on the Large Cloud scenario. Small Cloud will have simpler and smaller design. Snapshot means protected with array-level snapshot for fast restore VMDK larger than 1.5 TB will be provisioned as RDM. RDM will be used sparingly. Virtual-compatibility mode used unless App team said so. Tier 2 and 3 can have large Datastore as replication is done at vSphere layer. Interface will be FC for all. This means storage vMotion can be done with VAAI Consult storage vendor for array specific design. I don’t think the array has Shares & Reservation concept. IOPS Array can’t guarantee or control latency per Tier. 1 TB was selected as it provides the best balance between performance and manageability with approximately VMs and virtual disks per volume. For manageability, it allows an adequately large portion of disks to better use resources and limit storage sprawl. A smaller size maintains a reasonable RTO and reduces the risks associated with losing a single LUN. In addition, the size limits the number of VMs that remain on a single LUN. I got different results from checking the IOPS: Commercial SSD, with Multi-Layer Cell: 1000 – 2000 IOPS Enterprise Flash Drive, with Single-Layer Cell & much higher speed & buffer: 6000 – 30K IOPS 15k rpm: IOPS. Some say 250. Some say 150 – 200. So I take 150 as it’s easy to remember. 10k rpm: IOPS. Some say 100 – 150. So I take 100 as it’s easy to remember. 7200 rpm: IOPS. Some say Some say 120. I don’t use this in my design. Too big a capacity will encourage more VM. 5400 rpm: IOPS SAS drives are a good compromise (cost vs capacity vs speed) at  IOPS From virtualgeek: Even if you do thin provisioning at the array level, eagerzeroedthick VMDKs “cancel it out” – because they “pre-zero” every portion of the VMDK.    Note that production storage dedupe or compression techniques can solve this (since all the zeros can be eliminated), but thin-provisioning at the array level on it’s own cannot. The data we have collected reveals: Both thin and thick disks perform similarly on various workloads. Thin provisioned disks show similar performance trends as thick disks do when scaled across different hosts. External fragmentation has negligible impact on the performance of thin provisioned disks. There is insignificant performance impact on existing thick disks if thin provisioning is implemented on a shared array. There are five key features that Storage DRS offers: Resource aggregation Initial Placement Datastore Maintenance Mode Load Balancing Affinity Rules Tier Price Min IOPS Max Latency RAID RPO RTO Size Limit Replicated Method Array Snapshot # VM 1 4x/GB 6000 10 ms 10 15 minute 1 hour 2 TB 70% Array level Yes ~10 VM. EagerZeroedThick 2 2x/GB 4000 20 ms 2 hour 4 hour 3 TB 80% vSphere level No ~20 VM. Normal Thick 3 1x/GB 2000 30 ms 8 hour 4 TB ~30 VM. Thin Provision

60 Arrangement within an array
Below is a sample diagram, showing disk grouping inside an array. The array has 48 disks. Hot Spare not shown for simplicity This example only has 1 RAID Group (2+2) for simplicity Design consideration Datastore 1 and Datastore 2 performance can impact one another, as they share physical spindles. Each datastore spans across 16 spindles. IOPS is only 2800 max (based on 175 IOPS for a 15K RPM FC disk). Because of RAID, the effective IOPS will be lower. The only way they don’t impact if there are “Share” and “Reservation” concept at “meta slice” level. Datastore 3, 4, 5, 6 performance can impact one another. DS 1 and DS 3 can impact each other since they share the same Controller (or SP). This contention happens if the shared component becomes bottlenect (e.g. cache, RAM, CPU). The only way to prevent is to implement “Share” or “Reservation” at SP level. For Storage IO Control to be effective, it should be applied to all datastores sharing the same physical spindles. So if we enable for Datastore 3, then Datastore 4, 5, 6 should be enabled too. Avoid different settings for datastores sharing underlying resources (spindles, controller, cache, port) Use same congestion threshold them. Use comparable share values (e.g. use Low / Normal / High everywhere) Avoid using extent. SIOC is not supported on datastores with multiple extents. Therefore Storage DRS cannot be used for I/O load balancing.

61 Storage IO Control Datastore A Datastore B SIOC
Storage IO Control is at the Datastore level There is no control at RDM level. ??? But array normally share spindles. In the example below, the array has 3 volumes. Each volume is configured the same way. Each has 32 spindles in RAID10 configuration (8 units of 2+2 disk groups). There are non vSphere sharing the same spindles Best practices Unless the array has “Shares” or “Reservation” concept, then avoid sharing spindles between each Storage Profile. Datastore A Datastore B SIOC

62 Storage DRS, Storage IO Control and physical array
Array is not aware of VM inside VMFS. It only sees LUN. Moving VM from 1 datastore to another will look like large IO operation to the array. One LUN will decrease in size, while the other one increase drastically. With array capable of auto-tiering: VMware recommends configuring Storage DRS in Manual Mode with I/O metric disabled Use Storage DRS for initial placement and out of space avoidance features Whitepaper on Storage DRS interoperability with Storage Technologies: Feature or Product Initial Placement Migration Recommendations Array-based replication (SRDF, MirrorView, SnapMirror, etc ) Supported Moving VM from one datastore to another can cause a temporary lapse in SRM protection (?) and increase size of next replication transfer. Array-based snapshots Moving VM from one datastore to another can cause increase in space usage in the destination LUN, so the snapshot takes longer. Array-based Dedupe Moving VM from one datastore to another can cause temporary increase in space usage, so the dedupe takes longer. Array based thin provisioning Supported on VASA-enabled arrays only [e1: reason??] Array-based auto-tiering (EMC FAST, Compellent Data Progression, etc) Do not use IO-based balancing. Just use Space-based. Array-based I/O balancing (Dell Equallogic) n/a as it is controlled by the array For array-based replication, SDRS initial placement is supported, but we recommend using manual mode for its migration recommendations, so that the recommendations can be approved with their impact on protection and replication transfer size in mind. Similarly, for array-based snapshots & dedupe, SDRS initial placement is supported, but we recommend using manual mode for its migration recommendations, so the potential increase in space usage can be considered by the admin. Storage DRS uses the construct “DrmDisk” The DrmDisk represent a consumer of datastore resources The smallest entity it can migrate VMDK = DrmDisk. Snapshot part of DrmDisk. VM files (Swap, VMX, log) contained into one DrmDisk (VM configuration). Storage DRS can load balance per DrmDisk

63 RAID type In this example, I’m using just RAID10.
Generally speaking, I see a rather high Write ratio (around 50%). RAID5 will result in higher cost, as it needs more spindle. More spindles gives the impression we have enough Storage. It is difficult to say no to request when you don’t have storage issue. More spindles mean you’re burning the environment more. vCloud Suite introduces additional IOPS outside the guest OS VM boots results in writing the vRAM to disk. Storage vMotion and Storage DRS Snapshot Mixing RAID5 and RAID10 This increases complexity. RAID5 was used for capacity. But nowadays each disk is huge (2 TB). I’d go for mixing SSD and Disk, then mixing RAID type. So it is: SSD RAID 10 for performance & IOPS Disk RAID 10 for capacity I’d for just 2 tier instead of 3. This minimises the movement. Each movement cost both read and write. Sample below is based on 150 IOPS per spindle. Need to achieve 1200 IOPS The above table comes from Patrick Carmichael, VMware, presentation at VMworld Session ID is INF-STO1807. How many IOPs can I achieve with a given number of disks? Total Raw IOPS = Disk IOPS * Number of disks Functional IOPS = (Raw IOPS * Write%)/(Raid Penalty) + (Raw IOPS * Read %) How many disks are required to achieve a required IOPS value? Disks Required = ((Read IOPS) + (Write IOPS*Raid Penalty))/ Disk IOPS RAID Level # Disks required. (20% Write) (80% Write) 6 16 40 5 13 27 (nearly 2X of RAID10) 10 14 RAID Type Write IO Penalty 5 4 6 10 2

64 Cluster Mapping: Host to Datastore
Always know which ESX cluster mounts what datastore cluster Keep the diagram simple. Main purpose is to communicate with other team. Not too many info. The idea is to have a mental picture that they can understand If your diagram has too many lines, too many datastores, too many clusters, then it is too complex. Create a Pod when such thing happens. Modularisation makes management and troubleshooting easier. This diagram shows clearly the mapping between datastores and clusters. You need to draw something like this. Keep it simple, as the idea is to have a mental picture. Good read: Why should a zoned lun not be shared between clusters? I tried to find an official stance from VMWare on the communities section and the only information I could find was that you could share a lun between clusters or more accurately between all ESX host in multiple clusters. I would rather not share the luns between clusters, but a team member seems to be bent on doing it. What reasons personal, political, or technical led to the statement in the book about not having luns cross environmets? There are several reasons I would never do it in an environment that I was designing. 1) You want to limit the number of hosts connecting to any particular storage volume. This is primarily for performance reasons. If you have say, 20 hosts, each with 2 data paths, all talking to the same LUN for multiple VMs, you likely have a lot of disparate I/O going on between host and array. VMFS as a file system cannot effectively handle more than hosts all talking to the same LUN at the same time actively (Although the cluster limit was bumped from 16 to 32). The idea here, is you will never have all 32 hosts of a cluster communicating to a single LUN at the same time due to how you spread your VM load. 2) Assigning all LUNs to all hosts is a management nightmare. Services such as VMotion, DRS, and HA cannot function across multiple clusters, so the primary driver of "shared storage" no longer exists. Forcing the ESX Admin into ugly management of determining "Where can I put particular VMs for a particular cluster without impacting another's performance?" will become extremely tedious. Simply not zoning a LUN to a group of servers is something a storage admin SHOULD be familiar with and should have no impact on the backend infrastructure... Managing a nasty spider web of data paths is hard enough within a single cluster...let alone multiple. I consider this on par with why you would VLAN a network. Just because you CAN create a giant class A for every node in your network, why in the world would you? 3) Data security is another reason. If your clusters serve particular purposes such as DMZ vs Internal vs Partner/Customer clusters or simply Prod vs Dev, you don't want to have the ability for someone to accidentally map storage of an internal or partner system onto a DMZ server. While controls should be in place to prevent it from a process perspective, you will never be able to fix "human error", unless you simply make it impossible to do.

65 Mapping: Datastore Replication
You should also have a datastore replication diagram. Just like previous diagram, is serves the same purpose. So keep it simple.

66 Type of Datastores Types of datastore
Business VM Tier 1 VM, Tier 2 VM, Tier 3 VM, Single VM Each Tier may have multiple datastores. IT VM Staging VM From P2V process, or moving from Non-Prod to Prod. Isolated VM Template & ISO Desktop VM Mostly local datastore on ESXi host, backed by SSD. SRM Placeholder Datastore Heartbeat Pro: Dedicated DS so we don’t accidentaly impact while offlining a datastore. Cons: another 2 DS to manage per cluster. Increase scanning time. Can use the SRM placeholder as heartbeat? Always know where a key VM is stored. A Datastore corruption, while rare, is possible. 1 datastore = 1 LUN Relative to “1 LUN = Many VMFS”, it gives better performance due to less SCSI reservation Other guides: Use Thin Provisioning at array level, not ESX level. Separate Production and Non Production. Add a process to migrate into Prod. You can’t guarantee Production performance if VM moves in and out without control. RAID level does not matter so much if Array has sufficient cache (with battery backed, naturally) 20% free capacity for VM swap files, snapshots, logs, thin volume growth, and storage vMotion (inter tier). VMware, NetApp, & EMC all recommend that an applications with high I/O requirements or one which is sensitive to latency variation - these require a storage design that focuses on that particular VM, and should be isolated form other datasets. Ideally, the data will reside on a VMDK stored on a datastore that is connected to multiple ESX servers, yet is only accessed by a single VM. The name of the game with these workloads isn’t scale in terms of VMs per datastore, but scaling the performance of one VM.

67 Special Purpose Datastore
1 low cost Datastores for ISO and Templates Need 1 per vCenter data center. Need 1 per physical Data Center. Else you will transfer GBs of data across WAN. Around 1 TB ISO directory structure: 1 staging/troubleshooting datastore To isolate a VM. Proof to Apps team that datastore is not affected by other VM. For storage performance study or issue. Makes it easier to corelate with data from Array. The underlying spindles should have enough IOPS & Size for the single VM Our sizing: Small Cloud: 1 TB Large Cloud: 1 TB 1 SRM Placeholder datastore So you always know where it is. Sharing with other datastore may confuse others. Used in SRM 5 to place the VMs metadata so it can be seen in vCenter. 10 GB enough. Low performance. \ISO\ \OS\Windows \OS\Linux \Non OS\ store things like anti virus, utilities, etc

68 Storage Capacity Planning
Theory and Reality can differ. Theory is the initial, high level planning you do. Reality is what it is after 1-2 years. Theory or Initial Planning For green field deployment, use the Capacity Planner. The info on actual usage is usefull as the utilisation can be low. The IOPS info is good indicator too. For brown field deployment, use the existing VMs as indicator. If you have virtualised 70%, this 70% will be a good indicator as it’s your actual environment You can also use rules of thumb, such as: 100 IOPS per normal VM. 100 IOPS per VM is low. But this is a Concurrent Average. If you have 1000 VM, this will be 100K IOPS. 500 IOPS per database VM 20 GB per C:\ drive (or where you store OS + Apps) 50 GB per data drive for small VM 500 GB per data drive for Database VM 2 TB per data drive for File server Actual or Reality Use tool, such as VC Ops 5.6 for actual measurement. VC Ops 5.6 needs to be configured (read: tailored) to your environment. Create custom groups. For each group, adjust the buffer accordingly. You will need at least 3 groups, 1 per tier. I’d not use spreadsheet or Rules of Thumb for >100 VM environment. Use tools to find out the peak period of a VM. The built-in chart at vCenter requires you to see each VM disk usage manually, which can be time consuming for 100 VM. FLASH is becoming the storage of choice for performance sensitive environments Creates a new tier of storage between system memory and disk FLASH as is faster than disk, FLASH in the server is faster than FLASH in the SAN

69 Multi-Pathing Different protocol has different technology
NFS, iSCSI, FC all have different solution NFS uses single-path for a given datastore. No multi-pathing. So use multiple datastore to spread load In this design, I do not go for high-end array due to cost High-end Array gives Active/Active, so we don’t have to do regular load balancing. Most mid-range is Active-Passive (ALUA). Always ensure the LUNs are balanced among the 2 SP. This is done manually within the array. Choose ALUA array instead of plain Active/Passive Less manual work on the balancing and selecting the optimal path. Both controller can receive IO request/command, although only 1 owns the LUN. Path from the managing controller is the optimized path. Better utilization of the array storage processors (minimize unnecessary SP failover) vSphere will show both path as Active, but the Preferred one is marked “Active IO” Round Robin will issue IO across all optimized paths and will use non-optimized paths only if no optimized paths are available. See Array Type My selection Active/Active Round Robin or Fixed ALUA Round Robin or MRU Active/Passive EMC PowerPath/VE 5.4 SP2 Dell EquaLogic EquaLogic MMP HP/HDS PowerPath/VE 5.4 SP2? ALUA allows hosts to determine the states of target ports and prioritize paths. The host uses some of the active paths as primary while others as secondary. vSphere 4.0 changes the naming pattern from vmhbaN:N:N:N by adding the channel. The channel is useful in iSCSI, although not all iSCSI array vendors use it. Vmware software iSCSI is part of the Cisco iSCSI Initiator Command Reference. Dell/Equalogic PSP - Uses a “least deep queue” algorithm rather than basic round robin Can redirect IO to different peer storage nodes Asymmetric Logical Unit Access (ALUA). ALUA-complaint storage systems provide different levels of access per port. ALUA allows hosts to determine the states of target ports and prioritize paths. The host uses some of the active paths as primary while others as secondary. Picking between MRU and Fixed is easy in my opinion as MRU is aware of optimized and unoptimized paths it is less static and error prone than Fixed. When using MRU however be aware of the fact that your LUNs need to be equally balanced between the storage processors, if they are not you might be overloading one storage processor while the other is doing absolutely nothing. This might be something you want to make your storage team aware off. The other option of course is Round Robin. With RR 1000 commands will be send down a path before it switches over to the next one. Although theoretically this should lead to a higher throughput I haven’t seen any data to back this “claim” up. Would I recommend using RR? Yes I would, but I would also recommend to perform benchmarks to ensure you are making the right decision.

70 FC: Multi-pathing VMware recommends 4 paths
Path is point to point. The Switch in the middle is not part of the path as far as vSphere is concerned. Ideally, they are all active-active for a given datastore. Fixed means 1 path active, 3 idle. 1 zone per HBA port. The zone should see all the Target ports. If you are buying new SAN Switches, consider the direction for the next 3 years. Whatever you choose will likely be in your data center for the next 5 years. If you are buying a Director-class, then consider for the next 5 years. Upgrading Director is a major work, so plan for 5 years usage. Consider both EOL and EOSL date. Discuss with SAN switches vendors and understand their roadmap. 8 Gb and FCoE are becoming common Round-Robin It is per Datastore, not per HBA. 1 ESX host typically has multiple datastores. 1 Array certainly has multiple datastores. All these datastores share the same SP, Cache, Ports, and possibly spindles. It is active/passive at a given datastore. Leave the default settings of No need to set iooperationslimit=1 Choose this over MRU. MRU needs manual fail back after path failure. When you start your ESX/ESXi host or rescan your storage adapter, the host discovers all physical paths to storage devices available to the host. Based on a set of claim rules defined in the /etc/vmware/esx.conf file, the host determines which multipathing plug-in (MPP) should claim the paths to a particular device and become responsible for managing the multipathing support for the device. By default, the host performs a periodic path evaluation every 5 minutes causing any unclaimed paths to be claimed by the appropriate MPP. The claim rules are numbered. For each physical path, the host runs through the claim rules starting with the lowest number first. The attributes of the physical path are compared to the path specification in the claim rule. If there is a match, the host assigns the MPP specified in the claim rule to manage the physical path. This continues until all physical paths are claimed by corresponding MPPs, either third-party multipathing plugins or the native multipathing plug-in (NMP Belmont: Note that the FC switches need not be physically separated especially when you have multiple zone, the complexity in management and lack of flexibility can be an issue. Look into VSANs when you have many zones and leverage on FCoE capable switches for investment protection, since you are not going to deploy this for just 1-2 years…

71 FC: Zoning & Masking Implement zoning Do it before going live, or during quite maintenance window due to high risk potential 1 zone per HBA port. 1 HBA port does not need to know the existence of others. This eliminates the Registered State Change Notification Use soft zoning, not hard zoning Hard zone: zone based on the SAN Switch port. Any HBA connects to this switch port get this zone. So this is more secure. But be careful when recabling things into the SAN switch! Soft zone: zone based on the HBA port. The switch port is irrelevant. Situation that needs rezoning in Soft Zone: Changing HBA, replacing ESX server (which comes with new HBA), upgrading HBA Situation that needs rezoning in Hard Zone: reassigning the ESX to another zone, port failure in the SAN switch. Virtual HBA can further reduce cost and offer more flexibility Implement LUN Masking Complement zoning. Zoning is about path segregation, zone is about access. Do at array level, not ESX level. Mask on the array, not on each ESXi host. Masking done at the ESXi host level is often based on controller, target, and LUN numbers, all of which can change with the hardware configuration Port zoning allows devices attached to particular ports on the switch to communicate only with devices attached to other ports in the same zone. The SAN switch keeps a table indicating which ports are allowed to communicate with each other. Port zoning is more secure than WWN zoning, but it creates a number of problems because it limits the flow of data to connections between specific ports on the fabric. We use QLogic and Emulex open source Linux drivers, which are already FC-SW2 compliant (tested / validated with switch vendors).

72 FC: Zoning & Masking See the figure, there are 3 zones.
Zone A has 1 initiator and 1 target. Single-Initiator zone is good. Zone B has two initiators and targets. This is bad. Zone C has 1 initiator and 1 target Both SAN switches are connected via an Inter-Switch Link. If Host X rebooted and it’s HBA in Zone B logs out of the SAN, an RSCN will be sent to Host Y’s initiator in Zone B and cause all I/O going to that initiator to halt momentarily and recover within seconds. Another RSCN will be sent out to Host Y’s initiator in Zone B when Host X’s HBA logs back in to the SAN and cause another momentary halt in I\O. Initiators in Zone A and Zone C are protected from these events because there are no other initiators in these zones. Most latest SAN switches provide RSCN suppression methods. But suppressing RSCNs is not recommended, since RSCNs are the primary way for initiators to determine an event has occurred and to act on the specified event such as lost of access to targets. Zoning is implemented in the switches and controls which HBA ports have access to which storage processor ports Source: In the figure above, there are three zones illustrated. Zone A and Zone C both have one initiator and one target, while Zone B has two initiators and targets. Both switches in the SAN are connected via an Inter-Switch Link. If Host X rebooted and it’s HBA in Zone B logs out of the SAN, an RSCN will be sent to Host Y’s initiator in Zone B and cause all I/O going to that initiator to halt momentarily and recover within seconds. However, another RSCN will be sent out to Host Y’s initiator in Zone B when Host X’s HBA logs back in to the SAN and cause another momentary halt in I\O. Initiators in Zone A and Zone C are protected from these events because there are no other initiators in these zones. Most latest SAN switches provide RSCN suppression methods, however, suppressing RSCNs is not recommended since RSCNs are the primary way for initiators to determine an event has occurred and to act on the specified event such as lost of access to targets. It is important to follow established SAN best practices such as single initiator zones in order to avoid situations described and others not listed.

73 Large: Reasons for FC (partial list)
Network issue does not create storage issue Troubleshooting storage does not mean troubleshooting network too FC vs IP FC protocol is more efficient & scalable than IP protocol for storage Path failover is <30 seconds, compared with <60 seconds for iSCSI Lower CPU cost See the chart. FC has lowest CPU hit to process the IO, followed by hardware iSCSI Storage vMotion is best served with 10 GE FC consideration Need SAN skills. Troubleshooting skills, not just Install/Configure/Manage. Need to be aware of WWWWW. This can impact upgrade later on as new component may not work with older component We need to look at TCO vs per device cost, the 1G complexity can lead to outages due to L2 complexity in a highly virtualized environments. Further, if you work out the bandwidth and HA requirement for Clusters and VM mobility, the access switch design may not be cheap. In order to use VMware Storage VMotion your storage infrastructure must provide sufficient available storage bandwidth. For the best Storage VMotion performance you should make sure that the available bandwidth will be well above the minimum required For iSCSI and NFS, make sure that your network topology does not contain Ethernet bottlenecks, where multiple links are routed through fewer links, potentially resulting in oversubscription and dropped network packets. Any time a number of links transmitting near capacity are switched to a smaller number of links, such oversubscription is a possibility. Recovering from dropped network packets results in large performance degradation. In addition to time spent determining that data was dropped, the retransmission uses network bandwidth that could otherwise be used for new transactions. Applications or systems that write large amounts of data to storage, such as data acquisition or transaction logging systems, should not share Ethernet links to a storage device. These types of applications perform best with multiple connections to storage devices Before using VMware Storage VMotion make sure you have sufficient storage bandwidth between the ESX host where the VM is running and both the source and destination storage arrays. This is necessary because the VM will continue to read from and write to the source storage array, while at the same time the virtual disk to be moved is being read from the source storage array and written to the destination storage array. While this is happening, both storage arrays may have other traffic (from other VMs on the same ESX host, from other ESX hosts, etc.) that can further reduce the effective bandwidth. With insufficient storage bandwidth, Storage VMotion can fail. With barely sufficient bandwidth, while Storage VMotion will succeed, its performance might be poor. When possible, schedule Storage VMotion operations during times of low storage activity, when available storage bandwidth is highest. This way Storage VMotion will have the highest performance.

74 Large: Backup with VADP
1 back up job per ESX, so impact to production is minimized. Because there were four ESX server/datastore pairs, four NetBackup policies were configured – one policy per ESX server. This allowed us to limit the number of simultaneous backups that occurred against each ESX server. Using this method, the backup I/O load on each ESX datastore was similar, backup performance and reliability was optimized Notes for the NFS (Small and Medium cloud): With NFS and array based snapshots, one has the greatest ease and flexibility on what level of granularity can be restored. With an array-based snapshot of a NFS datastore, one can quickly mount a point-in-time copy of the entire NFS datastore, and then selectively extract any level of granularity they want. Although this does open up a bit of a security risk, NFS does provide one of the most flexible and efficient restore from backup option available today. For this reason, NFS earns high marks for ease of backup and restore capability. The vSphere 4 vStorage APIs for Data Protection have been designed so that no additional holding tank or staging is required. This also means that the concept of a backup proxy no longer applies. VM backups can be configured using standard NetBackup master server, media server, or clients. Special purpose Backup Proxy systems designed specifically for VM backups and additional staging area storage no longer need to be purchased. Another feature of the vStorage API is changed block tracking. This is a block-level incremental backup implementation. After the initial full backup is performed, subsequent block-level incremental backups transfer to the backup system only the blocks that have changed since the previous full or incremental backup. This shortens backup windows while retaining full disaster recovery restore functionality VMware Consolidated Backup did not support an incremental backup technology where the entire VM could be automatically restored from both full and incremental (vmdk) backups to a specific point in time.

75 Backup Server A backup server is an "I/O Machine"
By far, majority of work done is I/O related Performance of disk is key Fast internal bus is key. Multiple internal buses desirable. No share path. 1 port from ESX (source) and 1 port to tape (target) Lots of data in from clients and out to disk or tape Not much CPU usage. 1 socket 4-core Xeon 5600 is more than sufficient Not much RAM usage. 4 GB is more than enough But Deduplication uses CPU and RAM Deduplication relies on CPU to compare segments (or blocks) of data to determine if they have been previously backed up or if they are unique. This comparison is done in RAM. Consider 32 GB RAM (64 bit Windows) Size the concurrency properly Too many simultaneous backups can actually slow the overall backup speed. Use backup policy to control the number of backups that occur against any datastore. This minimizes that I/O impact on datastore, as it must still serve production usage. 2 ways of back up: Mount the VMDK file as a virtual disk (with a drive letter). Back up software can then browse the directory. Mount the VM as image file. Source: Symantec public presentation at VMworld 2009 and VMware documentation Deduplication relies heavily on both memory (RAM) and CPU resources. Today’s CPUs are powerful that for traditional backups it is common for the backup system CPU to be underutilized. But deduplication changed this significantly. Deduplication relies heavily on CPU power to compare segments (or blocks) of data to determine if they have been previously backed up or if they are unique. More and faster CPUs can improve overall deduplication performance which in turn improves backup performance Deduplication technologies are particularly suited to take advantage of large amounts of RAM. Before backup data is committed to disk, it is compared with data that has been previously backed up. This comparison process is performed in RAM instead of constantly comparing backup data that exists on disk

76 Network Design

77 Methodology Plan how VXLAN and SDN impacts your architecture
Define how vShield will complement your VLAN based network Decide if you will use 10 GE or 1 GE I’d go for 10 GE for the Large Cloud example If you use 10 GE, define how you will use Network IO Control Decide if you use IP storage or FC storage Decide the vSwitch to use: local, distributed, Nexus Decide when to use Load Based Teaming Select blade or rack mount This has impact on NIC ports and Switches Define the detailed design with vendor of choice

78 IP Multicast forwarding is required (based on IETF draft)
VXLAN Complete isolation of network layer Overlay networks are isolated from each other and the physical network Separation of Virtualization and Network layers Physical network has no knowledge of virtual networks Virtual networks spun up automatically as needed for VDCs Loss of visibility as all overlay traffic is now UDP tunneled Can’t isolate virtual network traffic from physical network Virtual networks can have overlapping address spaces Today’s network management tools useless in VXLAN environments The physical network – while excellent at forwarding packets – is an inflexible, complex and costly barrier to realizing the full agility required by cloud services Networking is bogged down in a 20-year-old operational model originally designed for manual provisioning on a device-by-device basis. Capability required: Network fault management Configuration management SNMP MIB polling Protocol analysis Capacity planning / modeling Challenges Edge Solves for Peak 10 (based on VMworld presentation, INF-NET2166. They are SP provider) Reduce physical firewall sprawl Reduce Ethernet cross connects Dramatically reduce provisioning time for new customers Ability to offer customers a VPN/SSL-VPN solution without the need for a dedicated hardware appliance Load balancing solution for smaller customers Physical firewall inventory will be greatly reduced Ease of deployment (first level support can deploy new firewalls) Things to share with Network team regarding VXLAN: IP Multicast forwarding is required (based on IETF draft) More multicast groups are better Multiple segments can be mapped to a single multicast group If VXLAN transport is contained to a single VLAN, IGMP Querier must be enabled on that VLAN If VXLAN transport is traversing routers, multicast routing must be enabled. Increased MTU needed to accommodate VXLAN encapsulation overhead Physical infrastructure must carry 50 bytes more than the VM VNIC MTU size. e.g MTU on VNIC -> 1550 MTU on switches and routers. Leverage 5-tuple hash distribution for uplink and interswitch LACP If VXLAN traffic is traversing a router, proxy ARP must be enabled on first hop router Prepare for more traffic between L2 domains

79 Network Architecture (still VLAN-based, not vCNS-based)
The above shows a fully virtualised network, where the network appliance is virtualised. From here you can show to network or security team that it is indeed possible to mix all the VLAN in the same ESX cluster. It is not using vCNS yet. This is still the traditional VLAN based solution.

80 ESXi Network configuration
LBT is per port group, not per switch! The diagram does not include vShield App yet It adds 1 hidden vSwitch per vSwitch for VM. Management network does not require vShield Zone protection. When using Cross-host Storage vMotion (shared nothing vMotion), to migrate a virtual machine with snapshots, you should provision at least a 1Gbps management network. This is because vMotion uses the Network File Copy (NFC) service to transmit the virtual machine's base disk and inactive snapshot points to the destination datastore. Because NFC traffic traverses the management network, the performance of your management network will determine the speed at which such snapshot content can be moved during vMotion migration. As in the non-snapshot disk-copy case, however, if the source host has access to the destination datastore, vMotion will preferentially use the source host’s storage interface to make the file copies, rather than using the management network.

81 Design Consideration Design consideration for 10 GE
We only have 2-4 physical port. This means we only have 1-2 vSwitch. Some customers have gone with 4 physical ports as 20 GE may not be enough for both Storage and Network Distributed Switch relies on vCenter Database corruption on vCenter will impact it. vCenter availability is more critical. Use Load Based Teaming This prevents one burst from impacting Production. For example, a large vMotion can send a lot of traffic. Some best practices Enable jumbo frame Disable STP on ESXi-facing ports on the physical switch Enable PortFast mode on ESXi-facing ports Do not use DirectPath IO, unless the app really has proof that it needs it. vSphere 4.1 onward can do 8 concurrent vMotion in 10 GE. LACP in 5.1 With LACP, no longer need to do static mode EtherChannels for link aggregation. LACP could not be used with either default Port ID or Load-based Teaming. Both the physical and the virtual switch using LACP must allow all MAC addresses to be sent over all interfaces simulatenously and the only vSphere NIC Teaming Policy that is compatible with this is IP Hash. LACP has a few advantages in the way it handles link failures and cabling mistakes. Source: - -

82 Network IO Control 2x 10 GE is much preferred to 12x 1 GE
10 GE ports give flexibility. Example, vMotion can exceed 1 GE when physical cable not used by other traffic But a link failure means losing 10 GE External communication can still be 1 GE. Not an issue if most communication is among VM. Use Use ingress traffic shaping to control traffic type into the ESX? Shares Bandwidth (per pNIC) Function vShield Remarks 20% VM – Production VM – Non Production VM – Admin Network VM – Back up LAN (agent) Yes A good rule of thumb is ~8 VM’s per Gigabit Admin Network is used for basic network services like DNS server, AD Server. Use vShield App to separate with Production. Complement existing VLAN, no need to create more VLAN The Infra VM is not connected to Production LAN, rather they are connected to Management LAN. 10% Management LAN VMware Management VMware Cluster Heartbeat No In some cases, the Nexus Control & Nexus Packet need to be physically separated from Nexus Management. vMotion Non routable, private network 15% Fault Tolerant 0 – 10% VM – Troubleshooting Same with Production. Used when we need to isolate the networking performance 5% Host-Based Replication No? Only for ESXi that is assigned to do vSphere replication. From throughput point of view, if the inter-site link is only 100 Mb, then you only need 0.1 GE max. Storage Why I don’t use 1 GE Consider 1-2 year ahead while looking at NIC ports. You may need to give more network for VM as you are running more VM , or running network-demanding VM. Once wired, it is hard (expensive too) to rewire. All cables are already connected and labelled properly to each physical switch ports. For blade, the choice is pretty much 10 GE as it has built-in But for the real physical deployment our best practice is using 2x 10G (standard 10G or FCoE 10G) Use of Jumbo Frames is recommended for best VMotion performance. Physical switch must support Jumbo Frames too. Jumbo Frames adds complexity to network design and maintenance over time Performance gains are marginal with common block sizes (4KB, 8KB, 32KB). vMotion uses large block size Note: IEEE 802 standards do not recognize jumbo frames While Jumbo Frame has 9000 Bytes, we need to deduct: - 20 Bytes for IP Header 8 Bytes for ICMP Header In OS like Windows, the MTU should be will fail at ping. In some apps like SQL Server, the default packet size should be set to 8192, as 8192/64 = 128.

83 Large: IP Address scheme
The example below is based on 1500 server VM and desktop VM, which is around 125 ESX and 125 ESX respectively. Do we separate the network between Server and Desktop farms? Since we are using the x.x.x.1 address, the basic network address (gateway) will be on x.x.x.254. Purpose IP Address Total Segments Remarks ESX iLO 1 per ESX 1 Out of band management & console access ESX Mgmt 1 per ESX. ESX iSCSI 1-2 per ESX Need 2 (1 address per active path) if we don’t use LBT and do static mapping ESX vMotion 2 per ESX Multi-NIC vMotion ESX FT Cannot multi path? Agent VMs 5 per ESX 3 vShield App, TrendMicro DS, Distributed Storage,etc. Mgmt VMs 1 per DC vCenter, SRM, Update Manager, vCloud, etc. Group in 20 so similar VMs have sequential IP address, easier to remember Address ESXi #001 ESXi #125 Remarks iLO x for Server farm. Enough for 254 ESX. x for Desktop farm x for non ESX (e.g. network switch, array, etc) Mgmt x for Server farm. Enough for 254 ESX. x for Desktop farm iSCSI This is for ESX only. Other devices should be on x VSA will have many addresses when it scales beyond 3 ESX. vMotion Fault Tolerance Agent VMs For iSCSI and vMotion to utilise >1 physical path, it has to have 1 IP address per path. The NIC teaming must be set to

84 Security & Compliance Design
Example: tracking configuration changes in vSphere & patching are part of compliance, not security.

85 Areas to consider VM Server Storage Network Management Guest OS vmdk &
Prevent DoS Log review VMware Tools Lockdown mode Firewall SSH Log review Zoning and LUN masking VMFS & LUN iSCSI CHAP NFS storage VLAN & PVLAN Management LAN No air gap with vShield Virtual appliance vSphere roles Separation of duty Source & tools vSphere hardening guide VMware Configuration Manager Other industry requirement like PCI-DSS Take advantage of vCNS Changing the paradigm in security. From “Hypervisor as another point to secure” to “Hypervisor to give unfair advantage for security team”. vShield App for firewall and vShield End Point for anti virus (only Trend Micro has the product as at Sep 2011) Does not need to throw away physical firewall first. Complement it by adding “object-based” rules that follows the VM. VM security vmdk encryption is not currently provided by vSphere, so we need to encrypt within the Guest OS instead. Need to ensure that the global AD administrator does not have access to mount the drive or login to any Windows. But strictly speaking this is outside the VMware security scope, as it’s normally done by the AD team. If a VM is compromised, it can be used to launch denial of service attack. It can also impact the performance of other shared component. Areas that can be impacted CPU and RAM: the VM can max out its vCPU & vRAM, or keep writing to its RAM, inducing load on vmkernel to serve it. Network: the VM can saturate the shared vmnic. Network IO Control can help here. Storage IOPS: the VM can saturate the IOPS, easily done by tools like Iometer. Storage IO control can help here. Storage space: the VM can generate lots of logs or fill up its vmdk.

86 Separation of Duties with vSphere
VMware Admin >< AD Admin In small setup, it’s the same person doing both. AD Admin has access to NTFS. This can be too powerful if it has data Segregate the virtual world Split vSphere access into 3. Storage Server Network Give Network to Network team. Give Storage to Storage team. Role with all access to vSphere should be rarely used. VM owner can be given some access that they don’t have in physical world. They will like the empowerment (self service) VMware Admin Networking Admin Server Admin Operator VM Owner Storage Admin MS AD Admin Network Admin DBA Apps Admin Enterprise IT space vSphere space

87 Folder Properly use it Do not use Resource Pool to organise VM.
Caveat: the Host/Cluster view + VM is the only view where you can see both ESX and VM. Study the hierarchy on the right It is Folder everywhere. Folder is the way to limit access. Certain object don’t have its own access control. They rely on folder. E.g. You cannot set permissions directly on a vNetwork Distributed Switches. To set permissions, create a folder on top of it. I tried the following: Create a folder. Put all local datastore into a folder Specify no access for a user. Login as that user. Go to any ESX, go to the Configuration tab. The local datastore not visible!

88 Storage related access
Non-Storage Admin should not have the following access Initiate Storage vMotion Rename or Move Datastore Create Low level file operations Different ways of controlling access Network level. The ESXi will not be able to access the entire array as it can’t even see it on the network Array level. Control which ESXi hosts can or cannot see. For iSCSI, we can configure per target using CHAP For FC, we can use Fibre Channel zoning or• LUN masking vCenter level. Using the vCenter permissions (folder level or datastore level). Most granular. You can now manage datastores like other inventory objects. That means you can configure and manage datastores from a centralized view, organize datastores into folders, and set permissions on a per folder or per datastore level. For example, it is now possible to block the creation of VMs or the creation of snapshots on a per datastore basis or through the use of folders. To set a permission on an individual datastore, select the datastore and then the Permissions tab. Right-click anywhere on the permissions tab and select Add Permission. To set a permission on a folder, right-click the folder and select Add Permission.

89 Network related access
Server Admin should not have the following access Move network This can be a security concern Configure network Remove network Server Admin should have Assign network To assign a network to a VM You can now manage datastores like other inventory objects. That means you can configure and manage datastores from a centralized view, organize datastores into folders, and set permissions on a per folder or per datastore level. For example, it is now possible to block the creation of VMs or the creation of snapshots on a per datastore basis or through the use of folders. To set a permission on an individual datastore, select the datastore and then the Permissions tab. Right-click anywhere on the permissions tab and select Add Permission. To set a permission on a folder, right-click the folder and select Add Permission.

90 Roles and Groups Create new groups for vCenter Server users.
Avoid using MS AD built-in groups or other existing groups Do not use default user “Administrator” in any operation Each vCenter plug-in should have their own user, so you can differentiate among all the plug-in Disable the default user “Administrator” Use your own personal ID. The idea is security should be trace-able to an individual. Do not create another generic user (e.g. VMware Admin). This defeats the purpose, and is practically no different to “Administrator” Creating a generic user increase risk of sharing, since it has no personal data. Create 3 roles (not user) in MS AD Network Admin Storage Admin Security Admin Create a unique ID for each of the vSphere plug-in that you use SRM, Update Manager, Chargeback, CapacityIQ, vShield Zone, Converter, Nexus, etc E.g. SRM Admin, Chargeback Admin This is the ID that the product will use to login to vCenter. This is not the ID you use to login to this product. Use your personal ID for this purpose. This helps in troubleshooting. Otherwise too many “Administrator” and you are not sure who they _really_ are. Also, if the Administrator password has to change, then you don’t have to change everywhere.

91 VM Design

92 Standard VM sizing: Follow McDonald
1 VM = 1 App = 1 purpose. No bundling of services. Having multiple application or services in 1 OS tend to create more problem. Apps team knows this better. Start with Small size, especially for CPU & RAM. Use as few virtual CPUs (vCPUs) as possible. CPU impact on scheduler, hence performance Hard to take back once you give them. Also, the app might be configured to match the processor (you will not know unless you ask the application team). Maintaining a consistent memory view among multiple vCPUs consumes resources. There is licencing impact if you assign more CPU. vSphere 4.1 multi-core can help (always verify with ISV) Virtual CPUs not used still consumes timer interrupts and execute the idle loops of the guest OS In physical world, CPU tend to be oversized. Right size it in virtual world. RAM RAM starts with 1 GB, not 512 MB. Patch can be large (330 MB for XP SP3) and needs RAM Size impact vMotion, ballooning, etc, so you want to trim the fat Tier 1 Cluster should use Large Page. Anything above XL needs to be discussed case by case. Utilise Hot Add to start small (need DC edition) See speaker notes for more info Even if some vCPUs are not used, configuring VMs with them still imposes some small resource requirements on ESX: 􀂄 Some older guest operating systems execute idle loops on unused vCPUs, thereby consuming resources that might otherwise be available for other uses (other VMs, the VMkernel, the console, etc.). 􀂄 The guest scheduler might migrate a single-threaded workload amongst multiple vCPUs, thereby losing cache locality. The per-virtual-machine memory space overhead includes space reserved for the VM devices (e.g., the SVGA frame buffer and several internal data structures maintained by the VMware software stack). These amounts depend on the number of vCPUs, the configured memory for the guest operating system, and whether the guest operating system is 32-bit or 64-bit. To check if we don’t have enough RAM Look at the value of Memory Balloon (Average) in the vSphere Client Performance Chart. An absence of ballooning suggests that ESX is not under heavy memory pressure and thus memory overcommitment is not affecting performance. (Note that some ballooning is quite normal and not indicative of a problem.) b Check for guest swap activity within that VM. This can indicate that ballooning might be starting to impact performance (though swap activity can also be related to other issues entirely within the guest). c Look at the value of Memory Swap Used (Average) in the vSphere Client Performance Chart. Memory swapping at the host level would indicate more significant memory pressure. Large page In addition to the usual 4KB memory pages, ESX also makes 2MB memory pages available (commonly referred to as “large pages”). By default ESX assigns these 2MB machine memory pages to guest operating systems that request them, giving the guest operating system the full advantage of using large pages. The use of large pages results in reduced memory management overhead and can therefore increase hypervisor performance. If an operating system or application can benefit from large pages on a native system, that operating system or application can potentially achieve a similar performance improvement on a VM backed with 2MB machine memory pages. Consult the documentation for your operating system and application to determine how to configure them each to use large memory pages. More information about large page support can be found in the performance study entitled Large Page Performance (available at Large Page file support in Windows/Linux (requires application level support) This is not enabled by default. Within windows, enable PAE extensions and grant permission to the service account running the application to leverage the additional useable memory space. In Linux, you must pre allocate the large pages Item Small VM Medium VM Large Custom CPU 1 2 4 8 – 32 RAM 1 GB 2 GB 4 GB 8, 12, 16 GB, etc Disk 50 GB 100 GB 200 GB 300, 400, etc GB

93 SMP and UP HAL Does not apply to recent OS such as Windows Vista, Win7, Win2008 Design Principle Going from 1 vCPU to many is ok. Windows XP and Windows Server 2003 automatically upgrade to the ACPI Multiprocessor HAL Going from many to 1 is not ok. To change from 1 vCPU to 2 vCPU Must change the kernel to SMP. "In Windows 2000, you can change to any listed HAL type. However, if you select an incorrect HAL, the computer may not start correctly. Therefore, only compatible HALs are listed in Windows Server 2003 and Windows XP. If you run a multiprocessor HAL with only a single processor installed, the computer typically works as expected, and there is little or no affect on performance. Step to change: To change from many vCPU to 1. Step is simple. But MS recommends reinstall. “In this scenario, an easier solution is to create the image on the ACPI Uniprocessor computer. “ Microsoft does not support running a HAL other than the HAL that Windows Setup would typically install on the computer. For example, running a PIC HAL on an APIC computer is not supported. Although this configuration may appear to work, Microsoft does not test this configuration and you may have performance and interrupt issues. Microsoft also does not support swapping out the files that are used by the HAL to manually change HAL types. Microsoft recommends that you switch HALs for troubleshooting purposes only or to workaround a hardware problem. There are two types of hardware abstraction layers (HALs) and kernels: UP and SMP. UP historically stood for “uniprocessor,” but should now be read as “single-core.” SMP historically stood for “symmetric multi-processor,” but should now be read as multi-core.

94 MS Windows: Standardisation
Data Center edition is cheaper on >6 VM per box MS Licensing is complex. Table below may not apply in your case per VM. 10 VM means 10 licence per 4 VM. 10 VM means 3 licence per socket. 2 socket means 2 licence. Unlimited VM per box Source:

95 Guest OS Use 64-bit if possible Access to > 3 GB RAM.
Performance penalty is generally negligible, or even negative In Linux VM, Highmem could show significant overheads with 32 bit. 64 bit guests can offer better performance. Large memory footprint workloads will benefit more with 64 bit guests Some Microsoft & VMware products have dropped support for 32 bit Increase scalability in VM. Example: for Update Manager 4 If it is installed on 64 bit Windows, it can concurrently scan 4000 VM. But if it’s installed on 32 bit, the concurrency drops to 200 Powered‐on Windows VM scan per VUM server is 72. Most other numbers are not as drastic as the above example. Disable unnecessary device from Guest OS Choose the right SCSI controller Set the right IO Time out On Windows VM, increase the value of the SCSI TimeoutValue parameter to allow Windows to better tolerate delayed I/O resulting from path failover. For Windows VM, stagger anti-virus scan. Performance will degrade significantly if you scan all VM simultaneously Unload ESX host USB drivers if not required

96 Performance, Capacity, Configuration
Management Design Performance, Capacity, Configuration

97 vCenter Run vCenter Server as a VM vCenter Server VM best practices:
Disable DRS on all vCenter VMs. Move them to first ESXi on your farm. Always remember where you run your vCenter. Remember both the host name and IP address of that first ESXi host. Start in this order: Active Directory  DNS  vCenter DB  vCenter Set HA to high priority Limitations Windows patching of vCenter VM can’t be done via Update Manager Can’t cold clone the VM. Use hot clone instead. VM-level operation that requires the VM to be powered-off, can be done via ESX. Login directly to the ESXi host that has the vCenter VM. Do the changes, then boot the VM. Not connected to Production LAN. Connect to management LAN, so VLAN Trunking required as vSwitches are shared (assuming you are not having dedicated IT Cluster) Security Protect the special-purpose local vSphere administrator account from regular usage. Instead, rely on accounts tied to specific individuals for clearer accountability. Other configuration Keep the Statistic Level at Level 1. But use vCenter Operations to complement. Level 3 is a big jump in terms of data collected You can use the Microsoft Windows built-in system account or a user account to run vCenter Server. The Microsoft Windows built-in system account has more permissions and rights on the server than the vCenter Server system needs, which can contribute to security problems Level 2 vs Level 1: As an example, an average quad processor, ESX server will have 6 metrics collected at level 1 during a sample interval, while level 2 collects a total of about 20 (+/- a few based on the number of devices in the host). This level is used most often in environments that do capacity planning and charge back on VM’s. It allows you a pretty granular look at the information about the core four without grabbing level 3 counters (which is a big jump in the amount of metrics monitored.)

98 Naming convention Object Standard Examples Remarks Data center Purpose Production This is the virtual data center in vCenter. Normally, a physical data centers has 1 or many virtual data center. As you will only have a few of these, no need to create cryptic naming convention. Avoid renaming it. Cluster As above. ESXi host name Don’t include version no as it may change. No space. VM Project_Name Purpose ## Intranet WebServer 01 Don’t include OS name. Can include space Datastore Environment_type_## PROD_FC_01 TEST_iSCSI_01 DEV_NFS_01 Local_ESXname_01 Type is useful when we have multiple type. If you have 1 type, but multiple vendor, you can use vendor name (EMC, IBM, etc) instead. Prefix all Local so they are separated easily in the dialog boxes. “Admin ID” for ProductName-Purpose VCOps-Collector Chargeback- All the various plug-in to vSphere needs Admin access. Folder Avoid special characters as you (or other VMware and 3rd party products or plug-in) may need to access them programmatically. If you are using VC Ops to manage multiple vCenters, then the naming convention should ensure it’s unique across vCenters.

99 vCenter Server: HA Many vSphere features depend on vCenter
Distributed Switch Auto-Deploy HA (management) DRS and vMotion Storage vMotion Licensing Many add-on depends on vCenter vShield vCenter SRM VCM vCenter Operations vCenter Chargeback vCloud Director View + Composer Implement vCenter Heartbeat Automated recovery from hardware, OS, application, network Awareness of all vCenter Server components Only solution fully supported by VMware Can protect database (SQL Server) and vCenter plug-ins View Composer, Update Manager, Converter, Orchestrator SRM, vCD, View utilizes vCenter to manage VMs (power on, provision, etc.) Loss of data in Chargeback could mean loss of revenue for the IT organization CapacityIQ information and models may be skewed if vCenter is down frequently and/or for long periods of time Configuration Manager and Operations relies on vCenter Server for data vSphere Storage Appliance (VSA) requires vCenter to manage Some effect of vCenter Server downtime: Management requires direct connection to host Can't provision new VMs from templates No host profiles Historical records will have gaps during outages vMotion, DRS (incl. Storage) unavailable Distributed Switches - Unable to deploy and manage vCenter Plug-in’s (ex. VUM) unavailable HA failover works but can’t be reconfigured 99

100 vMA: Centralised Logging
Benefits Ability to search across ESX convenience Best practices One vMA per 100 hosts with vilogger Place vMA on management LAN Use static IP address, FQDN and DNS Limit use of resxtop (used for real time troubleshooting not monitoring) Enable remote system logging for targets vilogger (enable/disable/updatepolicy/list) Rotation default is 5 Maxfiles defaults to 5MB Collection period is 10 seconds ESX/ESXi log files go to /var/log/vmware/<hostname> vxpa logs are not sent to syslog See KB

101 Thank You

Download ppt "VCAP-DCD, TOGAF Certified"

Similar presentations

Ads by Google