4 Microsoft IT Data Single Instance SAP (1.5Tb Db) 120,000 server accounts300,000+ PCs and devicesAppsSingle Instance SAP1,371LOB appsDevices300K PCs10K data centre servers10K network devicesIncident Mgmt90K help desk calls/month7K infrastructure Service Requests/month6K changes/monthDublinRedmondSilicon ValleyMonthly Remote Access45K RAS49K OWA18K RPC over httpThe Function of ITGThe IT organization at Microsoft is unique in terms of the broad variety of responsibilities. The first role, like any other large organization, is to run the IT service. This is developing and maintaining information technology systems and solutions that help Microsoft employees do their jobs efficiently and effectively. Another aspect is providing a robust application and infrastructure architecture as the foundation for building line of business applications. We provide services ranging from end user support and telecommunications management to server and network operations. This includes managing the connectivity for over 150,000 PCs worldwide. We ensure that the nearly 50,000 employees and 5,000 contractors and 17,000 vendors in over 400* MS locations around the world are able to access the corporate network 24 hours a day, 7 days a week.Operating EnvironmentThe environment is:Dynamic. The rate of change is very high. Microsoft continues to grow, and within the past year Microsoft has also seen an increase in mergers and acquisitions activity.Large. It includes four enterprise data centers, but 450 locations worldwide. IT-managed infrastructure exists at over 200 of those sites.A real world environment. Users expect production quality services, even when deploying beta software.Singapore92,000 end users89 countriesMSIT supports over 400 sites globally; 25% Internet connected only3M+ messages per day internally,10M externally; 8M filtered out99.99% availability7,000,000 remote connections/month
5 Before (Live Communication Server 2005)... So now let’s take a quick look at the before picture of the architecture when we were running on Live Communication Server (LCS)
8 After (Office Communication Server 2K7)... The highly centralized architecture we used with LCS 2K5 served us well when our focus was primarily IM and presence however things changed with OCS 2K7...
9 After: Now, with the broadened focus on new technologies with OCS 2K7 we’ve found that moving to a regional deployment makes a lot more sense and provides for a much better user experience, mostly because of voice and video. We wanted to ensure the best user experience possible so we wanted to get the infrastructure closer to the users. Interestingly this also gave us a better disaster recovery story. In this diagram you’ll notice our 3 main data centers for Microsoft which are the 2 larger circles on the either side of the upper quadrant (Dublin and Singapore) as well as the middle circle which is our largest data center in Redmond where the corporate head quarters are. In each of these larger data centers you’ll find a PABX, the OCS 2K7 server pool along with our VOIP gateways which interface with the Mediation servers. Finally you’ll notice we also have our Exchange UM servers deployed in these regional data centers. The two smaller circles in the bottom right and left corners of this diagram represent what we would call tail sites or branch offices. These are sites that are not major data centers yet still have UC Telephony deployed to them. Sydney fits this description. In these smaller branch offices all that is needed to provide UC Telephony is a VOIP gateway. For other sites in Australia where UC Telephony hasn’t been specifically deployed we are still able to provide key users with UC Telephony by simply assigning them with a number from the Sydney dial plan and then having the users forward their current local number to this new Sydney extension to avoid having local callers have to dial long distance to a Sydney number. We’ve deployed to numerous users in this configuration and it works just great!Branch office call flow: From the smaller branch offices the call flow goes like this – The user initiates the call from their communicator client or VOIP enabled phone (Tanjay), the call is passed to the VOIP gateway located locally at the branch office which translates the call into a TDM voice call (traditional voice) and then hands it off to the PABX via a PRI line (Primary Rate Interface). The call is sent out over the PSTN network to Singapore where the combination of the VOIP gateway and Mediation server transform the call back into a VOIP call, hands it off to the OCS server and it’s sent out over the network as an IP call. In the future the goal is retire the legacy PABX’s which will help us realize some substantial cost savings.
10 Regional Pool Topology This diagram depicts more specifically exactly how we’ve deployed OCS and UC Telephony in our Singapore regional data center. On the left you’ll see the NET.com Shout/VX IP gateways that are connected to all of the PBX’s at the various sites via a PRI (primary rate interface) line. All of that infrastructure is located on premise at the branch offices. Just to the right of the gateways are the Mediation servers. Note that there’s a 1:1 mapping of gateways to mediation servers. On the right hand side you’ll see we have a network load balancer that front ends the OCS pool (we currently have 4 OCS servers in Singapore). We also have a clustered SQL server that is used to house the contacts information for our users. The OCS archiving server is primarily used to house metric data. We do not archive conversations however you can do that if you wish.
11 MS IT OCS TopologyLCS Corp deployment was a single EE Pool for all global users.Chose Regional model for OCS (Redmond, Dublin, Singapore)Improved performance for regional usersEspecially for Audio and VideoWeb Components and ConferencingRemote AccessProvisioning Still AutomatedAlso lays foundation for global business continuance and disaster recovery strategySo, to summarize with LCS 2005 we deployed with a single Enterprise Edition Pool for all global users. Not other LCS infrastructure was deployed anywhere else in the world. We chose to move to a regional model with OCS as we knew we were moving to a broader set of functionality such as voice and video as well as web conferencing and Remote Access. Despite moving to a regional model for our deployment provisioning of the accounts is still automated and centralized with our Identity Access Management team that owns the AD in Redmond. One very important thing to add here is that moving to a regional model also lays the foundation for a much better BC/DR strategy.
13 MS IT Deployment Overview Regional DeploymentsExchange UM for voicEE, CE and SE TopologiesParallel OCS and LCS deploymentNot Just the Corp deploymentConverting our MMS LCS Customers to OCSNew MMS customers will be hosted on OCSAs for the actual deployment to our users IT made the following decisions: We chose to go with Regional deployments v. centralized as I’ve already covered. We made the decision to deploy all regional UC users on a dedicated Unified Messaging deployment for our voic solution. This meant users needed to use a different access number. We currently have 3 different topologies deployed: Enterprise Edition in Redmond and Consolidated Edition in both Dublin and Singapore as well as the legacy Standard Edition in Redmond that we use for testing purposes to ensure things like security patches etc. function properly. We chose to roll this out to users with a phased approach which meant we had both LCS and OCS running in parallel for a period of time. This decision was made mostly to make sure we didn’t overwhelm our Helpdesk teams. In all cases, as users were migrated to UC Telephony they were also migrated to OCS 2K7 unless they had already been migrated to that platform. One other consideration our teams had to take into account is the fact that we also have an offering called MMS (Microsoft Managed Solutions) in which we host , Sharepoint and IM services for external customers so we also had make sure we had plans in place for those users.
14 UC Planning Considerations Regional site PBX requirementsLocal dial plan interrogationGateway requirementsUser CommunicationExchange UM IntegrationUC Routing (Location Profiles)Network traffic planningMediation Server placementWe’ve standardized on the Nortel PBX platform outside of Redmond. Because of the standardization we knew exactly what we needed to implement UC Telephony out in the regions which was very helpful.Very important to understand how users dial today so you can duplicate that behavior as closely as possible with your new UC dial plans.You need to pay special attention to any unique dialing processes at regional sites for things such as jumping on alternate carriers etc. You really need to have a good understanding of this.Gateway Requirements: Microsoft has standardized on 2 different gateway products across all of our deployments:1: In Redmond and other sites in the world we deployed Audio Codes Median 2000’s2: An in many of the regional deployments we chose to deploy a product from Net.Com called a Shout gateway. We used 2 different models: VX 900 or 1200.With each of these we can accommodate as much as 8-10 users per trunk.User Comm’s: A solid communication plan is important for any rollout. But given the change in how users will now dial their customers, partners, friends and family (Especially in the true soft phone scenario with the Communicator client) it was important to give them as much guidance as possible on the most effective ways to dial their contacts.There were also some user scenarios that we don’t currently support at Microsoft that we needed to weed out. Examples would be ACD users and multiple line appearance users such as the boss/admin configuration we see a lot within Microsoft. We’ve been able to handle this scenario with the simul-ring functionality in UC however.UM integration: We chose a separate deployment for UC users which required a new TUI access number.UC Routing: We had to some special planning to accommodate the short dialing scenario (by ext.)Network traffic: It’s very important to try to understand how many concurrent calls to expect at each site. We handled this by reviewing historical PBX data where possible but sometimes it was a guess frankly.Mediation Server placement: Lastly we had to think about where to place the mediation servers that are needed for converting traditional voice calls to IP. I’ll talk more in depth on the mediation server placement in a few minutes.
15 Site Selection Considerations Deploy to countries where regulatory and homologation hurdles are cleared for gateway and VoIP deployments.Site has adequate bandwidth for added UC users (Peer to Peer), and Client to Mediation Server.Device availability (Catalina or Tanjay)PBX has spare QSIG T1/E1 ports for gateway connectivityUsersBasic phone users with PSTN phone numberReg/Homologation: It can be very difficult in countries like India for example to deploy things like UM and UC VOIP telephony solutions because of regulatory concerns with their governments. Things like Toll Bypass. Also had issues where we couldn’t purchase or install Gateways thru our vendors. Homologation: Need to make sure devices are approved for deployment in a specific country. We have to make sure things like frequencies used by devices don’t walk all over each other. Power is another concern here. There are others.B/W: Once again you have to try to predict traffic patterns and concurrent users. We used part historical data and sometimes it was a SWAG. We made sure that most of the sites were currently deployed to have sufficient bandwidth.Were the proper devices available?? We began with the Catalina devices as the default for each office, then we added the Tanjay’s. There are also wireless and bluetooth devices available. We allowed the users to self select and fund these other devices as they wished.Users: With users we had challenges with things like making sure all users had their phone numbers entered in the Global Address list which is fed by the AD in a standard and proper format. If this doesn’t happen some of the functionality is impacted such as the ability to dial users from the Communicator client by calling them up as a contact.
16 UC Telephony Scenarios TanjayCatalina/SoftphoneBest for deskbound workersHIGH phone usageBest for deskbound remote or mobile workersLOW phone usageHeadset compatibleDevice controls callsCommunicator controls callsTelephone independent of PCMust be logged in to use your telephone or forwarded to another number, or using a simultaneous ringHere is a comparison of our two main UC devices we deployed. The Catalina device is the device that will work for anyone, whether you’re a mobile worker, a remote worker or you’re primarily at your desk. We see the Tanjay device being more targeted to high phone users, so when that is available, we’ll be recommending it for that kind of user.On the Catalina/Softphone device, remember that Communicator is controlling your calls. This means you must be logged in on your PC, or forwarded to another number in order to receive your calls. You can also select to have your work calls ring on another number using the “simultaneously ring” feature.Wirless headsets: These are growing in popularity. IT is not choosing to deploy these and provide free to the users. Can purchase and use if they so choose. Personally I believe most remote users or users who travel a lot will opt for a “package” if you will of the Tanjay or Catalina for their desk phone and a wireless headset for when they are traveling. This is how I’ve chosen to have mine set up.3/25/2017
17 Gateway + Mediation Servers SIP GatewayTranslate SIP/RTP to/from circuit switched telephony protocolsMediation ServerInterfaces with a SIP gatewayIntermediates SIP signaling interactionsTLS/SRTP SIP/RTPTranscoding of codecRT Audio G.711Media flows between OCS 2007 network and the SIP gatewayDeploy Gateway and Mediation Server at a 1:1 ratioInstall Reskit / Admin toolsUseful to have Netmon installed on server for troubleshootingGateway: Translates SIP/RTP to and from traditional curcuit switched telephony protocols. Session Initiation Protocol (SIP), which is similar to the HyperText Transfer Protocol (HTTP), is a text-based application-layer signaling and call control protocol. SIP is used to create, modify, and terminate SIP sessions. RTP is a real-time transport protocol. SIP makes use of RTP for transferring digitized audio and video data between the various parties participating in a call.TLS – Transport Layer Security. An industry standard designed to protect the privacy of information communicated over the Internet.The mediation servers are located in the 3 regional DC’s and sit between the VOIP gateway and the OCS server Pool. It’s purpose is to convert calls from TLS to SIP and transcode RT Audio into G.711 for the highest voice quality.We strongly recommend that you deploy VOIP Gateway’s and Mediation servers in a 1:1 ratio. This helps to ensure all mappings are correct and makes troubleshooting much easier. The bottom line is to keep the environment as simple as possible.Netmon: Make sure to install Netmon 3 on the mediation servers. This is especially important for our implementation as we have IPSec installed and Netmon 3 allows you to peak inside the IPSec.
18 Mediation Server Deployment Datacenter vs Branch Office Data Center a good choice when…High bandwidth w/ QoS between DC and Branch OfficeLow Latency between DC and Branch officeNo server hardware support at Branch officeBranch Office good choice when…120 Kbps per call network bandwidth not availableHigh number of users on systemWe’ve had a great deal of debate among the groups deploying UC Telephony about where to locate the Mediation servers to provide the best voice quality experience in our branch offices. We chose to deploy them in our 3 main data centers however we’ve learned a great deal over the past few months. While the data center is a good choice when Branch Offices have big pipes to the data center with QOS deployed the latency can still become problematic. Obviously whether the branch office has onsite server hardware support is an important consideration. In general, if we were to do things over again I feel we would very likely decide to deploy the Mediation servers at the branch offices to provide the best user experience. Incidentally there are now VOIP gateway products that are coming to market that handle mediation server tasks so the separate mediation server may become a thing of the past.
19 RouteHelper.exe (ResKit) Helpful UC ToolsOCSUMUtil.exeCreates UC/UM IntegrationReads ExUM InfoDial plan namePhone NumberCreates an AD ContactUC enables contactSets phone numberSets ExUM integration attributes in contactRouteHelper.exe (ResKit)UC Routing ToolProvides Telecom Manager friendly interface for configuring UC routingBridges many of the UC routing objects into one easy interface.Reduces routing complexity by abstracting telecom manger from RegEx code.Provides test interface for all sites in the forest.Manage gatewaysHere are a couple of very useful tools. The OCSUMUtil.exe ships with the base product while the RouteHelper.exe is included in the OCS Resource kit. The test interface is particularly helpful.
20 Per user calculation Total Media Type Bandwidth Needed Audio 45 Kbps Video250 KbpsData~45 KbpsSignaling10 KbpsTotal350 Kbps per directionType of usage is important when planningConsider the whole path end to endI get asked a lot, “Jonathan just how much additional bandwidth do we need to plan for here?” I can’t answer that question for each and every case for each and every customer however I can provide you with the average figures we came up with internally here at Microsoft. <Read info from chart>
21 Other Network Considerations DelayEngineer to less than a mean of 150 msLossup to 10% can be handled without significant problemsConnectivityThe clients can connect through pretty well all common networks
22 Troubleshooting Serverside Post Install Server Validation WizardOCS MOM Packs for Operations Manager 2005 and 2007OCS Logging ToolReplacement for Flatfile loggingBest Practice AnalyzerPerfmon for trending and quick health checksMake sure you run the Post Install Server Validation Wizard which will help you double check all of our configurations.Microsoft users MOM or what is now called Operations Manager under the Systems Management Center umbrella to monitor the health of our environment and get advanced notification of troubling trends.The OCS logging tool has simplified the trouble shooting process immensely. In LCS 2K5 we had flatfile logging which was very effective for troubleshooting however it produced a tremendous amount of data to sift through. The new logging tool has simplified this process.Best Practive Analyzer: This is a script based approach designed by our internal COE teams that checks your environment from every angle and provides best practices for how you can ensure your entire platform is set up optimally.Perfmon is essential for trending and quick health checks.
23 Troubleshooting Clientside Install Client with Logging enabled where possible, especially during pilotVery useful to understanding client issuesSome Privacy and Compliance concernsWe’ve used client side logging extensively during our pilot and early roll out phases to effectively identify and resolve a number of issues.
25 Lessons LearnedImportant to drive synergies between all teams (Networking, Telephony, Messaging and UC) early.Lack of telephone number standardization caused delays in enabling users.Wireless can be problematic.Live Meeting without wired power can cause issues.Using voice from other enterprise locations causes RTP to go over TCP, firewalls typically only allow 443 or 80 which causes audio to go over 443 via TCP.Legacy network hubs / switches can cause poor audio.Slower laptop CPU’s can be problematic with UC audio and RoundTable, especially when Recording.
26 Gateway Lessons Learned Choose good partnersProvide site deployment plans early and often to Gateway Vendor so that delays in homologation don’t hinder deployments.Bring up T1/E1 2 weeks prior to deploymentStandardize on tie line interfaces between the PBX and UC (i.e. QSIG)
27 Deployment Challenges Dial Plans are located in UC, PBX, Gateways and Exchange UMNot all Telco carriers are created equalEach country is different for T1/E1 configurationVariable length phone numbersOutbound Caller ID variantsInconsistent, inbound Caller ID can impersonate internal users if it matches the length and range of an internal extension.Users that require advanced PBX features are impacted.Dial plans – Located in multiple areas and need to match for this to work successfully. This can be a challenge to manage.
28 Best Practices Infrastructure End User Standardize gateway hardware Deploy in phasesEnsure a crisp and well thought out enablement processEnd UserCreate strong communications packageStandardize user devicesIntroduce users to the softphone conceptEnd user training/preparationHelpdesk preparationLastly I wanted to leave you with some final thoughts around Best Practices:Standardize on gateway hardware – Microsoft used 3 different gateway products which were Intel’s TIMG and PIMG models, Audiocodes Mediant 1K and 2K models and also a product from NET.com called a SHOUT gateway using models VX1200 and VX950. By utilizing a standard set of gateways you can drastically simplify maintenance, admin and support.Deploy in phases – We began our deployment with only hardcore dogfood users from the product group and IT. Once we got a lot of learnings under our belt we broadened the deployment to more users in Redmond and once again captured a lot of lessons learned. We then broadened this out to 14 regional sites of which Sydney is one. In these sites we began with a small user base of local IT personell. This allowed us to run through a stringent test plan to make sure all key dialing scenarios passed our test. Once we were confident it all worked as planned we began to roll it out to selected groups who were savvy on the technology and were good dogfooders. We knew we’d get great feedback from these users. By phasing our roll out in this manner we could be confident the user experience would be good plus we also wouldn’t inundate our HD.Enablement process: This requires a lot of coordination between multiple groups. <Describe our process>End user: Communication is key when you are rolling out something that will change the way end users work. We standardized on 2 primary devices (Tanjay and Catalina) to keep support simplified and end user training to a manageable size. We spent a great deal of time introducing users to the softphone concept which meant teaching them tips and tricks for dialing from OC client. Once users caught on they loved it and would not go back to traditional dialing. Through our EPE program we did a massive wave of end user training and prep. Helpdesk prep is always key.September 07