Presentation is loading. Please wait.

Presentation is loading. Please wait.

Best Practices for Load Balancing Your GlobalSearch Installation

Similar presentations


Presentation on theme: "Best Practices for Load Balancing Your GlobalSearch Installation"— Presentation transcript:

1 Best Practices for Load Balancing Your GlobalSearch Installation
Dan Tascher Software Support Manager

2 Main components of GlobalSearch
Introduction Main components of GlobalSearch Application services IIS, Capture Workflow, Document Workflow, etc. Database Document meta data Document Storage Physical documents Out of the box, GlobalSearch is designed to be easy to deploy. It comes with an all-in-one installer that installs all components necessary to run the software on a single server. GlobalSearch was also built with the ability to scale out for enterprise level use and have the deployment span across multiple servers and utilize load balancing technology. This presentation will talk about going beyond the small business users who would only require one or two servers. There are three main components that we have to worry about when talking about load balancing GlobalSearch: the application services, the database engine, and the storage of physical documents.

3 Microsoft Internet Information Services (IIS)
Responsible for all communication between server and clients Main bottleneck for high user environments. Magic number: 100 users per web server (“Application Server”) Microsoft Internet Information Services (IIS) is the point of communication between the GlobalSearch server and the GlobalSearch clients. When a client connects to GlobalSearch, it sends a series of HTTP requests to the server and from there the server performs specific actions. If you think of GlobalSearch as a normal website, as the number of visitors to a website increases, the number of servers required to handle the requests needs to increase as well. In the instance of GlobalSearch, the web server component is the central hub to the application. Because of this it allows us to break out the different components on to separate servers to ensure a good user experience. The diagram shown here is a typical installation where all of the components are on one server. As you can see the GlobalSearch client machine is connecting to the server via Microsoft IIS and IIS in turn is telling the other components what to do. The “Magic Number” for GlobalSearch is 100 users. We have done testing and found that for each 100 users a new instance of IIS is required. Since only one instance of IIS can be run on each server, an additional “Application Server” will be required.

4 Storage of all data within GlobalSearch
Database Services Storage of all data within GlobalSearch Paths to documents Index Information Security Archive structure Magic number: 40 users Microsoft SQL Server is the information repository for GlobalSearch. All of the information about the documents, their meta-data, security, archive structure, etc. is stored in a few databases in SQL. Microsoft SQL Server is capable to handling thousands of connections so unless we are setting up an environment where there is SQL replication we will only need one SQL Server. In a typical GlobalSearch installation like the diagram I showed on the last slide all of these components reside on one server. However once we get up to around 40 users of GlobalSearch we are going to want to break the SQL component out on to it’s own server. This is because while SQL is capable of handling thousands of simultaneous connections, the load it puts on the server will cause performance issues in other areas. In this diagram you can see that we have 40 users connecting to a GlobalSearch Server which has been configured to connect to a dedicated SQL server. It is important to note that at this many users we require a non express version of SQL because the express version limits database size and server performance. An express version of sql with this many users would experience severely degraded performance.

5 200 Users The diagram you see here is one configuration you could have with 200 GlobalSearch Users. In this particular instance we have two web servers running GlobalSearch pointing to a central dedicated SQL Server users would be pointing to the first GlobalSearch Server and the other 100 users would be pointing to the second GlobalSearch Server. These servers in turn are configured to both look at the dedicated SQL server. This is a very basic example of a load balanced environment. This type of set up would only work in a situation where there is minimal capture workflow activity and minimal document workflow activity. If there were heavy capture requirements we would have to break the workflow components out on to additional “Capture Servers”. This example is a very manual way to load balance GlobalSearch. We are now getting outside the scope of GlobalSearch, but there are technologies out there that allow you to have all users pointing to the same web address on a load balancing server which will then distribute the requests accordingly to the GlobalSearch server farm.

6 1,000 Users Using Load Balancing Technology
This diagram very simplistically shows how a GlobalSearch server farm could be load balanced using a load balancing server between the clients and the servers. All 1,000 users would be pointing to the load balancing server as their endpoint and the load balancer would be configured to send 100 users to each server as they become available. All 10 GlobalSearch servers would be connected to another load balancing server that would load balance the database servers as necessary. As I said previously, this type of setup is now outside the scope of GlobalSearch, but a configuration like this is possible.

7 6,000 Users! Just to show you the possibilities of scaling with GlobalSearch, I have put together this diagram which shows a distributed load balanced environment for 6,000 users. As you can see here the 6,000 users are pointing to a cluster of 6 load balancers which are configured together to handle the load and pass it on to the 60 GlobalSearch servers. The GlobalSearch servers are configured to use the cluster of 2 load balancers pointing to 6 load balanced sql servers.

8 Capture Workflow – automated import of documents
Capture Services Capture Workflow – automated import of documents Up to three instances on each capture server 4 GB RAM per instance of capture workflow With OCR: 4 Cores per instance Without OCR: 1 Core per instance Capture Workflow services allow for the automated import and indexing of documents into GlobalSearch. Each capture server that we configure can run up to three instances of capture workflow services. This means if we have a primary Application Server and a Secondary Capture Workflow Server we can have up to six capture workflow services running simultaneously. The diagram here shows what a capture server running OCR would look like. We have one server that has three instances of capture workflow. Since each instance required 4 GB RAM and 4 CPU Cores, we would need at a minimum 12 GB RAM and 12 CPU Cores, not including the amount required for the base operating system. If our capture workflow services did not require any OCR, we would only need one CPU core per instance. In that scenario we would still need 12 GB RAM, but only 3 CPU Cores, not including the base OS requirement.

9 Load Balanced Environment Example
200 user environment Heavy capture requirements Load Balancer Tool Here is an example of a load balanced environment. As you can see, we have two hundred client users. Each group of 100 has it’s own application server that is connecting to the dedicated SQL server. Additionally we have three capture servers with three instances of capture workflow running on each. This gives us a total of 9 simultaneous capture workflows that can be processing at the same time. In a situation like this, each capture workflow server can have its own set of capture workflows or they can all be processing the same set of capture workflows by using our workflow load balancing utility.

10 Load Balancing Utility
The load balancing utility for capture workflows allows us to have the same capture workflow running on multiple capture instances across multiple capture servers. The load balancer works by monitoring one specific hot folder and then distributing the files out evenly across a set of defined folders. In the example I have here we have 10 users all scanning the same document types to a single hot folder. The load balancer utility is monitoring that hot folder and distributes the files to our three capture workflow servers with three instances of capture workflow each. In essence this allows us to run the same capture workflow simultaneously nine times. In a heavy capture environment this will decrease the processing time by 900%. This means if you have 9 batches that took 1 minute each to process in a single workflow environment it will take 9 minutes to process all of the batches. In a distributed load balanced environment like the one shown here, all 9 batches will finish processing at the same time, meaning all will be done in one minute. The load balancing utility can also be used for more than capture workflow, it can be used to drive the processing of documents. For example, if there is a team of people that are indexing documents from inboxes – in a typical scenario all documents will flow into an inbox where the users would then grab a document and process in. Using the load balancer we can feed documents to a particular user based on how many documents they have in their inbox. This means that instead of the processing team waiting for documents to come through a capture workflow and then grabbing one when it is available, we can feed documents to users. This changes the whole dynamic: if each user is expected to have 5 documents in their inbox at they are processing, as each user removes a document from their inbox a new document is pushed in so there is a steady stream of documents to process.

11 More Advanced Load Balanced Environments
Multiple office locations So far the load balancing has been primarily focused on one office location. When you bring multiple offices into the equation there are other things we need to take into consideration. If all locations have ultra high speed internet connections and we are not dealing with large files, we can simple point all of the client machines to a central server located in one of the offices or in a data center. If we want to have a set of local data at each office so we don’t have to worry about internet connection speeds we can set up SQL Server replication and file system replication for each location. This means that any changes made in any one location will be copied locally to all other locations. This way when a client in Brazil can talk to a local server in Brazil instead of pulling all of its data from servers in Australia.

12 Single Instance Server Up to 40 Users (low capture requirements)
Server Requirements Single Instance Server Up to 40 Users (low capture requirements) 8GB RAM, 4 Core CPU Single Instance with Dedicated SQL Server Up to 100 Users GlobalSearch Server: 8GB RAM, 4 Core CPU Dedicated SQL Server (non express): 8GB RAM, 4 Core CPU Additional Application Servers Additional Capture Servers per Instance 4GB RAM, 1 or 4 Core CPU In a single server environment with up to 40 users and low capture workflow requirements the recommended configuration is 8 GB RAM and 4 Core CPU. Once you hit the 40 user mark the SQL server needs to be broken out into a dedicated sql server with 8GB RAM and 4 Cores CPU. The 8GB Ram and 4 Cores on an application server will support up to 100 users per machines. For each capture workflow instance on additional capture servers 4GB RAM is required with 1 core CPU for non-OCR workflows and 4 Cores CPU for workflows that contain OCR.

13 Capture Workflow Considerations
How many pages are being processed? What is capture workflow going to be doing? What are the customer’s expectations on processing time? When setting up a capture workflow load balanced environment, it is very important for us to understand some things before we can decide what the best approach to take would be. We can’t decide how many capture workflow instances we are going to need if we don’t understand the customer’s workflow process. How many pages are they processing a day? What actions will capture workflow be doing to those documents? A customer processing 100,000 pages a month with OCR and PDF conversion is going to have very different requirements than a customer processing 100,000 pages a month with only page count separation.

14 Benchmarks 4.1.1.0 Dell M2800, i5-4310M CPU @ 2.7GHz 8 GB RAM
Windows 8.1 Pro 64 bit To help understand how many capture services will be needed we created some benchmarks for the capture workflow service. The machine that we used was a Dell Laptop, M2800 running GlobalSearch with an intel i5 cpu clocked at 2.7GHz. It also had 8 GB RAM and was running Windows 8.1 Pro 64 bit.

15 Benchmarks We used 10 different files for the capture workflow benchmarks as you can see here. Each file name corresponds to the type of document. For example 1pg2bBW.pdf means it was a 1 page document in 2-bit black and white and a pdf. 50pg24bColor.docx is a a 50 page word document in 24-bit color.

16 Benchmarks The results of the capture workflow tests are shown here for each batch. With this chart you can see exactly how long it took for each action of capture workflow to perform on each of the sample documents and then calculate how long the capture workflows are going to take. So for example, what this benchmark shows is that a 50 page 2-bit black and white document took 54 seconds to run through text pdf creator. A 50 page 24-bit pdf took 1 minute and 17 seconds while a tiff file of the same size took 1 minute and 9 seconds. This means if we are scanning 1,000 pages a day in batches of 50 pages, it would take one capture workflow process 18 minutes to process all pages if they were 2-bit black and white pdfs. If they were 24-bit color pdfs, however, it would take a little over 25 and a half minutes. If we add barcode separation into the capture workflows, that would add 31 minutes and 22 minutes to each workflow respectively for a total of 49 minutes and 47 minutes respectively. We can round up and say 1,000 pages will take one hour to process with barcode separation and text pdf creation. This is probably fine for a small or medium sized business. When you get to a customer who is processing 1,000 pages an hour that means in the span of an 8 hour work day their capture workflow will take 8 hours to process all of those documents which may or may not be acceptable depending how quickly the customer needs access to those documents. If we add a second capture workflow process that 8 hours turns into 4 hours and with a third it turns into 2.6 hours. Keep in mind that these numbers will vary from server to server depending on hard specifications.

17 Professional Edition/SMB Additional Capture Servers
Licensing Corporate Edition 3 capture workflow instances 1 core server with 3 instances of capture workflow 2 servers: 1 core server and up to 3 instances on a second server 1+N servers: 1 core server and capture workflow distributed across multiple servers Professional Edition/SMB 1 capture workflow instance Additional Capture Servers Blocks of 3 capture workflow instances Load Balancing Utility So how about licensing all of this? We’ve made it easy to license exactly what you need to get the right environment. All corporate edition licenses are able to run up to 3 instances of capture workflow without any added license charges.  . As I have mentioned earlier in the presentation each capture server can run a maximum of 3 capture workflow instances per server. This can be deployed to any server configuration.  So for example, you could have one server that contained all of the GlobalSearch core application components and 3 instances of capture workflow installed on that same server. Alternatively this can be deployed on to two or more servers. There would be one core GlobalSearch server and additional servers with either three capture workflow instances on one server or 1 capture workflow instances each spread across multiple servers with a total of up to 3 instances. It is important to note that the Corporate Edition of GlobalSearch only comes with 1 license of OCR/PDF Creator. If you wanted to have OCR or PDF Creation done on the additional instances a license must be purchased for each instance. The Professional Edition of GlobalSearch is licensed for only one instance of capture workflow. This means you could have one core server with everything contained on a single server or two servers, one for the core GlobalSearch services and one for the capture workflow instance. For both editions of GlobalSearch, additional capture servers can be purchased in blocks of 3 capture workflow instances which can be deployed in any server configuration with a maximum of 3 instances per server. The load balancing utility that I talked about earlier is an add-on for all editions of GlobalSearch and must be purchased separately. (Does this require multiple purchases for multiple utilities?)


Download ppt "Best Practices for Load Balancing Your GlobalSearch Installation"

Similar presentations


Ads by Google