Diagnosing, Troubleshooting, and Improving Ignition Gateway Performance

Diagnosing, Troubleshooting, and Improving Ignition Gateway Performance
Dave Fogle Director of Support/ Inductive Automation James Hunt Support Manager / Inductive Automation

Overview Most Common Problem Areas Java Process & JVM Tuning
High Tag counts Database Connections & Throughput Project Size, Project Count, Client Connections Infrastructure Issues Java Process & JVM Tuning Native and Heap Memory Garbage Collection

Tags & Performance As tag numbers increase so will memory usage
Certain aspects of every tag are stored in active memory (heap) UDTs vs non-UDT Tags Increased memory usage vs faster load times With large systems fast load times are important Startup Times Large tag counts will lead to slower startup times, however UDTs will help lessen the startup time As with anything, large numbers of tags on a gateway can start to hamper performance in a variety of ways. Usually the first thing that you’ll notice will be the need to increase the amount of memory allocated to the Java heap. Certain aspects of every tag are stored in active memory, which is provided by the Java heap. The more tags you have the more memory these tag definitions begin to take up. It’s also worth noting that UDTs will use more memory than non-udt tags, however there is a positive trade off in tag loading performance with udts over non udt tags. It all comes down to finding a balance that works best for your current situation. Switching all of your tags away from UDTs may initially seem like a good idea because you think you’ll be reducing memory consumption which you may feel is at the root of your performance issues. While you’re memory usage may decrease by doing this you’ve also now just drastically increased the number of entries in the internal database that represent your tag structure. In doing this you may now run into issues with longer system startup times due to the increased amount of time it takes to load all your tags from the internal database. If you find yourself slowly accumulating large numbers of tags on one gateway eventually you’ll want to start considering breaking off some of those tags onto other gateways and start using remote tag provider connections to make those tags available. The main idea here is to be aware that every new tag that you add will require some amount of memory and as you start to approach tag counts into the low hundreds of thousands you have to start thinking about memory usage differently than when you had a system of only twenty five thousand tags.

Tags & Performance Scan Classes
Large quantities of tags in scan class not inherently bad, but… Types of tags grouped in a scan class Being aware of exactly how many tags you have grouped together Mostly applicable to UDTs – What’s actually in them? Another aspect of a high number of tags is that high tag counts usually lead to large scanclass groups or a large number of scanclasses. Most of the time merely having a large number of tags in a scanclass isn’t going to cause you a large number of issues as long as you’re not overloading your devices. Where we’ll usually see users having problems when the number of tags on their systems has slowly increased over time or when users use UDTs to quickly create a large number of UDTs instances and don’t pay close enough attention to the types of tags that they are grouping together in their scanclasses, specifically when it comes to expression tags and sql query tags. When it comes to OPC tags scanclasses are really only there to dictate the rate at which Ignition subscribes to the tags.

Tags & Performance Why Tag Types are important SQL Query Tags
Expression Tags Total execution tags Expression and query tags however depend on the execution of the attached expression and query which takes a certain amount of time. Once you start to accumulate a lot of these types of tags their combined execution time begins to add up and you potentially can get to a point where the amount of time for all the tags to execute is longer than the rate at which you’ve set your scanclass to run at. In these situations you’ll start to see that tags aren’t updating at the rate that you expect them to. Separating your query and expression tags to their own scanclass away from opc tags you may be storing history on and making sure not to group more together in a scanclass than can be executed faster than the rate of the scanclass is necessary.

Databases & Performance
When to look at your database Sluggish charts that used to be fast Query timeout errors in the client Active connections are nearly always maxed Throughput no longer sufficient and store & forward backing up Databases are another area where users begin to have issues after a system has been running for a while. Much of the time due to the ease of which Ignition allows you to create database connections, store tag data, query tag data, and display data on windows of a project users focus solely on the functionality of a project and pay less attention to how the data is stored, how much data is being stored, and how often queries are being served to the underlying database. While this sort of naivety doesn’t lead to many immediate problems, it can start a snowball effect that will lead to problems in the future that are not only somewhat difficult to identify but also quite difficult to remedy after the fact. Where will you initially start seeing these problems manifest themselves?: Charts that were once quite responsive will start taking a long time to load. You may start seeing query timeout errors throughout the project leading to charts and tables that fail to load successfully and potentially causing windows to seemingly take a long time to load. When you look at the database connection status in the gateway you may see more and more active connections being consumed by longer and longer running queries. You may also see that your throughput to the database has decreased or the number of active connections is no longer sufficient to handle the number of queries you’re trying to push over the connection.

3 things to consider when database is “slow” Maximum number of active connections Increase in active clients over the years Project evolution led to more data reporting Default is 8. Increase the pool to handle your increased demand When it comes to database connections causing slow downs there are three things to consider. 1) You may need to increase your maximum number of active connections in the database connection settings. Increased queries from clients as well as more active clients on the plant floor both cause an increased query load on the connection. You can check the status of your active connections from the database status page in the gateway. If you see the connections regularly maxed out or close to being maxed then you should up the number of active connections available to the pool. By default the max active connections setting is set at 8, and this is an “advanced” setting of the datasource connection in the gateway. You can increase it until you find a value that allows your queries to all be handled without timing out waiting for an available active connection to the database.

3 things to consider when database is “slow” Is your database remote? Limited by network bandwidth and latency If actually maxed on throughput – Remote History Provider 2) If you're connecting to a remote database you also have to keep in mind that you’re going to be limited by the latency of your network connection. It’s becoming more and more popular to keep your data in the “cloud” at a remote location. While convenient and not ill advised, it does add to the list of things to consider when dealing with throughput issues, due to the inherent bandwidth restrictions of whatever network you’re on. You may find yourself in a position where you can’t physically push data through your connection any faster and in that case you’ll likely need to either rethink your architecture or adjust your expectations. Consider using remote history providers if using the tag historian to reduce the amount of transmitted data if a possibility. Leave the actual query be sent to the database to a gateway that is either on the same machine as the database or is in much closer proximity to the database.

3 things to consider when database is “slow” Database needs to be optimized Memory Table structure – Indexes, etc More data == longer query times Keeping scalability in mind from the beginning 3) Many of the times users call in with slow queries and database issues it has less to do with Ignition and more to do with the database itself. Over time the database grows, and the amount of data that needs to be sifted through when querying gets larger and larger. This is a fairly straightforward point, but it’s also something that many people forget to plan for when initially setting up their gateways and projects. Database health, maintenance, performance, backup, and hardware requirements for scalability tend to be afterthoughts for new users of Ignition largely due to the fact that the software makes it easy to connect and store data without needing to have much database knowledge to begin with. If you don’t have a DBA whose job it is to oversee the health of your database then those responsibilities. Learning what all the preferences and settings for your database of choice do as well as when and how they should be manipulated is very important. Allocated memory, table growth, hard disk space, user permissions, and many more are all things that can affect your performance and may need to be adjusted over time depending on your needs. Even if you do have a DBA it will likely still be your responsibility to describe your needs accurately, and those needs will include things potentially average amount of data stored a day/month. Average queries per second, or at least a general idea of load you’re going to be putting on the database. Having a solid understanding of what you need now and how your system potentially may grow in the future will help your DBA plan accordingly.

Projects & Clients Natural growth of projects and scope over time
Three areas of growth happen simultaneously Size of Projects (# of windows, number of users, etc.) Number of projects on the gateway Number of active clients across all projects Not about knowing a magic number Growth over the above areas has a cumulative affect on performance Impossible to say “100 clients needs X specific hardware” CAN say “100+ clients means I should evaluate load on my gateway, think more powerful hardware” Another area users run into unexpected performance issues has to do with a growing number of projects, project size, or client connections made to the gateway. As is a common theme, these issues will usually creep up seemingly out of nowhere after things have been running without issue for quite a while. Usually all three of these things go hand in hand. The more complex the projects being run when it comes to data being displayed and queries being run the sooner you’re going to start to see issues arise as your client count increases. Usually we’ll say that you can expect to run anywhere between 50 and several hundred clients on a single gateway, but when you want to start running over 100 clients on a gateway you’ll have to start considering more powerful server hardware as well as making some significant project design choices.

Projects & Clients Too many clients? Slow client response
Slow updates to subscribed tags Slow responses to gateway requests Synchronous actions to the gateway block for a long time Symptoms you’ll start to see when you’re reaching a situation where too many clients are starting to stress your gateway include: Clients start responding slowly and or subscribed tags start updating slowly. By default the client update rate is set to 250ms, so if you have 400 clients running you’re sending 1600 requests to the gateway every second. The gateway has to process and respond to these requests as well as perform all the other gateway side actions that normally take place as well. When you start getting too many requests the gateway starts to have a hard time keeping up and you’ll start to see delays in updates to the clients. For example, tag writes may appear to tag longer than you would expect. This is likely not due to the write actually taking longer (although that can also occur) but rather it taking a long time for the gateway to report back that the tag that you’re looking at on your screen has indeed changed value. Also any sort of synchronous action to the gateway like scripting functions on buttons that block will also start taking an excessive amount of time.

Projects & Clients Too many queries?
Excessive memory usage by gateway, or memory exhaustion Query results take a long time to return to client Query timeout errors, dropped connections to gateway If a high number of query requests from clients are to blame for the slow performance you’ll likely see symptoms that could include the gateway using an excessively high amount of memory or running out entirely, or the query results may start taking a long time to return to the client. Queries timing out, client connections being dropped and then reconnecting, and the like will start to occur more and more frequently when the gateway is being stressed by an excessive amount of sql queries from running clients. Too many queries can also be due to an increased client count and may also exhibit the same symptoms as running too many clients.

Projects & Clients Plan for the future from the beginning
Be aware of client update rates Keep in mind tag subscription numbers Avoid polling queries Communication w/ gateway should be on demand Tips: Finding scaling problems late usually results in a partial if not complete redesign of projects / architecture Again, all these things have to do with the gateway being flooded by requests from running clients that it needs to handle. It’s imperative to keep in mind potential future growth when developing your projects and putting together system architectures. If you’re going to want to be putting together a single project (or a couple projects) that are going to be accessed by 400 clients then you’re going to have to put a lot of thought into how that project is designed and what sort of performance you should expect. You’ll want to adjust the client update rate, stay away from client subscriptions to tags if you can help it, and definitely don’t use polling queries. You really don’t want to do any communication with the gateway unless the user presses a button and is specifically asking for certain information to be updated or delivered. If you find yourself in these situations due to a slow expansion over time it’s going to require you to do a painful overhaul of your project. Remedying slow performance caused by any of the above listed issues takes a lot of time and development effort and potentially a decent amount of down time. Once you reach this point you have to usually go through a complete redesign of your projects in order to streamline things and reduce load on the system. That’s why it’s always best to develop a plan before starting your project design to help ensure that you will be able to scale as your install ages and grows over time.

Infrastructure, Architecture, & IT
Know your VMs, VM hardware, and have access to both VMs add a layer of complexity to troubleshooting Under-provisioning Overloaded VM host machine VM snapshot/backup schedule Ignition isn’t the only place to look when you start noticing slow performance of your gateways. When you start dealing with very large architectures you have to be hyper aware of your entire IT infrastructure and how that can play a role in your system performance. The follow list consists items that we’ve come across when helping users troubleshoot their system performance that can save a tremendous amount of time when trying to track down seemingly vague and ambiguous problems. Know the details of your VMs, VM hardware, or have quick access to members of your IT team who can get you that information as well as access to the VM host machines. Virtualizing your architecture is not necessarily bad, but it does introduce many more complexities when it comes to troubleshooting problems. Things that we see causing problems on a regular basis include: Refer to paper notes

IT schedule Java updates Database backups / maintenance Any change to any system that Ignition connects to can cause an unforeseen problem It’s important to also be aware of you IT department’s schedule for maintenance on any of the systems that interact with the Ignition gateway. Network maintenance, Windows updates, Java updates, DB maintenance and backups, etc. Anytime you make a change to any of the systems that interconnect with your gateway you run the risk of causing unforeseen issues that may not be immediately noticeable or easily diagnosable. Being privy to any and all changes that IT regularly implements does a long way to being able to quickly identify vague performance issues when they arise.

Variation of load across infrastructure Large file transfers Offsite backups Higher than normal DB activity Poor scheduling of network traffic can impact performance Beyond just maintenance it’s also good to have an idea of what the variation of load across your infrastructure looks like during different times of day, week, and month. It’s very possible that network activity across the plant may spike during certain times of day due to number of active people in the plant, scheduled backups across multiple machines, large file transfers, and higher than normal database activity. The more that you know about what your network activity looks like from day to day the better you can plan out when to expect longer response times due to network latency. This may not initially seem like too big of a deal, but when your architecture gets large enough, and you’re dealing with things across a large company wide network, poor scheduling of bandwidth intensive tasks can cause noticeable slowdowns in client to gateway communications and result in poor performance.

What causes slow performance? Stuff. Lots of stuff. Solutions come from coordinated troubleshooting efforts The main takeaway from all these examples is that the definition and cause of “slow performance” differs from install to install and there is rarely a checklist you can follow to quickly track down what is causing your performance issues. Most of these performance issues aren’t solved because the tech support rep you’re working with knows something inherent to the software that others aren’t aware of. We have definitely seen systems fail in some very specific ways over the years, but the solutions to those underlying issues usually came from a coordinated gathering and analyzing of information regarding the systems to which Ignition was connected.

Part 2: Java Process & JVM Tuning

Ignition and the Java Process
Understanding your Java Process is just as important as your server’s hardware: Insure Stability Optimize to your performance needs of your project Analyze your performance to better scale your projects going forward As your Ignition system grows one of the key parts that you will need to start to manage is the Java process. The reason I am referencing the Java process and not the JVM is because as your Ignition system grows. You will need to be aware of more than just what the JVM is doing in order to insure the best performance from your Gateway. Changes to how the Java process runs based on the scope and complexity of your projects are critical to Ignitions performance. The goal is to help you understand the different parts of the Java process and how those parts affect performance in Ignition. As well as advice on optimizing your Java process to better fit your gateways performance needs.

Intro to the Java Process
The Java process has 3 defined areas of memory: Java Heap Space This is the most well known of the 3 defined areas of memory and is the home of the Ignition service and where Garbage Collection reigns supreme. Native Memory This is memory allocated to the Java process that exists outside of the Heap. Such as the garbage collector and references to the host OS resources. Direct Memory While similar to native memory it serves a different function. In being where buffers for shared Hardware resources needed by the JVM reside such as Network adapters and graphics card. Java heap space is the core of your Java process and where the majority of the changes you make to optimize the Ignition service reside. Native memory is where the garbage collection process exists , and os interfacing Java libraries find reference – such as JNI and NIO Direct Memory is where direct hardware references are reserved with the OS so that Java can reference these within the JVM

Anatomy of the Java Process
Visual break down of the Java process that might help see the relationships at play

Java Heap Space Key points in understanding the Java Heap Space:
This is the first place you should consider making changes to improve performance of your JVM. Not giving the Java heap enough memory to run the Ignition Gateway service is the most common source of performance issues. The type of Garbage collector being used by the JVM can lead to performance problems as you increase the size of the heap space. More memory doesn’t always mean better performance. The Java heap is the first place you are going to look when trying to assess your gateways performance. In this memory space resides the Ignition service and garbage collection reigns supreme. In this memory space you can make simple but powerful changes to the heap to improve performance of your gateway. The first of these changes is the size of the heap. This most likely be the first change you will make in order to improve performance on your gateway. Going forward any alterations you are wanting to make to the JVM will be done from the Ignition.conf file. This file is read at the launch of the Ignition service, which means that any changes to this file do not take effect until the Ignition service is restarted. This file can be found here in your Ignition install: The second biggest change you can make to your JVM is to change the Garbage Collector, in Java 8 the default collector is Concurrent Mark or CMC which is works well with smaller heap spaces but starts to show its age when your heap space is 4gb or bigger. Once the size and scope of your projects require a heap space 4gb or larger you are going to need a different Garbage collector that can scale with the memory requirements while still performing its collections without reducing performance of the gateway. To address larger heaps spaces Java created their newest Garbage Collector called G1GC ( Garbage First Collector). This made some fundamental changes in how the java heap is created and managed with larger heap spaces in mind. While being far more efficient and stable in managing larger heaps spaces there are still limitations to what it can handle in a single heap space. At this time having a heap space larger than 30gb you start to see degradation of performance when using G1GC. Which highlights the point that increasing your heap space isn't always the best option when looking to increase performance on your gateway. Which is why looking at the Java process as a whole is vitally important to managing performance and optimizing your gateway has it grows in complexity/scope.

Java Heap Space Cont. Configuration changes made to the JVM are done in the Ignition.conf file located: Windows : C:\Program Files\Inductive Automation\Ignition\data Linux: /var/lib/ignition/data OS x: /usr/local/ignition/data/ You can give the JVM additional execution paramaters in the Ignition.conf file that will change how the JVM is loaded. About mid way through the file you will find the JVM settings that effect how it runs. This file is only read when the Ignition service starts so any changes made will require a restart.

Important JVM Parameters
Set Minimum and Maximum Heap Space wrapper.java.initmemory=2048 (MB) wrapper.java.maxmemory=4096 (MB) 2. Set Garbage Collector (aka GC) for JVM wrapper.java.additional.3=-XX:+UseG1GC wrapper.java.additional.3=-XX:+UseConcMarkSweepGC 3. Set Diagnostic Logging of the GC metrics for JVM wrapper.java.additional.4=-XX:+PrintGCDetails wrapper.java.additional.5=-Xloggc:C:\Java_GC\diag_Check.log Setting the heap space memory is one of the most powerful and simplest changes you can make to the JVM to improve performance. The second is the Garbage collector , while this decision should be made based on the size of your max heap memory settings. The default GC is currently ConcMarkSweep which is a good collector but really only scale well with max memory foot print less then 4 gb. Once you max heap space is 4gb or bigger then you will want to be using G1 (aka Garbage First) collector , the reason for this is that it has been built from the ground up to handle large heap spaces and more dynamic scaling with regards to the JVM. The last parameters are in some ways the most important because these two become key if you are wanting to profile your garbage collection to better optimize this process.

Optimizing Garbage Collection
Optimizing your Garbage collection is a important part of tuning the performance for the JVM. Depending on your performance needs you can tune for the following: Throughput Latency Manage the memory footprint Note: When optimizing your JVM there will be trade offs as you tune the gc for a certain performance metric. For example if you need to keep your memory footprint small you will see a decrease in overall throughput and latency due to limited resources. Throughtput , Latency , and Memory footprint are the three areas you can make adjustments to in order to get performances gains for your Ignition service. And as your project scales you will need to prioritize which of these metrics are most important for how you need your Gateway to run at optimum levels.

Garbage Collection Parameters
Recommended Parameters for optimizing your GC: -XX:InitiatingHeapOccupancyPercent This will effect how often the heap is evaluated and unused objects in memory marked for one of the collection cycles. By default this threshold is set to 45% of the heap. -XX:G1HeapWastePercent Is the total amount of fragmentation(aka garbage) that you allow in the heap. The default is 5 % of total heap. XX:InitiatingHeapOccupancyPercent – is important if you see your heap space growing and Garbage collection is taking longer kick in then you would like. You can lower the initial occupancy means that marking phase will happen more often which means that mixed collection events are happening more often. This is good for systems that are looking increase throughput but don’t need the memory to stay allocated for use long term. Such as bringing in large amounts of data from a database query. XX:G1HeapWastePercent –sets how fragmented you want to the allow the heap to get. If you increase this number it means that garbage collection events won’t be happening as often.

Recommended Parameters for optimizing your GC cont’d: -XX:G1MixedGCCountTarget This sets the minimum threshold for the number of old regions to be included in a Collection set of a mixed collection cycle. The default is 8 regions -XX:G1OldCSetRegionThresholdPercent This sets the maximum threshold for number of old regions to be included in a Collection set of a mixed collection cycle. The default is 10% of total heap size These parameters work as a pair and are basically the min and max settings for how many old gen regions can be flagged in a mixed collection cycle.

Recommended Parameters for optimizing your GC cont’d: -XX:G1MixedGCLiveThresholdPercent This is the maximum percentage of live data within a region that will allow it to be in included in a Collection Set. The default is 85 % of a region. This sets the ratio of how much live data can be present in a region when added to a Collection set. Why this is important because if you reduce this number you will reduce how many regions would be eligible for collection. Increasing regions eligible for collection which would cause the collector to run more often which would help with latency in terms of freeing resources and keeping the heap free of fragmentation.

Native Memory Characteristics of Native Memory:
Exist outside of the heap space Memory space is managed by the host OS Has live object references to resources within the heap Is where Garbage collection process resides Can run out of memory resources Can be affected by limited memory resources Is not included in garbage collection but can be affected by it

Direct Memory Characteristics of Direct Memory:
Where Byte Buffers exist for external communication Has direct references to host machine hardware resources Resources within the heap can drive amount of memory used Is not managed by garbage collection Usage need to be actively tracked and assessed when developing your project OS manages memory space allocated

Tracking Non-Heap memory
Note: While we may not have direct control of its growth and usage. It still needs to be actively tracked due to its affect on JVM process performance.

Diagnosing, Troubleshooting, and Improving Ignition Gateway Performance

Similar presentations

Presentation on theme: "Diagnosing, Troubleshooting, and Improving Ignition Gateway Performance"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Diagnosing, Troubleshooting, and Improving Ignition Gateway Performance

Similar presentations

Presentation on theme: "Diagnosing, Troubleshooting, and Improving Ignition Gateway Performance"— Presentation transcript:

Similar presentations

About project

Feedback