Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linux, and Windows and VMWare, oh my! By Ron Kaminski

Similar presentations


Presentation on theme: "Linux, and Windows and VMWare, oh my! By Ron Kaminski"— Presentation transcript:

1 Linux, and Windows and VMWare, oh my! By Ron Kaminski

2 Introduction Over the past 20 years, I’ve started and expanded capacity planning groups at dozens of firms, my most recent is now 15 months old – You learn things in that process – CMG is the place to share this information – I look forward to your presentation on this topic in a few years! Today’s goal is to give you “planning and audit points” that you can use to review how you do capacity planning, and maybe persuade you that other methods might be more productive, or at least worth a shot! There will also be “How to” information, that may have you adding some “to do” items to your list If you have a question, ask it! – I like nothing better than surfing off on a tangent that helps the class Story Times! New risks 2© Ron Kaminski 2013, All Rights Reserved

3 Introduction In the next few hours, we will cover – Defining your mission – Picking the right vendor partners – Going “Extra-Product” – Avoiding the “IT Mindset Traps” – The politics of capacity planning in organizations, the key factor in your eventual success, or failure – Reporting, what you should and surprisingly should not do – Classic capacity planning question descriptions and proper answering techniques © Ron Kaminski 2013, All Rights Reserved3

4 Introduction In the next few hours, we will cover – How clouds and “software as a service” will still need capacity tracking and planning tools, and what new kinds you will need – Modeling when all of the cards are stacked against you, or “Tricks of the trade” – Goals to work towards – An audit list to compare to your systems Capacity planning done well can change the fortunes of a company and help all of our careers. Come sharpen your methods and learn tricks that will make you part of your firm’s future productive assets, and not an expense to be controlled © Ron Kaminski 2013, All Rights Reserved4

5 Ron’s Rules You can ask anything, at any time – Sometimes the answer is coming up soon in the examples, and in that case I’ll tell you so Quick Survey – Does anyone here already have… A network queuing theory based modeling package? Regular, automated process and workload pathology detection? Fast web reporting of resource consumption by business useful workloads? By the end of this talk, I hope that you will realize that workload characterized views of consumption, web accessible, over business useful time spans are a must have part of the best run IT shops – Lets see why… © Ron Kaminski 2013, All Rights Reserved

6 Defining your mission Every site has their own “Hot button!” issues – “We are buying a new $23 million computer room every 6 months!” Attack server sprawl with data, not words – “I don’t know why we hired a capacity planner, we just…” – “Our critical applications are slowing down!” Use relative response times and historical information to show why – Chargeback used to be a big draw but it has really faded away in the post.com world It shows you when you are talking to an old vendor – The ITIL push and reality when facing outsourcing or “ZOG” ITIL takes a back seat to cost control, at least in the states – “We need better reporting!” Be careful to be holistic in what you deliver, cover every thing that they can buy, historically and ideally with business cycle peaks When you start hearing terms like “focus on business priority” and “really look at travel expenses” realize that cost cutting is in your future and report in ways that enable them to cut power and machines © Ron Kaminski 2013, All Rights Reserved 6

7 Defining your mission You might think that all that variation would lead to very different solutions, and you’d be wrong! – All effective capacity planning systems are based on having: Efficient data collection, regrouping, reduction and storage Effective graphical reporting of business meaningful spans of time Components of workload response time that lead to diagnosis Solving the desire for answers to “What if…?” questions Problematic consumption diagnosis, reporting and ticketing – Some capacity planning product “features” marketed by vendors to the naïve are actually seldom used in the real world, and for good reasons Linear Trending, when what you really need is business cycle discovery and planning – The retail cycle at grocery chains and web payment system vendors Real Time Monitors, when you might want to go home or on vacation some day. Remember, problems happen 24 X 7, and humans won’t be watching “twitch monitors” that consistently. - The mission control room story Top 10 is often used to focus a newbie on peak consumption, which may all be valid © Ron Kaminski 2013, All Rights Reserved7

8 Defining your mission Who is doing the reporting? – Vendor supplied reports Tend to be single metric Often don’t include contextual information Are often “generate on demand” and therefore any useful span of time takes beyond the allowable attention span Often have serious contextual clarity problems – Workloads change colors as » the number present changes » You switch machines » Use black outlines that swamp the colors for small workloads – The “I’m only using vendor reports this time” and hit count story Can take unimaginable resources to produce – Set yourself a consumption budget and manage to it – You want to trade more bonds? Stop looking at it! May focus on reporting “right now” data rather than long term useful decision support information Seldom contain “disturbance to the status quo” notation capabilities © Ron Kaminski 2013, All Rights Reserved8

9 Defining your mission Who is doing the reporting? – Write your own reports Can be anything that you dream up (and can deliver the code for) There are multiple “free” languages and infrastructure to pick from – We’ve used perl, PHP, java and a whole lot more Can be tailored for your firm’s decision maker’s specific needs Can use “generate ahead” and other techniques to speed web reporting Writing your own can also have “down sides” – Staff turnover and the “Who is going to maintain this ___?” issues – Some staff are not gifted visual communicators – If the information used changes formats, (and over time they all do) someone is going to have to maintain that stuff © Ron Kaminski 2013, All Rights Reserved9

10 Defining your mission What do you want to present? – Workload characterized subdivisions of consumption over time? – Long term historical context for decision makers over multiple natural business cycles? – Information subdivided into audience specific groupings for ease of use by subgroups – Integration into your firm’s CMDB Ticketing systems Software development life cycle – Totals over time The spark lines counter-argument © Ron Kaminski 2013, All Rights Reserved10

11 Why sparklines of totals can be really useful These are sparklines of total CPU used, Average CPU used and the average CPU used by all nodes in that O/S Is there one in particular that draws your eyes to it, that wants you to probe deeper? © Ron Kaminski 2013, All Rights Reserved11

12 Why sparklines of totals can be really useful If you are like me, ustca102 has you wondering, “What made it step up like that? On our system, clicking on the tiny sparkline brings up a “zoomed in” image, which really gets you wondering: Clicking on that graphic brings up our normal web reporting system: © Ron Kaminski 2013, All Rights Reserved12

13 Why sparklines of totals can be really useful © Ron Kaminski 2013, All Rights Reserved13

14 Why sparklines of totals can be really useful OK, sometimes totals are useful – Sometimes they can draw your eye to issues – They can quickly dispel rumors that “All of our machines are maxed out!” For example, our applications specialists were consistently maintaining that all of their machines were barely big enough to make month end, and they would argue mightily whenever we might suggest that there was room for consolidation I brought the chart on the next slide to the next meeting, and suddenly their tune changed… © Ron Kaminski 2013, All Rights Reserved14

15 Why sparklines of totals can be really useful © Ron Kaminski 2013, All Rights Reserved15

16 Why sparklines of totals can be really useful What happened after the meeting? – In the next 9 months, using extremely conservative criteria, we Virtualized 230 machines ($1,521,000) Retired 55 machines ($ 390,553) – “Oh! You can just turn that off!”, or, “See steam come out of the operations folk’s ears” stories Planned 10 machines ($ 40,000) Potential 28 machines ($ 112,000) – We then plan on going back over with slightly less conservative criteria and finding a couple million more – We will also be doing more “application stacking” where it makes more sense Sort of makes capacity planning tools look cheap, doesn’t it? © Ron Kaminski 2013, All Rights Reserved16

17 Why sparklines of totals can be really useful © Ron Kaminski 2013, All Rights Reserved17 A DBA pal of mine asked for a review of memory on a box, asking for an increase to add caching and improve performance – I didn’t really detect a memory shortage:

18 Why sparklines of totals can be really useful Still, people don’t usually mention issues unless there is an underlying cause. So, as a capacity planner, you have to always look deeper and always check all of the following: – CPU – Disk I/O – Memory – Network – Response time for key workloads If you don’t always check everything, something can sneak by – Here is what I found when I followed the “always check everything” rule When I looked at CPU, I saw: © Ron Kaminski 2013, All Rights Reserved18

19 Why sparklines of totals can be really useful © Ron Kaminski 2013, All Rights Reserved19

20 Why sparklines of totals can be really useful © Ron Kaminski 2013, All Rights Reserved20

21 Why sparklines of totals can be really useful Now you see several reasons see why longer term sparklines can be pretty useful – Do you currently have ways to generate them? – If not, do you want to get ways to generate them? – Don’t you all think that your vendor ought to provide them, in group and zoomed in formats? So lets start asking them to… Do you also see why you should always check everything and then sit back and ask yourself: – “If I had asked that question and then got this response, what would I ask next?” © Ron Kaminski 2013, All Rights Reserved21

22 Defining your mission Anticipate the “next questions” and always answer them before being asked –The unanswered “next question” can be a huge time waster often a stall technique used by the politically astute –It raises temporary doubt in your findings, and builds their case for swift purchase, before you answer their question –often a way for the old guard to show that they still are the “top dogs” to management Impatient or frightened management might run off and buy something! The undeclared war between Project Managers and Capacity Planners The “project manager weasel who never lost” story © Ron Kaminski 2013, All Rights Reserved 22

23 Defining your mission –If you are going to shoot down someone’s hypothesis that lack of CPU was the cause of a problem, you’d better find out what really caused the problem before the meeting! –Your goal: One meeting or phone call per issue! –They may say “We just want a quick and dirty answer” but they never really do! –Always cover at least: CPU Memory Disk I O Workload response time changes For web-centric systems, network distances and loads 23© Ron Kaminski 2013, All Rights Reserved

24 Defining your mission Cultural differences are real and might affect your workload choices –Some cultures avoid direct blame or information that would cause someone to “lose face” –Any workloads are better than none –The “No personal pronouns” story Be consistent! –Always use the same groupings on all similar nodes Use the same colors if you can! –Reduce the burden on your audience –Multiply the value of your workload creation efforts –Use consistent precedence order to decide where to put a process that meets the criteria to be in several different workloads © Ron Kaminski 2013, All Rights Reserved 24

25 Defining your mission Whatever you decide: – Track your own tools usage! There are multiple great freeware web usage reports that will tell you if folks are using or snoozing your data (We use webilizer: ) Unviewed information is wasted time and efforts – Use speed tests If there are multiple ways to do something (CSV files versus a Performance database) code for both and have a race – Will your web users want the slower one? – The capacity planning reporting challenge story – Don’t settle, always seek new audiences and better reports Add new functions – Sadly, there is no shortage of bad vendor reporting on expensive infrastructure » Anyone here ever seen a great graphical historical display in business useful terms of SAN information or LAN usage by segment? – Your firm may have business specific information that might be really useful to decision makers if overlaid on or graphically reported near with IT resource consumption © Ron Kaminski 2013, All Rights Reserved25

26 Our site’s web usage: © Ron Kaminski 2013, All Rights Reserved26

27 Our site’s web usage: © Ron Kaminski 2010, All Rights Reserved27

28 Our site’s web usage: © Ron Kaminski 2013, All Rights Reserved28

29 Our shared long term mission When you innovate and come up with new report ideas, share them at CMG! – Or at least send me examples in mail and I’ll do it for you! – Share code in this or other user groups that make sense We should all work together in user groups, public forums, on the web, etc., to push all of our vendor partners to address these needs – The more they do for us, the less we carry the “home brew code” weight We should also all work to reduce the volume, impact and long term storage requirements of our solutions – I have yet to encounter a vendor that isn’t carrying around a lot of extra metrics in the bowels of their systems that will never be used We should have a CMG sponsored “help wanted” section for capacity planning specialist positions in the various countries © Ron Kaminski 2013, All Rights Reserved29

30 Picking the right vendor partners I believe that all capacity planning efforts should have tools that include: – Efficient resource usage and process consumption collectors – Network queuing theory based “what if…?” modeling based on workloads, not total consumption The bulge trap – Efficient, speedy web-based historical consumption data display Ideally your chosen vendor would – support most or all of your differing operating systems and devices – have ample training and consultants available, there is nothing better than a co-pilot when you are starting out – participates in and supports CMG! © Ron Kaminski 2013, All Rights Reserved30

31 Picking the right vendor partners In the not too distant future, the best vendors should be: – Offering efficient “low impact” “cloud deployable wrappers” that run with your applications in a cloud – “We don’t have to worry, its in a cloud” is nonsensical Are you going to generate fake transactions and time them? When you get a long time back, or significant variance, are you going to have enough information to know why? I think that in time people will realize this need, and want it in their contracts Don’t you want to know the overhead of encryption and decryption in the process, and it’s response time effects? Stupidity is infinitely scalable, as long as you aren’t getting the bill – If nobody cares to make their code efficient, because they just send it to the cloud, how good is that code going to be? – Will it be running on the same machine as you tested? – Will it impact your users? © Ron Kaminski 2013, All Rights Reserved31

32 Picking the right vendor partners In the not too distant future, the best vendors should be: – Offering efficient “low impact” “cloud deployable wrappers” that run with your applications in a cloud (continued) The internet will continue to grow logarithmically – So those clouds could get mighty full, mighty quick – How do you want to find out that it is too full? » Do you want your customers telling you? » Or do you want your own reports based on scientifically accurately collected consumption data? Social media sites are becoming valuable business tools – Businesses “tweet” and have Facebook pages! – Do you think that a free application originally designed to let 14 year olds share photos is designed for high performance business needs? – How will you be sure? © Ron Kaminski 2013, All Rights Reserved32

33 Picking the right vendor partners In the not too distant future, the best vendors should be: – Thinking about SaaS user tools as well, Sure, SaaS vendors maintain the code and pay if it is a hog, but are they: running maintenance activities like backups and virus cans that slow things down right during prime time for Australia in your globally distributed firm? suffering from office hours peaks of consumption that impact your user’s response times? Taking outages to horizontally scale that might impact your firm’s ability to ship product? – Without your own data, you will never know What responsibility do you have to your firm’s users? Why is this network queuing theory based modeling stuff so important? – Let’s understand what it means and then see an example… © Ron Kaminski 2013, All Rights Reserved33

34 © Ron Kaminski 2013, All Rights Reserved34 Modeling Norms Most modeling packages assume a Poison or Chi-squared distributions of the arrival rate of transactions Some simpler, yet often quite elegant systems like Dr. Neil Gunther’s PDQ modeling just use a quadratic and forget the tails – They aren’t all that different despite what we modeling junkies might say! Don’t focus on the distribution selected, focus on whether they use queuing theory models and give you relative response times

35 © Ron Kaminski 2013, All Rights Reserved35 Why network queuing theory based modeling? These concepts are also often illustrated with simple queue graphics like the one at the right An important implied assumption is that all requests are served, none are lost Response time is the sum of Queuing Time plus Service Time

36 © Ron Kaminski 2013, All Rights Reserved36 Why network queuing theory based modeling? Methods do differ, but queues for interactive workloads are usually computed based on load percentage using a formula like: Q = U/(1-U) – where: – Q = Expected Queue – U = Utilization Response time is the sum of Queuing Time plus Service Time

37 © Ron Kaminski 2013, All Rights Reserved37 Why network queuing theory based modeling? So, as a workload competes for resources throughout a day, it’s response time is likely to vary Computed relative response times show us both the variations and the reason The Y Axis metric does not matter! – Just pick a basis, the ratio is the important part!

38 © Ron Kaminski 2013, All Rights Reserved38 Why network queuing theory based modeling? A workload’s typical transaction is likely to rely on several resources Imagine a workload running on a machine with four CPUs, six disks and some network IO on one card Note that when technologies differ, service times can differ

39 © Ron Kaminski 2013, All Rights Reserved39 Why network queuing theory based modeling? Now do you see where a graph like this can come from? If the warehouse folks are complaining about response times at 3:00 AM, should you upgrade the CPU? – When do you suspect that the backups are running? – Would a CPU upgrade help daytime response? But it also might make demand for I/Os faster and really slow down the warehouse at 3:00 AM too, so you better address the I/O issue!

40 Picking the right vendor partners In my experience, network queuing theory based tools move folks quickest to actionable answers – Once you understand relative response times, most issues are quick and easy to diagnose If a new vendor harps on linear “trending” graphics and projections, don’t expect them to be around for very long If a monitoring or other product vendor keeps adding “and you can use this for capacity studies” it is probably because the salesperson heard that you were looking for capacity planning tools! – Stick with network queuing theory based packages and you won’t go wrong! – Dozens of “And we can do capacity planning too!” stories © Ron Kaminski 2013, All Rights Reserved40

41 Ron Goes Off on VMware VMware is the single biggest indictment of the poor way most firms have done capacity planning in the Windows space – The lack of workload characterized views of consumption is why folks bought a server for each functional part – “We don’t want to stack multiple applications on one server! So we VMware them! …which is just stacking with the added joy of paying for not only extra copies of the OS and tools, but $900+ for VMware as well And in the end, the code is running on the same box! – VMware’s “so called” capacity planning tool is proof that they never attended a CMG! It is as near useless as any marketed tool that I have ever seen, but at least it is expensive… © Ron Kaminski 2013, All Rights Reserved41

42 Going “Extra-Product” Once you get used to your vendor’s product, if you are like me, you’ll start wishing for more functions tailored to your specific needs – In the old days, a grey haired expert would whip out a spreadsheet or other mathematical package and start creating some “home-brew” solution – I use perl and GD:Graphics, PHP, java script and anything else that I can think of, you can use what makes sense to you – Check out old CMG papers, they are laced with great ideas In other words, don’t feel limited to what your vendor does “out of the box” – Find buddies that use the same vendor and start sharing ideas and code – Things that you will see later in this presentation are shared among dozens of firms and they wouldn’t live without them – You don’t have to agree 100%, take what fits best and leave the rest © Ron Kaminski 2013, All Rights Reserved42

43 Going “Extra-Product” There are a whole group of us running many of the extensions that we’ve developed over time – Some of our extensions have made it into some products, but nowhere near enough of them! We probably get 50% of our firm’s benefit from the tools from our own extensions We regularly meet with the vendors and implore them to add the features that we like Having more singing from the same hymnal might just get through to them! Come join us! The best ideas might be in your head! Share! © Ron Kaminski 2013, All Rights Reserved43

44 The politics of capacity planning in organizations Organizational politics are often the key factor in your capacity planning group’s eventual success or failure Long experience has taught many of us the importance of – Friends in high places Try to get the capacity planning issue instigated by a knowledgeable VP or at least a director Often a major initial stumbling block is even getting permission to install collectors on production systems, much less the physics of actually doing it, and there is nothing better than having their bosses boss saying, “Yes, you must do this, it is a priority” – Determining and rating the skills and power balances in your organization, usually by O/S – Managerial chaos can be a severe issue – Diagnosing and surmounting the barriers to success Describing the type Their common barriers and techniques to surmount them © Ron Kaminski 2013, All Rights Reserved44

45 Identifying and surmounting barriers Barrier: The “not invented here” über-geek – Identification clues Often are early members of a firm Usually position themselves as masters of several related technologies, but can be rather sparse on details The younger the firm, the more often you find them, internet firms in high growth areas are full of them They are convinced that “If we didn’t need it then, we don’t need it now!” – Their typical barrier methods “This is not an organizational priority” “This collector code is not proven on our sensitive production systems” – Techniques to surmount their barriers Friends in high places compel them Share credit for successes with them to their management Involve them in the model setup, ideally model along side them, letting them suggest probable growth steps © Ron Kaminski 2013, All Rights Reserved45

46 Identifying and surmounting barriers Barrier: “The high priests of the old tool set” – Identification clues They like “twitch monitoring” and often have built an extensive installation of them with impressive sounding names like “The war room” or “mission control” – Whenever you enter it during non-emergencies, notice how few people are actually using the displays They prefer current “totals” like total CPU because they’ve never had consumption by business identifiable sub-groupings They react to brief workload peaks by demanding upgrades – Their typical barrier methods Stalling. They ask streams of technical questions, and each answer that you give prompts another Requests to integrate, new capacity tools must feed information to their “war room” – Techniques to surmount their barriers Ask them to put long term, workload characterized consumption on their displays Have them tasked to help address pathologies automatically detected (that their monitors did not seem to surface) © Ron Kaminski 2013, All Rights Reserved46

47 Identifying and surmounting barriers Barrier: The application architects – Identification clues They rigorously defend their current multi-node spread as vital for – The organization – Uptime – Scalability 90% of their machines will be empty or nearly so The architecture was set in stone a decade ago, and is designed to solve the issues of that time, miniscule PCs – Their typical barrier methods Lecturing you on how their way is the “only way” – “Don’t you realize that these are business critical systems?” is used to justify all manner of excessive purchasing – They will lecture you on availability and scalability at the drop of a hat – Techniques to surmount their barriers Show them the serious speedups possible by collapsing application layers onto fewer machines and removing network time from chatty applications Ask them for estimates on just how much more their application will need to scale, given that it is 7 years old and already in use firm wide? © Ron Kaminski 2013, All Rights Reserved47

48 Identifying and surmounting barriers Barrier: The entrenched fire fighting squad – Identification clues They offer to work with you, but not today as there is an emergency They position themselves as “the experts” in an application They are hyper-sensitive to any changes in the environment, they view them as “dangerous” “Our conference rooms are named after comic book super heroes!” revisited, when you fly in to interview, everyone is fighting a fire – Their typical barrier methods They position themselves as “must have” team members and then are never Beware their commitments to make data or specifics available, they will often be “too busy” later to do it in a timely manner if at all – Techniques to surmount their barriers Agree to work with them as valued members of the team, then ignore them in your plans as they will always be too busy to help anyway Never trust them to come through with a key item, always plan for another way to get what they promise that does not involve them Over time, train them that many of the “time consuming fires” that they fight are simple pile ups of multiple pathologies that won’t bite if addressed in a timely manner © Ron Kaminski 2013, All Rights Reserved48

49 Identifying and surmounting barriers Barrier: The overwhelmed, outsource-able and scared – Identification clues They have single functions, often somewhat amorphous, and difficult to tag a dollar value on They are not in politically savvy management’s structures – Their typical barrier methods They stall, seemingly frightened to take on any task without exact instructions from their management The view tasks related to capacity planning as “Not their priority” They view all new functions as threats They seem to ignore all information not generated by their own function – Techniques to surmount their barriers These are politically weak people in politically weak areas, stay away from them so as not to have to rely on them If forced to work with them, work with their manager to emphasize that capacity planning is and important priority that they cannot stall Help the good ones get out of that group © Ron Kaminski 2013, All Rights Reserved49

50 Identifying and surmounting barriers Barrier: “This is a database server only” DBAs – Identification clues They claim that “In order to save the firm database license money, we are concentrating the databases from multiple applications on just a few servers” and “nothing else can run on these servers” – Their typical barrier methods Outright refusal to try collapsing micro-applications onto database servers Claim remaining capacity on the 1/3 used database server is “for growth” but are real hard to pin down for specifics, usually because there aren’t any – Techniques to surmount their barriers Try to get them to allow/install only a certain small percentage of application code on their machines due to “a network emergency”. That seems tiny and reasonable. – Use a number like 10% to 20%. They don’t need to know that that was all of the applications that you ever dreamed of doing. Show them how your automated process pathology code works, to ease their fears about rogue applications eating their machines alive and harming other applications Praise them to their boss as “innovative and balanced problem solvers” © Ron Kaminski 2013, All Rights Reserved50

51 Identifying and surmounting barriers Barrier: Lying, manipulative project leaders – Identification clues You are originally asked to model 400 users from a sample of 30. Later they say, “Oh no! We meant 1000 users!” – Their typical barrier methods Some project leaders view themselves as risk minimizers. Sadly, they often feel that 60% excess hardware is a proper sized “cushion”, so they inflate their usage estimate 60% to make the modelers justify excess hardware for them They took 3 extra months to get all these whacky features in, way past their deadline, but now time is an emergency and they need their results immediately or they just need to buy hardware right away because they have no time to test properly – Techniques to surmount their barriers Speed. You can model this stuff far faster than they can get a load test to work without half of those whacky features blowing up Ask more people for how many users really are going to be there © Ron Kaminski 2013, All Rights Reserved51

52 Identifying and surmounting barriers Barrier: Enthusiastic but “We went to Load Runner Class and we absolutely have to to run huge saturation load tests” drones – Identification clues They don’t understand mesa tests and modeling is all that is needed. Even if you can get a decent mesa test out of them, they still want to do a saturated load test anyway They REALLY BELIEVE two seemingly counter intuitive things: 1.Your operations group must run out and buy exactly the machine and memory that they dreamed up from dubious research for their tests 2.They do not have to run against realistic data volumes with similar indexes and size as intended production. They will NEVER create a statistically relevant data source. They will frankly state: “It is impossible!” © Ron Kaminski 2013, All Rights Reserved52

53 Identifying and surmounting barriers Barrier: Enthusiastic but “We went to Load Runner Class and we absolutely have to to run huge saturation load tests” drones – Their typical barrier methods No matter how many times you say not to, they will always strive to ramp up users at the start and ramp down afterward. Get ready to lose your first and last measurement periods If you can get a realistic transaction mix from them, they will still strive to run them too fast – The 30 second contract review, 8 hours a day story – Techniques to surmount their barriers Always question their user think times, then adjust your model to deal with the silliness that you uncover. Maybe 20% of the samples that I get have realistic transaction arrival rates, so beware Be consistent, over a series of tests you will wear them down, or get them fired © Ron Kaminski 2013, All Rights Reserved53

54 A mail message to a new fleet of “Load Runner” enthused contractor drones The purpose of load tests can be manifold, to test functionality, capacity, and “feel”. Modeling based on a sample does the same things and more, and usually much faster and cheaper. If you choose to run a load test, be sure to run a “realistic transaction mix” with the expected blend of all commands, not just one kind. If you are limited to simulating a subset of intended loads by physics (we don’t recommend simulating above 20 users per load running PC for accuracy) we can then take that load and model much higher ones and any alternate hardware that you might dream of. We have these caveats to improve accuracy: 1.Perform the tests on real, not virtual, servers for measurement accuracy 2.Run a proper “mesa test” for sampling which includes: A.Make sure that the CPP group has a collector on your intended test machine days before the test B.Start your test precisely on an hour boundary C.Do not, repeat, DO NOT “ramp up” or “ramp down” users. Just start and go, 20 users per load runner box will not overwhelm anything. Ramping is not required for models, indeed it is wrong to do it. D.Stop precisely on an hour boundary E.Send mail to us telling us I.how many users you simulated II.The precise timings III.How many more users we should add in the models IV.Anything else pertinent © Ron Kaminski 2013, All Rights Reserved54

55 A mail message to a new fleet of “Load Runner” enthused contractor drones 3.The purpose of the test is to produce a flat topped “mesa” of usage that depicts your users acting normally. A graph of CPU consumption should look like a rectangle with a flat steady top, nowhere near saturated. We then take that sample of happy users unconstrained and model what hardware is needed for more happy unconstrained users. 4.Do a “practice run” several days before your real test to flush out issues and tell us so we can see how well you followed mesa instructions 5.DO NOT do any of the following, which will waste your time, ruin the data and cause rework A.DO NOT “ramp up” or “ramp down” usage at the start or end of your tests. It just makes us throw out that data B.DO NOT try to “saturate the machine”. The models will find that saturation load, don’t waste your time. Concentrate on producing an unsaturated load of happy users getting great response times C.DO NOT try to simulate hundreds of users from one PC with one network card. It will fail or worse, produce incorrect data leading to massive errors D.DO NOT create loads with unrealistically fast “think times”. If the user is likely to do a transaction, then wait 5 minutes reading it or processing it, then set the inter-transaction wait time to 5 minutes, not 30 seconds. Remember, your goal is to be realistic, not to have high unrealistic loads. Mesa tests may seem odd at first, but in time you will learn to love mesa tests and their time and cost savings to projects. After a few of them, you’ll never load test the old way ever again. Questions? Please ask, or invite us to your team meetings for a confab! © Ron Kaminski 2013, All Rights Reserved55

56 The politics of capacity planning in organizations How to win friends and influence people in the operations group – Set up “being on the capacity planning team” as an aspiration goal, a promotion path, for the operations folks – Try to find an operations or O/S expert at the top of their game and get them assigned to the capacity planning effort These are often the best acolytes and really take well to capacity planning – As the operations staff start to use the capacity planning reporting and pathology detection systems Praise their efforts and successes to management Coach their failures privately – Get them (and their management) to realize that keeping process pathology counts down reduces emergencies and call-outs, and greatly contributes to system stability – Train them on the tools so they start to use them and build new skills If the only users of the capacity planning reports are on the capacity planning team, you are doing something wrong! © Ron Kaminski 2013, All Rights Reserved56

57 The politics of capacity planning in organizations How to win friends and influence people in the application development group – In addition to the barriers presented previously, you may also encounter The earnest improver, who takes the time to learn about new technologies and tries to integrate their benefits into their software development lifecycle The non-technical manager, who may never understand all of the math and formulas, but who will be far better at the political skill required for success External vendors whose future profits hinge on success – Try to become an asset to each of these groups make sure that they see you as a willing partner in their success work late on their models help them succeed and get the resources that they need when they need them – Send mail when you work early, late or on the weekends (and CC your boss of course), it shows that you are really trying to help © Ron Kaminski 2013, All Rights Reserved57

58 The politics of capacity planning in organizations How to win over and influence your boss – There are several types of bosses The experienced true believer The unbeliever The unconvinced cost counter – There are techniques to deal with each Your goal is to convert the last two into the first one! – Keeping all happy will involve deploying collectors, generating workload characterized historical consumption web pages and “What if..?” models of future consumption The key is to survive long enough to – get a proper network queuing theory model based software purchased in sufficient quantity to make a difference – Get some applications leadership on your side – keep the last two from canning you before you start to get meaningful results on a large scale © Ron Kaminski 2013, All Rights Reserved58

59 The experienced true believer Usually you have worked with or for this boss before, so they already know – How expensive the tools can be, so they are not shocked – What a reasonable time for results is – How to help enable your success – What battles to fight, and what battles to avoid My last 4 gigs have been for someone who I had either consulted for or worked for – Delivering results delivers career options for you! Characteristics of the experienced believer – Patience – Helps get the software quickly – Helps break through organizational politics to get your collectors quickly deployed – Projects confidence in meetings with other management © Ron Kaminski 2013, All Rights Reserved59

60 The unbeliever These folks (often with a development background) are distrustful of fancy methods like network queuing theory – This is often based on an insecurity, they don’t understand complex tools and thus distrust them – Have made their career by betting on simple solutions and extrapolating linearly – Are often in their position due to management turmoil In several gigs I’ve had non-believers in the management structure above me Characteristics of the non-believer – Initial open contempt of scientific capacity planning methods – Demand results before they help you get collectors in place to answer it with a historical basis – Often will throw CPU and memory at disk I/O slowness – Can be turned, but wow, it sure takes patience! © Ron Kaminski 2013, All Rights Reserved60

61 The unconvinced cost counter These can be great bosses in time, because like scientists, they demand proof before supporting you, but once they have it, they will be true believers They either have no experience with sophisticated capacity planning, or have had running the group forced on them by higher ups who have Characteristics of the unconvinced cost counter – Repeated references early in the process to how much your group and your software costs, and lots of implying that savings results had better surpass that soon – Caution early on, so they will spend the time with other departments getting them to go along with you – Thrive on informational updates, so show steady progress You don’t have to be perfect, just constantly getting better – You’ll know when they switch to true believers when They start buying you more licenses! They stop complaining about costs The “We need to show results!” to “Do you need more licenses?” conversion © Ron Kaminski 2013, All Rights Reserved61

62 Reporting There are a lot of tragically bad business graphics and especially capacity planning reports out there. Issues include: – Graphics that distort the viewers perceptions Quasi-3d Black outlines around bar charts Non-calendar displays of long spans of time No color consistency – Foolish consistency may be the hobgoblin of little minds, but it is also the key to getting management to use your site for decision making (don’t pay attention to “little minds” and “management” appearing in the same sentence…) Lots of chrome, little content – Tufte: “Question every pixel. Basically, any pixel that isn’t conveying new data, get rid of it!” © Ron Kaminski 2013, All Rights Reserved62

63 Reporting Other issues that limit effectiveness – Multi-page reports that nobody ever reads If your answer is so complex that it requires that much evidence, start over on a new one They paid $10,000! It has to hit the desk with a thud! The “same thud” lives on! – Relying on the untrained user to wade in and find the answers themselves Some you can train, most no If any correlation of graphics requiring memory is needed, forget it – Ron’s Position: Non-web presentations in general are useless relics of a bygone age. Most of your reader’s data comes in hyperlinked form, so get with it or be left behind – Web reports of all nodes in the firm Most users really appreciate ways to see only their span of control © Ron Kaminski 2013, All Rights Reserved63

64 Reporting There are also some “Must have’s” – Automated context that graphically highlights when something is out of the ordinary (managers love this stuff) – Automated business and hardware context, ideally driven by your CMDB, that include Hardware and software specifics Business Purpose Business owner Primary and backup technical contacts Ideally a text description of it’s business function Other helpful links © Ron Kaminski 2013, All Rights Reserved64

65 The Zen of Great Reporting Seek minimalism in all parts of it Reduce graphic clutter Reduce user perceived complexity – Workload color consistency is a simple “must-have” Reduce user choices and actions – If the user needs to know 4 things to make a decision, they had better be close on the same web page Add extra information that lets the user more fully understand odd behaviors and situations – Sorting it by date is nice too Don’t restrict yourself to measured quantities – Workload response time detail is one of the most powerful graphics that you can use © Ron Kaminski 2013, All Rights Reserved65

66 Reporting Examples © Ron Kaminski 2013, All Rights Reserved66

67 Reporting Examples (UNIX) © Ron Kaminski 2013, All Rights Reserved67

68 Reporting Examples (Windows) © Ron Kaminski 2013, All Rights Reserved68

69 Reporting Examples © Ron Kaminski 2013, All Rights Reserved69

70 Reporting Examples © Ron Kaminski 2013, All Rights Reserved70

71 New! Reporting Examples Windows © Ron Kaminski 2013, All Rights Reserved71

72 New! Reporting Examples UNIX © Ron Kaminski 2013, All Rights Reserved72

73 Reporting Examples © Ron Kaminski 2013, All Rights Reserved73

74 Classic capacity planning question descriptions and proper answering techniques Capacity issues are usually an emergency to someone Roughly 93% of the requests for upgrades are nonsensical if you have any historical workload based resource consumption information – So you have to say no in a way that makes the evidence clear Always give them a way to succeed along with your “no”, remember that may they still have a real problem! – “No, you don’t need CPU or memory, but you are doing 5500 I/Os a second to your slow, locally attached C drive Can you turn down logging? Can you send those I/Os to fast SAN or RAM drives? Can you get help from your DBA pals? – “No, you don’t need more CPUs, you need to fix those looping processes.” © Ron Kaminski 2013, All Rights Reserved74

75 Classic capacity planning question descriptions and proper answering techniques Here is the pattern for this next section: – Real quotes from the users (disguised, slightly) – The evidence – The answer – What happened I want some interaction on these, if you did it better, speak up! Share! That is what CMG is for! These graphs used in the examples are all homebrew perl and GD:Graphics, and they are used at several firms – Yes I will share the code if you want it, but sheesh, you can do better! You are going to want some form of screen graphics capture tool – I use freeware ZScreen, downloadable from many sources, it is fabulous © Ron Kaminski 2013, All Rights Reserved75

76 Classic capacity planning question descriptions and proper answering techniques User quote – “We are keeping these machines rather heavily loaded.” but they won’t tell you why The evidence © Ron Kaminski 2013, All Rights Reserved76

77 Classic capacity planning question descriptions and proper answering techniques The answer – It turns out that this application was on three nodes, two heavily used and one lightly used They wanted a review of each – Is ustca027 too empty? – Is ustwa007 too full? – Is ustca031 too full? Let’s use Relative Response Time by hour to answer them © Ron Kaminski 2013, All Rights Reserved77

78 Is ustwa007 too full? © Ron Kaminski 2013, All Rights Reserved78

79 Is ustca031 too full? © Ron Kaminski 2013, All Rights Reserved79

80 Classic capacity planning question descriptions and proper answering techniques What happened – The users are initially shocked to see that the capacity planners, whom the view as machine stealers for VMs, are recommending that they get more hardware! – Once they started to understand relative response time graphs, they became quite sophisticated at moving workloads around – You’ll know that you’ve converted them when they you asking if their IO_Wait could be solved if they split them over more drives or better RAID choices The morals of the story – Any vendor can show totals – Favor vendors that show workload characterized historical views of consumption – Favor vendors that can show you workload relative response times, so that your answers make sense to the business © Ron Kaminski 2013, All Rights Reserved80

81 Classic capacity planning question descriptions and proper answering techniques We started getting warnings from our automated checks: 10/03/23 CPU_SATURATION_WARNING: Windows2003 node in04sqp001 used up to % of an available 400% from 2010/03/23 at 0200 until /03/26 CPU_SATURATION_WARNING: Windows2003 node in04sqp001 used up to % of an available 400% from 2010/03/26 at 0000 until /03/27 CPU_SATURATION_WARNING: Windows2003 node in04sqp001 used up to % of an available 400% from 2010/03/27 at 0000 until /03/28 CPU_SATURATION_WARNING: Windows2003 node in04sqp001 used up to % of an available 400% from 2010/03/23 at 0300 until The evidence (here’s what the sparkline looked like): © Ron Kaminski 2013, All Rights Reserved81

82 Classic capacity planning question descriptions and proper answering techniques More evidence: © Ron Kaminski 2013, All Rights Reserved82

83 Classic capacity planning question descriptions and proper answering techniques My initial suspicions were ‘Code improvement opportunities” so I contacted my DBA pals: © Ron Kaminski 2013, All Rights Reserved83

84 Classic capacity planning question descriptions and proper answering techniques Those CPU graphs with response time increases due to CPU_Wait when they hit the “knee in the curve”: © Ron Kaminski 2013, All Rights Reserved84

85 Classic capacity planning question descriptions and proper answering techniques The answer from my DBA pals: © Ron Kaminski 2013, All Rights Reserved85

86 Classic capacity planning question descriptions and proper answering techniques What happened (the changes went in on Mar 29 th ): © Ron Kaminski 2013, All Rights Reserved86

87 Classic capacity planning question descriptions and proper answering techniques What about the charts Ron? © Ron Kaminski 2013, All Rights Reserved87

88 Classic capacity planning question descriptions and proper answering techniques Things to learn from this example: – Not all code “innovations” work as efficiently as desired SQL developed in far flung places for even farther flung places is especially suspect “When the answer is correct, the code is done”, well maybe not… – Not all innovations will go through a rigid capacity planning review You need either automated warnings or to take the time to scan thousands of graphs often to detect and correct these You need fast graphical evidence to get fast reactions – You need to go out of your way to be nice to DBAs, they will save your firm millions if you let them, and if you only ring them up when there is real evidence of mayhem Always ask their boss to praise their efforts, those memos come in handy at review time © Ron Kaminski 2013, All Rights Reserved88

89 Classic capacity planning question descriptions and proper answering techniques Many of you will be deploying virtual terminal environments to hundreds of users – What if something goes a little wrong? The evidence: © Ron Kaminski 2013, All Rights Reserved89

90 Classic capacity planning question descriptions and proper answering techniques The answer: – We started ticketing suspicious CPU consuming VMware slices on Feb 3 rd – Most of it was Bezier curve screen savers! We banned them What happened: – We got back more than half of our VMware farm! © Ron Kaminski 2013, All Rights Reserved90

91 Classic capacity planning question descriptions and proper answering techniques User quote: I was wondering if we could get the memory increased on our Exchange 2007 CAS servers USTCAX100 and USTWAX100? Right now both servers are running 4.25GB and I would like to move them to 8GB. We are seeing performance issues with those servers and we are noticing that RAM usage is at 80%-90% or higher all of the time. Users are starting to notice this with Communicator. Due to the fact that it can’t get a response quick enough from CAS, it is putting an exclamation point on the communicator alerting them to address book issues. If we are not able to increase the memory, the only other option would be to add more CAS servers in the environment to balance the load. We also are going to be increasing the load on these servers with the 2000 users we will be adding to the North America environment from the XYZ Co. acquisition and moving South American users to North America servers. Please let me know if this is feasible or not? © Ron Kaminski 2013, All Rights Reserved91

92 Classic capacity planning question descriptions and proper answering techniques The evidence: First, look to see if anything has gone wrong recently They might be reacting to a recent problem, but don’t stop there © Ron Kaminski 2013, All Rights Reserved92

93 Classic capacity planning question descriptions and proper answering techniques The evidence: Looking deeper, we don’t see a memory shortage, (there is evidence of a slight leak) paging is very low, CommitBytes isn’t anywhere near CommitLimit, but … CPU seems in short supply, and the CPU Wait component of relative response time is huge Their short term performance issue is due to CPU shortage, not memory! © Ron Kaminski 2013, All Rights Reserved93

94 Classic capacity planning question descriptions and proper answering techniques The Answer: Along with the graphs from the previous page (and getting them to address the lsass loop) we added two virtual processors to this VMware slice Note that if you disagree with their solution, give them an alternative that fixes present issues We may give them more memory later, when they’ve earned it © Ron Kaminski 2013, All Rights Reserved94

95 Classic capacity planning question descriptions and proper answering techniques What happened: The CPU Wait disappeared immediately The user’s immediate issues were solved The users now know that decisions will be based on evidence, the results will be real, and they like it! Hardware in use for a growing application will grow, but slowly © Ron Kaminski 2013, All Rights Reserved95

96 Classic capacity planning question descriptions and proper answering techniques Hey folks, there is still one more issue, with imjpmig process, the Input Method Editor, which lets you use Japanese characters. It is looping regularly: 10/01/15 LOOP_PROBLEM: 3444 running imjpmig CPU looped from Jan 15 04:59:54 until Jan 15 23:54:53 and may still be looping. 10/01/16 LOOP_PROBLEM: 3444 running imjpmig CPU looped from Jan 16 00:07:48 until Jan 16 23:54:58 and may still be looping. 10/01/21 LOOP_PROBLEM: 5344 running imjpmig CPU looped from Jan 21 13:59:59 until Jan 21 23:54:58 and may still be looping. 10/01/22 LOOP_PROBLEM: 5344 running imjpmig CPU looped from Jan 22 00:01:27 until Jan 22 23:54:56 and may still be looping. 10/01/23 LOOP_PROBLEM: 5344 running imjpmig CPU looped from Jan 23 00:01:25 until Jan 23 23:54:53 and may still be looping. I changed the workload to just highlight Input Method Editor by itself. I also found a bunch of patches available: ownloads&mkt=en- US&FORM=QBME1&l=1&refradio=0&qsc0=0 ownloads&mkt=en- US&FORM=QBME1&l=1&refradio=0&qsc0=0 © Ron Kaminski 2013, All Rights Reserved96 Sometimes your own systems detect problems, so answer in a way that provides all required information

97 Classic capacity planning question descriptions and proper answering techniques Eventually they got the fix migrated to production and everything worked fine from then on – Don’t get discouraged if folks don’t always do what you want immediately – Change controls, priority conflicts and other issues may stall the fix – With enough graphical evidence, eventually you will win! © Ron Kaminski 2013, All Rights Reserved97 What happened?

98 Classic capacity planning question descriptions and proper answering techniques Ron logs in on a Saturday to work on slides for UKCMG (“Again! And what do you get paid to do this?” asks my dear wife) and sees the following: The evidence (from my pathology detection code’s morning mail) CPU saturation found: CPU_SATURATION_WARNING: Windows2000 node ustca337 used up to % of an available 100% from 2010/03/12 at 0400 until CPU_SATURATION_WARNING: Windows2003 node ustwasbx16 used up to % of an available 100% from 2010/03/12 at 1400 until CPU_SATURATION_WARNING: Windows2003 node uktcas06 used up to % of an available 100% from 2010/03/12 at 0300 until CPU_SATURATION_WARNING: Windows2003 node ustca227 used up to % of an available 100% from 2010/03/12 at 0400 until CPU_SATURATION_WARNING: Windows2003 node ustca724 used up to % of an available 100% from 2010/03/12 at 0400 until CPU_SATURATION_WARNING: Windows2003 node ustcas44 used up to % of an available 100% from 2010/03/12 at 0400 until CPU_SATURATION_WARNING: Windows2003 node ustcas54 used up to % of an available 100% from 2010/03/12 at 0400 until CPU_SATURATION_WARNING: Windows2003 node ustca088 used up to % of an available 100% from 2010/03/12 at 0800 until © Ron Kaminski 2013, All Rights Reserved98

99 Classic capacity planning question descriptions and proper answering techniques The evidence continued – Whenever a whole bunch of bad things happen synchronized over many machines, think global tool © Ron Kaminski 2013, All Rights Reserved99

100 Classic capacity planning question descriptions and proper answering techniques The evidence continued – Whenever a whole bunch of bad things happen synchronized over many machines, think global tool © Ron Kaminski 2013, All Rights Reserved100

101 Classic capacity planning question descriptions and proper answering techniques © Ron Kaminski 2013, All Rights Reserved101  This is really bad news, a critical Business Sensitive / Critical production server doing its normal real sqlservr workload with a Tool process going on a CPU binge and causing excessive response times due to CPU_Wait

102 Classic capacity planning question descriptions and proper answering techniques The answer – A new piece of monitoring code was installed BREAKING THE NO NEW CODE INSTALLS ON A FRIDAY rule! What happened – The code creator had deployed a new script, and he reviewed it after getting mail about all of the warnings: ”This was a bug in a script update that I made; we should be seeing this behavior on most of the attached server list. ______ is pushing out an update to the script now; once this is done we’ll have to log into each of the affected servers, verify the looping process is running sqlcheck.vbs, and kill it.” – We were able to swiftly detect and fix the issue How would your site do this? © Ron Kaminski 2013, All Rights Reserved102

103 Classic capacity planning question descriptions and proper answering techniques What we saw: – We started getting Commit_Bytes approaching Commit_Limit warnings: 10/04/05 COMMIT_BYTES_PROBLEM: Commit Bytes were within 80% of Commit Limit from Apr 5 18:00:00 until Apr 5 23:59:00 and may still be. 10/04/06 COMMIT_BYTES_PROBLEM: Commit Bytes were within 80% of Commit Limit from Apr 6 00:00:00 until Apr 6 23:59:00 and may still be. 10/04/07 COMMIT_BYTES_PROBLEM: Commit Bytes were within 80% of Commit Limit from Apr 7 00:00:00 until Apr 7 23:59:00 and may still be. 10/04/09 COMMIT_BYTES_PROBLEM: Commit Bytes were within 80% of Commit Limit from Apr 9 00:00:00 until Apr 9 23:59:00 and may still be. 10/04/10 COMMIT_BYTES_PROBLEM: Commit Bytes were within 80% of Commit Limit from Apr 10 00:00:00 until Apr 10 23:59:00 and may still be. 10/04/11 COMMIT_BYTES_PROBLEM: Commit Bytes were within 80% of Commit Limit from Apr 11 00:00:00 until Apr 11 23:59:00 and may still be. 10/04/12 COMMIT_BYTES_PROBLEM: Commit Bytes were within 80% of Commit Limit from Apr 12 00:00:00 until Apr 12 23:59:00 and may still be. 10/04/13 COMMIT_BYTES_PROBLEM: Commit Bytes were within 80% of Commit Limit from Apr 13 00:00:00 until Apr 13 23:59:00 and may still be. © Ron Kaminski 2013, All Rights Reserved103

104 Classic capacity planning question descriptions and proper answering techniques We investigated, seeing rising total memory: © Ron Kaminski 2013, All Rights Reserved104

105 Classic capacity planning question descriptions and proper answering techniques The evidence, memory by user: © Ron Kaminski 2013, All Rights Reserved105

106 Classic capacity planning question descriptions and proper answering techniques The evidence, memory by leaking process: © Ron Kaminski 2013, All Rights Reserved106

107 Classic capacity planning question descriptions and proper answering techniques The evidence, for the spreadsheet inclined: © Ron Kaminski 2013, All Rights Reserved107

108 Classic capacity planning question descriptions and proper answering techniques The answer: – Clearly this application has a jlaunch process (run by the SAPServicePRG user) memory leak – You have two options: Get them to patch/fix the application, or Get them to reboot the machine periodically so that you don’t start paging hard and affect performance – So you notify the project leader: Hi all, If you look at memory usage over the last few months on these three severs, you’ll see steady and/or repeating ramps. This leads eventually to warnings like these: COMMIT_BYTES_PROBLEM: On ustca146, Commit Bytes were within 80% of Commit Limit from Apr 6 00:00:00 until Apr 6 23:59:00 and may still be. COMMIT_BYTES_PROBLEM: On ustca147, Commit Bytes were within 80% of Commit Limit from Apr 6 00:00:00 until Apr 6 23:59:00 and may still be. COMMIT_BYTES_PROBLEM: On ustca148, Commit Bytes were within 80% of Commit Limit from Apr 6 00:00:00 until Apr 6 23:59:00 and may still be. …and after that, when commit bytes hits commit limit, you can experience rather severe application slowdowns. In every case, the major rising memory consumer seems to be jlaunch processes run by SAPServicePRG. Most recently: PID 6160 on ustca146 started Mar 2 20:54:58 PID 3772 on ustca147 started Mar 2 20:54:50 PID 8032 on ustca148 started Mar 2 20:54:56 Could someone take a look at these to see if a fix is possible? If not, could we recycle these jlaunch processes, perhaps weekly, to keep memory usage down? Thanks for looking! © Ron Kaminski 2013, All Rights Reserved108

109 Classic capacity planning question descriptions and proper answering techniques What happened: Hi Ron, Thank you for keeping an eye on these servers! You are right, there is a steady growth of memory usage by the SAP PRG processes on these application servers. This is not a surprise. There are several known issues regarding memory leaks with the current version of the Java hibernate libraries being used in the fake_name application and old fake_product. We have worked with the application vendor, fake_name, to resolve some of the more significant issues that were causing regular outages. Fake_vendor has not resolved some of the less-severe issues. There are plans to upgrade the entire application suite and change the underlying application execution platform from fake_product to new fake product. The application upgrade includes new libraries for hibernate, and the memory leak issues related to hibernate with fake_product have not appeared in new fake product. The landscape upgrade is currently scheduled for June. We will go ahead and schedule a recycle of the old fake product to recycle the Jlaunch processes you mentioned below. We will schedule regular process recycles until the system is upgraded. Please let me know if you have any additional questions or concerns. Thank you! © Ron Kaminski 2013, All Rights Reserved109

110 Classic capacity planning question descriptions and proper answering techniques What happened: – Memory leaks, key points to remember Graphics help get their attention, CSV files are there for the whackos who demand the real data – Sometimes they say that they need it “to prove to the vendor” » Believe me, the vendor usually knows all too well… – It is easy to do and nips their evasions in the bud – Remember the “stall techniques”? Sometimes they can’t, or aren’t, going to fix it – Welcome to big corporations and “priorities” – Then you need to get them to reboot periodically to get the leaked memory back Do you have the graphs and data quickly available to discover, document and communicate this? © Ron Kaminski 2013, All Rights Reserved110

111 We have this really cool way to see all of the server’s disk space for the last 90 days © Ron Kaminski 2013, All Rights Reserved111

112 Classic capacity planning question descriptions and proper answering techniques The evidence: Subject: Possible disk space issue looming on ustca479 Hi All, Here is a view of total disk space and disk space used on ustca479: Perhaps some purge/delete/cleanup is in order? Ron Kaminski © Ron Kaminski 2013, All Rights Reserved112

113 Classic capacity planning question descriptions and proper answering techniques The answer: Subject: RE: Possible disk space issue looming on ustca479 Ron, Thank you for the heads up. The increased disk space utilization is partially due to enhanced logging that we have enabled over the past few months. I have cleaned up some old logs and we will continue to monitor the disk utilization to determine if additional disk space is required. Thanks, Matt © Ron Kaminski 2013, All Rights Reserved113

114 Classic capacity planning question descriptions and proper answering techniques What happened: Well, It was a start! But alas, note the inexorable rise beginning again after the clean up. © Ron Kaminski 2013, All Rights Reserved114

115 Classic capacity planning question descriptions and proper answering techniques The best way to deal with these is to avoid them proactively by making great, workload characterized consumption information available to all – Train your firm to use the capacity reporting and pathology detection systems You have automated pathology detection, all the way through ticketing issues, haven’t you? – Think graphics, not tables of numbers – If only a secret club know the capacity data, you are making a big mistake – Train OS support folks to use the “What if…?” models © Ron Kaminski 2013, All Rights Reserved115

116 Summary Proper capacity planning isn’t just running a product that produces screens of output to check off against a requirements list – Proper capacity planning data, properly presented, is useful to all strata of IT and even the end users While it does require investment and knowledge, the rewards can be immense Keep coming to CMG to stay current and see past the hype! – Get better at this than me and get up here and share how you do it with everyone else! – Write a paper for next years conference! © Ron Kaminski 2013, All Rights Reserved116

117 © Ron Kaminski 2013, All Rights Reserved117 The Book Shelf The Visual Display of Quantitative Information, by Edward R. Tufte, published by The Graphics Press – You will never look at vendor resource consumption graphics products the same way again – Join me in the “Ban 3-D and bouncing lines with oversize dots to represent sampled quantities” movement! – If you don’t already have it, run from the room right now and go buy it, it is that good Envisioning Information, by Edward R. Tufte, published by The Graphics Press – More great ways to think about human perception of numbers Handouts from any of Tufte’s lectures

118 © Ron Kaminski 2013, All Rights Reserved118 The Book Shelf Learning Perl, by Schwartz & Christiansen, published by O’Reilly & Associates Inc., the llama book ( Learn perl in a few cross country flights!) Programming Perl, by Wall, Christiansen & Orwant, published by O’Reilly & Associates Inc., the camel book (This book is a must have) Graphics Programming with Perl, by Verbruggen, published by Manning Publications, (This book really helped me a lot when I decided to make my own reporting system, check out the online sample chapters)

119 © Ron Kaminski 2013, All Rights Reserved119 The Book Shelf Cascading Style Sheets, The Definitive Guide, by Meyer, published by O’Reilly & Associates Inc., the salmon book HTML & XHTML, The Definitive Guide, Musciano & Kennedy, published by O’Reilly & Associates Inc., the koala book Perl Graphics Programming, Wallace, published by O’Reilly & Associates Inc., the colubus monkey book CGI Programming with Perl, Guelich, Gundavaram & Birzniks, published by O’Reilly & Associates Inc., the mouse book

120 © Ron Kaminski 2013, All Rights Reserved120 The Book Shelf New Operating Systems on your plate? – Windows® Internals: Including Windows Server 2008 and Windows Vista, Fifth Edition, Russinovitch, Solomon & Ionescu, published by Microsoft Press; 5 edition (June 17, 2009) – Unix Power Tools, Third Edition, Powers, Peek, O’Reilly & Loukides, published by O’Reilly & Associates Inc., the power drill book

121 © Ron Kaminski 2013, All Rights Reserved121 General Questions? Rules: – No “Which vendor…” questions! All vendors do great things, often in different ways, and in ways that change over time. Effective use comes from deep understanding of their methods. CMG is a great place to ask the vendors those questions and keep current! That is what the nightly drinking time is for! ;^)) – No “Which client did that…” questions! They may be in the audience!

122 © Ron Kaminski 2013, All Rights Reserved122 Stump The Modeler! I’ve dumped a lot of material on you today, and some among you have some great questions that may bring a nuance into focus – Ask them! Or you might see things differently than I do – Write your own course! I’ll come and see it! Make sure to use a lot of graphs!

123 © Ron Kaminski 2013, All Rights Reserved123 Oh Boy! Legalese Any process names, product names, lyrics, trademarks or commercial products mentioned are the property of their respective owners All opinions expressed are those of the author, not any of the author’s present or past employers Any ideas from this paper implemented by the reader are done at their own risk. The author and/or his present or past employers assume no liability or risk arising from activities suggested in this paper. Work safe, and have a good time!

124 Give Blood! I have, regularly, since I turned 16, every 8 weeks, it is good for you and society! – –

125 Thank You So Much For Listening! Write A Paper For CMG!


Download ppt "Linux, and Windows and VMWare, oh my! By Ron Kaminski"

Similar presentations


Ads by Google