Scalability Tools: Automated Testing (30 minutes) Overview Hooking up your game  external tools  internal game changes Applications & Gotchas  engineering,

Scalability Tools: Automated Testing (30 minutes) Overview Hooking up your game  external tools  internal game changes Applications & Gotchas  engineering, QA, operations  production & management Summary & Questions

Review: controlled tests & actionable results useful for many purposes (1) Repeatable tests, using N synchronized game clients (2) High-level, actionable reports for many audiences ProgrammerDevelopment DirectorExecutive Test Game

Handout notes: automated testing is a strong tool for large-scale games!  Pushbutton, large-scale, repeatable tests  Benefit  Accurate, repeatable measurable tests during development and operations  Stable software, faster, measurable progress  Base key decisions on fact, not opinion  Augment your team’s ability to do their jobs, find problems faster  Measure / change / measure: repeat  Increased developer efficiency is key  Get the game out the door faster, higher stability & less pain

Handout notes: more benefits of automated testing  Comfort and confidence level  Managers/Producers can easily judge how development is progressing  Just like bug count reports, test reports indicate overall quality of current state of the game  Frequent, repeatable tests show progress & backsliding  Investing developers in the test process helps prevent QA vs. Development shouting matches  Smart developers like numbers and metrics just as much as producers do  Making your goals – you will ship cheaper, better, sooner  Cheaper – even though initial costs may be higher, issues get exposed when it’s cheaper to fix them (and developer efficiency increases)  Better – robust code  Sooner – “it’s ok to ship now” is based on real data, not supposition

Automated testing accelerates large-scale game development & helps predictability Time Target Launch Project Start TSO case study: developer efficiency Strong test support Weak test support Oops Ship Date Time % Complete autoTest Better game earlier

Measurable targets & projected trends give you actionable progress metrics, early enough to react Any test (e.g. # clients) Target Time Any Time (e.g. Alpha) First Passing Test Now Oops

Success stories  Many game teams work with automated testing  EA, Microsoft, any MMO, …  Automated testing has many highly successful applications outside of game development  Caveat: there are many ways to fail…

How to succeed  Plan for testing early  Non-trivial system  Architectural implications  Fast, cheap test coverage is a major change in production, be willing to adapt your processes  Make sure the entire team is on board  Deeper integration leads to greater value  Kearneyism: “make it easier to use than not to use”

Automated testing components Test Manager Test Selection/Setup Control N Clients RT probes Any Game Startup & Control Scriptable Test Client(s) Emulated User Play Session(s) Multi-client synchronization Repeatable, Sync’ed Test I/O Report Manager Raw Data Collection Aggregation / Summarization Alarm Triggers Collection & Analysis

Input systems for automated testing algorithmic recorders scripted Game code Multiple test applications are required, but each input type differs in value per application. Scripting gives the best coverage.

Hierarchical automated testing unit system subsystem Faster ways to work with each level of code Incremental testing avoids noise & speeds defect isolation Multiple levels of testing gives you

Handout notes: Input systems for automated testing  Multiple forms of input sources  Multiple sets of test types & requirements  Make sure the input technology you pick matches the test types you need  Cost of systems, types of testing required, support, cross-team needs, …  A single, data-driven autoTest system is the usually the best option

Handout notes: Input sources (algorithmic)  Powerful & low development cost  Exploits game semantics for test coverage  Highly useful for some test types, but limited verification  E.g.: for each : CreateNewAvatar  for each BuyAndPlaceAllObjects for each UseAllActions,  Broad, shallow test of all object-based content  Combine with automated errorManagers to increase verification, and/or currentObject.selfTest()

Handout notes: Input (recorders)  Internal event pump / external UI actions  Both are brittle to maintain  Neither can effectively support load or multi-client synchronization, and are limited for regression testing  Best use: capturing defects that are hard to reproduce, effective in overnight random testing of builds & some play testing  Semantic recorders are much less brittle and more useful

Input (Scripted Test Clients) Command steps … Validation steps … Pseudo-code script of users play the game, and what the game should do in response createAvatar [sam] enterLevel 99 buyObject knife attack [opponent] checkAvatar [sam exists] checkLevel 99 [loaded] checkInventory [knife] checkDamage [opponent]

Handout notes: scripted test clients  Scripts are emulated play sessions: just like somebody plays the game  Command steps: what the player does to the game  Validation steps: what the game should do in response  Scripted clients are flexible & powerful  Use for many different test types  Quick & easy to write tests  Easy for non-engineers to understand & create

Handout notes: scripted test clients  Scriptable test clients  Lightweight subset of the shipping client  Instrumented – spits out lots of useful information  Repeatable  Embedded “automated debugging” support helps you understand the test results  Log both server and client output (common format), w/timestamps!  Automated metrics: collection & aggregation  High level “at a glance” reports with detail drill down  Build in support for hung clients & triaging failures

Handout notes: scripted test client  Support costs: one (data driven) client better than N test systems  Tailorable validation output is a very powerful construct  Each test script contains required validation steps (flexible, tunable, …)  Minimize state to regress against == fewer false positives  Presentation layer tip: build a spreadsheet of key word/actions used by your manual testers, automate the most common/expensive

Test Client (Null View) Game Client Scripted Players: Implementation Script Engine State Game GUI Game Logic Commands State Presentation Layer Or, load both

Test Client Test-specific input & output via a data- driven test client gives maximum flexibility Regression Load Input API Reusable Scripts & Data Output API Script-Specific Logs & Metrics Key Game States Pass/Fail Responsiveness

A Presentation Layer is often unique to a game NullView Client  Some automation scripts should read just like QA test scripts for your game  TSO examples  routeAvatar, useObject  buyLot, enterLot  socialInteraction (makeFriends, chat, …)

Handout notes: Scriptable tailorable for many applications: engineering, QA and management  Unit testing: 1 feature = 1 script  Load testing: Representative play session, times 1,000s  Make sure your servers work, before the players do  Integration: test code changes for catastrophic failures  Build stability: quickly find problems and verify the fix  Content testing: exhaustive analysis of game play to help tuning and ensure all assets are correctly hooked up and explore edge cases  Multi-player testing: engineers and QA can test multi-player game code without requiring multiple manual testers  Performance & compatibility testing: repeatable tests across a broad range of hardware gives you a precise view of where you really are  Project completeness: how many features pass their core functionality tests; what are our current FPS, network lag and bandwidth numbers, …

Input (data sets) Mock data Repeatable tests in development, faster load, edge conditions  Unpredictable user element finds different bugs Real data  Random Edge cases & real world performance Repeatable Debugging & benchmarking  

Input (client synchronization) RemoteCommand (x) Ordered actions to clients  Most realistic & flexible waitUntil ( localStateChange )  waitFor (time) Brittle, less reproducible 

Common Gotchas  Not designing for testability  Retrofitting is expensive  Blowing the implementation  Brittle code  Addressing perceived needs, not real needs  Use automated testing incorrectly  Testing the wrong thing @ the wrong time  Not integrating with your processes  Poor testing methodology

Testing the wrong time at the wrong time Applying detailed testing while the game design is still shifting and the code is still incomplete introduces noise and the need to keep re-writing tests Build Acceptance Tests (BAT)  Stabilize the critical path for your team  Keep people working by keeping critical things from breaking Final Acceptance Tests (FAT)  Detailed tests to measure progress against milestones  “Is the game done yet?” tests need to be phased in

More gotchas: poor testing methodology & tools  Case 1: recorders  Load & regression were needed; not understanding maintenance cost  Case 2: completely invalid test procedures  Distorted view of what really worked (GIGO)  Case 3: poor implementation planning  Limited usage (nature of tests led to high test cost & programming skill required)  Case 4: not adapting development processes  Common theme: no senior engineering analysis committed to the testing problem

Handout notes: more gotchas  Automating too late, or too much detail too early  No ability to change the development process of the game  Not having ways to measure the effects compared to no automation  People and processes are funny things  Sometimes the process is changed, and sometimes your testing goals have to shift  Games differ: a lot  autoTest approaches will vary across games

Handout notes: BAT vs FAT  Feature drift == expensive test maintenance  Code is built incrementally: reporting failures nobody is prepared to deal with yet wastes everybody’s time  Automated testing is a new tool, new concept: focus on a few areas first, then measure, improve, iterate

Automated Testing for Online Games Overview Hooking up your game  external tools  internal game changes Applications  engineering, QA, operations  production & management Summary & Questions

Handout notes: Applying automated testing  Know what is automation good / not good at & play to its strengths  Change your processes around it  Establish clear measures, iteratively improve  Make sure everybody can use it & has bought into it  Tests become a form of communication

“The difference between us and a computer is that the computer is blindingly stupid, but it is capable of being stupid many, many millions of times a second.” Douglas Adams (1997 SCO Forum)  Repeat massive numbers of simple, easily measurable tasks  Mine the results  Do all the above, in parallel, for rapid iteration Automated testing: strengths

 Automation breaks down as individual test complexity increases  Repeating simple tests hundreds of times and combining the results is far easier to maintain and analyze than using long, complex tests, and parallelism allows a dramatically accelerated test cycle Handout notes: autoTest complexity

Semi-automated testing is best for game development Testing Requirements  Rote work (“does door108 still open?”)  Scale  Repeatability  Accuracy  Parallelism Integrate the two for best impact Automation  Creative bug hunting, visuals  Judgment calls, playability  Reacting to change,  Evaluating autoTest results Manual Testing

Handout notes: Semi-automated testing  Automation: simple tasks (repetitive or large-scale)  Load @ scale  Workflow & information management  Regression All weapon damage / broad, shallow feature coverage / …  Integrated automated & manual testing  Tier 1 / Tier 2: automation flags potential errors, manual investigates  Within a single test: automation snapshots key game states, manual evaluates results  Augmented / accelerated: complex build steps, full level play thru, …

Plan your attack with stakeholders (retire risk early: QA, Production, Management)  Tough shipping requirements (e.g.)  Scale, reliability  Regression costs  Development risk  Cost / risk of engineering & debugging  Impact on content creation  Management risk  Schedule predictability & visibility

Handout notes: plan your attack  What are the big costs & risks on your project?  Technology development (e.g., scalable servers)  Breadth of content to be regressed, frequency of regressions  Your development team is significantly handicapped without automated tests & multi- client support: focus on production support to start  Often, sufficient machines & QA testers not available  Run-time debugging of networked games often becomes post-mortem debugging: slower & harder

Factors to consider Test applications Sub system Game logic Graphics Unit Full system Test characteristics Frequency of use Repeatable / random Creation & maintenance $$ Execution $$ Manual $$ Overlap w/other tests

Handout notes: design factors  Test overlap & code coverage  Cost of running the test (graphics high, logic/content low) vs frequency of test need  Cost of building the test vs manual cost (over time)  Maintenance cost of the test suites, the test system, & churn rate of the game code

Automation focus areas (Larry’s “top 5”) Performance Scale is hard to get right  Critical path stability Keep team going forward  Non-determinism Gets in the way of everything  Content regression Massive, recurring $$  Compatibility & install Improves life for you & user 

Handout notes: automation focus areas (recommendations)  Full system scale/stability testing  Multi-client & server code must always function, or the team slows down  Hardest part to get right (and to debug) when running live players  Scale will screw you, over and over again…  Non-determinism  Difficultly in debugging slows development and hurts system reliability  Content regression  Build stability  Complex systems & large development teams require extra care to keep running smoothly, or you’ll pay the price in slower development and more antacids  And for some systems, compatibility testing or installer testing  A data-driven system is very important: you can cover all the above with one test system

Yikes, that all sounds very expensive!  Yes, but remember, the alternative costs are higher and do not always work  Costs of QA for a 6 player game – you need at least 6 testers at the same time  Testers  Consoles, TVs and disks & network  Non-determinism  MMO regression costs: yikes 2  10s to 100s of testers  10 year code life cycle  Constant release iterations

Checkin Build Smoke Regression Development Unstable builds are expensive & slow down your entire team! Repeated cost of detection & validation Firefighting, not going forward Impact on others Dev Servers Feedback takes hours (or days) Bug introduced

Stability: keep the team working! (TSO use case: critical path analysis) Failures on the Critical Path block access to much of the game enter_house () Test Case: Can an Avatar Sit in a Chair? use_object () buy_object () buy_house () create_avatar () login ()

Checkin Development Prevent critical path code breaks that take down your team Sniff Test Pass / fail, diagnostics Candidate code Safe code

Stability & non-determinism (monkey tests) Code Repository Compilers Continual Repetition of Critical Path Unit Tests Reference Servers

Handout notes: build stability  Poor build stability slows forward progress (especially the critical path)  People are blocked from getting work done  Uncertainty: did I bust it, or did it just ‘happen’?  A lot of developers just didn’t get non-determinism  Backsliding: things kept breaking  Monkey Tests: “always current” baseline for developers  Common measuring stick across builds & deployments extremely valuable  Monkey tests rock!  Instant trip wire of problems & focusing device  Server aging: fill the pipes, get some buffers dirty  Keeps wheels in motion while developers use those servers  Accurate measure of race condition bugs

Build stability & full testing: comb filtering New code Sniff Test, Monkey Tests - Fast to run - Catch major errors - Keeps coders working $ Full system build Smoke Test, Server Sniff - Is the game playable? - Are the servers stable under a light load? - Do all key features work? $$ Promotable to full testing Full Feature Regression, Full Load Test - Do all test suites pass? - Are the servers stable under peak load conditions? Promotable $$$ Cheap tests to catch gross errors early in the pipeline More expensive tests only run on known functional builds

Handout notes: build stability  Much faster progress after stability checkers added  Sniff  Hourly reference tests (sniff monkey, unit monkey)  Comb filters kept the manpower overhead low (on both sides), and gave quick feedback  Fewer redos for engs, fewer bugs for QA to find&process  Size of team gives high broken build cost  Fewer side-effect bugs  …

Handout notes: dealing with stability  Hourly stability checkers (monkey tests)  Aging (dirty processes, growing datasets, leaking memory)  Moving parts (race conditions)  Stability measure: what works, right now?  Flares go off, etc  Unit tests (against Features)  Minimal noise / side effects  Reference point: what should work?  Clarity in reporting / triaging

Handout notes: non-determinism is a big risk factor in online development  Race conditions, dirty buffers, shared state, …  Developers test with a single client against a single server: no chance to expose race conditions  Fuzzy data views over networked connections further complicates implementation & debugging  Real-time debugging is replaced with post- mortem analysis

Handout notes: the effects of non-determinism  Multiple CPUs / players greatly complicates development & testing, while also increasing system complexity  You can’t reliably reproduce bugs  Near-infinite code path coverage + variable latency & transactions over time introduce massive code complexity, very hard to get right  Also hard to test edge cases or broad coverage  Each test can execute differently over any run

AutoTest addresses non-determinism  Detection & reproduction of race condition defects  Even low probability errors are exposed with sufficient testing (random, structured, load, aging)  Measurability of race condition defects  Occurs x% of the time, over 400x test runs

Monkey test: enterLot ()

Monkey test: 3 * enterLot ()

Four different behaviors in thirty runs!

Handout notes: non-deterministic failures  30 test runs, 4 behaviours  Successful entry  Hang or Crash  Owner evicted, all possessions stolen  Random results observed in all major features  “Critical Path” random failures outside of Unit Tests very difficult to track

Content testing (areas)  Regression  Error detection  Balancing / tuning  This topic is a tutorial in and of itself  Content regression is a huge cost problem  Many ways to automate it (algorithmic, scripted & combined, …)  Differs wildly across game genres

Content testing (more examples)  Light mapping, shadow detection  Asset correctness / sameness  Compatibility testing  Armor / damage  Class balances  Validating against old userData  … (unique to each game)

Load testing, before paying customers show up Expose issues that only occur at scale Establish hardware requirements Establish play is acceptable @ scale

Handout notes: some examples of things caught with load testing  Non-scalable algorithms  Server-side dirty buffers  Race conditions  Data bloat & clogged pipes  Poor end-user performance @ scale  … you never really know what, but something will always go “spang!” @ scale…

Load testing catches non-scalable designs Global data (SP) all data is always available & up to date Scalability is hard: shared data grows with #players, AI, objects, terrain, …, & more bugs! Global data (MP) shared data must be packaged, transmitted, unpackaged, and constantly refreshed Local data

Handout notes: why you need load testing  SP: all information is always available  MP: shared information must be packaged, transmitted and unpackaged  Each step costs CPU & bandwidth, and can happen 10’s to 100’s of times per minute  May also cause additional overhead (e.g. DB calls)  Scalability is key: many shared data structures grow with the number of players, AI, objects, terrain, …  Caution: early prototypes may be cheap enough, but as game progresses, costs may explode

Handout notes: why you need load testing  Case 1, initial design: Transmit entire lotList to all connected clients, every 30 seconds  Initial fielding: no problem  Development testing: < 1,000 Lots, < 10 clients  Complete disaster as clients & DB scaled  Shipping requirements: 100,000 Lots, 4,000 clients  DO THE MATH BEFORE CODING  LotElementSize * LotListSize * NumClients  20 Bytes * 100,000 * 4,000  8,000,000,000 Bytes, TWICE per minute!!

22,000,000 DS Queries! 7,000 next highest Load testing: find poor resource utilization

Load: test both client & server behaviors

Handout notes: automated data mining / triage  Test results: Patterns of failures  Bug rate to source file comparison  Easy historical mining & results comparison  Triage: debugging aids that extract RT data from the game  Timeout & crash handlers  errorManagers  Log parsers  Scriptable verification conditions

Automated Testing for Online Games (One Hour) Overview Hooking up your game  external tools  internal game changes Applications  engineering, QA, operations  production & management Summary & Questions

Summary: automated testing  Start early & make it easy to use  Strongly impacts your success  The bigger & more complex your game, the more automated testing you need  You need commitment across the team  Engineering, QA, management, content creation

Q&A & other resources  My email: larry.mellon_@_emergent.netlarry.mellon_@_emergent.net  More material on automated testing for games  http://www.maggotranch.com/mmp.html http://www.maggotranch.com/mmp.html  Last year’s online engineering slides  This year’s slides  Talks on automated testing & scaling the development process  www.amazon.com: “Massively Multiplayer Game Development II” www.amazon.com  Chapters on automated testing and automated metrics systems  www.gamasutra.com: Dag Frommhold, Fabian Röken www.gamasutra.com  Lengthy article on applying automated testing in games  Microsoft: various groups & writings  From outside the gaming world  Kent Beck: anything on test-driven development  http://www.martinfowler.com/articles/continuousIntegration.ht ml#id108619: Continual integration testing http://www.martinfowler.com/articles/continuousIntegration.ht ml#id108619  Amazon & Google: inside & outside our industry

Scalability Tools: Automated Testing (30 minutes) Overview Hooking up your game  external tools  internal game changes Applications & Gotchas  engineering,

Similar presentations

Presentation on theme: "Scalability Tools: Automated Testing (30 minutes) Overview Hooking up your game  external tools  internal game changes Applications & Gotchas  engineering,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scalability Tools: Automated Testing (30 minutes) Overview Hooking up your game  external tools  internal game changes Applications & Gotchas  engineering,

Similar presentations

Presentation on theme: "Scalability Tools: Automated Testing (30 minutes) Overview Hooking up your game  external tools  internal game changes Applications & Gotchas  engineering,"— Presentation transcript:

Similar presentations

About project

Feedback