Download presentation
Presentation is loading. Please wait.
Published byMeryl Boyd Modified over 8 years ago
1
Scalability Tools: Automated Testing (30 minutes) Overview Hooking up your game external tools internal game changes Applications & Gotchas engineering, QA, operations production & management Summary & Questions
2
Review: controlled tests & actionable results useful for many purposes (1) Repeatable tests, using N synchronized game clients (2) High-level, actionable reports for many audiences ProgrammerDevelopment DirectorExecutive Test Game
3
Handout notes: automated testing is a strong tool for large-scale games! Pushbutton, large-scale, repeatable tests Benefit Accurate, repeatable measurable tests during development and operations Stable software, faster, measurable progress Base key decisions on fact, not opinion Augment your team’s ability to do their jobs, find problems faster Measure / change / measure: repeat Increased developer efficiency is key Get the game out the door faster, higher stability & less pain
4
Handout notes: more benefits of automated testing Comfort and confidence level Managers/Producers can easily judge how development is progressing Just like bug count reports, test reports indicate overall quality of current state of the game Frequent, repeatable tests show progress & backsliding Investing developers in the test process helps prevent QA vs. Development shouting matches Smart developers like numbers and metrics just as much as producers do Making your goals – you will ship cheaper, better, sooner Cheaper – even though initial costs may be higher, issues get exposed when it’s cheaper to fix them (and developer efficiency increases) Better – robust code Sooner – “it’s ok to ship now” is based on real data, not supposition
5
Automated testing accelerates large-scale game development & helps predictability Time Target Launch Project Start TSO case study: developer efficiency Strong test support Weak test support Oops Ship Date Time % Complete autoTest Better game earlier
6
Measurable targets & projected trends give you actionable progress metrics, early enough to react Any test (e.g. # clients) Target Time Any Time (e.g. Alpha) First Passing Test Now Oops
7
Success stories Many game teams work with automated testing EA, Microsoft, any MMO, … Automated testing has many highly successful applications outside of game development Caveat: there are many ways to fail…
8
How to succeed Plan for testing early Non-trivial system Architectural implications Fast, cheap test coverage is a major change in production, be willing to adapt your processes Make sure the entire team is on board Deeper integration leads to greater value Kearneyism: “make it easier to use than not to use”
9
Automated testing components Test Manager Test Selection/Setup Control N Clients RT probes Any Game Startup & Control Scriptable Test Client(s) Emulated User Play Session(s) Multi-client synchronization Repeatable, Sync’ed Test I/O Report Manager Raw Data Collection Aggregation / Summarization Alarm Triggers Collection & Analysis
10
Input systems for automated testing algorithmic recorders scripted Game code Multiple test applications are required, but each input type differs in value per application. Scripting gives the best coverage.
11
Hierarchical automated testing unit system subsystem Faster ways to work with each level of code Incremental testing avoids noise & speeds defect isolation Multiple levels of testing gives you
12
Handout notes: Input systems for automated testing Multiple forms of input sources Multiple sets of test types & requirements Make sure the input technology you pick matches the test types you need Cost of systems, types of testing required, support, cross-team needs, … A single, data-driven autoTest system is the usually the best option
13
Handout notes: Input sources (algorithmic) Powerful & low development cost Exploits game semantics for test coverage Highly useful for some test types, but limited verification E.g.: for each : CreateNewAvatar for each BuyAndPlaceAllObjects for each UseAllActions, Broad, shallow test of all object-based content Combine with automated errorManagers to increase verification, and/or currentObject.selfTest()
14
Handout notes: Input (recorders) Internal event pump / external UI actions Both are brittle to maintain Neither can effectively support load or multi-client synchronization, and are limited for regression testing Best use: capturing defects that are hard to reproduce, effective in overnight random testing of builds & some play testing Semantic recorders are much less brittle and more useful
15
Input (Scripted Test Clients) Command steps … Validation steps … Pseudo-code script of users play the game, and what the game should do in response createAvatar [sam] enterLevel 99 buyObject knife attack [opponent] checkAvatar [sam exists] checkLevel 99 [loaded] checkInventory [knife] checkDamage [opponent]
16
Handout notes: scripted test clients Scripts are emulated play sessions: just like somebody plays the game Command steps: what the player does to the game Validation steps: what the game should do in response Scripted clients are flexible & powerful Use for many different test types Quick & easy to write tests Easy for non-engineers to understand & create
17
Handout notes: scripted test clients Scriptable test clients Lightweight subset of the shipping client Instrumented – spits out lots of useful information Repeatable Embedded “automated debugging” support helps you understand the test results Log both server and client output (common format), w/timestamps! Automated metrics: collection & aggregation High level “at a glance” reports with detail drill down Build in support for hung clients & triaging failures
18
Handout notes: scripted test client Support costs: one (data driven) client better than N test systems Tailorable validation output is a very powerful construct Each test script contains required validation steps (flexible, tunable, …) Minimize state to regress against == fewer false positives Presentation layer tip: build a spreadsheet of key word/actions used by your manual testers, automate the most common/expensive
19
Test Client (Null View) Game Client Scripted Players: Implementation Script Engine State Game GUI Game Logic Commands State Presentation Layer Or, load both
20
Test Client Test-specific input & output via a data- driven test client gives maximum flexibility Regression Load Input API Reusable Scripts & Data Output API Script-Specific Logs & Metrics Key Game States Pass/Fail Responsiveness
21
A Presentation Layer is often unique to a game NullView Client Some automation scripts should read just like QA test scripts for your game TSO examples routeAvatar, useObject buyLot, enterLot socialInteraction (makeFriends, chat, …)
22
Handout notes: Scriptable tailorable for many applications: engineering, QA and management Unit testing: 1 feature = 1 script Load testing: Representative play session, times 1,000s Make sure your servers work, before the players do Integration: test code changes for catastrophic failures Build stability: quickly find problems and verify the fix Content testing: exhaustive analysis of game play to help tuning and ensure all assets are correctly hooked up and explore edge cases Multi-player testing: engineers and QA can test multi-player game code without requiring multiple manual testers Performance & compatibility testing: repeatable tests across a broad range of hardware gives you a precise view of where you really are Project completeness: how many features pass their core functionality tests; what are our current FPS, network lag and bandwidth numbers, …
23
Input (data sets) Mock data Repeatable tests in development, faster load, edge conditions Unpredictable user element finds different bugs Real data Random Edge cases & real world performance Repeatable Debugging & benchmarking
24
Input (client synchronization) RemoteCommand (x) Ordered actions to clients Most realistic & flexible waitUntil ( localStateChange ) waitFor (time) Brittle, less reproducible
25
Common Gotchas Not designing for testability Retrofitting is expensive Blowing the implementation Brittle code Addressing perceived needs, not real needs Use automated testing incorrectly Testing the wrong thing @ the wrong time Not integrating with your processes Poor testing methodology
26
Testing the wrong time at the wrong time Applying detailed testing while the game design is still shifting and the code is still incomplete introduces noise and the need to keep re-writing tests Build Acceptance Tests (BAT) Stabilize the critical path for your team Keep people working by keeping critical things from breaking Final Acceptance Tests (FAT) Detailed tests to measure progress against milestones “Is the game done yet?” tests need to be phased in
27
More gotchas: poor testing methodology & tools Case 1: recorders Load & regression were needed; not understanding maintenance cost Case 2: completely invalid test procedures Distorted view of what really worked (GIGO) Case 3: poor implementation planning Limited usage (nature of tests led to high test cost & programming skill required) Case 4: not adapting development processes Common theme: no senior engineering analysis committed to the testing problem
28
Handout notes: more gotchas Automating too late, or too much detail too early No ability to change the development process of the game Not having ways to measure the effects compared to no automation People and processes are funny things Sometimes the process is changed, and sometimes your testing goals have to shift Games differ: a lot autoTest approaches will vary across games
29
Handout notes: BAT vs FAT Feature drift == expensive test maintenance Code is built incrementally: reporting failures nobody is prepared to deal with yet wastes everybody’s time Automated testing is a new tool, new concept: focus on a few areas first, then measure, improve, iterate
30
Automated Testing for Online Games Overview Hooking up your game external tools internal game changes Applications engineering, QA, operations production & management Summary & Questions
31
Handout notes: Applying automated testing Know what is automation good / not good at & play to its strengths Change your processes around it Establish clear measures, iteratively improve Make sure everybody can use it & has bought into it Tests become a form of communication
32
“The difference between us and a computer is that the computer is blindingly stupid, but it is capable of being stupid many, many millions of times a second.” Douglas Adams (1997 SCO Forum) Repeat massive numbers of simple, easily measurable tasks Mine the results Do all the above, in parallel, for rapid iteration Automated testing: strengths
33
Automation breaks down as individual test complexity increases Repeating simple tests hundreds of times and combining the results is far easier to maintain and analyze than using long, complex tests, and parallelism allows a dramatically accelerated test cycle Handout notes: autoTest complexity
34
Semi-automated testing is best for game development Testing Requirements Rote work (“does door108 still open?”) Scale Repeatability Accuracy Parallelism Integrate the two for best impact Automation Creative bug hunting, visuals Judgment calls, playability Reacting to change, Evaluating autoTest results Manual Testing
35
Handout notes: Semi-automated testing Automation: simple tasks (repetitive or large-scale) Load @ scale Workflow & information management Regression All weapon damage / broad, shallow feature coverage / … Integrated automated & manual testing Tier 1 / Tier 2: automation flags potential errors, manual investigates Within a single test: automation snapshots key game states, manual evaluates results Augmented / accelerated: complex build steps, full level play thru, …
36
Plan your attack with stakeholders (retire risk early: QA, Production, Management) Tough shipping requirements (e.g.) Scale, reliability Regression costs Development risk Cost / risk of engineering & debugging Impact on content creation Management risk Schedule predictability & visibility
37
Handout notes: plan your attack What are the big costs & risks on your project? Technology development (e.g., scalable servers) Breadth of content to be regressed, frequency of regressions Your development team is significantly handicapped without automated tests & multi- client support: focus on production support to start Often, sufficient machines & QA testers not available Run-time debugging of networked games often becomes post-mortem debugging: slower & harder
38
Factors to consider Test applications Sub system Game logic Graphics Unit Full system Test characteristics Frequency of use Repeatable / random Creation & maintenance $$ Execution $$ Manual $$ Overlap w/other tests
39
Handout notes: design factors Test overlap & code coverage Cost of running the test (graphics high, logic/content low) vs frequency of test need Cost of building the test vs manual cost (over time) Maintenance cost of the test suites, the test system, & churn rate of the game code
40
Automation focus areas (Larry’s “top 5”) Performance Scale is hard to get right Critical path stability Keep team going forward Non-determinism Gets in the way of everything Content regression Massive, recurring $$ Compatibility & install Improves life for you & user
41
Handout notes: automation focus areas (recommendations) Full system scale/stability testing Multi-client & server code must always function, or the team slows down Hardest part to get right (and to debug) when running live players Scale will screw you, over and over again… Non-determinism Difficultly in debugging slows development and hurts system reliability Content regression Build stability Complex systems & large development teams require extra care to keep running smoothly, or you’ll pay the price in slower development and more antacids And for some systems, compatibility testing or installer testing A data-driven system is very important: you can cover all the above with one test system
42
Yikes, that all sounds very expensive! Yes, but remember, the alternative costs are higher and do not always work Costs of QA for a 6 player game – you need at least 6 testers at the same time Testers Consoles, TVs and disks & network Non-determinism MMO regression costs: yikes 2 10s to 100s of testers 10 year code life cycle Constant release iterations
43
Checkin Build Smoke Regression Development Unstable builds are expensive & slow down your entire team! Repeated cost of detection & validation Firefighting, not going forward Impact on others Dev Servers Feedback takes hours (or days) Bug introduced
44
Stability: keep the team working! (TSO use case: critical path analysis) Failures on the Critical Path block access to much of the game enter_house () Test Case: Can an Avatar Sit in a Chair? use_object () buy_object () buy_house () create_avatar () login ()
45
Checkin Development Prevent critical path code breaks that take down your team Sniff Test Pass / fail, diagnostics Candidate code Safe code
46
Stability & non-determinism (monkey tests) Code Repository Compilers Continual Repetition of Critical Path Unit Tests Reference Servers
47
Handout notes: build stability Poor build stability slows forward progress (especially the critical path) People are blocked from getting work done Uncertainty: did I bust it, or did it just ‘happen’? A lot of developers just didn’t get non-determinism Backsliding: things kept breaking Monkey Tests: “always current” baseline for developers Common measuring stick across builds & deployments extremely valuable Monkey tests rock! Instant trip wire of problems & focusing device Server aging: fill the pipes, get some buffers dirty Keeps wheels in motion while developers use those servers Accurate measure of race condition bugs
48
Build stability & full testing: comb filtering New code Sniff Test, Monkey Tests - Fast to run - Catch major errors - Keeps coders working $ Full system build Smoke Test, Server Sniff - Is the game playable? - Are the servers stable under a light load? - Do all key features work? $$ Promotable to full testing Full Feature Regression, Full Load Test - Do all test suites pass? - Are the servers stable under peak load conditions? Promotable $$$ Cheap tests to catch gross errors early in the pipeline More expensive tests only run on known functional builds
49
Handout notes: build stability Much faster progress after stability checkers added Sniff Hourly reference tests (sniff monkey, unit monkey) Comb filters kept the manpower overhead low (on both sides), and gave quick feedback Fewer redos for engs, fewer bugs for QA to find&process Size of team gives high broken build cost Fewer side-effect bugs …
50
Handout notes: dealing with stability Hourly stability checkers (monkey tests) Aging (dirty processes, growing datasets, leaking memory) Moving parts (race conditions) Stability measure: what works, right now? Flares go off, etc Unit tests (against Features) Minimal noise / side effects Reference point: what should work? Clarity in reporting / triaging
51
Handout notes: non-determinism is a big risk factor in online development Race conditions, dirty buffers, shared state, … Developers test with a single client against a single server: no chance to expose race conditions Fuzzy data views over networked connections further complicates implementation & debugging Real-time debugging is replaced with post- mortem analysis
52
Handout notes: the effects of non-determinism Multiple CPUs / players greatly complicates development & testing, while also increasing system complexity You can’t reliably reproduce bugs Near-infinite code path coverage + variable latency & transactions over time introduce massive code complexity, very hard to get right Also hard to test edge cases or broad coverage Each test can execute differently over any run
53
AutoTest addresses non-determinism Detection & reproduction of race condition defects Even low probability errors are exposed with sufficient testing (random, structured, load, aging) Measurability of race condition defects Occurs x% of the time, over 400x test runs
54
Monkey test: enterLot ()
55
Monkey test: 3 * enterLot ()
56
Four different behaviors in thirty runs!
57
Handout notes: non-deterministic failures 30 test runs, 4 behaviours Successful entry Hang or Crash Owner evicted, all possessions stolen Random results observed in all major features “Critical Path” random failures outside of Unit Tests very difficult to track
58
Content testing (areas) Regression Error detection Balancing / tuning This topic is a tutorial in and of itself Content regression is a huge cost problem Many ways to automate it (algorithmic, scripted & combined, …) Differs wildly across game genres
59
Content testing (more examples) Light mapping, shadow detection Asset correctness / sameness Compatibility testing Armor / damage Class balances Validating against old userData … (unique to each game)
60
Load testing, before paying customers show up Expose issues that only occur at scale Establish hardware requirements Establish play is acceptable @ scale
61
Handout notes: some examples of things caught with load testing Non-scalable algorithms Server-side dirty buffers Race conditions Data bloat & clogged pipes Poor end-user performance @ scale … you never really know what, but something will always go “spang!” @ scale…
62
Load testing catches non-scalable designs Global data (SP) all data is always available & up to date Scalability is hard: shared data grows with #players, AI, objects, terrain, …, & more bugs! Global data (MP) shared data must be packaged, transmitted, unpackaged, and constantly refreshed Local data
63
Handout notes: why you need load testing SP: all information is always available MP: shared information must be packaged, transmitted and unpackaged Each step costs CPU & bandwidth, and can happen 10’s to 100’s of times per minute May also cause additional overhead (e.g. DB calls) Scalability is key: many shared data structures grow with the number of players, AI, objects, terrain, … Caution: early prototypes may be cheap enough, but as game progresses, costs may explode
64
Handout notes: why you need load testing Case 1, initial design: Transmit entire lotList to all connected clients, every 30 seconds Initial fielding: no problem Development testing: < 1,000 Lots, < 10 clients Complete disaster as clients & DB scaled Shipping requirements: 100,000 Lots, 4,000 clients DO THE MATH BEFORE CODING LotElementSize * LotListSize * NumClients 20 Bytes * 100,000 * 4,000 8,000,000,000 Bytes, TWICE per minute!!
65
22,000,000 DS Queries! 7,000 next highest Load testing: find poor resource utilization
66
Load: test both client & server behaviors
67
Handout notes: automated data mining / triage Test results: Patterns of failures Bug rate to source file comparison Easy historical mining & results comparison Triage: debugging aids that extract RT data from the game Timeout & crash handlers errorManagers Log parsers Scriptable verification conditions
68
Automated Testing for Online Games (One Hour) Overview Hooking up your game external tools internal game changes Applications engineering, QA, operations production & management Summary & Questions
69
Summary: automated testing Start early & make it easy to use Strongly impacts your success The bigger & more complex your game, the more automated testing you need You need commitment across the team Engineering, QA, management, content creation
70
Q&A & other resources My email: larry.mellon_@_emergent.netlarry.mellon_@_emergent.net More material on automated testing for games http://www.maggotranch.com/mmp.html http://www.maggotranch.com/mmp.html Last year’s online engineering slides This year’s slides Talks on automated testing & scaling the development process www.amazon.com: “Massively Multiplayer Game Development II” www.amazon.com Chapters on automated testing and automated metrics systems www.gamasutra.com: Dag Frommhold, Fabian Röken www.gamasutra.com Lengthy article on applying automated testing in games Microsoft: various groups & writings From outside the gaming world Kent Beck: anything on test-driven development http://www.martinfowler.com/articles/continuousIntegration.ht ml#id108619: Continual integration testing http://www.martinfowler.com/articles/continuousIntegration.ht ml#id108619 Amazon & Google: inside & outside our industry
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.