Automated Testing In MMP Development Controlling Cost, Risk and Schedule Larry Mellon Austin Game Conference Sept 10-11, 2003.

Automated Testing In MMP Development Controlling Cost, Risk and Schedule Larry Mellon Austin Game Conference Sept 10-11, 2003

Abstract & “Reading These Slides” Customers pay good money for entertainment. Month after month, they expect error-free service and a never-ending supply of fresh content. Massively multi-player games are large, complex systems literally littered with the potential for customer non-satisfaction. Reconciling these two points is a difficult -- and costly -- proposition. Automated testing is one approach that has proven very useful. Constant, repetitive testing early in the development process greatly improves stability, keeping developers productive while embedded metrics collectors keep management informed on progress. Automated testing effectively provides thousands of "virtual testers", pushbutton- accessible to any developer. Scripts may be tailored to measure the quality of a game's feature set, or written to provide repeatable, scalable stress tests against server code. They may also be written in advance of the code, creating sample storyboards that describe how players will use a new feature. Developers may work against these executable use cases when first constructing the code, or to keep the code working while extending the feature set over years of live operations. Accelerated development and higher quality in an expensive, subscription-based service make automation an essential component of any MMP developer's toolkit. “Key Points” slides are hidden in slideshow mode. They provide details regarding the previous, graphical slide(s). Graphical slides sometimes depend heavily on animations. Older versions of PPT do not support all animations used in this talk. Reading These Slides

Major Functions of Automation Repeatable, Synchronized Input Debugging Performance Tuning Regression Data Management Test Results Raw Data

Key Points Race conditions across hundreds of threads  repeat tests: over, and over Large scale systems are difficult to deal with Synchronized client inputs Test Results  volume of raw data  distributed nature Summarization / searching Tune&Repeat for performance testing Regression  (complex) code drift  only way to keep something working is to always keep testing it

Major Weaknesses of Automation Judgment Rigid Inputs, Rigid Analysis Visual Effects

Semi-Automated Testing (1) (2) (3) Automation Manual Command Steps Validation Steps

Key Points 1. Automation: simple, repetitive tasks 1. Automation: simple, repetitive tasks Load Load Synchronized, repeatable (distributed) inputs Synchronized, repeatable (distributed) inputs Workflow (information management) Workflow (information management) 2. Manual: judgment / innovative tasks 2. Manual: judgment / innovative tasks Visuals Visuals Playability Playability Creative bug hunting Creative bug hunting 3. Combined 3. Combined Tier 1 / Tier 2 Tier 1 / Tier 2 Within a single test Within a single test

Automation: Architecture Startup & Control System Under Test Test Manager Test Selection/Setup Control N Clients RT probes Repeatable, Sync’ed Test Inputs System Under Test Scripted Use Cases Emulated User Play Sessions Multi-client synchronization Collection & Analysis System Under Test Report Managers Raw Data Collection Aggregation / Summarization Alarm Triggers

Key Points Test Manager Test Manager Embeds a series of probes into a client Embeds a series of probes into a client Health, current test steps, performance, … Health, current test steps, performance, … Desktop UI to manage tests, configure client(s) Desktop UI to manage tests, configure client(s) Up to one hundred clients per workstation Up to one hundred clients per workstation every developer / tester / manager can run any test against any client/server combo every developer / tester / manager can run any test against any client/server combo Scripted user playsessions Scripted user playsessions Game cannot tell if a Command was generated by a User or a Script Game cannot tell if a Command was generated by a User or a Script Used live game code via Presentation Layer Used live game code via Presentation Layer Report Managers Report Managers Summarization / aggregation of data Summarization / aggregation of data Trends / histories Trends / histories Tier 1 alarm system Tier 1 alarm system Python Rocks! Python Rocks!

Automated Testing: Topics Business Case TSO Applications & Implementation Wrapup Questions Automated Testing: Overview Customer satisfaction Development risk / operational costs

Development / Schedule Risk How do we know when we are ready to launch? How do we know how fast we are going? Complex, Large-Scale System Large Development Team

Key Points Implementation Risk Implementation Risk Many things that can go wrong Many things that can go wrong And with many people, many opportunities for them to go wrong And with many people, many opportunities for them to go wrong Scale: team/feature/process count Scale: team/feature/process count Schedule Risk Schedule Risk Scale / Complexity Scale / Complexity When are we ready? When are we ready? How fast are we going? How fast are we going?

Accelerated, Measurable Development Repeatable, Synchronized Input Data Management Development Instant Feedback, Rapid Triage, History Management StabilityReference Cases Summarized, Aggregated Results Current Status History Trends

Customer Satisfaction Stability New Content Automated testing keeps existing code working while new code is being added.

Key Points Stability Stability Servers Up Servers Up Bugs Down Bugs Down Fast response, always Fast response, always No griefers No griefers No lost data No lost data New Content New Content Features / events / whatever Features / events / whatever

Subscription: Profit & Running Costs Profit… New Content Regression Customer Support Operations ~ $10 per customer

Automation: Lower Running Cost Profit… Regression Customer Support Operations ~ $10 per customer Lower New Content Cost

Automation: Lower Running Cost Profit… Customer Support Operations ~ $10 per customer Lower New Content Cost Lower Testing Cost

Automation: Lower Running Cost Profit… Operations ~ $10 per customer Lower New Content Cost Lower Testing Cost Happy Customers Don’t Call

Business Case: Summary Tool investment accelerates development, provides tangible measures of progress Tool investment pays out over entire lifecycle of subscription-based service

Key Points Happy customers don’t call in and they don’t leave Happy customers don’t call in and they don’t leave Less backsliding in development Less backsliding in development Fewer breaks, faster triage Fewer breaks, faster triage Easier insertion of new features Easier insertion of new features Grand total Grand total Lower operational costs Lower operational costs Customer support Customer support New content New content Faster, stable, development of new content Faster, stable, development of new content

Automated Testing: Topics Business Case The Sims Online Applications Implementation The Sims Online Applications Implementation Wrapup Questions Automated Testing: Overview

TSO: Applications Load Testing QA Regression Stability Operations Development

Key Points QA Test Suites QA Test Suites Turn on after Unit Tests pass Turn on after Unit Tests pass Turn on tests in stages, as the code gets turned on in stages Turn on tests in stages, as the code gets turned on in stages Core functionality (Alpha?) Core functionality (Alpha?) Full functionality (Ship?) Full functionality (Ship?) Reduced $$, a lot Reduced $$, a lot Rapid turnaround time Rapid turnaround time Load Testing Load Testing We would not have shipped without it We would not have shipped without it Build / Server Stability Build / Server Stability Unexpected bonus Unexpected bonus

Load Testing: Goals Expose issues that only occur at scale Expose issues that only occur at scale Establish hardware requirements Establish hardware requirements Establish response is playable @ scale Establish response is playable @ scale Emulate true user behaviour Emulate true user behaviour

Stability Via Monkey Tests Code Repository Compilers Continual Repetition of Unit / Integration Tests Reference Servers

Stability Via Monkey Tests Reference Servers 1.Last “known good” 2.What works, right now 3.Aggregate reporting 1.Last “known good” 2.What works, right now 3.Aggregate reporting Aging Moving parts

Monkey Test: EnterLot

Non-Deterministic Failures

Key Points 20 test runs, 4 behaviours 20 test runs, 4 behaviours Successful entry Successful entry Hang or Crash Hang or Crash Owner evicted, all possessions stolen Owner evicted, all possessions stolen Random results observed in all major features Random results observed in all major features “Critical Path” random failures outside of Unit Tests very difficult to track “Critical Path” random failures outside of Unit Tests very difficult to track

Critical Path: Unit Testing Failures on the Critical Path block access to much of the game. Worse, unreliable failures… Failures on the Critical Path block access to much of the game. Worse, unreliable failures… enter_house () Test Case: Can an Avatar Sit in a Chair? use_object () buy_object () buy_house () create_avatar () login ()

Key Points Build stability: slowed forward progress (especially the critical path) Build stability: slowed forward progress (especially the critical path) People were blocked from getting work done People were blocked from getting work done Uncertainty: did I break that, or did it just ‘happen’? Uncertainty: did I break that, or did it just ‘happen’? A lot of developers just didn’t get non-determinism A lot of developers just didn’t get non-determinism Backsliding: things kept breaking Backsliding: things kept breaking Monkey Tests: “always current” baseline for developers Monkey Tests: “always current” baseline for developers Common measuring stick across builds & deployments extremely valuable Common measuring stick across builds & deployments extremely valuable

Impact On Others Impact On Others

Pre-Checkin Regression: don’t let broken code into Mainline.

Key Points Much faster progress after stability checkers added Much faster progress after stability checkers added Sniff Sniff Hourly reference tests (sniff monkey, unit monkey) Hourly reference tests (sniff monkey, unit monkey) Comb filters kept the manpower overhead low (on both sides, and gave quick feedback. Fewer redos for engs, fewer bugs for QA to find&process) Comb filters kept the manpower overhead low (on both sides, and gave quick feedback. Fewer redos for engs, fewer bugs for QA to find&process) Extra post-checkin testing story (optional) Extra post-checkin testing story (optional) Size of team gives high broken build cost Size of team gives high broken build cost Fewer “Redos” Fewer “Redos” Fewer side-effect bugs Fewer side-effect bugs …

Key Points Hourly stability checkers Hourly stability checkers Aging (dirty processes, growing datasets, leaking memory) Aging (dirty processes, growing datasets, leaking memory) Moving parts (race conditions) Moving parts (race conditions) Stability measure: what works, right now? Stability measure: what works, right now? Flares go off, etc Flares go off, etc Unit tests (against Features) Unit tests (against Features) Minimal noise / side effects Minimal noise / side effects Reference point: what should work? Reference point: what should work? Clarity in reporting / triaging Clarity in reporting / triaging

Automated Testing: Topics Business Case The Sims Online Applications Implementation The Sims Online Applications Implementation Wrapup Questions Automated Testing: Overview

Test Client Single, Data Driven Test Client Regression Load Single API Reusable Scripts & Data Testing feature correctness Testing system performance

Test Client Data Driven Reporting Regression Load Single API Reusable Scripts & Data Single API Configurable Logs & Metrics Key Game States Pass/Fail Responsiveness

Key Points Support costs: one (data driven) client better than N clients Support costs: one (data driven) client better than N clients Tailorable validation output turned out to be a very powerful construct Tailorable validation output turned out to be a very powerful construct Each test script contains required validation steps (flexible, tunable, …) Each test script contains required validation steps (flexible, tunable, …) Minimize state to regress against == fewer false positives Minimize state to regress against == fewer false positives

Test Client Derived From Real Client Game Client View Logic Presentation Layer

Key Points Load & Regression: inputs must be Load & Regression: inputs must be Accurate Accurate Repeatable Repeatable Validation: must be tailorable, variable precision Validation: must be tailorable, variable precision Churn rate: logic/data in constant motion Churn rate: logic/data in constant motion How to keep testing client accurate? How to keep testing client accurate?

Test Client Game Client Scripted Player Actions Script Engine State Game GUI Client-Side Game Logic Commands State Presentation Layer

What Level To Test At? Game Client Mouse Clicks Presentation Layer Regression: Too Brittle (pixel shift) Load: Too Bulky Regression: Too Brittle (pixel shift) Load: Too Bulky View Logic

What Level To Test At? Game Client Internal Events Presentation Layer Regression: Too Brittle (Churn Rate vs Logic & Data) Regression: Too Brittle (Churn Rate vs Logic & Data) View Logic

Gameplay: Semantic Abstractions NullView Client View Logic Presentation Layer ChatEnter Lot Use Object … Route Avatar Basic gameplay changes less frequently than UI or protocol implementations.

Scriptable User Play Sessions Set of Presentation Layer “primitives” Set of Presentation Layer “primitives” State probes: peek/poke arbitrary game state State probes: peek/poke arbitrary game state Synchronization: wait_until, remote_command Synchronization: wait_until, remote_command Recordable… Recordable… Test Scripts: Test Scripts: Specific / ordered inputs Specific / ordered inputs Specific game responses Specific game responses Single user play session, multi-user session Single user play session, multi-user session

Key Points Scriptable play sessions: big win Scriptable play sessions: big win Load: tunable based on actual play Load: tunable based on actual play Regression: constantly repeat hundreds of play sessions, validating correctness Regression: constantly repeat hundreds of play sessions, validating correctness Development: repeatable ‘live’ input Development: repeatable ‘live’ input P_Layer events logged as SimScript P_Layer events logged as SimScript Recorder (GUI) / Monitor (Remote) Recorder (GUI) / Monitor (Remote)

Toolkit Usability Data Managers Pushbutton Use: Developers, Testers, Test Farms, Everywhere Pushbutton Use: Developers, Testers, Test Farms, Everywhere Continual Process Improvement Tool UI / feature set False positives Continual Process Improvement Tool UI / feature set False positives

Key Points Data Managers reduce CONSTANT SHOUTING Data Managers reduce CONSTANT SHOUTING Produce summary reports Produce summary reports Aggregates / correlates across tests Aggregates / correlates across tests Filters known defects Filters known defects Translates common failures to root causes Translates common failures to root causes Developer / Tester “push button” use Developer / Tester “push button” use wxPython rocks! wxPython rocks! Continual process improvement Continual process improvement automation of setup/analysis automation of setup/analysis Must be easier to run than avoid running Must be easier to run than avoid running Must solve problems “on the ground now” Must solve problems “on the ground now” False Positives: measure, reduce the cause(s) False Positives: measure, reduce the cause(s)

Automated Testing: Topics Business Case Wrapup Questions Automated Testing: Overview TSO Apps & Impl

Wrapup Very Useful, Very Reusable Development Acceleration Lower Operational Costs Cautionary Tales Tabula Rasa

PreCheckin SniffTest Keep Mainline working Hourly Stability Checkers Baseline for Developers Easy to Use == Used Distribute Test Development & Ownership Across Full Team Load Test: Early & Often Break it before Live Executive Support Radical shifts in Process Dedicated Tools Group

Cautionary Tales Flexible Design Requires Flexible Tests Signal To Noise Ratio Defects In The Testing System

Key Points Risk: Risk: Testing requires specifications: hard in rapid iteration game design Testing requires specifications: hard in rapid iteration game design Specs must be easily malleable, as the design itself is Specs must be easily malleable, as the design itself is Must be followed, everywhere (Team Buyin) Must be followed, everywhere (Team Buyin) Automate information gathering/analysis Automate information gathering/analysis Ease of access for data Ease of access for data Ease of analysis for data Ease of analysis for data High level views, common to all users High level views, common to all users Tests become a communication device Tests become a communication device Protect the believability of tests: be very sure Protect the believability of tests: be very sure Start clean, every time Start clean, every time Use real game code, every time Use real game code, every time Repeat a lot, build big tables Repeat a lot, build big tables Hourly reference tests are essential (e.g., snifftest) Hourly reference tests are essential (e.g., snifftest) Remember: the first reponses is always “the test that is wrong” (“I hate snifftest” story) Remember: the first reponses is always “the test that is wrong” (“I hate snifftest” story)

Automated Testing: Topics Business Case Questions The Sims Online Applications Implementation The Sims Online Applications Implementation Architecture & Applications

Automated Testing In MMP Development Controlling Cost, Risk and Schedule Larry Mellon Austin Game Conference Sept 10-11, 2003.

Similar presentations

Presentation on theme: "Automated Testing In MMP Development Controlling Cost, Risk and Schedule Larry Mellon Austin Game Conference Sept 10-11, 2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automated Testing In MMP Development Controlling Cost, Risk and Schedule Larry Mellon Austin Game Conference Sept 10-11, 2003.

Similar presentations

Presentation on theme: "Automated Testing In MMP Development Controlling Cost, Risk and Schedule Larry Mellon Austin Game Conference Sept 10-11, 2003."— Presentation transcript:

Similar presentations

About project

Feedback