Automated Testing of Massively Multi-Player Games Lessons Learned from The Sims Online Larry Mellon Spring 2003.

Slides:



Advertisements
Similar presentations
How We Manage SaaS Infrastructure Knowledge Track
Advertisements

Object Oriented Analysis And Design-IT0207 iiI Semester
Performance Testing - Kanwalpreet Singh.
Test process essentials Riitta Viitamäki,
Extreme Programming Alexander Kanavin Lappeenranta University of Technology.
Agenda Functional and Performance testing Why Performance Definitions Performance Testing Tools HP LoadRunner Features and Advantages Components Testing.
HP Quality Center Overview.
Prashant Lambat Sr. Manager SQA Engineering Symantec Corporation, Pune Date: 29 th January 2011.
Key-word Driven Automation Framework Shiva Kumar Soumya Dalvi May 25, 2007.
Automating with Open Source Testing Tools Corey McGarrahan rSmart 01-July-08.
QAAC 1 Metrics: A Path for Success Kim Mahoney, QA Manager, The Hartford
1 Scaling The Software Development Process: Lessons Learned from The Sims Online Greg Kearney, Larry Mellon, Darrin West Spring 2003, GDC Greg Kearney,
Computer Engineering 203 R Smith Project Tracking 12/ Project Tracking Why do we want to track a project? What is the projects MOV? – Why is tracking.
Cloud Testing – Guidelines and Approach. Agenda Understanding “The Cloud”? Why move to Cloud? Testing Philosophy Challenges Guidelines to select a Cloud.
SE 450 Software Processes & Product Metrics Reliability: An Introduction.
Mike Azocar Sr. Developer Technical Specialist Microsoft Corporation
Review: Agile Software Testing in Large-Scale Project Talha Majeed COMP 587 Spring 2011.
Software Testing and Quality Assurance
6/19/2007SE _6_19_TSPImp_SVT_Lecture.ppt1 Implementation Phase Inputs: Development strategy & plan Completed, inspected & baselined SRS & SDS.
Swami NatarajanJune 17, 2015 RIT Software Engineering Reliability Engineering.
SE 450 Software Processes & Product Metrics Reliability Engineering.
Computer Engineering 203 R Smith Agile Development 1/ Agile Methods What are Agile Methods? – Extreme Programming is the best known example – SCRUM.
Chapter 3.1 Teams and Processes. 2 Programming Teams In the 1980s programmers developed the whole game (and did the art and sounds too!) Now programmers.
Performance Testing Design By Omri Lapidot Symantec Corporation Mobile: At SIGiST Israel Meeting November 2007.
© 2006, Cognizant Technology Solutions. All Rights Reserved. The information contained herein is subject to change without notice. Automation – How to.
GDC Tutorial, Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon.
Terms: Test (Case) vs. Test Suite
Low Cost Load and Performance Testing. Example Test.
From 3 weeks to 30 minutes – a journey through the ups and downs of test automation.
Architecture Of ASP.NET. What is ASP?  Server-side scripting technology.  Files containing HTML and scripting code.  Access via HTTP requests.  Scripting.
MGS Testing A High Level Overview of Testing in Microsoft Games Studio Joe Djorgee – Test Lead.
1 Autonomic Computing An Introduction Guenter Kickinger.
Automated Testing In MMP Development Controlling Cost, Risk and Schedule Larry Mellon Austin Game Conference Sept 10-11, 2003.
CompSci 230 Software Design and Construction
CS 501: Software Engineering Fall 1999 Lecture 16 Verification and Validation.
Designing For Testability. Incorporate design features that facilitate testing Include features to: –Support test automation at all levels (unit, integration,
Scalability Tools: Automated Testing (30 minutes) Overview Hooking up your game  external tools  internal game changes Applications & Gotchas  engineering,
Understand Application Lifecycle Management
Winrunner Usage - Best Practices S.A.Christopher.
 CS 5380 Software Engineering Chapter 8 Testing.
May 29 th, 2003 Curtis Anderson Sivaprasad Padisetty.
Testing Basics of Testing Presented by: Vijay.C.G – Glister Tech.
Testing Workflow In the Unified Process and Agile/Scrum processes.
Dr. Tom WayCSC Testing and Test-Driven Development CSC 4700 Software Engineering Based on Sommerville slides.
Luminance.  Major group refactoring.  Development Lead: Stephen Damm.  Project Manager: Martina Nagy.  Test team member: Chet Collins.  A lot of.
1 Performance Optimization In QTP Execution Over Video Automation Testing Speaker : Krishnesh Sasiyuthaman Nair Date : 10/05/2012.
2012 Agile Conference. Introduction Background Examining a case study of a project that was filled with dead code and how a team turned it around. This.
TESTING LEVELS Unit Testing Integration Testing System Testing Acceptance Testing.
Chapter 3: Software Project Management Metrics
MIS 7003 MBA Core Course in MIS Professor Akhilesh Bajaj The University of Tulsa Introduction to S/W Engineering © All slides in this presentation Akhilesh.
Testing, Testing & Testing - By M.D.ACHARYA QA doesn't make software but makes it better.
Test Plan: Introduction o Primary focus: developer testing –Implementation phase –Release testing –Maintenance and enhancement o Secondary focus: formal.
Performance Testing - LR. 6/18/20162 Contents Why Load Test Your Web Application ? Functional vs. Load Web Testing Web-Based, Multi-Tiered Architecture.
Optimizing Your Localization Pipeline for a Dynamic Universe David Lakritz President & CEO Language Automation, Inc.
D E P A R T M E N T O F COMPUTER SCIENCE AND SYSTEMS ANALYSIS SCHOOL OF ENGINEERING & APPLIED SCIENCE O X F O R D O H I O MIAMI UNIVERSITY Software Testing.
QA Process within OEM Services Ethan Chang QA Engineer OEM Service, Canonical
性能测试那些事儿 刘博 ..
Software Architecture in Practice
Software testing
Applied Software Implementation & Testing
The Top 10 Reasons Why Federated Can’t Succeed
Testing and Test-Driven Development CSC 4700 Software Engineering
Sharing the good, the bad, the ugly & What can we do about it?
Bringing more value out of automation testing
Why Threads Are A Bad Idea (for most purposes)
Overview Activities from additional UP disciplines are needed to bring a system into being Implementation Testing Deployment Configuration and change management.
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes)
Presentation transcript:

Automated Testing of Massively Multi-Player Games Lessons Learned from The Sims Online Larry Mellon Spring 2003

Context: What Is Automated Testing?

Classes Of Testing System Stress Load Random Input Feature Regression Developer QA

Automation Components Collection & Analysis System Under Test Repeatable, Sync’ed Test Inputs System Under Test Startup & Control System Under Test

What Was Not Automated? Startup & Control Repeatable, Synchronized Inputs Results Analysis Visual Effects

Lessons Learned: Automated Testing Time (60 Minutes) 1/31/3 Wrap-up & Questions What worked best, what didn’t Tabula Rasa: MMP / SPG Fielding: Analysis & Adaptations Design & Initial Implementation Architecture, Scripting Tests, Test Client Initial Results 1/31/3 1/31/3

Requirements Load Testing Load Regression Testing High Code “Churn Rate” Regression

Design Constraints Load Regression Churn Rate Automation (Repeatable, Synchronized Input) (Data Management) Strong Abstraction

Test Client Single, Data Driven Test Client Regression Load Single API Reusable Scripts & Data

Test Client Data Driven Test Client Regression Load Single API Reusable Scripts & Data Single API Configurable Logs & Metrics Key Game States Pass/Fail Responsiveness “Testing feature correctness” “Testing system performance”

Problem: Testing Accuracy Load & Regression: inputs must be –Accurate –Repeatable Churn rate: logic/data in constant motion –How to keep testing client accurate? Solution: game client becomes test client –Exact mimicry –Lower maintenance costs

Test Client == Game Client Test Control State Game GUI Client-Side Game Logic Commands State Presentation Layer Test Client Game Client

Game Client: How Much To Keep? Game Client View Logic Presentation Layer

What Level To Test At? Game Client Mouse Clicks Presentation Layer Regression: Too Brittle (pixel shift) Load: Too Bulky Regression: Too Brittle (pixel shift) Load: Too Bulky View Logic

What Level To Test At? Game Client Internal Events Presentation Layer Regression: Too Brittle (Churn Rate vs Logic & Data) Regression: Too Brittle (Churn Rate vs Logic & Data) View Logic

Gameplay: Semantic Abstractions NullView Client View Logic Presentation Layer Buy LotEnter Lot Use Object … Buy Object ~ ¾ ~ ¼ Basic gameplay changes less frequently than UI or protocol implementations.

Scriptable User Play Sessions SimScript –Collection: Presentation Layer “primitives” –Synchronization: wait_until, remote_command –State probes: arbitrary game state Avatar’s body skill, lamp on/off, … Test Scripts: Specific / ordered inputs –Single user play session –Multiple user play session

Scriptable User Play Sessions Scriptable play sessions: big win –Load: tunable based on actual play –Regression: constantly repeat hundreds of play sessions, validating correctness Gameplay semantics: very stable –UI / protocols shifted constantly –Game play remained (about) the same

SimScript: Abstract User Actions include_scriptsetup_for_test.txt enter_lot$alpha_chimp wait_untilgame_stateinlot chat I’m an Alpha Chimp, in a Lot. log_message Testing object purchase. log_objects buy_objectchair log_objects

SimScript: Control & Sync # Have a remote client use the chair remote_cmd $monkey_bot use_object chair sit set_data avatar reading_skill 80 set_data bookunlock use_object book read wait_untilavatarreading_skill 100 set_recordingon

Client Implementation

Composable Client - Scripts - Cheat Console - GUI Event Generators Game Logic Presentation Layer

Composable Client - Scripts - Console - GUI - Console - Lurker - GUI Any / all components may be loaded per instance Event Generators Viewing Systems Game Logic Presentation Layer

Lesson: View & Logic Entangled Game Client View Logic

Few Clean Separation Points Game Client View Logic Presentation Layer

Solution: Refactored for Isolation Game Client View Logic Presentation Layer

Logic Presentation Layer Lesson: NullView Debugging ? Without (legacy) view system attached, tracing was “difficult”.

Logic Presentation Layer Solution: Embedded Diagnostics Diagnostics Timeout Handlers …

Talk Outline: Automated Testing Time (60 Minutes) Wrap-up & Questions Lessons Learned: Fielding Design & Initial Implementation Architecture & Design Test Client Initial Results 1/31/3 1/31/3 1/31/3

Mean Time Between Failure Random Event, Log & Execute Record client lifetime / RAM Worked: just not relevant in early stages of development –Most failures / leaks found were not high-priority at that time, when weighed against server crashes

Monkey Tests Constant repetition of simple, isolated actions against servers Very useful: –Direct observation of servers while under constant, simple input –Server processes “aged” all day Examples: –Login / Logout –Enter House / Leave House

QA Test Suite Regression High false positive rate & high maintenance –New bugs / old bugs –Shifting game design –“Unknown” failures Not helping in day to day work.

Talk Outline: Automated Testing Time (60 Minutes) ¼ ½ ¼ Wrap-up & Questions Fielding: Analysis&Adaptations Non-Determinism Maintenance Overhead Solutions & Results Monkey / Sniff / Load / Harness Design & Initial Implementation

Analysis: Testing Isolated Features

Analysis: Critical Path Failures on the Critical Path block access to much of the game. enter_house () Test Case: Can an Avatar Sit in a Chair? use_object () buy_object () buy_house () create_avatar () login ()

Solution: Monkey Tests Primitives placed in Monkey Tests –Isolate as much possible, repeat 400x –Report only aggregate results Create Avatar: 93% pass (375 of 400) “Poor Man’s” Unit Test –Feature based, not class based –Limited isolation –Easy failure analysis / reporting

Talk Outline: Automated Testing Time (60 Minutes) Wrap-up & Questions Lessons Learned: Fielding Non-Determinism Maintenance Costs Solution Approaches Monkey / Sniff / Load / Harness Design & Initial Implementation 1/31/3 1/31/3 1/31/3

Analysis: Maintenance Cost High defect rate in game code –Code Coupling: “side effects” –Churn Rate: frequent changes Critical Path: fatal dependencies High debugging cost –Non-deterministic, distributed logic

Turnaround Time Tests were too far removed from introduction of defects.

Critical Path Defects Were Very Costly

Solution: Sniff Test Pre-Checkin Regression: don’t let broken code into Mainline.

Solution: Hourly Diagnostics SniffTest Stability Checker –Emulates a developer –Every hour, sync / build / test Critical Path monkeys ran non-stop –Constant “baseline” Traffic Generation –Keep the pipes full & servers aging –Keep the DB growing

Analysis: CONSTANT SHOUTING IS REALLY IRRITATING Bugs spawned many, many, s Solution: Report Managers –Aggregates / correlates across tests –Filters known defects –Translates common failure reports to their root causes Solution: Data Managers –Information Overload: Automated workflow tools mandatory

ToolKit Usability Workflow automation Information management Developer / Tester “push button” ease of use XP flavour: increasingly easy to run tests –Must be easier to run than avoid to running –Must solve problems “on the ground now”

Sample Testing Harness Views

Load Testing: Goals Expose issues that only occur at scale Establish hardware requirements Establish response is scale Emulate user behaviour –Use server-side metrics to tune test scripts against observed Beta behaviour Run full scale load tests daily

Load Testing: Data Flow Client Metrics Game Traffic Resource Metrics Debugging Data Test Driver CPU Load Control Rig Server Cluster Load Testing Team System Monitors Internal Probes Test Client Test Client Test Client Test Driver CPU Test Client Test Client Test Client Test Driver CPU Test Client Test Client Test Client

Load Testing: Lessons Learned Very successful –“Scale&Break”: up to 4,000 clients Some conflicting requirements w/Regression –Continue on fail –Transaction tracking –Nullview client a little “chunky”

Current Work QA test suite automation Workflow tools Integrating testing into the new features design/development process Planned work –Extend Esper Toolkit for general use –Port to other Maxis projects

Talk Outline: Automated Testing Time (60 Minutes) Wrap-up & Questions Lessons Learned: Fielding Design & Initial Implementation Biggest Wins / Losses Reuse Tabula Rasa: MMP & SSP 1/31/3 1/31/3 1/31/3

Biggest Wins Presentation Layer Abstraction –NullView client –Scripted playsessions: powerful for regression & load Pre-Checkin Snifftest Load Testing Continual Usability Enhancements Team –Upper Management Commitment –Focused Group, Senior Developers

Biggest Issues Order Of Testing –MTBF / QA Test Suites should have come last –Not relevant when early & game too unstable –Find / Fix Lag: too distant from Development Changing TSO’s Development Process –Tool adoption was slow, unless mandated Noise –Constant Flood Of Test Results –Number of Game Defects, Testing Defects –Non-Determinism / False Positives

Tabula Rasa How Would I Start The Next Project?

Tabula Rasa PreCheckin Sniff Test There’s just no reason to let code break.

Tabula Rasa PreCheckin SniffTest Hourly Monkey Tests Keep Mainline working Useful baseline & keeps servers aging.

Tabula Rasa Dedicated Tools Group PreCheckin SniffTest Keep Mainline working Hourly Stability Checkers Baseline for Developers Continual usability enhancements adapted tools To meet “on the ground” conditions. Continual usability enhancements adapted tools To meet “on the ground” conditions.

Tabula Rasa PreCheckin SniffTest Keep Mainline working Hourly Stability Checkers Baseline for Developers Dedicated Tools Group Easy to Use == Used Executive Level Support Mandates required to shift how entire teams operated.

Tabula Rasa PreCheckin SniffTest Keep Mainline working Hourly Stability Checkers Baseline for Developers Easy to Use == Used Load Test: Early & Often Executive Support Radical Shifts in Process Dedicated Tools Group

Tabula Rasa PreCheckin SniffTest Keep Mainline working Hourly Stability Checkers Baseline for Developers Easy to Use == Used Distribute Test Development & Ownership Across Full Team Load Test: Early & Often Break it before Live Executive Support Radical shifts in Process Dedicated Tools Group

Next Project: Basic Infrastructure Control Harness For Clients & Components Reference Client Self Test Reference Feature Regression Engine Living Doc

Building Features: NullView First Control Harness Reference Client NullView Client Self Test Reference Feature Regression Engine Living Doc

Build The Tests With The Code NullView Client Login Self Test Monkey Test Nothing Gets Checked In Without A Working Monkey Test. Control Harness Reference Client Reference Feature Regression Engine

Conclusion Estimated Impact on MMP: High –Sniff Test: kept developers working –Load Test: ID’d critical failures pre-launch –Presentation Layer: scriptable play sessions Cost To Implement: Medium –Much Lower for SSP Games Repeatable, coordinated scale and pre-checkin regression were very significant schedule accelerators.

Conclusion Go For It…

Talk Outline: Automated Testing Time (60 Minutes) Wrap-up Questions Lessons Learned: Fielding Design & Initial Implementation 1/31/3 1/31/3 1/31/3