Multi-core/Cell Game Engine Design

Slides:



Advertisements
Similar presentations
Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel.
Advertisements

Larrabee Eric Jogerst Cortlandt Schoonover Francis Tan.
Parallel Processing with PlayStation3 Lawrence Kalisz.
Part IV: Memory Management
Sven Woop Computer Graphics Lab Saarland University
SSRS 2008 Architecture Improvements Scale-out SSRS 2008 Report Engine Scalability Improvements.
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
COURSE: COMPUTER PLATFORMS
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Concurrency Important and difficult (Ada slides copied from Ed Schonberg)
Operating Systems ECE344 Ding Yuan Final Review Lecture 13: Final Review.
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
Soul Envoy Final Year Project 22nd April 2006 By Zhu Jinhao.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
Chapter 9 Virtual Memory Produced by Lemlem Kebede Monday, July 16, 2001.
Assets and Dynamics Computation for Virtual Worlds.
Chapter 11 Operating Systems
MEMORY MANAGEMENT By KUNAL KADAKIA RISHIT SHAH. Memory Memory is a large array of words or bytes, each with its own address. It is a repository of quickly.
Computer Organization and Architecture
By Steven Taylor.  Basically a video game engine is a software system designed for the creation and development of video games.  There are many game.
Threading Games for Performance – Architecture – Case Studies.
AGD: 5. Game Arch.1 Objective o to discuss some of the main game architecture elements, rendering, and the game loop Animation and Games Development.
0 Real-time Graphics: Issues and Trends in Games Tobi Saulnier CEO, 1 st Playable Productions November 5, 2007 Computer Graphics, Fall 2007.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
Gearbox Software PRODUCTION PIPELINE – JOBS TITLES – JOB DESCRIPTIONS.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Multi-threading basics
Multi-Core Architectures
1 Lecture: Virtual Memory, DRAM Main Memory Topics: virtual memory, TLB/cache access, DRAM intro (Sections 2.2)
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Tiger Woods 2008: Advancements in Environments Peter Arisman Technical Art Director Tiger Woods 2008.
Seamless Mobility: Michael Wehrs Director of Technology & Standards Mobile Device Division, Microsoft Corp. Wireless Software Innovations Spurring User.
Interactive Visualization of Exceptionally Complex Industrial CAD Datasets Andreas Dietrich Ingo Wald Philipp Slusallek Computer Graphics Group Saarland.
1 Real-time visualization of large detailed volumes on GPU Cyril Crassin, Fabrice Neyret, Sylvain Lefebvre INRIA Rhône-Alpes / Grenoble Universities Interactive.
Project “Smoke” N-core engine experiment Threading for Performance AND Features.
Emergent Game Technologies Gamebryo Element Engine Thread for Performance.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
GPU Based Sound Simulation and Visualization Torbjorn Loken, Torbjorn Loken, Sergiu M. Dascalu, and Frederick C Harris, Jr. Department of Computer Science.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
A computer contains two major sets of tools, software and hardware. Software is generally divided into Systems software and Applications software. Systems.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Introduction Who are we? Paul Martin – Started out in the industry 1996 as PlayStation programmer – Currently a technical director and one of the principals.
 Operating system.  Functions and components of OS.  Types of OS.  Process and a program.  Real time operating system (RTOS).
GPU Architecture and Its Application
REAL-TIME OPERATING SYSTEMS
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Game Engine Architecture
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Understanding Operating Systems Seventh Edition
Lecture: Large Caches, Virtual Memory
Graphics Processing Unit
Chapter III Desktop Imaging Systems & Issues
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Game Engine Architecture
Storage Virtualization
Lecture: DRAM Main Memory
Page Replacement.
Lecture: DRAM Main Memory
Lecture: Cache Innovations, Virtual Memory
Operating Systems Chapter 5: Input/Output Management
Software models - Software Architecture Design Patterns
Multithreaded Programming
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Operating System Introduction.
Virtual Memory: Working Sets
Presentation transcript:

Multi-core/Cell Game Engine Design Kalloc Studios Multi-core/Cell Game Engine Design Henry Yu President & CEO

Credentials Worked in video game industry for nearly 20 years Lead Programmer for Sierra-Online Director of Technology for Activision Technical Director for Electronic Arts/ Westwood Software Director for Angel Studios/Rockstar Software Engineer Director for THQ Found Kalloc Studios 2006

Topics of discussion Game Industry Trends Hardware capabilities comparison System Architecture Kalloc Studios’ mission

Game Industry Trends Consumers demand more realistic visuals, physics interactions, A.I. behaviors More game content to give a full, immersive experience Computer hardware has been evolving to utilize multi-core design Faster iteration time to promote rapid game development

Hardware comparison of the PlayStation 3 to the Xbox 360 Multi-core vs. Cell based architecture, different synchronization models Hard to utilize the SPU due to its small amount of local memory DMA transfers are difficult to structure Slower RSX graphics performance Memory limitations due to its non-unified memory architecture Slower Blue Ray Rom data throughput

Fundamental System design Architecture differences between Xbox 360 and PlayStation 3

Current Kalloc Engine Capabilities Cross platform for Xbox 360, PlayStation 3 and PC 720p and 1080p high definition support 400 or more characters fully skinned with ~5000 polygons and 91 bones (4 weight influences) 50 or more vehicles with ~3000 polygons Normal mapped characters All characters with facial animations Overall polygon throughput ~24 million polygons per second Full collision detection with dynamic objects such as characters and vehicles NPC driving and responding to collisions NPC have reactive behaviors toward player’s action

System Architecture Local store and Data Streaming Model Multi-Threaded Architecture Graphics Subsystem Animation System Physics Components Asset Pipeline via Live Update System

Local Store and Data Streaming Model The architecture works like an array where individual game objects, physics objects, render objects, etc are each allocated in a contiguous chunk of memory reserved for that type of object. The contiguous chunk of memory then can be DMA-d over to the PS3 SPU or even cached on the local memory on PS3. Having objects in contiguous memory is an optimization for the PS3 that will also yield performance increases in Xbox because cache misses will be reduced.

Data Streaming Model to process tasks

Multi-threaded Architecture Thread Based Model and SPU Thread Server implementation for task based architecture Multi-Threaded Scheduler manages both blocking and non-blocking processes Multi-stage implementation for data synchronization N + 1 frame GPU running concurrently with core CPU and SPUs

Functional Based multi-threaded Architecture Functional Based architecture associates one thread per subsystem. All subsystems are processed simultaneously. Advantages: Very easy to implement since it does not require tasks to be divided and dependencies to be resolved. Suitable for middleware solutions. Disadvantages: Uneven distribution of processing power since one slower task can hold up the rest of processors, making them idle. Mutexes or some other synchronization protection must be used to resolve data dependencies.

Task Based multi-threaded Architecture Task Based architecture uses all threads to process a subsystem. Subsystems are processed in a given order. Large tasks must be divided into smaller tasks so that they can be distributed along all processors Advantage: Extremely even balance of processor power. Virtually eliminates the problem of waiting for the slowest tasks. Due to subsystems being processed in a fixed order, many dependencies are removed, allowing data access without mutex locking. Disadvantage: Difficult to implement since all tasks are required to be divided and dependencies resolved. Hard to use middleware solutions since this architecture is relatively new.

Task Distribution Model

Solutions to Data Synchronization Mutex locks using critical sections Data separation using multiple stages (e.g. read and write stages) Local Store Model using ring buffers Component object level organization to separate data dependency

Current Graphics System 720p and 1080p native support Interleaved vertex format with 16-bit normals and UV data to maximize data throughput Multi-level Shadow Map to enhance resolution quality Use of instancing to increase rendering performance Depth Of Field effect High Dynamic Range lighting with tone mapping Particle effects Hardware instancing for rendering props Scene graph techniques such as octree and occlusion systems to further optimize large scale rendering Supports unlimited number of bones for animation

Animation System Support unlimited bones per character Key frame compression Quaternion based interpolation Support for up to 9 channels of animation: rotation, translation and scale Support for overlaid animations Procedural animation to minimize number of animations in game

Physics Component Use of component system to accommodate different physics middleware and custom physics engine: Havok, Bullet and Ageia PhysX Simple custom physics system Sphere to sphere, box to box, box to sphere, etc collisions 2D Grid partition optimizations Per cell collision detection Simple vehicle simulation

Instant Asset Update System for Asset pipeline Instant refreshing of assets without restarting the engine/game No intermediate file formats = quick export process Instant feedback for artists and designers to check for data validity and quality No overnight build/baking process Asset sharing between designers, artists or programmers within the network Built in support for art outsourcing Easy DVD/Blu Ray burns for archiving and build delivery

Mission of Kalloc Studios Create a truly next gen multi-platform game engine that maximizes cutting edge hardware such as multi-core and cell architecture and latest graphics rendering capabilities Create innovative and quality game titles Train highly motivated talent to become industry specialists

Questions ?

Thank you. henry@kalloc. com jobs@kalloc. com www. kallocstudios Thank you! henry@kalloc.com jobs@kalloc.com www.kallocstudios.com (760) 602-7959