Making a game with Molehill: Zombie Tycoon Jean-Philippe Auclair Lead R&D Software Architect Luc Beaulieu CTO – Frima Studio
Session Overview State of Flash Molehill’s API presentation Digging deeper into Molehill
State of Flash Is Flash Dead? FB: Top 10 = 250M MAU Desktops: Flash 10 installed on 99%+ SmartPhones: Flash/Air 200+M, 100 devices Streaming: 120 petabytes per month Advances in Flash for 3D games AS3 10.1, 10.2 … Molehill
Molehill’s API Presentation Pros: – GPU Accelerated API – Relies on DirectX 9 and OpenGL ES 2.0 – Native Software fallback Cons: – No point sprite support, branching, MRT, depth buffer – No CPU threading support – Native Software fallback
This Page Intentionally Left Green
Digging deeper into Molehill Assuming a basic knowledge of 3D development terminology Display Layers Model/Animation File Format Character Animation: Matrix vs Quaternion Texturing Optimizing the Particle System Fast Lights & Shadows CPU Post-Processing effects Profiling & Debugging tools Bonus! – The math explaining all the numbers I’m going to talk about – Cheat sheets
Display Layers
Frima 3D File Format Many 3D engines for flash try to support multiple input format …Or support only generic format such as ColladaXML Using a format optimized for 3D game made in Flash – Small File Size – Small Memory footprint – No processing required
Frima 3D File Format Collada XML 3DS Max Scene Max Script Exporter Build Tool Export pipeline
Frima 3D File Format Model / Animation Build Tool Game Object Serialize (AMF) Compress Game File Export pipeline
Add To Scene Frima 3D File Format Game Object Uncompress Unserialize Game File In-Game usage
Zombie Re-Animation Techniques – Matrix linear blending – DualQuaternion linear blending Molehill Constraint – Vertex Shader constants limits: 128 Float4 Zombie: 24 bones
Animation techniques Matrix linear blending can cause loss of volume when joints are twisted or extremely bent When using matrix, each bone take 3 constants – Maximum number of bones is 40 When using DualQuats, each bone take only 2 constants – Maximum number of bones is 60 Matrix (left) / Dual Quaternion (Right)
Transitions & interpolation Too Much Animation transition require two sets of bones Idle blending to walk Same thing for frame interpolation (ex: Bullet time Animation)
File size? Performance?
Texturing in Molehill
The first version of the engine was only using PNGs Adobe Texture Format (ATF) – Texture are kept compressed in Video Memory – Native support for multi-device publishing – One file containing 3 encoding: DXT1, ETC1 and PVRTC – 1.3x bigger than original PNG – Contain the MipMapping of the texture – Does not support transparency
Texturing in Molehill Transparency – Use PNGs with indexed color – Sample a “alpha mask texture” in the pixel shader ATF Avatar = opaque ATF Avatar = opaque PNG Fence = Transparent PNG Fence = Transparent
Texturing in Molehill Many effects can use ATF when using the good blend modes No need for transparency Splatter = Multiply Fire = Additive
Particle System Using a divided workload (CPU/GPU) for better performance – Each particle property update is computed on the CPU at each frame Alpha, Color, Direction, Rotation, frame(If SpriteSheet), etc. – On the GPU Applying theses properties Expending billboard vertex to face the screen
Particle System : Optimization How many particle? – Due to the VertexBuffer and IndexBuffer limits, – In ZombieTycoon we were limited to around particles per draw call Using Fast ByteArray (also known as Alchemy memory or DomainMemory) – Using Azoth, properties updates were 10 times faster Batching draw calls using the same texture Using a 100% GPU particle system – It’s expensive on the GPU – Support only linear transformation – Zero CPU required
Particle System
Lights & shadows Techniques – ShadowMap & LightMap – Dynamic lighting – Fake Volumetric lights – Fake projected shadows
Lights & shadows ShadowMap & LightMap – We used two textures, a “multiplied” ShadowMap and an “additive” LightMap Diffuse * ShadowMap + Lightmap = Composite
Lights & shadows Dynamic lighting – Lighting required expensive pixel shader, currently limited to 256 instructions – Zombie Tycoon support up to 7-9 lights (spot or points) per object.
Lights & shadows Pixel Shader assembly code – Per light, without Normal/Specular mapping.
Lights & shadows Fake Volumetric Lights – Using a few billboard particles, it’s easy to fake a nice and lightweight volumetric lighting – All object are sampling Shadow and light maps, and since the light particles are “additive”, if an object is behind the lights, it will look brighter
Lights & shadows
Fake projected shadows – We created a particle of a gradient black spot aligned to the ground – Orientation and scale of the particle depends on light position and intensity
CPU Post-Processing Possibility of reading the BackBuffer – Strongly recommended not to use Readback – Fast pipeline for data from the System memory to Video memory – VERY slow pipeline from video to system memory Effects: Bloom, Blur, Depth of Field, etc. Motion Blur
CPU Post-Processing Bloom post-processingNormal
Profiling and Debugging tools (CPU) FlashDevelop (O.S.S.) – Most of the production is using FlashDevelop – Now with a profiler and a debugger, it’s very easy to work with it
Profiling and Debugging tools (CPU) Adobe Flash Builder Profiler – Profile Function calls – Profile Memory allocation
Profiling and Debugging tools (CPU) FlashPreloadProfiler (O.S.S.) – Profile Function calls – Profile Memory allocation – Profile Loaders status – Can be used in Debug/Release & browser/Projector
Profiling and Debugging tools (GPU) Pix for windows – List of API calls – Shaders assembly code – Pixel debugger – Texture viewer
Profiling and Debugging tools (GPU) Intel® Graphics Performance Analyzers (GPA) – Render in wireframe – Profile Vertex and Pixel shader performance – Visualize overdraw and draw call sequence – Save a frame, and make real-time experiment – Identification of bottlenecks
Sources & References Geometric Skinning with Approximate Dual Quaternion Blending – Intel® Graphics Performance Analyzers (GPA) – Pix for windows – Contact Luc Beaulieu Jean-Philippe Auclair jpauclair.net TD-Matt blog FlashPreloadProfiler Azoth Flash in Facebook AppData.com Flash Stats
Bonus Slide: The maths! Character animation: – Matrix linear blending: 128 Float4 VertexConstant – WorldMatrix – ViewProj matrix = 120Float4 120Float4 / / 3Float4 per bone = 40 bones in the constants Bullet time and transitions require two sets of bones: 40/2 = 20 bones per character max – DualQuaternion linear blending: 128 Float4 VertexConstant – WorldMatrix – ViewProj matrix = 120Float4 120Float4 / / 2Float4 per bone = 60 bones in the constants Bullet time and transitions require two sets of bones: 60/2 = 30 bones per character max Max Particle Count – The VertexBuffer is limited to vertex, the IndexBuffer is limited to index of type SHORT – In theory, you could have up to triangle in one draw call – In practice, with no vertex re-use between particles and using quads (4 vertex): 65536/6 = particle max per draw call Lighting – With the PixelShader limit of 256 instructions, we were able to fit around 7 to 9 dynamic lights per object (point or spot light)
Achievement: Geek Cheat Sheet
Achievement: Super Geek!
Contact Luc Beaulieu Jean-Philippe Auclair jpauclair.net Thank You! Questions?