Presentation is loading. Please wait.

Presentation is loading. Please wait.

Daosheng Mu, Lead Programmer Eric Chang, CTO XPEC Entertainment Inc.

Similar presentations


Presentation on theme: "Daosheng Mu, Lead Programmer Eric Chang, CTO XPEC Entertainment Inc."— Presentation transcript:

1 Daosheng Mu, Lead Programmer Eric Chang, CTO XPEC Entertainment Inc.
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser MMOG Daosheng Mu, Lead Programmer Eric Chang, CTO XPEC Entertainment Inc.

2 Outline Brief of Speakers Introduction of Adobe Flash Stage3D API
XPEC Flash 3D Engine Optimization for Flash Program Future Works Conclusion Q & A

3 Brief of Speakers Eric Chang 19 Years of Game Industry Experiences
Cross-platform 3D Game Engine Development PC/Console/Web

4

5 Brief of Speakers Daosheng Mu
4.5 Years of Cross-platform 3D Game Engine Development Experiences PC/Console/Web

6 Native C/C++ vs. Unity vs. Flash
Why Flash? Native C/C++ vs. Unity vs. Flash Native C/C++ Unity Flash Development Difficulty High Low Mid Ease of Cross Platform Performance Market Popularity (>95%)

7 Project C4 Demo Video

8 Introduction of Adobe Flash Stage3D API
release Based on flash player benefit: 市佔高、用戶多、商業行為市場大 3D API: 並且跨平台( browsers, mobile devices ),叫做stage3D。一個Flash Player最多可以使用到4個stage3D,所以可以做同時四個視窗的應用。

9 Stage3D Support all browsers
One codebase 跨所有的瀏覽器。所有使用flash player開發的應用都可以橫跨所有瀏覽器,並且效能差距也不大。 如果現在要用Html5 WebGL來做遊戲最常被討論的就是在哪個瀏覽器上效能差距如何,因為它是被不同瀏覽器所來處理 而Flash的所有程式都是在一個flash player內被執行,因此在 stage3D上作3D遊戲效能並不會差距很大。

10 Stage3D Stage3D includes with GPU-accelerated 3D APIs Z-buffering
Stencil/Color buffer Vertex shaders Fragment shaders Cube textures More… Stage3d 提供了一般3d api該要有的功能:

11 Stage3D Pros: Cons: GPU accelerated API
Relies on DirectX, OpenGL, OpenGL ES Programmable pipeline Cons: No support of alpha test No support of high-precision texture format Alpha test: 不支援一般desktop 3d api理應提供的render state,需要我們在pixel shader額外加上指令 RGBA: 對於高精確度的圖檔格式並沒有提供,每個channel只能存入8bit資料。我們在開發shadow map時,必須要特殊儲存方式的貼圖來存放深度資訊。

12 Stage3D ResourceNumber allowedTotal memory Vertex buffers 4096 256 MB
Index buffers 128 MB Programs 16 MB Textures 128 MB* Cube textures Draw call limits 32,768 最大公約數 texture 128 MB for mobile device,PC大約是340 MB。 MMOG仍然不夠用 *350 MB is absolute limit for textures, 340 MB is the result we gather

13 AGAL Adobe Graphics Assembly Language
No support of ‘if-else’ statements No support of ‘constants’ Assembly -> ByteArray -> Program3D 不能使用branching, if…, 常數一定要透過shader constant傳入,不能直接在shader內被宣告 Pixelbender3D is readable and high-level, but AGAL is a good way to have good performance Program3D

14 XPEC Flash 3D Engine 以上是stage3D的介紹
接下來我想要分享我們如何使用Stage3D來打造一個適合Web MMORPG引擎的經驗

15 Model Pipeline Action Message Format (AMF):
Native ByteArray compression Native object serialization 一個遊戲引擎都會需要適合自己的binary format。 AMF: AMF是Adobe獨家開發出来的通信協議,主要用於Flash/Server溝通 可以使用原生的壓縮方式。 使用原生的序列化讀取,資料讀入方便。 3DS Max Engine Loader Exporter Collada Binary Converter AMF Engine Render

16 XPEC Flash 3D Engine CPU Application: update/render on CPU
Command buffer Driver GPU XPEC Flash 3D Engine CPU Application: update/render on CPU Command buffer: store graphics API instruction 如何做出一個效能好的3d engine 要了解CPU-driver-GPU CPU會將3d api的指令傳給command buffer,當時機到了之後就會交給driver指派給GPU來做處理 我們可以努力的部分就是CPU的部分。

17 XPEC Flash 3D Engine: Application
Object3D Material Geometry Update UpdateDeltaTime UpdateTransform Scene management Scene partition Frustum culling UpdateHierarchy Draw SetMaterial SetGeometry Stage3D Set Stage3D APIs UpdateTransform: world transform UpdateHierarchy: 更新角色骨架、或是particle的property Scene management: 正常的選擇方法~~~ex: bsp, cell…

18 Scene Management Goal: Minimize draw calls as possible Indoor Scene
BSP tree Outdoor Scene Octree/Quad tree Cell Grid

19 Scene Management: Project C4
Grid partition Object3D: (MinX, MaxX), (MinY, MaxY) x (0, 0) (2, 2) (4, 4) (3,4),(0,2) (0,0),(1,2) y

20 Scene Management: Project C4
Frustum: (MinX, MaxX), (MinY, MaxY) x (0, 0) (2, 2) (4, 4) (3,4),(0,2) (0,0),(1,2) y (1,4),(0,4)

21 XPEC Flash 3D Engine: Command Buffer
Initialize createVertex/Index Buffer createTexture createProgram Begin clear setRenderToTexture Draw setVertex/Index Buffer setProgram setProgramConstants setRenderState setTextureAt drawTriangles End present Mode transition: User mode – 不能存取hardware相關資源 Kernel mode – 可以透過driver存取hardware資源 如何使用這些stage3D APIs: Initialize做的事不要在update的時候做 Update執行的command越少越好 --- Avoid user/kernel mode transition Decrease shader patching “Material sorting” Reduce draw call “Shared buffers” “Dynamic batching”

22 Material Sorting Opaque/Translucent 這裡有五個model,三種不同material
2: 排序過後,只需要切換三次material,command buffer所存放的指令變少了,也就可以減少mode transition帶來的penalty

23 Material Sorting State management 1047/2598 draw calls

24 8800GT 效能上無明顯差距 CPU: Core2Duo E G

25 6600GT CPU: Pentium4 3.0G 1047 draw calls - CPU api call花費了許多時間 - GPU 處理也影響到最後的總時間 2598 draw calls - CPU api call的時間差距更明顯 - 總時間沒有差距,整體上GPU已經到達瓶頸

26 Before sorting(ms) After sorting(ms) NVIDIA 8800 GT - 1047 draw calls
Render loop elapsed time 16 Total elapsed time 41 40 draw calls 36 50 雖然material sorting在6600GT才有明顯效能提升,但我們的引擎是提供給Web game的專案使用 Web game的進入門檻較低,此優化方法能讓低階硬體玩家,享受到優化後的效能提升。 Before sorting(ms) After sorting(ms) NVIDIA 6600 GT draw calls Render loop elapsed time 34 31 Total elapsed time 53 48 draw calls 81 64 89

27 Shared Buffers Problem: Numbers of buffers are limited ResourceNumber
allowedTotal memory Vertex buffers 4096 256 MB Index buffers 128 MB Programs 16 MB MMOG的人物、配件數目很多 容易遭遇到限制上限

28 Shared Buffers Vertex Buffer Index Buffer Vertex Buffer Index Buffer
Can put same model share the same buffer 可以減少buffer創建的份數 Index Buffer Vertex Buffer Index Buffer

29 Particle System Each particle property is computed on the CPU at each frame Alpha, Color, LinearForce, Size, Speed, UV Facing Facing to screen, facing up, facing to parent, facing view up 未來應該要實作GPU particle,目前在CPU方面計算特效非常吃重

30 Particle System Index buffer Vertex buffer Indices will not be changed
Problem: Particle amount depends on frame Upload data to vertex buffer frequently

31 Particle System Static Index Buffer Dynamic Vertex Buffer Vertex Data
一條vertex buffer,在particle要被生成的時候再被加入這條buffer。 好處: 節省buffer數目,如果material是一樣的甚至可以一起發出draw call 隱憂: Stage3D並沒有一個flag可以指定這條buffer想用動態屬性,他們現在一律都是靜態屬性,修改buffer內容會比較傷害效能,好在我們一個frame只會做一次。

32 Skinned Model Problem: Lesser vertex constants allowed
128 constants per vertex program Global vertex constants Lighting, Fog, Const 由於vertex constant不多,我們又固定幾條是用來當做全域設定, Bone matrix 4x4

33 Skinned Model 4x3 Matrix Bone count per geometry is limited to 29
“Split mesh” 所以我們能給bone matrix使用的剩下不多 考慮一個draw call…增加draw call 拆mesh 128 constants / 3 = bones 3 * 29 bones = 87 constants

34 Shadow Map

35 Shadow Map clear() Clear back buffer setRenderToTexture()
Clear shadow map Draw to shadow map setRenderToBackBuffer() Set shadow map present() End frame

36 Shadow Map Problem: Texture format: RGBA8 Artifact Aliasing
Popping while moving

37 Shadow Map Size: 1024x1024 RGBA8  R32 將深度的計算結果評分配到各個channel

38 Shadow Map Percentage Closer Filtering (PCF) solution: Hard shadow
Aliasing Popping while moving PCF (percentage-closure filter)

39 Shadow Map PCF pw = 1/mapWidth ph = 1/mapHeight
(-pw , +ph) (+pw , +ph) (0, 0) (+pw , -ph) (-pw , -ph) PCF pw = 1/mapWidth ph = 1/mapHeight Result = 0.5 * texel( 0, 0) * texel( -pw, +ph) * texel(-pw, -ph) * texel( +pw, +ph) * texel(+pw, -ph) Put on different with PCF and non-pcf Vexel.depth result > shadowMapResult color is black Vexel.depth result < shadowMapResult color is white 選shadow map的原因: 地形高低起伏,正確的深度投影、自投影可以視情況關閉

40 Shadow Map PCF based solution: 整體效能並沒有影響

41 Toon Shading Single pass Two passes Problem: Dependent on no. of face
Scale vertex position following the vertex normal Not dependent on no. of face 𝑣 :𝑣𝑖𝑒𝑤 𝑣𝑒𝑐𝑡𝑜𝑟 𝜃 𝑖𝑓 𝜃>𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑, 𝑑𝑟𝑎𝑤 𝑡𝑜𝑜𝑛 𝑐𝑜𝑙𝑜𝑟 𝑁 :𝑣𝑒𝑟𝑡𝑒𝑥 𝑛𝑜𝑟𝑚𝑎𝑙 Game is comic style Two pass效能差 選two pass原因: 因為我們面數過低

42 Toon Shading Toon General Result Enable back face
Scale vertex position Draw color Toon Enable front face Draw material General Result 選two pass原因: 因為我們面數過低

43 Alpha Test Problem: Stage3D without alpha test “kil opcode in AGAL”
Performance penalty on mobile device It can be removed, one mobile use alpha blend to replace 我們曾經為了將我們的demo放到iPad上,fps 從1x~2x,就是把所有用到alpha test的 material改成alpha blend

44 Alpha Test Solution: Replace alpha-test with alpha-blend
Alpha blend在desktop其實影響比較嚴重。 Render loop time(ms) Total time(ms) 6600GT alpha test 17~19 47 6600GT alpha blend 18~19 65~67 8800GT alpha test 0.16 37 8800GT alpha blend 0.3 36 304 draw calls Alpha-test performance is better on desktop

45 Post Effect Color Filter Glow Origin DOF 以下還有一些我們引擎擁有的效果,今天沒有時間一一細講

46 Static Lightmap Pros: Cons: Pre-computation Global illumination
More textures 省去realtime光影的計算

47 Optimization for Flash Program
Memory leak: 由於一些物件被認為是有reference到,造成GC機制不會去回收它,但我們其實認為他理應要被回收,所以造成記憶體被堆疊 Memory leak: inner function, dereference, recycle For loop use for loop to replace for each

48 Optimization for Flash Program
Problem: For Each is slow “Use for-loop to replace it” Memory management “Recycle manager” “Strengthen garbage collection” Memory leak: 由於一些物件被認為是有reference到,造成GC機制不會去回收它,但我們其實認為他理應要被回收,所以造成記憶體被堆疊 Memory leak: inner function, dereference, recycle For loop use for loop to replace for each

49 Optimization for Flash Program
Solution: Recycle manager Reduce garbage collection loading Save objects initial time public function recycleObject3D( obj:IObject3D ):void public function requestObject3D( classType:int , searchKey:*, renderHandle:int = 0 ):* Memory leak: inner function, dereference, recycle For loop use for loop to replace for each

50 Optimization for Flash Program
Solution: Strengthen garbage collection Avoid inner function Force to dereference function pointer Dereference attribute in object destructor GC不如開發者所想的理想化在適當的地方都要被正確回收 Inner function: 記憶體累積快速,並且GC的啟動很頻繁

51 Force to dereference function pointer
Use inner function Avoid inner function Force to dereference function pointer 一個不停創建含有 inner function物件的實驗,回收就自動交由flash vm的GC來處理 Without inner function

52 Optimization for Flash Program
Experiment: before vs. after Switching among levels Before improvement: After improvement : 說明釋放跟載入

53 Rapid loading Web game最重要的部分就是能夠快速進入場景 資料量小 下載量少

54 Rapid loading Streaming Data compression Batch loading
PNG: swf compression: 20%~55% Package: zip compression: 25~30% Batch loading Separate resource to several packages Download what you really need

55

56 Rapid loading Enter to avatar stage Enter to game stage
After loading picture finished 5Mb/s Elapsed time (sec) 15 6 12 第一次進入遊戲,cache清空狀態 game code ui game scene scene textures

57 Future Works Adobe Texture Format (ATF) FlasCC AS3 Workers MovieClip
Support for compressed/mipmap textures on the different GPU chipset FlasCC C++  AS3 Compilation AS3 Workers Multi-thread support MovieClip Replace with Stage3D UI framework, ex: Starling 以上,就是今天所分享的關於Stage3D製作web 3d MMOG engine心得,未來仍有許多需要加強優化的項目。 PNG: memory usage is large, no-size compression, mipmap Alchemy: 70% native code performance Starling 可以取代movieclip提供完整3d加速

58 Conclusion Cross-Device/Cross-OS/Cross-Browser Flash vs. HTML5
Browser + Cloud Computing Write Once, Run Anywhere Flash vs. HTML5 Cross-Compiling Technology Trend C/C++ + Flash/ActionScript C/C++ + HTML5/JavaScript 58

59 Acknowledgements XPEC - Project C4 Team XPEC - RDO Team

60 Ellison_Mu@xpec.com Eric_Chang@xpec.com
Q & A


Download ppt "Daosheng Mu, Lead Programmer Eric Chang, CTO XPEC Entertainment Inc."

Similar presentations


Ads by Google