Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 KIPA Game Engine Seminars Jonathan Blow Seoul, Korea December 12, 2002 Day 15.

Similar presentations


Presentation on theme: "1 KIPA Game Engine Seminars Jonathan Blow Seoul, Korea December 12, 2002 Day 15."— Presentation transcript:

1 1 KIPA Game Engine Seminars Jonathan Blow Seoul, Korea December 12, 2002 Day 15

2 2 Bit Tricks Generating Bit Masks Is some number a power of two? Avoiding ‘if’ statements (branch prediction) Floating-point absolute value Floating-point compare Floating-point log2

3 3 Generating Bit Masks Suppose we want to mask the low n bits of a machine word We can generate that with a loop Show summation equation for the loop Identity that lets us do something faster

4 4 Is some number a power of two? The power-of-two will be a single bit somewhere in the middle of the word The power-of-two minus one will be a bit mask like the ones we just looked at ANDing them together will produce 0

5 5 Counting the number of set bits in a machine word Slow loop version “Trick” O(num set bits) version Discussion of tree version

6 6 Pentium 4 “fireball” A 16-bit integer unit at the core of the chip that runs at very high clock speeds 32-bit integer operations are pipelined through the fireball as multi-stage 16-bit operations Pipeline is organized for bits to flow from bottom to top of the word (as with addition and subtraction) Right-shifts require a dependency that goes in the opposite direction (slower!)

7 7 “How many bits does it take to store this range of values?” Application: network or file i/o Want ceil(log2(n_max)) assuming the values go from 0 to n_max Slow floating-point versions Fast bit-extraction versions

8 8 Floating-Point log2 Show slow version Fast version utilizing the IEEE-754 format

9 9 Fast absolute value Utilizing IEEE-754 floating point format

10 10 Fast floating-point compare Description of how x86 machines compare floating point numbers –Get at least one of them on the stack –Perform ‘fcomp’ instruction –Load the floating point control word –Bit-mask it to see if the desired field is set

11 11 Decision-making without branching (And without writing in assembly language, to use instructions like CMOV) Build a mask based on whether some intermediate result is negative or not Use that to mask values and add them, or whatever you want –Examples

12 12 Collision Detection Speedbox and Schnitzel as alternatives to the “prevent tunneling” raycast

13 13 Collision Detection Don’t forget to optimize mainly for the expected case! –To miss a lot, or to hit a lot? Example of Shock Force and the “early hit test” –We expect to miss usually! –So the early hit test was not so effective

14 14 Collision detection More Shock Force examples –Hierarchy of tests: bounding sphere, OBB, simple plane divide, BSP “hard case”

15 15 Profiling Motivation –You can’t optimize unless you profile. For some reason some people think they can… they’re wrong. Demo of sample app Goals: –Know where the overall CPU is being spent May depend on which kind of behavior is happening! –Know which routines are stable and which ones are not

16 16 Profiling Example of getting the current time on Windows –At different accuracy levels Description of how this is slow, and why –Too slow to call very often in code!

17 17 Profiling (2) Using the rdtsc instruction Converting this to realtime units by calling QueryPerformanceCounter once per frame

18 18 Profiling (3) Define macros that put rdtsc calls into preambles and postambles for functions Measure and categorize CPU time this way Measure “self time” and “hierarchical time” Code review of macros / constructors

19 19 Problem with rdtsc There’s this SpeedStep thing on Intel laptops –Change the CPU’s clock speed based on performance / temperature demands –Does not adjust rdtsc to compensate May spread beyond laptops in the future –Power consumption of CPUs is becoming an important concern for businesses

20 20 We can detect if rdtsc is screwing up profiling data But we can’t fix the profiling data Solution: just draw a big warning on the screen

21 21 Division of Profiler Low-Level Profiler High-Level Profiler

22 22 Walkthrough of first demo app How it uses the macros How it collects and draws the profiling data

23 23 Measuring variance of profiling data To figure out how stable each function is Draw which functions are “hot” in the realtime display

24 24 Behaviors We would like some better analysis of what the different behaviors are for our program Just “eyeing” the results is not very scientific Examples of different behaviors –Fill rate limited, AI limited, etc

25 25 Batch Profiling vs Interactive Profiling Batch profiling averages a bunch of data together over a session –Maybe it provides a way to peek at individual samples but the processing is never very convenient Interactive profiling is about seeing results as soon as they happen –But interactive profilers are usually hacked together What if we made a good one?

26 26 Want to detect and analyze specific behaviors But without preconceived ideas of what they might be Treat incoming frames of profiling data as vectors, and cluster them Description of k-means clustering

27 27 Clustering algorithms tend to be pretty slow And they require batch data to process –k-means needs random access to the input! Online k-means –Faster, non-batch. But quality?

28 28 Self-Organizing Map “Kohonen Self-Organizing Map” Description of the algorithm Much like online k-means –But with coherence in a separate space

29 29 Demo of SOM-enabled Profiling Tool Visualizations are still early Hopefully they will mature into something truly useful (people in other visualization fields like SOMs, so hopes are high)

30 30 Discussions of changes made to SOM to support online clustering


Download ppt "1 KIPA Game Engine Seminars Jonathan Blow Seoul, Korea December 12, 2002 Day 15."

Similar presentations


Ads by Google