Presentation on theme: "David Carter SCEE Technology Group"— Presentation transcript:
1 David Carter SCEE Technology Group Introducing PS2 to PC ProgrammersDavid CarterSCEE Technology Group
2 What We Will Be Covering An overview of the hardwareA basic rendering pipelineHow to improve performanceUnder used capacitiesPS2 design techniquesQuestions…Overview of the ps2 hardware including the GS, Emotion Engine, EE core, Vector Units, DMA, IOP and sound and some performance issues.Basic Rendering pipeline for the PS2, how the different units interact with one another.How to improve PS2 performance – looking at why C++, I$, D$, texture cache hits…cause continuous issues from developers.PS2 design techniques – VIF compression, Geometry and texture streaming, so using the design of the PS2 for complex tasks.Under used Capacities – VU0, scratchpad and the IOP.
3 What We Will Not Be Covering A MIPS programming courseShowing any sample codeThe price of beer (I am so glad it is cheap!)A PS2 in chocolate (ummm…tasty!)Sorry to disappoint you….but…we will not be having in depth analysis of low level techniques, the presentation will be more conceptual.
4 Basic PS2 Architecture IOP: Input Output Processor SPU2: Sound ProcessorSPU2IOPEmotion EngineMemory32mbGS4mbIPU128bit busDMAcacheVU0VU1GIFFPUEE COREBasic PS2 Architecture:EE Core 128 bit, 300Mhz, 2-way superscalar pipelines. The EE Core contains a number of caches namely, I$, D$ and scratchpad.64bit instruction set conforms to MIPS III standard, also contains 128bit Multimedia instructions.Memory: 32Mb RDRAMGS: 4Mb RDRAM, provides 2d primitives including points, lines, sprites, triangles, tri strips and fans.DMA: Wide data bus enables virtual UMA like data handling (max 2.4-gigabyte/sec) transfers in 8quad word (sliced mode) to a dedicated device.VU: Vector Units numbered 0 and 1. VU0 - 4Kb for data & 4Kb for instructions, connected directly to the EE Core through dedicated channel, VU1 16Kb for data and 16Kb for instructions, connects directly to the GIF.SPU2: Contains two cores which combine for a total of 48 voice channels. Digital effects such as reverb, echo and delay can be applied to the 2 stereo units (per chip).FPU: 32bit single precision floating point unit. Standard MIPS FPU bolted on.IOP: Used for processing I/O devices including HDD/DualShock2/USB/I-link etc. 36Mhz, 32bit, used for backwards compatibility for PSone titles.IPU: MPEG2 video layer decoding and bit stream decodingMain Bus: 128 bitsThe Emotion engine is key to the calculation performance of PS2. It contains a number of custom processor cores and uses. Its 128-bit bus provides maximum performance.EE: 128-bit Emotion Engine GS: Graphic SynthesiserVU0/VU1: Vector Units DMA: Direct memory accessFPU: Floating Point Unit IPU: Image processing Unit
5 Caches And Scratchpad Emotion Engine Memory 32mb GS 4mb SIF IPU 128bit busDMAcacheVIFVIFGIFFPUEE COREVU0VU1Cache: I$ 16Kb, D$ 8Kb, Scratchpad 16Kb it has a 1-cycle access to.It is a typical cache, sits between memory and CPU.Scratchpad is the most useful as you have direct access to it.Small caches were decided upon because it was felt that a lot of data would be shifting around the system and would not be spending a long time in the cache.PS2 Similar to older PC L1 caches which are around 16Kb (but are now 32 or 64Kb), but lacks the L2 cache around 512Kb that boosts PC performance.I$16kD$8kSPR16kSimilar to old style PC L1 cache.PS2 has small caches, as it was felt that a lot of dynamic data would not be in the cache for any length of time.EE CORE
6 EE Vector Units Emotion Engine Memory 32mb GS 4mb SIF IPU 128bit bus DMAcacheVIFVIFGIFFPUEE COREVU0VU1Each vector unit can do 4 multiplies and 4 adds in a single instruction and can transform about 36million vertices/sec.Both can operate in Micromode – LIW architecture (32bits*2)Argued that due to the PS2 architecture the PC paradigm started to shift with the emergence of Vertex Shaders.Vector Unit 0 has two operational modes. Macromode and Micromode. Macromode which operates on coprocessor instructions from the CPU core and a micromode which operates independently according to the micro programs stored in the micro memory.Vector Unit 1 operates only in micromode. Vector unit 1 has a larger instruction memory and VU data memory than VU0. It also has a direct connection to the GIF and has additional instructions to VU0 such as transferring data to the GIF. Micro instructions are LIW (Long Instruction Word) instructions of 32bits x 2 and can concurrently execute in the upper and lower instruction word. The upper instruction is processed by the FMAC and the lower instruction uses the lower 32bits use the integer registers as well as the floating point divide,square root,integer instructions ie ADD,AND,OR,SUB, Elementary functions (SIN,TAN etc), Loading,storing and branching are also handled by the lower instruction set.VU1 compares to PC in similar in operation to vertex shaders that handle the Geometry of a scene which have appeared on 3d graphics cards. You also have to learn a specific assembly language to program both the VU’s and Vertex Shaders. Vertex Shader v1.0 does not allow any loops, jumps or conditional branches, compared to VU which has all of these in its instruction set. Vertex Shader v2.0 is now comparable to the Vector Units with the addition of loops/branches.The VIFs (the Vector Unit Interface) functions as a preprocessor for the VU. The VIF unpacks packed vertex data based on specifications that it has been sent through VIFtags (see DMA slides later) and transfers the data directly into VU memory. This allows less main memory to be used to store verts, it also removes the load for formatting from the VU and shifts it too the VIF.36million verts a sec (roughly 600,000 can be processed a frame (60hz)) is not as good as the new Geforce 4 cards with there 110million+ verts a second.New PC graphics cards are slowly shifting the PC paradigm with vertex shaders. You could argue that the PS2 engineers anticipated this paradigm shift with the Vector Units.
7 Graphic Synthesiser Emotion Engine Memory 32mb GS 4mb SIF IPU 128bit busDMAcacheVIFVIFGIFFPUEE COREVU0VU1Primitives per second:150million points50million textured sprites75million untextured triangles37.5million textured trianglesTo compare this unit with a typical PC card, would be unfair. The PS2 provides a separate Geometry engine, which Vu0 & Vu1 take care of and a rasteriser which the GS takes care of. PC cards have Vertex Shaders and Pixel Shaders which are tightly coupled together on the same card, they provide similar basic functionality.GS Features:Variety of drawing primitives – Point, Lines, LineStrips, triange, triangle strips, fans and sprites. Can handle Flat and Gouraud shading, Perspective correct textures,Bilinear and trilinear texture mapping.MipMapping: 6 levels, which the GS can automatically calculate the base pointer for each LOD.Tiled textures specified by the wrap mode CLAMP (stretches texture), WRAP (titled texture)Textures formats supported 4/8bits (with CLUT: Colour Lookup Table) 16/24/32bits (without CLUT)A high speed Z buffer, can be 16bits, 24 bits, 32 bits.High efficiency scissoring, which is relative to screen space from the GS coordinate system.Registers with to 2 contexts, so that the GS can be set up that the GS can change its state. Example alpha blending for some triangles and not for others.Pixel fill rate – 2.4Giga pixel sec without texture mapping, 1.2Giga pixel second with texture mappingGS supports a number of resolutions, including NTSC, PAL and VESA.Features:Alpha blend, Z-test, Bi-linear/tri-linear filtering. Efficient scissoring and a fill rate of 2.4-giga pixel.
8 GIF Connection For VU1 Emotion Engine Memory 32mb GS 4mb SIF IPU 128bit busDMAcacheVIFVIFGIFFPUEE COREVU0VU1Vector Unit 1 has a dedicated output path to the GIFIt also has a much larger internal memory than VU0 to support double buffering of input and output data.This enables fast transformation and output to GS of patterned data.The Vector Units provide the major power for the PS2. The VU1 has 16Kb of memory. You can use this 16Kb to perform all your T’n’L. With only 16Kb as discussed before with cache, the idea is not to store an entire model locally, but stream it from a DMA channel into the VIF for VU1 to process. The idea of a double buffering of input and output is to allow the VU to work at its most optimal. Imagine a typical double buffer system on video card. Whilst data is being drawn to one buffer the other buffer is being displayed. Exactly the same principle, but with vertex data rather than drawing data.
9 Fill Rate Emotion Engine Memory 32mb GS 4mb SIF IPU 128bit bus DMA cacheVIFVIFGIFFPUEE COREVU0VU1Bandwidth of 4MB Embedded DRAM 48GB/secBandwidth of frame buffer 38.4Gb/secTexture bandwidth 9.6Gb/secFill rate 1.2Giga pixel a sec for textureFill rate 2.4Giga pixel a sec for untexturedThe GS has massive fill rate capabilities. The GS processes the 2d primitives that VU1 has transformed to screen space.48 Gigabyte/sec bandwidth = 1024 read write port, for drawing and accessing Z buffer and 512-bit port for texture reading, with a 150Mhz processor. So (((1024*2)+512)*150m)/8;=48GB….the 8 is for bits to bytes conversion…if you are wondering Example:35-million polys * 6 * 6 pixels square = 1.26Gpixel.Each pixel requires 16-byte bandwidth = 20.1GB/second16byte bandwidth comes from ((2*1024)/8)/16pixels at the same time; = 16bytes
10 IOP, SPU And Backwards Compatibility 32bit busMemory32mbVU0EE COREIOPSPU2IPUDMAVU1SPU mem 2mbSIF128bit busCD, HDD,Pad, USB,I-link, TCP/IPGS4mbGIFEmotion EnginecacheVIFFPUIOP mem 2mbIOP handles all of your input/output processing and is heavily underused. The IOP is multithreaded and could run scripts and code that is not tied to the frame rate.Libraries are available for all peripherals and are accessed via a number of SCEI modules which are loaded into IOP memory at runtime, a separate IOP flash image is also loaded at runtime. The SPU2 is 2 SPU chips from the PSone, each chip has 24 voices bring the total voices upto 48. Digital effects such as reverb, echo and delay can be applied to the 2 stereo units (per chip).The IOP is a slightly faster chip than the PS1 version that clocked in at around 33Mhz, the incarnation runs at about 36Mhz.The new network capabilities of the PS2 will also use the IOP in the form of modules, Michael Werle on Sunday is presenting a talk on the PS2 network strategy.The IOP processor comes from PS1, this solves compatibility!
11 DMA Emotion Engine Memory 32mb GS 4mb SIF IPU 128bit bus DMA cache VIF GIFFPUEE COREVU0VU1DMA bus has a bandwidth of 2.4Gb/sec, faster than AGPx8 which is (in theory!) 2.1Gb/sec.The DMA bus controls all data transfers in the system.The DMAC will not stall the CPU when transferring data.DMA transfers must be aligned to 128bits.The DMA bus has a maximum of 2.4Gb/sec bandwidth. Compare this with the AGPx4 which is (in theory!) to have a bus bandwidth of 1.1Gb/sec and even the x8 is 2.1Gb/sec, but past performance has shown that most applications on PC still use the 33Mhz bus so therefore the performance drops by over half.
12 DMA Data Transfer CPU cache Main memory Device4 start DMA controller Time sliced:8qword to 18qword to 28qword to 3repeatDedicated channelfor each deviceCPUcacheMain memoryDevice4startDMAcontrollerDevice0Device3Transfer modes:Physical: Two modes, Burst and Sliced. Burst mode is a continuous transfer of data by the DMAC. Burst is only available to and from scratchpad. Slice mode is the transfer of data in 8qword slices whilst arbitrating with other DMA channels.Logical: Two used modes, 1) Normal mode: Give the DMA a start address and a size and 2) Chain Mode: Set up tags to reference memory and then transfer the data.NOTE! DMA bypasses the cacheSo Why does DMA bypass the cache?Well the main features of cache are:1) Maintains CPU speed for data already in cache, 2) Its high speed and 3) Its expensive to haveDynamic media such as geometry does not stay in the cache for any long length of time so large cache is unhelpful.Data throughput is more important cache checking would slow down DMA.DMA must be aligned to a Quad word boundary.Quad-word = 128bits (16 bytes)Device1NOTEDMA bypasses the cacheDevice2To send data through a channel you just specify the start address, the data size and a start signal to the DMAC.
13 DMA Chains Built from list of tags, can contain many data types HeadTagTextureMatrixRef10000100001000011000010000100001Ref1000Binary Data00100001NextTagMatrixVerticesRefA number of different tags for DMA chains exist and they can transport any data over the 9 channels of the DMAC. Commonly used types are shown here. The Tags are sequentially in memory, the Ref Tags are references to addresses in memory. The DMAC is intelligent enough to use these address’s to transfer data from different areas of memory. VIF Codes can be used to describe a particular format you are sending in vertices to the VIF or as in this example the code is sending a signal to start executing the micro code on a Vector Unit.RefNormalsRefVIFCodeMicrocodestartTextureCoordsEndTagBuilt from list of tags, can contain many data types
14 Basic Rendering Pipeline +-*/CalculateanimationCPU + coprocessor VU0List processing DMAVU1GSTraversescene+-*/Transformto 2DWith the number of different units on the PS2 involved in the rendering pipeline, you can see that collating this data and processing it will take a large amount of work to get something quite simple up and running. Whereas a PC you would set up all your matrices, your scene (world), process the world to screen positions, then use either a software renderer or more commonly a 3d graphics card hardware render, and with DirectX this can all be realised quite quickly and easily.The PS2 requires a lot more thought in how all this data can be organised. The pipeline you see above is how the units use the data for processing, but it is better to imagine that these stages happen in parallel once the frame of data is pushed into the pipe. Very similar to a MIPS pipeline.The above is just an example there are a number of ways to use the PS2 for these tasks but we will concentrate a basic approach first. Firstly you may set up all your matrices in main memory. This can be done by the CPU+FPU or if animation or other vector maths is be performed then you can use VU0 to give you greater flexibility to the pipeline. For example you could use it to store matrices and DMA the matrices/verts to VU1 for computation it is really up to you.Back to our slide….Once your scene data is organised and is ready to be processed through to Rasterisation, your scene data must be moved from main memory to VU1 for transformation. This can be achieved through DMA operations. As a programmer you control the DMA for the system, you send the data to VU1 via the dedicated DMA channel. The VU1 processes the vertices converting from 3d verts to 2d points and then kicks the transformed vertices to the GS to be rendered. This is a full step through the rendering pipeline, you would normally find in a game application that once your move to the “List processing DMA” step that you will starting to construct the next frame with the CPU+FPU/VU0.This is similar to deferred mode on Direct X.Rasterisation
15 How To Improve PS2 Performance By not treating the PS2 as a PCBy using texture sizes and formatsPrevent the thrashing of Texture CacheWithout abusing Instruction and Data CacheThis is the second half of the presentation. Where common issues with PS2 programming including Rendering with differing scenarios. Looking at Texture size, format and how it can half your fill rate capabilities when thrashing the cache, from this you can realise texture ideals. The Instruction and Data cache hits are particularly disruptive on ps2 causing complete stalls of the CPU during refilling.
16 1st Attempt At A PC Port (max 0.5 million polys) IOPSPUIPUMemoryDMA bus: 2.4Gb/secGeometry andtextureThis is treating the PS2 as a PC. Allow the CPU/FPU to handle all the vector/matrix calculations. Obviously geometry proccessing on the latest graphics cards shift the emphasis of these calculations to be handled by the Graphics Card…which for PS2 is culmination of the Vus & the GS. The GS cannot handle 3d model data. Its power is to just draw the already transformed data its been given.FPU: 32bit single precision floating point unit (cop1). In code signified by x.xf. Compiler option –no-double-floats = removal of “f” to signify single float.Otherwise the compiler uses software doubles which are very slow.CPUVU0VU1GSFPUTransformation
17 2nd Attempt At A PC Port (max 1.5 million polys) IOPSPUIPUMemoryDMA bus: 2.4Gb/secGeometry andtextureThis is now shifting your linear transformations to the VU0. This allows you to calculate your vector and matrix in parallel with main processing still being handled by the CPU. VU0 and Vu1 are similar in operation the big difference between the units is VU0 has only 4Kb of memory compared to VU1 16Kb. This limits the amount of geometry calculations the VU0 can process. It also has no direct path to GS and so calculations either will be transferred to main memory or will interrupted based with bandwidth on the DMA bus.There is a development library from SCE that allows quick use of the VU0. These functions are all written in MacroMode and so stall the CPU during operation.CPUVU0VU1GSFPUTransformationin parallel with CPU
18 VU Renderer (lighting, no animation) (typical 10-20 million polys) IOPSPUIPUMemoryDMA bus: 2.4Gb/secGeometryTextureNow this has shifted emphasis from VU0 to VU1….so where is the power coming from? The VU1 doubled buffer system/micro-code only/larger internal memory 16Kb/it also has a direct path to GS. Textures are also being streamed to the GS. DMA tags can be used for sync’ing Geometry with TextureNote that that if the GS is waiting for a texture for the processing geometry then VU1 will stall if the GS has not uploaded the texture. Load balancing the texture and geometry can help in certain situations, it can be difficult to implement.With the 1.2Gb bandwidth to the GS, the bandwidth is shared with Geometry and texture data. Geometry can take roughly 25% and Texture data 75%.If no actual game processes were happening and only the VU Render was in operation then you could expect around 20-30million polys, close to theoretical limit of 35million textured triangles.CPUVU0VU1GSFPUTransformation
19 Complete Game (lighting, animation) (typical 5-10 million polys) IOPSPUIPUMemoryDMA bus: 2.4Gb/secGeometryTextureThis is now bringing in VU0 to add more weight to your VU render. Roughly 22 million vertices per second can be processed by the CPU and VU0 tightly coupled together to prepare data for the next frame.CPUVU0VU1GSFPUTransformation
20 VRAM Layout 4MB Embedded memory 4MB of VRAM is split into 8K pages Buffer1Buffer2ZTexture4MB Embedded memory4MB of VRAM is split into 8K pagesPages split into 32 blocks of 256 bytesFrame buffers addressed by pageTextures addressed by blockAllowing multiple textures per pageAddress range: 32bits 0x0 -> 0xfffffDifferent buffer types require different start address alignments: Frame Buffer 2k words, Zbuffer 2k words, Texture buffer 64words, CLUT 64 words. So starting address is the address/the alignment size.
21 By Using Texture Size And Format 4MB of VRAM is split into 8K pagesPages split into 32 blocks of 256 bytesBlock position varies based on formatPossible to store multiple textures in 1 pageEG 16-Bit Texture Page28101391146121457131516182426171925272021283022232931GS has 4Mb of VRAM. This is split into 8k pages (AKA Texture Pages) which is then split in to 32 blocks of 256 blocks. When texture data is uploaded to the GS the GS is informed of the base pointer address to the texture page and the block position in that Tpage.Block positions are based on the texture format. If differing formats are used, then a situation where you upload one texture on top of another can happen as shown with the yellow and red texture.
22 GS Coordinate System Frame Buffers use a 16-bit coordinate system 12-bit integer . 4-bit fractionFull RangeTypically centre specified as (2048, 2048)Scissoring area specified based relative to this centre
23 GS Coordinate Scissoring 0,04096,4096X and Y Values are 16bitScissoring will not work outside that rangeNo hardware clippingThere is a VU clip instructionYXDiagram at the top shows the GS coordinate systems full range 0,0-4096,4096. The screen display (yellow) drawn overly large here is in the middle of the GS coordinate system. This GS will scissor the area relative to the screen centre.A problem below shows a huge triangle which is being drawn outside the scissoring area
24 Prevent The Thrashing Of Texture Cache Current texels read from Texture CacheOnly 8K in size or 1 Texture PageCosts to reload Texture CacheNo need to use PC-style 32-bit texturesToo many colours, takes up too much VRAMAiming for TV not a PC MonitorTexture Sizes that fit into Texture Cache4bit 128x128, 8bit 128x64 (with CLUT)16bit 64x64, 32bit 64x32Texture coordinates should be kept within these sizes for peak performance. Just one pixel outside the texture page buffer can halve or even quarter the fill-rate.Compared to PC style graphics. With only 4Mb of VRAM which contains a double buffered system and Z buffer, the best you can have is around 2Mb for textures. Now this doesn’t seem a lot but were as PC graphics cards are 16Mb+. PS2 does not have to have a large resolutions 1025*768 or greater to contend with. It also does not need large 32/24bit textures. The TV just cant show these in great detail. The PS2 is not connected to a PC monitor it’s a waste of resources, so..something windows programmers can appreciate there…
25 Instruction And Data Cache Issues Large Loops and JumpsLarge Objects/StructuresConsider the cost of useful C++ features (e.g. Templates) they can have a negative effectWhat can help?Breaking large loops into several smaller loopsCheck disassembly of code for inliningUn-cached Memory Access (0x )Scratchpad is the fastest memory you have direct access to, use as a main work area.Features of cache: Maintains CPU speed for data already in cache, High speed, Expensive to haveDynamic media such as geometry does not stay in the cache for any long length of time so large cache is unhelpful.Data throughput is more important cache checking would slow down DMA.C++ and I$ IssuesInstruction cache misses are most likely due to heavy implementation of the game in C++. This is fine for PC programs, which have large amounts of cache, this means large loops/jumps to other large sections will be in the cache. PS2 only has 16Kb for I$, so to improve performance remove large loops - its better to have 5 smaller loops. I$ miss and refills stall the CPU, this can be quite drastic and sometimes 50 cycles can pass before refill has completed. This is different to the PS1 I$ which would half the CPU speed until refill completion. Check the disassembly of the code to make sure that inlining has happened, sometimes the compiler is crap at doing this. It will cause a large amount of jumps, this will add to call overhead, which then creates more I$ misses than you would of expected. Also consider the cost of useful C++ features such templates since these typically have a negative effect on the Instruction and Data caches.D$ Issues/solutionAs a programmer when you need to write/read data to/from the main memory this information is placed in the data cache. Writing back to main memory involves you flushing the cache, so the main memory syncs with the cache.One problem with this is that flushing the cache is slow. To help with data D$ issues we recommend using uncached memory access therefore bypassing the cache and using Scratchpad as a general work area. Scratchpad has the appeal of having 1 cycle access and so is the fastest direct memory to access.
26 Vector Unit 0 Usage Emotion Engine Memory 32mb GS 4mb SIF IPU 128bit busDMAcacheVIFVIFGIFFPUEE COREVU0VU1Vu0 is greatly under used. Is suggested for taking some work off the CPU and this will help reduce Instruction Cache misses. Vu0 should be used for general vector/matrix functions rather than the FPU. We recommend that you should not use Vu0 as a co-processor in macro mode as the CPU can stall whilst the Vu0 is processing. A better option would be to use micro mode and create functions on the Vu0. This can then be called and will allow the CPU to carry on in parallel. All the CPU has to do is act on the result when it is available. VU0 is memory mapped in to the main memory so retrieval is easier.Suggested for taking some work off the CPU and help reduce I$ misses.Its not recommended to use VU0 in Macromode.Use Micromode and allow the CPU to carry on in parallel.
27 VIF Data Compression/Decompression Emotion EngineMemory32mbGS4mbSIFIPU128bit busDMAcacheVIFVIFGIFFPUEE COREVU0VU18-bitXYZ32-bit1.0Model data can be packed into optimal units, maintained in main memory and then can be automatically unpacked by the VIF. As a result data size in main memory is reduced and the load on the Vector Unit can be reduced. This is similar to the compression/decompression that Vertex Shaders provide.Compressed formats reduce memory size of model.Decompression from packed formats by the VIF, provides reduction load on VU.
28 Texture And Geometry Streaming Emotion EngineMemory32mbGS4mbSIFIPU128bit busDMA64 bitcacheVIFVIFGIFFPUEE COREVU0VU1DMA tags can be used for sync’ing Geometry with Texture. This can will not stall the VU1 whilst texture uploading is going on. To get around this Load balancing can be used, so that when a large amount of upload of Geometry is happening a large texture is also uploaded to the GS. Therefore less stalling for VU1 and the GS to contend with, although this sounds a reasonable way to stop VU1 stalling it can be quite difficult to implement.Theoretical speed transfers for 32/24/16 bit are around 1,200 MB/s, 8bit 900 MB/s and 4bit 600 MB/s. Realistically you should expect speeds around 32/24/16bit 1050Mb/s, 8bit 750MB/s and 4bit 450MB/s. It is recommended to achieve maximum throughput you need to swizzle the 4-bit and 8bit textures in to a 32bit format.1.2Gb/sec max bandwidth (24-meg/frame).GIF arbitrates between paths and packs data in to 64bit for GS.Watch priority ordering with paths to the GIF.
29 Summary The key to PS2 power is keeping the units busy Keeping data moving in parallel is the key to keeping the processors fed with data.DMA is the system which does this. This is the most crucial thing to understand to get performance on PS2.VRAM seems small but there are plenty of tricks.Cache issues… remember Scratchpad!Vector Unit 0 is underused.
30 ContactContact Information:SCEE Booth Exhibition Stand #9