Download presentation
Presentation is loading. Please wait.
1
Status – Week 265 Victor Moya
2
Summary ShaderEmulator ShaderEmulator ShaderFetch ShaderFetch ShaderDecodeExecute ShaderDecodeExecute Communication storage classes Communication storage classes GPU design GPU design PS2 PS2 PS3 PS3 Imagine Imagine
3
ShaderEmulator Decoded information for emulation moved from ShaderEmulator to ShaderInstruction. Decoded information for emulation moved from ShaderEmulator to ShaderInstruction. Used macros for shader instructions. Used macros for shader instructions. Functions renamed. Functions renamed. Not tested yet. Not tested yet. Shader assembler? Shader assembler?
4
Shader Box Diagram
5
ShaderFetch Parameters: Parameters: numThreads: numThreads: Number of threads and input buffers. Number of threads and input buffers. numActiveThreads: numActiveThreads: Number of threads. Number of threads. issueRate: issueRate: Max. instructions fetched per cycle. Max. instructions fetched per cycle. retireRate: retireRate: Max. newPC inputs per signal??? Max. newPC inputs per signal???
6
ShaderFetch Signals Input: Input: ShaderCommand: ShaderCommand: NEW_INPUT: Inputs to the shader. NEW_INPUT: Inputs to the shader. LOAD_PROGRAM: Load a new Shader Program. LOAD_PROGRAM: Load a new Shader Program. LOAD_PARAMETERS: Load new parameters values. LOAD_PARAMETERS: Load new parameters values. ShaderNewPC: ShaderNewPC: Thread flow changes. Thread flow changes. End of thread (EXIT). End of thread (EXIT). ShaderDecodeState: ShaderDecodeState: Decoder/Execute ready/busy state. Decoder/Execute ready/busy state. ConsumerState: ConsumerState: Consumer state: ready/busy to receive shader outputs. Consumer state: ready/busy to receive shader outputs.
7
Shader Fetch Signals Output: Output: ShaderState: ShaderState: Shader ready to receive new inputs. Shader ready to receive new inputs. Shader ready to receive new program or new parameters (all previous inputs must have finished). Shader ready to receive new program or new parameters (all previous inputs must have finished). ShaderInstruction: ShaderInstruction: Shader instructions fetched. Shader instructions fetched. ShaderOutput: ShaderOutput: Shader output (an array of QuadFloats) sent to the consumer. Shader output (an array of QuadFloats) sent to the consumer.
8
ShaderFetch Clock Read from ShaderCommand: Read from ShaderCommand: Panic if Shader not ready for command. Panic if Shader not ready for command. NEW_INPUT: NEW_INPUT: Get free thread or free input buffer. Get free thread or free input buffer. Reset thread state. Reset thread state. LOAD_PROGRAM: LOAD_PROGRAM: Stores new program in Shader Emulator. Stores new program in Shader Emulator. LOAD_PARAMETERS: LOAD_PARAMETERS: Stores new parameters in Shader Emulator. Stores new parameters in Shader Emulator.
9
ShaderFetch Clock Read ShaderNewPC: Read ShaderNewPC: N reads. N reads. NEW_PC: NEW_PC: Update the thread PC. Update the thread PC. END_THREAD: END_THREAD: Mark thread as finished. Mark thread as finished. Read ShaderDecodeState: Read ShaderDecodeState: If decoder busy: If decoder busy: Update Shader state. Update Shader state. Write ShaderState. Write ShaderState.
10
ShaderFetch Clock Fetch new instructions: Fetch new instructions: Fetch N instructions. Fetch N instructions. Check if thread is ready (not finished, not free). Check if thread is ready (not finished, not free). Fetch instruction from ShaderEmulator. Fetch instruction from ShaderEmulator. Write ShaderInstruction. Write ShaderInstruction. Update thread PC (+1). Update thread PC (+1). Update thread instruction counter. Update thread instruction counter. If crosses instruction execution limit: If crosses instruction execution limit: –Finish thread. Update nextThread pointer (Round Robin). Update nextThread pointer (Round Robin).
11
ShaderFetch Clock Check shader output state: Check shader output state: Output transmission in progress. Output transmission in progress. Send more data. Write to ShaderOutput. Send more data. Write to ShaderOutput. Update state. Update state. Output to send. Output to send. Check consumer state. Check consumer state. Start sending data. Write to ShaderOutput. Start sending data. Write to ShaderOutput. Set state as send in progress. Set state as send in progress. NOT IMPLEMENTED YET. NOT IMPLEMENTED YET. Update Shader state: Update Shader state: Write ShaderSate. Write ShaderSate.
12
ShaderFetch Comments Finished threads act as buffers for output. Finished threads act as buffers for output. Add output buffers? Add output buffers? Consumer/Output protocol. Consumer/Output protocol. Output sizes. Output sizes. Output latency. Output latency. Multicycle transmission. Multicycle transmission. Order of operations in clock(). Order of operations in clock().
13
ShaderFetch Comments ‘Scheduling’ policy for threads? ‘Scheduling’ policy for threads? Pure Round Robin: Pure Round Robin: Free/Finished threads fetch NOPs. Free/Finished threads fetch NOPs. Round Robin with priority: Round Robin with priority: Free/Finished threads are skipped. Free/Finished threads are skipped. Other? Other? Memory: Memory: THREAD_BLOCK/THREAD_RESUME from Shader Memory box or from ShaderDecodeExecute box. THREAD_BLOCK/THREAD_RESUME from Shader Memory box or from ShaderDecodeExecute box. New thread state: blocked. New thread state: blocked.
14
ShaderDecodeExecute Parameters: Parameters: numThreads: numThreads: ShaderEmulator threads. ShaderEmulator threads. issueRate: issueRate: Max. number of instructions received per cycle. Max. number of instructions received per cycle. retireRate: retireRate: Max. number of NewPC signals per cycle. Max. number of NewPC signals per cycle.
15
ShaderDecodeExecute Signals: Signals: Input: Input: ShaderInstruction: ShaderInstruction: –New Instructions from Fetch. ShaderExec: ShaderExec: –Instructions that end their execution.
16
ShaderDecodeExecute Output: Output: ShaderDecodeState: ShaderDecodeState: –Decoder ready/busy. ShaderNewPC: ShaderNewPC: –NEW_PC: PC changes from branch, call and ret instructions. PC changes from branch, call and ret instructions. Instruction refetch? -> Thread blocked? Instruction refetch? -> Thread blocked? –END_THREAD: EXIT instruction. EXIT instruction. ShaderExec: ShaderExec: –Instructions that start their execution.
17
ShaderDecodeExecute Clock Read ShaderExec: Read ShaderExec: N finished instructions. N finished instructions. Update dependence tables. Update dependence tables. Free resource tables (limited Ufs?). Free resource tables (limited Ufs?).
18
ShaderDecodeExecute Clock If decode is blocked: If decode is blocked: Check resources for block instruction. Check resources for block instruction. If resources available: If resources available: Execute instruction in ShaderEmulator. Execute instruction in ShaderEmulator. Update dependence table. Update dependence table. Read exec. latency table. Read exec. latency table. Write ShaderExec. Write ShaderExec. Unblock decoder. Unblock decoder. Write ShaderDecodeState. Write ShaderDecodeState.
19
ShaderDecodeExecute Read ShaderInstruction: Read ShaderInstruction: N instructions. N instructions. Check dependences. Check dependences. If instruction has dependencies: If instruction has dependencies: Threw away instruction. Threw away instruction. Send NewPC with the same PC. Send NewPC with the same PC. Check resource tables. Check resource tables. If there are no resources for the instruction: If there are no resources for the instruction: Block decoder. Block decoder. Write ShaderDecodeState. Write ShaderDecodeState. Exit. Exit. Execute instruction in ShaderEmulator. Execute instruction in ShaderEmulator. Read instruction exec. latency table. Read instruction exec. latency table. Write ShaderInstruction. Write ShaderInstruction.
20
ShaderDecodeExecute Comments Additional features to implement: Additional features to implement: Out-of-order support. Out-of-order support. Instruction variable latency support. Instruction variable latency support. Simulate UF latency: Simulate UF latency: –Ex. DIV/SQRT unit. Load/Store support. Load/Store support. Add Memory box: Add Memory box: –Variable latency. –Thread block at first level memory miss. Signal THREAD_BLOCK (to fetch and decode). Signal THREAD_BLOCK (to fetch and decode). –Thread wakeup when read ends. Signal THREAD_RESUME (to fetch and decode). Signal THREAD_RESUME (to fetch and decode).
21
ShaderDecodeExecute Comments Resource hazards: Resource hazards: None: None: All UFs are fully pipelined with 1 cycle input latency). All UFs are fully pipelined with 1 cycle input latency). Supported: Supported: Block decoder. Block decoder. –Other threads could use free UFs. Block thread. Block thread. –Signal to fetch (ShaderNewPC): BLOCK_THREAD. –Needs a RESUME_THREAD signal? –Fetch hardware implementation? (not pure Round Robin?).
22
ShaderDecodeExecute Comments Dependence hazards: Dependence hazards: None : None : Same latency for all instructions. Same latency for all instructions. Enough threads to fill instruction latency. Enough threads to fill instruction latency. ‘Empty’ (free or already finished) threads fetch NOPs (pure Round Robin). ‘Empty’ (free or already finished) threads fetch NOPs (pure Round Robin). Hardware is less complex. Hardware is less complex.
23
ShaderDecodeExecute Comments Dependence hazards: Dependence hazards: Supported: Supported: Block decoder: Block decoder: –With multithread Shader it is waste. Block thread: Block thread: –Send a BLOCK_THREAD signal to fetch. –Needs a RESUME_THREAD signal. Ignore: Ignore: –Just ignore current instruction. –Send REFETCH (ShaderNewPC) with old PC. Instruction limit counter must not be updated!!! Instruction limit counter must not be updated!!!
24
Communication storage Communication between boxes: Communication between boxes: ShaderExecInstruction ShaderExecInstruction ShaderCommand ShaderCommand ShaderDecodeCommand ShaderDecodeCommand Dynamic: creation/destruction. Dynamic: creation/destruction. Class model or struct model? Class model or struct model? Inherit from a ‘dynamic data’ class. Inherit from a ‘dynamic data’ class. Modified new/delete implementation. Modified new/delete implementation.
25
GPU design Target architecture? Target architecture? NV30 NV30 DX9 DX9 DX10 DX10 OpenGL2 OpenGL2 PS3 PS3 Imagine Imagine Vector Vector Multithreaded Multithreaded Are we really going for it? Are we really going for it? Do we really know what we are doing? Do we really know what we are doing?
26
PS2 I got the EE, VU and GS programming manuals :). I got the EE, VU and GS programming manuals :).
27
PS3 Sony patent. Sony patent. I haven’t read it yet. I haven’t read it yet.
28
Imagine ‘Computer Graphics on a Stream Architecture’, John Douglas Owens, PhD dissertation. ‘Computer Graphics on a Stream Architecture’, John Douglas Owens, PhD dissertation. Not read yet either. Not read yet either.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.