Presentation is loading. Please wait.

Presentation is loading. Please wait.

INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler.

Similar presentations

Presentation on theme: "INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler."— Presentation transcript:

1 INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler

2 INTEL CONFIDENTIAL 1 The Front End Multiplexed FET Branch Pred IMEM PC Resolve Inst Q I$ ITLB 1110 1 2 0 0 first deq slot enq or drop 1 fault mispred 1 training pred rspImm rspDel 1 1 redirect 1 vaddr (from Back End) vaddr 0 (from Back End) paddr 0 1 Line Pred 0 0 inst or fault Legend: Ready to simulate? CPU 1 No CPU 2 FETIMEM

3 INTEL CONFIDENTIAL 2 On-Chip Networks in a Time-Multiplexed World

4 INTEL CONFIDENTIAL 3 Problem: On-Chip Network CPU L1/L2 $ msgcredit Memory Control r r r r [0 1 2] CPU 0 L1/L2 $ CPU 1 L1/L2 $ CPU 2 L1/L2 $ r router msg credit Problem: routing wires to/from each router Similar to the “global controller” scheme Also utilization is low

5 INTEL CONFIDENTIAL 4 Router 0..3 Multiplexing On-Chip Network Routers Router 3 Router 0 Router 2 Router 1 curto 1to 2to 3fr 1fr 2fr 3 0 1 2 3 0 0 0 1 1 123 2 23 3 reorder σ(x) = (x + 1) mod 4 σ(x) = (x + 2) mod 4 σ(x) = (x + 3) mod 4 123 0 0 0 1 1 2 23 3 Simulate the network without a network

6 INTEL CONFIDENTIAL 5 On-Chip Network Model Multiplexed Topology

7 INTEL CONFIDENTIAL 6 HAsim’s Network Model is Abstract In a software model the target network can be built at run-time Dynamism is expensive in FPGAs and recompilation is slow Solution: Constrained dynamism – Fixed parameters: Max nodes, max edges per node, max VCs – Dynamic: Number of active contexts (nodes) Endpoints of each edge (indirection table) Routing table Address mapping of distributed LLC

8 INTEL CONFIDENTIAL 7 Topology Manager Software – runs once at startup so no need to optimize HASIM_CHIP_TOPOLOGY_CLASS: – Manages streaming of parameters to the FPGA – Iterates over all software topology mapping classes until convergence Namespace defined by dictionaries –.dic files are preprocessed by LEAP tools – Hierarchy of enumerated types

9 INTEL CONFIDENTIAL 8 How do I… Map address ranges to LLC segments? Map target cores to nodes? Pick a number of memory controllers and map them to nodes? Define a target machine network topology? Manage interleaving for multiplexing the network and cores?

10 INTEL CONFIDENTIAL 9 Map Address Ranges to LLC Segments (SW) Build a table of n_llc_map_entries, where each entry is an index to a portion of the distributed LLC. icn-mesh.cpp: for (int addr_idx = 0; addr_idx SendParam(TOPOLOGY_NET_LLC_ADDR_MAP, &cores_net_pos[addr_idx % num_cores], sizeof(TOPOLOGY_VALUE), is_last); }

11 INTEL CONFIDENTIAL 10 Map Address Ranges to LLC Segments (FPGA) Consume the table that was streamed in from SW last-level-cache-no-coherence.bsv: // Define a node that will stream in the topology. This builds a node // on a ring. The node looks for messages tagged TOPOLOGY_NET_MEM_CTRL_MAP // and emits associated payloads. let ctrlAddrMapInit <- mkTopologyParamStream(`TOPOLOGY_NET_MEM_CTRL_MAP); // Allocate a local memory and initialize it with the streamed-in entries. LUTRAM#(Bit#(TLog#(TMul#(8, MAX_NUM_MEM_CTRLS))), STATION_ID) memCtrlDstForAddr <- mkLUTRAMWithGet(ctrlAddrMapInit); // Map an address to a node ID using the table function STATION_ID getMemCtrlDstForAddr(LINE_ADDRESS addr); // Use the low bits of the address as the index (resize does this). return memCtrlDstForAddr.sub(resize(addr)); endfunction

12 INTEL CONFIDENTIAL 11 Map Address Ranges to LLC Segments (LLC Hub) rule... // Incoming request from core if (m_reqFromCore matches tagged Valid.req) begin // Which instance of the distributed cache is responsible? let dst = getLLCDstForAddr(req.physicalAddress); if (dst == local_station_id) begin // Local cache handles the address. if (can_enq_reqToLocalLLC &&& ! isValid(m_new_reqToLocalLLC)) begin // Port to LLC is available. Send the local request. did_deq_reqFromCore = True; m_new_reqToLocalLLC = tagged Valid LLC_MEMORY_REQ { src: tagged Invalid, mreq: req }; debugLog.record(cpu_iid, $format("1: Core REQ to local LLC, ") + fshow(req)); end end else if (can_enq_reqToRemoteLLC && ! isValid(m_new_reqToRemoteLLC)) begin // Remote cache instance handles the address and the OCN request port is available. // // These requests share the OCN request port since only one type of request goes to // a given remote station. Memory stations get memory requests above. LLC stations get // core requests here. did_deq_reqFromCore = True; m_new_reqToRemoteLLC = tagged Valid tuple2(dst, req); debugLog.record(cpu_iid, $format("1: Core REQ to LLC %0d, ", dst) + fshow(req)); end end... endrule

13 INTEL CONFIDENTIAL 12 Map Cores and Memory Controllers to Nodes All computed (currently) in icn-mesh.cpp Given number of target cores and number of memory controllers: – Builds a rectangle of cores as close to square as possible – Adds a row of memory controllers at the top and bottom – Topology streamed to FPGA using same mechanism as address mapping E.g., 15 cores and 3 memory controllers: x M M x C C C C C C C C C C C C C C C x x M x x

14 INTEL CONFIDENTIAL 13 Network Topology: Map Cores/Memory Controllers to Nodes Multiplexed order of nodes is the same as order of cores – No permutations required for local port Nodes are connected to: – Core – Memory controller – Nothing The node doesn’t care what is connected! Hide indirection in ports

15 INTEL CONFIDENTIAL 14 Network Topology: Map Cores/Memory Controllers to Nodes In icn-mesh.bsv: // // Local ports are a dynamic combination of CPUs, memory controllers, and // NULL connections. // // localPortMap indicates, for each multiplexed port instance ID, the type // of local port attached (CPU, memory controller, NULL). // let localPortInit <- mkTopologyParamStream(`TOPOLOGY_NET_LOCAL_PORT_TYPE_MAP); LUTRAM#(Bit#(TLog#(TAdd#(TAdd#(MAX_NUM_CPUS, 1), NUM_STATIONS))), Bit#(2)) localPortMap <- mkLUTRAMWithGet(localPortInit); PORT_SEND_MULTIPLEXED#(MAX_NUM_CPUS, OCN_MSG) enqToCores <- mkPortSend_Multiplexed("Core_OCN_Connection_InQ_enq"); PORT_SEND_MULTIPLEXED#(MAX_NUM_MEM_CTRLS, OCN_MSG) enqToMemCtrl <- mkPortSend_Multiplexed("ocn_to_memctrl_enq"); PORT_SEND_MULTIPLEXED#(NUM_STATIONS, OCN_MSG) enqToNull <- mkPortSend_Multiplexed_NULL(); let enqToLocal <- mkPortSend_Multiplexed_Split3(enqToCores, enqToMemCtrl, enqToNull, localPortMap);

16 INTEL CONFIDENTIAL 15 Network Topology: Defining Inter-Node Edges Each network node: Local N E S W

17 INTEL CONFIDENTIAL 16 Network Multiplexing Logically, there are n nodes in the network. Each has a local port connected either to a core, to memory or to nothing. Network connection mapping and routing will determine the topology. Topology manager defines the routing table. Note: Dateline not yet implemented

18 INTEL CONFIDENTIAL 17 Network Topology and Routing Torus:

19 INTEL CONFIDENTIAL 18 Network Topology and Routing Mesh (connections identical, routing table ignores some edges):

20 INTEL CONFIDENTIAL 19 Network Topology and Routing Bi-directional ring:

21 INTEL CONFIDENTIAL 20 Network Topology and Routing Uni-directional ring:

22 INTEL CONFIDENTIAL 21 Router 0..3 Final Problem: Multiplexing On-Chip Network Routers Router 3 Router 0 Router 2 Router 1 curto 1to 2to 3fr 1fr 2fr 3 0 1 2 3 0 0 0 1 1 123 2 23 3 reorder σ(x) = (x + 1) mod 4 σ(x) = (x + 2) mod 4 σ(x) = (x + 3) mod 4 123 0 0 0 1 1 2 23 3

23 INTEL CONFIDENTIAL 22 Network Topology: Communication Across Multiplexed Nodes Each node talks to a different multiplexed node instance Naïve port binding would have each node talk only to itself A-Ports are already buffered Bury transformation in A-Ports Retain simple read next / write next port semantics within models

24 INTEL CONFIDENTIAL 23 Network Topology: Communication Across Multiplexed Nodes icn-mesh.bsv: // Initialization from topology manager ReadOnly#(STATION_IID) meshWidth <- mkTopologyParamReg(`TOPOLOGY_NET_MESH_WIDTH); ReadOnly#(STATION_IID) meshHeight <- mkTopologyParamReg(`TOPOLOGY_NET_MESH_HEIGHT); // Outbound and inbound ports are loopbacks to the same multiplexed module. Ports connect // to logically different nodes but physically to the same simulator object. Vector#(NUM_PORTS, PORT_SEND_MULTIPLEXED#(NUM_STATIONS, MESH_MSG)) enqTo = newVector(); Vector#(NUM_PORTS, PORT_RECV_MULTIPLEXED#(NUM_STATIONS, MESH_MSG)) enqFrom = newVector(); // Outbound port is a normal A-Port. It has no buffering. enqTo[portEast] <- mkPortSend_Multiplexed("mesh_interconnect_enq_E"); // Inbound port provides buffering for multiplexing. Instead of forwarding messages FIFO // it must transform the messages so they cross to the correct multiplexed instance when // instances (nodes) are traversed sequentially. enqFrom[portWest] <- mkPortRecv_Multiplexed_ReorderLastToFirstEveryN("mesh_interconnect_enq_E", 1, meshWidth, meshHeight);... enqFrom[portEast] <- mkPortRecv_Multiplexed_ReorderFirstToLastEveryN("mesh_interconnect_enq_W", 1, meshWidth, meshHeight); enqFrom[portSouth] <- mkPortRecv_Multiplexed_ReorderFirstNToLastN("mesh_interconnect_enq_N", 1, meshWidth); enqFrom[portNorth] <- mkPortRecv_Multiplexed_ReorderLastNToFirstN("mesh_interconnect_enq_S", 1, meshWidth);

Download ppt "INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY HAsim On-Chip Network Model Configuration Michael Adler."

Similar presentations

Ads by Google