Presentation on theme: "Fault-Tolerant Delay-Insensitive Inter-Chip Communication Yebin Shi Apt Group The University of Manchester."— Presentation transcript:
Fault-Tolerant Delay-Insensitive Inter-Chip Communication Yebin Shi Apt Group The University of Manchester
Outline SpiNNaker Inter-Chip interconnect Basic Transmitter and Receiver Potential Problems with the Designs Robust Transmitter and Receiver Future work and conclusion
Research Aims Investigate the impact of transient glitches at inter-chip wires on the interface circuits. Redesign the link interface circuits to increase glitch-resistance and avoid deadlock.
SpiNNaker Network infrastructure: – 6 bidirectional inter-chip links – delay-insensitive on-chip and inter-chip communication – Packets are variable-length, serialized in 4-bit flits, with end-of-packet marker – 1 Gb/s throughput per link
Inter-Chip Communication Inter-Chip Network: – 2of7 data encoding – 2-phase (NRZ) handshake – data and control in single stream On-Chip Network: – 3of6 data encoding – 4-phase (RTZ) handshake – separate data and control channels
Link Transmitter - data channel: pipeline for code and phase conversion - ctrl channel: merge EoP symbol into the data stream
Link Receiver - data channel: phase and code conversion pipeline - ctrl channel: Extract EoP symbols from stream
Glitch Impact on Simulation Automatic packet data generation CRC scheme included for result verification Random generation of transient glitches –injected onto the inter-chip link –Single Event Upset (SEU) scenarios Configurable frequency and duration of glitches –Frequency: up to ½ glitch/packet –duration scale: ns Extensive simulation –a large number of densely packed glitches over 1M packets –speed-up fault simulation
Fault effects in the Transmitter Deadlock risks: – A transient glitch may corrupt a 2-of-7 symbol, leading to handshaking failure. – Phase-sensitive phase converter. – Independent reseting.
Fault Effects in the Receiver Deadlock risks: – A corrupted 2-of-7 symbol may prevent completion of conversion to 3of6. – Independent reseting.
Deadlock in Receiver - a glitch occurs when dout_cd is in transit - a wrong value stored in the bottom latch - a conversion failure for next data conversion
Robust 2-ph to 4-ph Conversion phase-insensitive converter: – Used in 2-phase ack input to the Transmitter. – Used in 2-phase data inputs to the Receiver. reset signal not shown
Receiver Phase Converter acki also triggers the ack signal back to the transmitter
Code conversion with Priority Arbitration – support full set of 2-of-7 code – convert invalid symbols into a valid one – stop propagation of invalid symbols containing more than 2 transitions
Independent Reset –An extra, possibly redundant, transition is created after reset in case the Tx is waiting for an acknowledge token. –The phase-insensitive converter for ack2 in TX absorbs the extra token if it is not needed.
Simulation results Simulation results for 1 million packets sent Items \ DesignsOriginal I/F Proposed I/F Glitches478,280390,357 Successfully Received Packets 916,684863,182 Deadlock7,6327 Performance (ns/symbol) 1715 Area(um 2 ) – Significantly reduced deadlock occurrence. – worse packet loss. – trivial area overhead. – increased throughput.
Conclusions and Future work Enhance the resistance to transient glitches in inter-chip links by replacing phase converters. Avoid deadlocks by hardening completion detection modules in the receiver. Remove corrupt symbols by applying an arbitration scheme for symbol conversions. Allow independent chip resets without introducing deadlocks by sending safe, possibly redundant tokens (data or ack) on reset. A generalized approach for circuit evaluation, including the computation of safety margins. Investigation into the impact of back-pressure on glitch resistance.