Presentation is loading. Please wait.

Presentation is loading. Please wait.

3. Hardware Redundancy Reliable System Design 2010 by: Amir M. Rahmani.

Similar presentations


Presentation on theme: "3. Hardware Redundancy Reliable System Design 2010 by: Amir M. Rahmani."— Presentation transcript:

1 3. Hardware Redundancy Reliable System Design 2010 by: Amir M. Rahmani

2 matlab1.ir Forms of Redundancy Hardware redundancy – add extra hardware for detection or tolerating faults Software redundancy – add extra software for detection and possibly tolerating faults Information redundancy – extra information, i.e. codes Time redundancy – extra time for performing tasks for fault tolerance

3 matlab1.ir Types of Hardware Redundancy Fault Tolerance requires Redundancy 1- Static Redundancy (that is Passive) uses fault masking to hide occurrence of fault does not require reconfiguration Example: TMR, Voting 2- Dynamic Redundancy (that is Active) uses comparison for detection and/or diagnoses requires reconfiguration remove faulty hardware from system Example: Stand-by system 3- Hybrid Redundancy combination of static & dynamic redundancy

4 matlab1.ir 1- Static Redundancy A class of redundancy techniques that can tolerate faults without reconfiguration (failover). Static redundancy can be divided into two major subclasses: Masking redundancy Active redundancy

5 matlab1.ir Masking Redundancy Uses majority voting to mask faults Requires 2f +1 modules to tolerate f faulty modules N-Modular Redundant system (NMR) N independent modules replicate the same function – parallelism – results are voted on – requirements: N >= 3 TMR (Triple Modular Redundancy)

6 matlab1.ir Triple Modular Redundancy (TMR) e.g. Majority voting. 1-bit majority voter (3 AND gates ORed)

7 matlab1.ir Triple Modular Redundancy (TMR)

8 matlab1.ir Masking Redundancy TMR with triple voting

9 matlab1.ir Masking Redundancy Multi-stage TMR

10 matlab1.ir N-Modular Redundant system (NMR)

11 matlab1.ir Active Redundancy Two or more units are active and produce replicated results simultaneously Relies on fail-stop units Fail-stop property: a unit produces correct results or no results at all Requires f +1 modules to tolerate f faulty modules

12 matlab1.ir Fail-stop Nodes Node 1 and 2 send their results individually to node 3 and 4 All nodes are fail-stop: They send correct results or no results at all

13 matlab1.ir 2- Dynamic Redundancy Relies on error detection and reconfiguration Requires f +1 modules to tolerate f faulty modules May require recovery of system or application state May require outage time

14 matlab1.ir Example: Duplicate and Compare – can only detect, but NOT diagnose i.e. fault detection, no fault-tolerance – may order shutdown – comparator is single point of failure simple implementation: 2 input XOR for single bit compare

15 matlab1.ir Example: Stand-by System E.g. communications checksums and memory parity bits – only one module is driving outputs – other modules are: idle => hot spares shut down => cold spares – error detection => switch to a new module (hot or cold spares)

16 matlab1.ir Types of Stand-by Systems Hot standby Warm standby Cold standby

17 matlab1.ir Hot Stand-by Characteristics Spare updated simultaneously with primary module + Advantages + Very short or no outage time + Does not require recovery of application - Drawbacks - High failure rate (fault rate) - High power consumption

18 matlab1.ir Warm Stand-by Characteristics Spare up and running Needs to recover application status + Advantages + Does not require simultaneous up-dating of spare and primary module - Drawbacks - Requires recovery of application state - High fault rate - High power consumption

19 matlab1.ir Cold Stand-by Characteristics Spare powered-down + Advantages + Low failure rate (fault rate) + Low power consumption Satellite application - Drawbacks - Very long outage time - Needs to boot kernel/operating system and recover application status.

20 matlab1.ir 3- Hybrid Redundancy N-Modular Redundancy with spares – N active + S spare modules (off-line) – Voting and comparison – Replaces erroneous module from spare pool

21 matlab1.ir N-Modular Redundancy with spares

22 matlab1.ir Coding checks / Exception checks Coding checks Error detection codes are formed by the addition of check bits to a data word. A cyclic redundancy code check was used in the disk store of ESS. A parity bit was used in the RAM Exception checks Hardware constraints: Usually result from the inability of the hardware to provide the better service needed by the software. Examples Improper address alignment Unequipped memory locations Unused op-code Stack overflow

23 matlab1.ir Watchdog Timers So far, we’ve figured out how to detect when something is wrong … but how do we detect when we’re not doing anything at all? Watchdog timer monitors a module and triggers a recovery if the module doesn’t do anything in a given amount of time – E.g., put a watchdog timer on a microprocessor bus Who watches the watchdog? – If we assume single fault scenario, then this usually isn’t a problem – But what if watchdog has hard fault that causes it to never timeout and trigger a recovery?


Download ppt "3. Hardware Redundancy Reliable System Design 2010 by: Amir M. Rahmani."

Similar presentations


Ads by Google