Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fault-Tolerant Design for Long- Life Deep Space Missions Yiğit Kültür 2006702835.

Similar presentations


Presentation on theme: "Fault-Tolerant Design for Long- Life Deep Space Missions Yiğit Kültür 2006702835."— Presentation transcript:

1 Fault-Tolerant Design for Long- Life Deep Space Missions Yiğit Kültür

2 Contents Introduction Fault-Tolerant System Considerations and Techniques Historical Perspective Future Approach Conclusion

3 Introduction Recently, planet Mars has been at the focal point of astronomical attention because Mars will play a key role in humanity’s expansion to the deep space Future Mars transportation will require reliable operations over a lifespan of years unlike:  Space Shuttle which requires operations over months  Space Station which is close enough to the Earth for maintenance logistics

4 Introduction Long operation period associated with deep space missions demands:  Innovative fault-tolerant technology development  Applications of advanced redundancy techniques To enable Mars exploration s afety, reliability and a utonomy must be improved A new technology plan to guide the development of the next generation fault tolerant computing technology

5 Fault Tolerant System Considerations Traditionally, avionic systems achieved fault- tolerance through redundancy management Redundancy management technique:  Detects and isolates a failure  Performs hardware roconfiguration A combination of self-monitoring and cross- comparison strategies lead to comprehensive fault coverage at reduced risk and cost

6 Fault Tolerant System Considerations Primary Flight Control System (PFCS) Baseline Requirements  Mission reliability: 0.95 success probability at 10 years with no repair  Throughput: 100 million instructions per second (MIPS)  Expandable I/O: 100 Mbits/sec  Expandable Memory: 1 GByte  Mass Storage Capacity: 1 Terabyte  Cycle Rate: 100 Hz  Hardware N-fail operation  Low life-cycle cost  Low power and mass  Radiation tolerance  Building block approach(Look for existing soultions to the parts of the problem and combine the soluitons)

7 Fault Tolerant Techniques for Mars Applications Ultra-reliable systems for long-life applications like human Mars exploration are required to sustain:  Permanent faults  Transient (temporary) faults  Intermittent (not continuous) faults  Timing faults  Latent (hidden) faults  Worst-case fault scenarios with a lower probability of occurence

8 Fault Tolerant Techniques for Mars Applications Distributed Architectures are more suitable to long-life space applications:  Function integration  Parallel computation  Graceful performance growth  Selective technology upgrade  Appropriate levels of function reliability  Graceful degradation of system capabilities in the presence of faults  Efficient use of hardware resources

9 Long-Life Unmanned Redundant Systems Historical Perspective VikingVoyagerGalileo

10 Historical Perspective Safety Critical High Reliability Systems Columbia Challenger Discovery Atlantis Endeavour

11 Long-Life Unmanned Redundant Systems Viking Viking is an instance of the pre-1970 Thermoelectric Outer Planets Spacecraft (TOPS) concept This spacecraft firstly introduced the use of computer as a fault manager, to attempt to reconfigure and restore the spacecraft to an operational configuration Fundamental strategy was to switch power on and off to various alternative subsystems until either the built-in fault monitoring indicated operation was restored, or until commands from the Earth are detected in the case of faults in the communication chain There was no real-time masking of faults, so if a fault occured during a maneuver, an incorrect maneuver would have been performed Viking Fault-Tolerant Architecture CCS: Command Computer Subsystem FDS: Flight Data Subsytem

12 Long-Life Unmanned Redundant Systems Voyager Like Viking, Voyager is an instance of the pre-1970 Thermoelectric Outer Planets Spacecraft (TOPS) concept. The improvement according to Viking is in only limited ways, such as the addition of a pair of seperate computers for the attitude and articulation control In both of them standby redundancy was used. The standby spares where cross-strapped so that either unit could be switched in to communicate with the other units Cross-strapping and switching allowed reconfiguration around failed components, either automatically or by the ground command Voyager Fault-Tolerant Architecture CCS: Command Computer Subsystem FDS: Flight Data Subsytem AACS: Attitude and Articulation Control Subsystem

13 Long-Life Unmanned Redundant Systems Galileo Galileo mission is a follow on to the Voyager Jupiter fly-by mission Galileo design borrows heavily from the experiences of the Voyager Block redundancy (An error checking method that generates a longitudal parity byte from a specified string or block of bytes on a longitudinal track.) is used throughout the subsystems All except CDS operates as an active/standby pair CDS operates as active redundancy wherein each block can issue independent commands, or they can operate in parallel on the same critical activity Galileo Fault-Tolerant Architecture CDS: Command and Data Subsystem AACS: Attitude and Articulation Control Subsystem

14 Long-Life Unmanned Redundant Systems Galileo The major departure from the Voyager arcihtecture is the extensive use of microprocessors and the consequent use of bus oriented architecture to facilitate communications among them Galileo on-board fault detection software is designed to alleviate the effects and symptoms of faults, rather than to pinpoint the exact faults. Fault identification and isolation are performed by the ground intervention Galileo Fault-Tolerant Architecture CDS: Command and Data Subsystem AACS: Attitude and Articulation Control Subsystem

15 Safety Critical High Reliability Systems Shuttles Operational differences from planetary probes:  being absolutely certain no fault propagates to the effectors during a relatively shorter operation cycle  rather than relying on fault monitors to interrupt processing and going through a reconfiguration, powering several redundant strings on and operating in parallel

16 Safety Critical High Reliability Systems Shuttles Conceptual Shuttle Orbiter Fault-Tolerant Architecture Voting occurs both in General Purpose Computers (GPC’s) and at the final effectors Voting is much more brute force than fault moitoring, requiring more hardware but also providing greater fault coverage Much more suited to real- time safety-critical maneuver control than a reconfiguration oriented strategy as in Viking, Voyager and Galileo GPC: General Purpose Computer

17 Mars Advanced Fault Tolerant Computing Approach Future Manned Mars Missions Parallel-Hybrid Redundancy will be the base for future long-life deep space missions:  It combines the attractive features of parallel processing and redundant computation  Computational elements can be arranged to provide high throughput or ultra reliability or a combination of them depending on the mission phase

18 Mars Advanced Fault Tolerant Computing Approach Future Manned Mars Missions Parallel-Hybrid Redundancy was first used in 1979 when Fault Tolerant Multi- Processor (FTMP) was designed and built:  FTMP used conventional shared memory multiprocessor architecture  Each virtual processor consisted of three real processors working as a triad to provide real- time fault masking  Upon detection of a fault in a processor, faulty unit is replaced from a pool of spares

19 Mars Advanced Fault Tolerant Computing Approach Future Manned Mars Missions Parallel-Hybrid Redundancy had certain drawbacks:  It was not explicitly designed to meet rigorous requirements of Byzantine resilience (Correctly functioning components of a Byzantine fault tolerant system will be able to reach the same group decisions regardless of Byzantine faulty components ) which is necessary to provide Coverage of random hardware faults Ultra-high reliability Ease of validation  It lacked ease of expandability due to redundant bus connections between processors and main memory  It did not support mixed redundancy because processors are aranged to work in triads regardless of the criticality of the application

20 Mars Advanced Fault Tolerant Computing Approach Future Manned Mars Missions FTPP Arcihtecture To solve the deficiencies of FTMP a new architecture called Fault Tolerant Parallel Processor (FTPP) was conceived It meets all requirements of random hardware faults FTPP will be the base of fault tolerance for future manned Mars missions

21 Mars Advanced Fault Tolerant Computing Approach Features of FTPP – Parallel Procesing FTPP Arcihtecture Parallel Processing is provided by: 40 Processing Elements (PEs) in 5 Fault Containment Regions (FCRs) 2 Input/Output Controllers (IOCs) per FCR

22 Mars Advanced Fault Tolerant Computing Approach Features of FTPP – Scalable Performance FTPP Arcihtecture Increasing the number of PEs in a single cluster create a communication bottleneck in the Network Elements (NEs) FTPP relies on hierarchical approach to scaling the performance by assebmling clusters via IOCs

23 Mars Advanced Fault Tolerant Computing Approach Features of FTPP – Mixed Redundancy FTPP Arcihtecture Most fault tolerant computers are designed to operate in a redundant mode only, which is a waste of resources for the uncritical tasks FTPP allows the processing elements to be configured as Simplex:non-critical tasks Triplex:tasks that require real-time fault masking Quadruplex or higher: when two or more sequential faults must be tolerated in a small time window without the benefit of reconfiguration In the figure: 4 quads 3 triplexes 15 simplexes

24 Mars Advanced Fault Tolerant Computing Approach Features of FTPP – Dynamic Reconfiguration FTPP Arcihtecture Mission consists of several phases such as launch, ascent, cruise from Earth orbit to Mars, Mars orbit injection, Mars landing For each phase the throughput, latency, iteration rates and criticality changes over a wide range, therefore the arcihecture must be flexible Reconfiguration from high throughput to high reliability 3 PEs which are operating as independent simplex elements can be synchronized to run the same task (S2,S3,S13) Replacing failed members A simplex in the same FCR as the failed member is synchronized with the non-failed members of the virtual group(Channel A of Q1 fails  S2,S7 or S12 can replace)

25 Mars Advanced Fault Tolerant Computing Approach Features of FTPP – Low Fault Tolerance Overhead FTPP Arcihtecture Frequent fault tolerant related functions such as fault/error detection, error masking(voting) and synchronization are implemented in the Network Element Less frequent functions such as identification of faulty modules, reconfiguration and reintegration are implemented in software which executes on PEs. Each NE services 8 PEs

26 Mars Advanced Fault Tolerant Computing Approach Features of FTPP – Open Architecture FTPP Arcihtecture FTTP provides open architecture for both hardware and software including: Processors I/O modules Fiber optic links Operating Systems

27 Mars Advanced Fault Tolerant Computing Approach Features of FTPP – Small Physical Size FTPP Arcihtecture Key element of meeting the weight, volume and power requirements is the packaging technology Multi-Chip Modules (MCMs) will be used: A NE on a single MCM with less than 4 cm 2

28 Conclusion Future manned deep space missions will require reliable operation over years and real-time masking of critical faults Current approaches are not enough and a new fault tolerant approach is needed FTPP is a powerful candidate for the spacecraft which will bring the humans to Mars

29 References Advanced fault tolerant computing for future manned space missions Benjamin, A.L.; Lala, J.H.; Digital Avionics Systems Conference, th DASC., AIAA/IEEE Volume 2, Oct Page(s): vol.2 NASA Website Computers in Spaceflight: The NASA Experience NASA Jet Propulison Laboratory Website Voyager: The Interstellar Mission


Download ppt "Fault-Tolerant Design for Long- Life Deep Space Missions Yiğit Kültür 2006702835."

Similar presentations


Ads by Google