Department of Computer Engineering
CS 690 SEMINAR
Safe Computer Architectures
Computer Engineering Department
The ability of a system to respond gracefully to an unexpected hardware or software failure is called fault tolerance. A fault can be either a defect in hardware or a bug in software that may be lead to incorrect result. In contrast, an error is a manifestation of the fault/bug, when a fault leads to a wrong result. Both faults and errors can spread through the system. For example, if a chip shorts out power to ground, it may cause nearby chips to fail as well. Errors can spread because the output of one unit is used as input by other units. If proper measures are not taken into account, errors eventually can bring the system into a failure. Fault Tolerance is founded on concept of redundancy. In engineering, redundancy is defined as replication of critical components of design to use them in case of failure instead of main parts.
DATE: 05 October, 2015, Monday @ 15:40