Bilkent University
Department of Computer Engineering


Safe Computer Architectures


Hamzeh Ahangari
PhD Student
Computer Engineering Department
Bilkent University

The ability of a system to respond gracefully to an unexpected hardware or software failure is called fault tolerance. A fault can be either a defect in hardware or a bug in software that may be lead to incorrect result. In contrast, an error is a manifestation of the fault/bug, when a fault leads to a wrong result. Both faults and errors can spread through the system. For example, if a chip shorts out power to ground, it may cause nearby chips to fail as well. Errors can spread because the output of one unit is used as input by other units. If proper measures are not taken into account, errors eventually can bring the system into a failure. “Fault Tolerance” is founded on concept of redundancy. In engineering, redundancy is defined as replication of critical components of design to use them in case of failure instead of main parts.


DATE: 05 October, 2015, Monday @ 15:40