Reliability & Failure Analysis

EMT 480/3: RELIABILITY &
FAILURE ANALYSIS
original version by
Noraini Othman
Edited by Hasnizah Aris
Lecture contents
1.
2.
3.
4.
5.
6.
Reliability Engineering
Design for Reliability (DFR)
Reliability Prediction Techniques
FMEA
FTA
Reliability Life Testing
Terms & Definitions
Reliability
the ability of a product to conform to its
electrical and visual/mechanical specifications
over a specified period of time under
specified conditions at a specified confidence
level
Reliability Engineering
refers to the development of
processes and standards to
reliability semiconductors during
Focuses
on
eliminating
requirements
technology,
ensure the
applications.
maintenance
Reliability Monitoring
consists of getting finished product samples from the line and
subjecting these to reliability testing. Valid reliability failures
should undergo root cause analysis for reliability improvement
Wafer-level Reliability Testing

once an integrated circuit has been designed and the first
silicon comes out, reliability tests at wafer-level are done to
assess the reliability of the die
Package-level Reliability Testing

refers to the assessment of the overall reliability of the device
in a packaged form.
New Product Qualification

operationally the same as package-level
reliability testing, except that it is systemized
with the objective of generating official
reliability data that would justify the mass
production of a new product.
Designing for Reliability (DFR)
The concept is to exert as much effort as

possible to design a product to be inherently
reliable
This consist of following all known design

rules for making a product reliable, not only
electrically but visually and mechanically as
possible
Building reliability into a product as early as

the design phase is a must.
Reliability design begins with the specification of

reliability goals consistent with cost and
performance objective
These goals must be translated into individual

component,
subcomponent
and
part
specifications
Various design methods are then applied in

order to meet the goals (such as stress-strength
analysis, simplification etc.)
A failure analysis is then performed to determine

whether specifications are being met and to
provide a systematic approach for identifying,
ranking and eliminating failure modes
If either the reliability or the safety goals are not

met, the design process must continue
Often, it may require a complete redesign
Reliability Design
In summary, an excellent reliability engineering

system would have all of the following
components:
(a) design for reliability
(b) wafer-level reliability testing
(c) package-level reliability testing
(d) new product/process qualification
(e) reliability monitoring
What is Reliability Prediction

Techniques?
Reliability prediction is a design-assist process by

which the reliability characteristics of a system are
obtained, by calculating the anticipated system RAMS
(Reliability, Availability, Maintainability and Safety-Integrity)
from assumed component failure rates.
The Importance of Reliability Prediction:

(a) provides early indication of a systems potential to meet
the design reliability requirements
(b) enables assessment of life-cycle costs to be carried out
(c) enables one to establish which components, or areas in a
design contribute to the major portion of unreliability
(d) enables trade-offs to be made, as for eg. between
reliability and maintainability in achieving a given availability
Why Reliability Prediction Techniques

is Needed?
Traditionally, reliability has been achieved through

extensive testing and use of techniques such as
probabilistic reliability modeling (These are techniques
done in the late stages of development)
The challenge is to design in quality and reliability

early in the development cycles
Reliability of a device could be known up-front, during

the design phase and before the device is
manufactured
This could avoid costly redesign cycles.
What Are The Factors That Affect The Reliability Performance

of Electronic Components?
Material quality
Operating temperature
Vibration and miscellaneous mechanical factors
Electrical stress levels
Introduction to FMEA
(a) Introduction
FMEA stands for Failure Modes and Effects Analysis
It is a methodology designed
(i) to identify potential failure modes for a product or
process;
(ii) to assess the risk associated with those failure modes;
(iii) to rank the issues in terms of importance; and
(iv) to identify and carry out corrective actions to address the
most serious concerns
For easy understanding, just remember that FMEA

is intended to
document:
(i) a Failure
(ii) its Mode
(iii) its Effects
(iv) by Analysis
(b) Types of FMEA

There are 2 standard categories of FMEA:
Design FMEA (DFMEA)
addresses potential failure modes arising during
design of components and subsystems
Process FMEA (PFMEA)

addresses potential failure modes arising during
manufacturing and assembly processes
(c) Process for Conducting FMEA
The process for conducting FMEA is summarized as

follows:
(a) Describe product or process
(b) Define Functions
(c) Identify Potential Failure Modes
(d) Describes Effects of Failures
(e) Determine Causes
(f) Direction Methods or Current Controls
(g) Calculate Risks use Risk Priority Number (RPN)
(h) Take Action
(i) Assess Results
A typical FMEA incorporates some method to

evaluate the risk associated with the potential
problems identified through the analysis. One of it
is by using the Risk Priority Numbers (RPN)
To use RPN method to assess risk, the analysis

team must:
(a) Rate the severity of each effect of failure
(b) Rate the likelihood of occurrence for each
cause of failure
(c) Rate the likelihood of prior detection for
each
cause of failure
(d) Calculate the RPN by obtaining the
product
of = Severity x Occurrence x
RPN
the three ratings:
Detection
An Example of FMEA Hazard Assessment
(d) Benefits of FMEA
Improve product/process reliability and quality
Increase customer satisfaction
Early identification and elimination of potential

product/process failure modes
Prioritize product/process deficiencies
Capture engineering/organization knowledge
Introduction to Fault Tree Analysis

(FTA)
(a) What is a FTA?
FTA stands for Fault Tree Analysis
It is a graphical representation of the major faults

or critical failures associated with a product, the
causes
for
the
faults,
and
potential
countermeasures
The tool helps identify areas of concern for new

product design or for improvement of existing
products. It also helps identify corrective actions
to correct or mitigate problems
In a Fault Tree, one works in a failure space, and

looks at system failure combinations
(b) When to use it?

FTA is useful both in designing new
products/services or in dealing
with identified problems in existing
products/services.
In the quality planning process, the analysis can be

used to optimize
process features and goals and to design for
critical factors and
human error. As part of process improvement, it
can be used to
help identify root causes of trouble and to design
remedies and
(c) Basic Constructs of FTA
The basic constructs in a Fault Tree Diagram are

(a) gates (~ represent conditions)
(b) events (represent the system failure mode)
The two most commonly used gates are:

(a) AND gates
(b) OR gates
If occurrence of either event causes the top event to

occur, then these events (blocks) are connected
using an OR gate
Alternatively, if both events need to occur to cause

the top event to occur, they are connected by an
AND gate
Example:
For the Top Event to occur, either A or B must
happen. In other
words, failure of A or B, causes the system to fail.
Block Diagram
equivalent
=
? Reliability
Symbols used in FTA
(d) How to Perform FTA in 6 steps
1. Select a top level event for analysis. Try to be specific, for

example, Email server down for more than 4 hours. Sources of
top level events include: Problem/Known Error Records;
potential failures from brainstorming; etc.
2. Identify faults that could lead to the top level event.

Continuing the above example, some possible faults leading to an
outage lasting more than four hours might be loss of power,
another might be hardware failure. List all the faults under the
top level event in boxes and connect the fault boxes to the top
level event box by drawing lines.
3. For each fault, list as many causes as

possible in boxes below the related fault.
Continuing the example above, in the case of loss
of power," some causes might be electrical
outage, power supply failure, and so on. Connect
the boxes to the appropriate fault box.
4. Draw a diagram of the fault tree." Two logic

operators AND and OR, also known as logic gates
are used to represent the sequencing of faults and
causes. For example, Email server down for more
than 4 hours could be caused by loss of power or
hardware fault." Another might be loss of building
power and battery backup exhausted.
Update faults and causes by grouping logically
related items using AND or OR between faults and
events; and faults and causes. Re-draw the lines
from top level event to logic gates to faults to logic
gates to causes.
5. Continue identifying causes for each fault until you reach

a root cause (reactive FTA), or one that you can do
something about (proactive FTA). For example, the root cause
of power supply failure might be filter clogged;" the root cause
of battery backup exhausted might be battery backup too
small."
6. Consider countermeasures. A root cause is one you can do

something about; so now you need to think of the
countermeasures you might apply to each root cause. List
countermeasures for each root cause in a box under the root
cause. For example, for filter clogged a countermeasure might
be clean filter monthly. Link the countermeasure to the root
cause by drawing a line.
Example:
Solution:

(a) Objectives
Measure the reliability performance of the product
Provide a level of confidence for a new product

designs reliability performance
Verify aspects of the functionality of the design
Determine the breaking point of the product
Uncover any weaknesses in components or the

packaging of the design and develop appropriate
corrective action

(b) Burn-In and Screening
Burn-in:
A process of operating items at elevated stress
levels (particularly temperature, humidity and
voltage) in order to accelerate the processes
leading to failure. The populations of defective
items are thus reduced
Screening:
An enhancement to Quality Control whereby
additional detailed visual and electrical/mechanical
tests seek to reveal defective features which would
otherwise increase the population of weak items

(c) Stress testing
In stress testing, a device is stressed until it fails.

Stresses can be classified as environmental or
self-generated
The environmental stresses may be any

combination of temperature, vibration, humidity,
shock or ingression of foreign bodies
The self-generated stress includes power

dissipation, applied voltage and current, selfgenerated vibration and wear.

(d) Environmental Stress Screening (ESS)
ESS is a screening process in which a product is

subjected to environmentally generated stress to
precipitate latent product defects
ESS techniques can precipitate latent failures,

which cannot be detected with electrical testing
or visual inspection, so that infant-mortality cases
can be eliminated and the product can enter the
useful-life phase of the bath-tub curve at the end
of the ESS testing
Several types of ESS testing available are listed as

follows:
(i) Temperature Cycling
(ii) Thermal Shock
(iii) Humidity Testing
(iv) Temperature, Humidity, Bias (THB) Testing
(i)
Temperature Cycling
()
Refers to the process in which a product is

subjected to multiple cycles of changing
temperatures between pre-determined extremes
at relatively high rates of change fatiguing and
causing inferior product to fail
()
Cycling will show at what temperature, both

high and low, a product will cease to function
properly
(ii) Thermal Shock
Refers to rapid temperature changes from extreme cold to

hot environment to thermally shocks and stresses a products
This causes permanent changes in electrical performance

and can cause sudden overloading of materials
Thermal shock failures are due to thermal mismatches or

materials with differences in rates of thermal expansion and
contraction
(iii) Humidity Testing
Humidity testing normally involves high heat to

aid in forcing water vapor through weakly
sealed components
Many electronic devices are susceptible to the

damaging effects of moisture both by direct
condensation and indirect effects
Direct condensation is where water comes out of

the air and forms droplets on a device
These droplets may find their way into the

device and attack sensitive components
Common effects include shorting of electrical

components and initiation of corrosive effects
Indirect effects are numerous
Example is moisture breaching sealed devices

which results in failures over time
(iv) Temperature, Humidity, Bias (THB) Testing
THB Testing is a reliability test designed to accelerate

metal corrosion, particularly that of the metallization on
the die surface of the device
Aside from temperature and humidity which are enough to

promote corrosion of metals in the presence of
contaminants, bias is applied to the device to provide the
potential differences needed to trigger the corrosion
process, as well as to drive mobile contaminants to areas of
concentration on the die
(e) Other Types of Test

- Marginal Testing : involves proving the various system
functions at the extreme limits of the electrical and
mechanical parameters
)
High Reliability Testing : example, verification of a

product MTBF of 106 hours involving a 2000 elapsed hours
of testing
- Testing for Packaging and Transport :

involves
consideration of waterproofing, hermetic seals, ventilation
holes, adequate padding, adequate storage facilities etc.
Many ALT of semiconductors involve

semiconductor properties are usually
temperature dependency
temperature as
have a strong
The most common accelerated test condition is as follows:

(a) Mechanical Shock
(b) Drop Shock (Test)
(c) Voltage Extremes
(d) High Humidity
(e) Random Vibration Test; etc

Reliability & Failure Analysis

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Reliability & Failure Analysis

Caricato da

Copyright:

Formati disponibili

EMT 480/3: RELIABILITY &

Edited by Hasnizah Aris

Terms & Definitions

Wafer-level Reliability Testing

Package-level Reliability Testing

New Product Qualification

Designing for Reliability (DFR)

The concept is to exert as much effort as

This consist of following all known design

Building reliability into a product as early as

Reliability design begins with the specification of

These goals must be translated into individual

Various design methods are then applied in

A failure analysis is then performed to determine

If either the reliability or the safety goals are not

Often, it may require a complete redesign

In summary, an excellent reliability engineering

What is Reliability Prediction

Reliability prediction is a design-assist process by

The Importance of Reliability Prediction:

Why Reliability Prediction Techniques

Traditionally, reliability has been achieved through

The challenge is to design in quality and reliability

Reliability of a device could be known up-front, during

This could avoid costly redesign cycles.

What Are The Factors That Affect The Reliability Performance

Vibration and miscellaneous mechanical factors

Electrical stress levels

FMEA stands for Failure Modes and Effects Analysis

For easy understanding, just remember that FMEA

(b) Types of FMEA

Process FMEA (PFMEA)

(c) Process for Conducting FMEA

The process for conducting FMEA is summarized as

A typical FMEA incorporates some method to

To use RPN method to assess risk, the analysis

An Example of FMEA Hazard Assessment

(d) Benefits of FMEA

Improve product/process reliability and quality

Increase customer satisfaction

Early identification and elimination of potential

Prioritize product/process deficiencies

Capture engineering/organization knowledge

Introduction to Fault Tree Analysis

FTA stands for Fault Tree Analysis

It is a graphical representation of the major faults

The tool helps identify areas of concern for new

In a Fault Tree, one works in a failure space, and

(b) When to use it?

In the quality planning process, the analysis can be

(c) Basic Constructs of FTA

The basic constructs in a Fault Tree Diagram are

The two most commonly used gates are:

If occurrence of either event causes the top event to

Alternatively, if both events need to occur to cause

Symbols used in FTA

(d) How to Perform FTA in 6 steps

1. Select a top level event for analysis. Try to be specific, for

2. Identify faults that could lead to the top level event.

3. For each fault, list as many causes as

4. Draw a diagram of the fault tree." Two logic

5. Continue identifying causes for each fault until you reach

6. Consider countermeasures. A root cause is one you can do