2009 01 3274

2009-01-3274
The Systems Engineering Relationship between Qualification, Environmental

Stress Screening and Reliability
James A. Robles
The Boeing Company
Copyright 2009 SAE International
ABSTRACT
The Systems Engineering Relationship between
Qualification, Environmental Stress Screening (ESS),
and Reliability is often poorly understood: as a
consequence resources are expended on efforts that
degrade inherent hardware reliability and vitiate
reliability predictions. This article expatiates on the
Systems Engineering relationship between Qualification
and ESS, and how their proper application enhances
inherent reliability and supports credible reliability
predictions. Examples of how their uninformed
application degrades inherent hardware reliability and
vitiates reliability predictions, and how
program/equipment managers can avoid this, are
presented.
INTRODUCTION
There is a problem with the reliability of recently fielded
systems: Department of Defense (DoD) concerns have
been widely reported.
Emerging data shows that a significant number
of U.S. Army systems are failing to demonstrate
established reliability requirements during
operational testing and many of these are falling
well short of their established requirement. . . .
Enclosure 1 outlines the process for establishing
and reporting the new reliability threshold, as
well as a mechanism for detecting and reporting
threshold breaches. The routine use of this
process and the implementation of reliability
best practices (Enclosure 2) will help the Army
achieve its reliability requirements.
1
Ensure programs are formulated to execute a
viable systems engineering strategy from the
beginning, including a RAM growth program, as
an integral part of design and development.
2
I share your concerns regarding the recent
downward trend of reliability, availability, and
maintainability (RAM) test results, and agree
with your assessment that RAM considerations
must be strengthened as our weapons systems
move through the development and production
phases and into operational service.
3
This is not, as discussed below, a commercial-off-the-
shelf (COTS) vs. custom or military specification
design issue: the focus of the above references is on
Program Management and System Engineering
processes and best practices.
This focus on processes and practices is a positive
development; however it is essential that we get the
content right. Two areas where there seems to be
widespread failure to do so are the definition of durability
environments, and the application of ESS: showing why
this is so requires some review of fatigue engineering
principles, the bathtub curve, and the limitations of our
reliability prediction methods.
FATIGUE ENGINEERING PRINCIPLES
Materials will fatigue
4, 5, 6, 7
under the repeated
application of stresses and strains that do not cause
failure on the first application: imagine bending a paper
clip back and forth. For engineering materials the
stress level (SL) vs. cycles to failure typically plots as a
straight line on a log-log scale: this is a power law
SAE Int. J. of Aerosp. | Volume 2 | Issue 1 268
relationship (see ARP 5890
8
, Equation B-9). The stress
level may be expressed as pounds per square inch, as
strain (inch per inch), power spectral density (PSD) for
random vibration, or as the magnitude of a temperature
cycle, etc. The cycles to failure may be expressed as
cycles or as time (assuming a consistent cyclic rate).
There is scatter in the data. The time or cycles to failure
for a number of identical samples tested at the same
stress level will form a Gaussian distribution. This
scatter is due to the variety of defects in each sample:
as a consequence larger samples, with a greater
probability of containing a more severe defect, will fail
sooner.
We use Miners Rule
9
to determine fatigue damage
accumulation and a Composite Damage Index (CDI) for
items subjected to combinations of different stress
levels.
CDI = the sum of n
x
/N
x
n
x
= number of applied stress cycles at stress
level x
N
x
= number of cycles to failure at stress level
x
Failure is expected to occur when the CDI is
approximately equal to one.
THE BATHTUB CURVE
The bathtub curve
10
, describing failure rate as a function
of time, is described in a number of sources and shown
in Figure 1. This bathtub curve can be used to describe
a range of phenomena including human death rates as a
function of age, and electronic failure rates as a function
of time.
The Infant Mortality portion of the curve is the initial
section for which the failure (death) rate decreases with
time (age). For military electronics this higher initial
failure rate is purported to be due to latent
manufacturing defects. Environmental Stress Screening
(ESS), comprising random vibration and temperature
cycling, is used to precipitate these defects as failures
so that they can be repaired to produce items without
infant mortality defects.
The Constant Failure Rate portion of the curve is the
section after Infant Mortality defects have been
eliminated, but before Wearout has begun to occur.
Failures are random. This is the period for which
Constant Failure Rate statistical prediction techniques
(MIL-HDBK-217
11
, VITA 51.1
12
, etc.) have some validity.
The Wearout portion of the curve is the last section
and has a Gaussian distribution that goes to zero when
the last item in a set starting population has failed.
Failures in this portion of the curve are due to fatigue,
and follow the Gaussian distribution that was previously
discussed for fatigue phenomena. Durability
8
verification
(analysis and/or test) during item Qualification, as shown
in Topic 2.2 Bathtub Curve
10
, is commonly used to
demonstrate that wearout will not occur during the
planned life of the item.
Typically, reliability analysis is aimed at
assessing the random failures that will occur in
the equipment during its useful life. These
failures are usually assumed to be repairable,
and may be due to a variety of causes, such as
defects in the equipment, improper use, damage
due to unusual conditions, inadequate
maintenance, etc. Durability analysis, on the
other hand, assesses failures due to wearout of
certain elements of the design.
8
LIMITATIONS OF THE RELIABILITY
PREDICTION PROCESS
MIL-HDBK-217 (as do most similar analysis techniques)
relies on a number of assumptions, two of which are
germane here: 1) infant mortality failures have been
eliminated by good process control, or screened out by
an effective ESS program that consumes a relatively
small tranche of demonstrated life, and 2) the period of
performance, after ESS, is within the demonstrated life
of the item, so that wearout failures will not occur:
these analysis techniques typically do not model
wearout mechanisms
13
.
Selection of the appropriate MIL-HDBK-217 PiE-factor
will not remotely compensate for the failure to
adequately specify durability environments. The PiE-
factor ratios assume, as does everything in the MIL-
HDBK-217 methodology, that durability has been
demonstrated and that the item is in the constant failure
rate portion of the bathtub curve: they do not account
for limited life due to wearout.
DURABILITY ENVIRONMENTS
The salient contributors to equipment durability
environments are vibration (high cycle fatigue) and
temperature (low cycle fatigue). The deleterious effects
of these environments are widely understood, and have
been thoroughly investigated in a number of venues.
AVIP also broadens the tools and focus of
electronic packaging design to address the life
cycle issues through fatigue analysis
14
Complexities include 1) Surface Mount
Technology, . . . Thermal cycling fatigue life of
electronics was improved through 1) Coefficient
of thermal expansion (CTE) matching, 2) Omega
and other strain relieving lead wire designs for
large devices, 3) Plated Thru hole improvements
. . . The F-22 programs typical durability life test
requires from 500 to 1500 thermal cycles on one
unit. . . . The design analysis included
consideration of the damaging effects on
electronics from thermal fatigue.
15
Constant
Failure
Rate
Infant
Mortality
Wearout
Figure 1 The Bathtub Curve
Figure 2 The Bathtub Curve with 95% to 100% of Demonstrated Life consumed by ESS
Failure
Rate
Time
ESS Wearout
Failure
Rate
Time
ESS Qualification Margin Against Wearout
Margin on Elimination of Infant Mortality Failures
Table 1 -- Vibration Durability Life Consumed by ESS
ESS Durability Percent of
Demonstrated
Life
Consumed by
ESS
Duration
(Minutes)
PSD
(g2/Hz)
Duration
x
PSD^4
Duration
(Minutes)
PSD
(g2/Hz)
Duration x
PSD^4
10 0.04
2.56E-
05
300
0.002 4.8E-09 99.98%
0.004 7.68E-08 99.70%
0.008 1.2288E-06 95%
0.016 1.96608E-05 57%
0.032
0.000314573
8%
0.040 0.000768 3%
0.064
0.005033165
1%
0.128
0.080530637
0.03%

The Bolton Memorandum
1
Enclosure 2 Reliability Best
Practices also confirms the need to address the fatigue
aspects of both thermal and vibration fatigue.
The supplier routinely conducts thermal and
vibration analyses to address potential failure
mechanisms and failure sites (i.e., a physics-of-
failure approach to reliable design). These
analyses would likely include the use of fatigue
analysis tools, finite element modeling, dynamic
simulation, heat transfer analyses, etc.
1
Appendix 1.4 Standard Evaluation Criteria, Reliability
Analysis
Comprehensive Thermal and Vibration analyses
and/or Finite Element Analyses (FEA) are
conducted to address potential failure
mechanisms and failure sites.
From ANSI/GEIA-STD-009, Figure 1
Engineering analysis and test data indentifying
the system/product failure modes and
distributions that will result from the life-cycle
loads.
16
Ideally the durability environments should be derived
from the planned usage of the item. The Bolton
Memorandum
1
Enclosure 2 Reliability Best Practices
also affirms that
The supplier has characterized the critical loads
and stresses. A good design team will
characterize the life cycle environment and
operational duty cycle stresses that their
components will see.
From the Halpin Highlights
14
a. Realistic systems requirements derived from
the users intended application.
b. Through understanding of operational usage
and environments.
Enclosure 1. Section C Statement of Work Reliability
Language and Tailoring Instructions, 4. System-Level
Operational & Environmental Life-Cycle Loads.
The contractor shall estimate and periodically
update the operational & environmental loads
(e.g., mechanical shock, vibration, and
temperature cycling) that the system is expected
to encounter in actual usage throughout the life
cycle.
From ANSI/GEIA-STD-009, Figure 2
User and environmental profile that defines the
system/products life cycle (operating and non-
operating environments, expected operating and
non-operating times, etc.).
16
From ANSI/GEIA-STD-009, 4.5.1.4
The developer shall estimate the user and
environmental loads (e.g., mechanical shock,
vibration, and temperature/humidity cycles . . .)
The temperature cycling fatigue environment is usually
the result of the combination of diurnal nighttime low
temperatures; and the maximum temperature achieved
at each potential failure site (solder joint, component
lead, etc.) as a result of diurnal daytime high
temperatures, cooling system performance, operational
cycles and equipment power on-off cycles. Experience
on programs where Durability fatigue analyses have
been conducted, and validated, show that the
temperature cycling fatigue contribution is typically
eighty percent (80%) to ninety percent (90%) of the
Composite Damage Index (CDI): this is true even for
platforms with relatively severe vibration environments.
Vibration and Temperature Cycling Environments are
Orthogonal to Each Other - In the case of a circuit card
assembly (CCA) vibration fatigue (primarily component
leads and solder joints) is typically due to the flexure
(Figure 3) of the CCA, perpendicular to the plane of the
CCA: as the CCA flexes repeatedly the strains imposed
on the component leads and solder joints lead to the
accumulation of fatigue damage.
Again in the case of a CCA temperature cycling fatigue
(again primarily component leads and solder joints) is
due coefficient of thermal expansion (CTE) mismatch
(Figure 4) between the component and the CCA in the
plane of the CCA: as the CCA goes through repeated
thermal cycles the strains imposed on the component
leads and solder joints lead to the accumulation of
fatigue damage.
Figure 3
CCA Flexure in Vibration
Figure 4
CTE Mismatch Strains Leads and Solder Joints
Changes to improve performance in one durability
environment can degrade performance in the other
durability environment. For example, stiffening the card
to improve vibration performance, could degrade
performance in temperature cycling. It follows that long
life in one durability environment does not imply any life
in the other.
ENVIRONMENTAL STRESS SCREENING
As noted above the intent of ESS is to precipitate infant
mortality (latent manufacturing) flaws so that they can be
repaired, and the fielded item will be at the beginning of
the flat (Constant Failure Rate) portion of the bathtub
curve.
A long-standing industry rule of thumb holds that power
spectral density (PSD) levels below 0.04 g
2
/Hz
17
are
insufficient to precipitate flaws: vibration at lower levels
is otiose.
We have another industry rule of thumb that ESS should
not consume more than five percent (5%) of the
demonstrated durability life of the item: this is to
increase the probability that the item remains on the flat
portion of the Bathtub Curve (Figure 1) for its planned
useful life.
Table 1 uses the equation from MIL-HDBK-810F
18
,
Paragraph 2.2 Fatigue Relationship to determine the
percentage of demonstrated durability life consumed by
ESS on a hypothetical program.
For this hypothetical program ESS is performed for 10
minutes at 0.04 g
2
/Hz. Durability vibration testing is
conducted for five (5) hours (300 minutes) at different
levels depending on the item installation zone. In this
hypothetical case conducting ESS for items installed in
installation zones with PSDs of 0.04g
2
/Hz or above may,
assuming that the items do have infant mortality defects,
make sense. For items installed in the zones with lower
PSDs, the conduct of ESS is non-value added (the
field/durability vibration level is too low to precipitate any
infant mortality defects), and deleterious (an excessive
portion of demonstrated durability vibration life is
consumed) to the items reliability.
ESS is an attempt to inspect in quality for low
production rate equipment. Defects in high production
rate equipment can be reduced or eliminated by the
application of statistical process control and automation.
High production rate equipment is far more likely to be
COTS than custom military specification design. It
follows that COTS is far more likely to be defect free (at
least prior to ESS) than custom military specification
design.
One way to decompose reliability is into two questions.
First, is the item inherently robust (durability
environments address this) enough? Second, is the
item defect (ESS is intended to address this) free? We
have experience flying COTS items such as Ricoh
Printers, Sony Satellite Dish Receivers, and HP Servers
(without conducting ESS) on military derivative aircraft:
in this relatively benign environment (commercial aircraft
converted to a military application) these COTS items
have proven to be considerably more reliable than the
military specification Government Furnished Equipment
(GFE). These COTS items are clearly not robust
enough for severe environment platforms, such as
fighter aircraft, but their reliable performance on military
derivative aircraft confirms that ESS would be non-value
added since field experience has shown these items to
be relatively free of infant mortality defects. In addition
given that they were not designed for flight environment,
ESS would be more likely to degrade reliability by
consuming an excessive portion of the items durability
life.
ESS vs. Burn-In - ESS is distinct from "Burn-In" which
is typically applied to components and subassemblies to
accelerate/screen time-temperature dependent thermally
activated failure mechanisms that can be modeled using
Arrhenius relationship: this includes solid state reactions
such as diffusion, grain growth etc. As with ESS, Burn-
in is intended to ensure that components or
subassemblies start off in the constant failure rate
region. ESS focuses on thermo-mechanical
mechanisms that are related more to component
assembly (leads, solder joints, etc.), but does not
necessarily drive solid state (thermally activated)
mechanisms to the constant failure rate region.
ESS DEGRADING INHERENT RELIABILITY AND
VITIATING RELIABILITY PREDICTIONS
Consider a hypothetical example from the hypothetical
program described above. As noted above the validity
of our reliability predictions, for fielded items, rest on two
assumptions: 1) Infant mortality defects have been
eliminated; and 2) the item does not enter the wearout
portion of the Bathtub Curve during its planned useful
life.
In this hypothetical case:
The durability vibration requirement (Table 1) is five (5)
hours at 0.008 g
2
/Hz.
A durability temperature cycling requirement is not
specified.
The ESS requirement is five (5) minutes of random
vibration, followed by twelve thermal cycles, and then
another five (5) minutes of random vibration. The last
five (5) thermal cycles and the final five (5) minutes of
vibration must be failure free: there is no limit on the
number of repeats allowed to achieve the required five
thermal cycles and five minutes of vibration failure free.
An item that is no better than required by this set of
requirements would be inherently unreliable the moment
it was fielded.
In this hypothetical case, the item went though ESS,
prior to qualification, without having to repeat the last
five (5) temperature cycles or the final five (5) minutes of
vibration. The total demonstrated vibration durability life
is the sum of five (5) hours (300 minutes) at 0.008 g
2
/Hz
and ten minutes at 0.040 g
2
/Hz: as discussed above,
assuming that a production unit went through ESS
without having to repeat the last five (5) minute of
vibration, ninety-five percent (95%) of the demonstrated
useful life would have been consumed before the item
was fielded. If the unit had to repeat (again, there is no
limit to how many times this could happen) the last five
minutes of vibration, after correction of a failure, then
well over 100% of demonstrated useful life would have
been consumed.
For temperature cycling, even assuming that there are
no repeated cycles following correction of a failure, at
least 100% of demonstrated useful life has been
consumed when ESS is completed, since in the absence
of a durability temperature cycling requirement one pass
through ESS is all that is included in the demonstrated
temperature cycling durability life. If there are repeat
ESS cycles then the situation would be considerably
worse.
In this case, the bathtub curve would be as shown in
Figure 2: the actual item might be better than the
requirements, but there would be no evidence or data to
show that this is the case. The flat (constant failure rate)
portion of the Bathtub Curve, where our reliability
predictions have some validity does not exist, so our
reliability prediction is vitiated. The inherent reliability of
the unit has been degraded by the fatigue damage it has
accumulated. In the case of vibration, this was done in
the attempt to eliminate latent defects that the field level
is too low to precipitate, thus artificially activating failure
mechanisms not relevant to the field environment.
HOW PROGRAM/EQUIPMENT MANAGERS CAN
AVOID THESE PITFALLS
1. Durability environments must include vibration and
temperature cycling requirements that are consistent
with the planned usage and the planned useful life.
Note: the temperature cycling verification does not have
to be an expensive test, but in many cases may be
accomplished by analysis or similarity.
2. ESS vibration and temperature cycling must be
limited, in each case, to some small portion of
demonstrated (typically five percent [5%]) useful life,
including a specified number of allowed repeat/repair
cycles.
3. Vibration ESS should not be conducted when the
durability vibration level is too low to precipitate Infant
Mortality (latent manufacturing) defects.
4. ESS should not be conducted on items (typically
COTS) that have been shown to be free of infant
mortality (latent manufacturing) defects.
These measures can enhance reliability while reducing
cost.
CONCLUSION
Finally, Program/Equipment Managers should have a
Useful Life Strategy that reflects the expected field
fatigue life of each class of items, and the customers
desire for technology insertion/refresh. For example, if
the item can only be expected to survive three to five
years in the field and the customer desires technology
insertion (how often do you replace your laptop?) every
three years, then attempting to ruggedize/qualify/ESS
the item for an longer life will only add cost while
degrading reliability. The proper application of
qualification, ESS and reliability prediction methods to
determine a useful life strategy while avoiding the
system engineering pitfalls described herein, will
minimize total ownership cost while enhancing
effectiveness for the war fighter.
All activities, methods and tools used should be
evaluated and applied in a manner that adds
demonstrated value to the program, at optimized
life cycle cost and utilization of resources
16
. . . . (which may include COTS, NDI, and CFI,
as well . . .) shall identify and confirm through
analysis, test, or accelerated test, the failure
modes and distributions that will result when
these life-cycle loads are imposed on these
items.
REFERENCES
1. MEMORANDUM FOR SEE DISTRIBUTION,
SUBJECT: Reliability of U.S. Army Materiel
Systems; 06 DEC 2007; Claude M. Bolton Jr.;
DEPARTMENT OF THE ARMY, Assistant Secretary
of the Army, Acquisition Logistics and Technology
2. MEMORANDUM FOR DIRECTOR, OPERATIONAL
TEST AND EVALUATION, DEPUTY UNDER
SECRETARY OF DEFENSE FOR ACQUISITION
AND TECHNOLOGY; SUBJECT: Report of
Reliability Improvement Working Group, Office of the
Secretary of Defense
3. MEMORANDUM FOR UNDER SECRETARY OF
DEFENSE (ACQUISITION, TECHNOLOGY, AND
LOGISTICS); SUBJECT: Reliability, Availability, and
Maintainability Policy; Department of the Air Force
4. Robert C. Junvinall; Stress, Strain, and Strength;
McGraw-Hill.
5. Joseph Edward Shigley, Mechanical Engineering
Design, McGraw-Hill.
6. Joseph H. Faupel,\; Engineering Design; John Wiley
and Sons, Inc.
7. Edited by Rao R. Tummala and Eugene J.
Rymaszewski Microelectronics Packaging
Handbook, Van Nostrand Reinhold.
8. Guidelines for Preparing Reliability Assessment
Plans for Electronic Engine Controls; ARP 5890.
9. Dave S. Steinberg, Vibration Analysis for Electronic
Equipment, John Wiley & Sons; Chapter 10
Structural Fatigue.
10. RAC (Reliability Analysis Center) Reliability Toolkit:
Commercial Practices Edition A Practical Guide for
Commercial Products and Military Systems Under
Acquisition Reform.
11. Military Handbook, Reliability Prediction of Electronic
Hardware; MIL-HDBK-217F, Notice 2; Rome Air
Development Center; 28 February 1995.
12. Reliability Prediction, MIL-HDBK-217, Subsidiary
Specification; VITA 51.1; June 2008.
13. Lori Bechtold, Physics of Failure in Handbook
Reliability Predictions, Components for Military &
Space Electronics (CMSE), 2009.
14. Halpin, Dr. J. C.; Avionics/Electronics Integrity
(AVIP) Highlights.
15. Glista, Stefan; Lessons Learned from the F-22
Avionics Integrity Program, 0-7803-5086-3 /98 IEEE.
16. ITAA Standard, Reliability Program Standard for
Systems Design, Development, and Manufacturing;
ANSI/GEIA-STD-0009-2008; November 13, 2008.
17. Navy Manufacturing Screening Program, Decrease
Corporate Costs, Increase Fleet Readiness;
Department of the Navy; NAVMAT P-9492; May
1979.
18. Department of Defense Test Method Standard for
Environmental Engineering Considerations and
Laboratory Test; MIL-STD-810F; 1 January 2000.
CONTACT
james.a.robles@boeign.com
https://www.e-standard.org

2009 01 3274

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

2009 01 3274

Caricato da

Copyright:

Formati disponibili

2009-01-3274

The Systems Engineering Relationship between Qualification, Environmental

Potrebbero piacerti anche