Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
T. I. Båjenescu
Company for Consulting C.F.C., La Conversion, Switzerland
M. I. Bâzu
National Research Institute for Microtechnology I. M. T., Bucharest, Romania
ABSTRACT: The paper gives an overview of the present situation in the domain, the front line in the battle
for the best products. After a short introduction, the main problems arisen in reliability evaluation of
monolithic integrated circuits, the evaluation itself and some new points of view concerning the dynamic life
testing, screening and burn-in, accelerated tests, physics of failure of plastic encapsulated microcircuits
(PEM) and process reliability are presented. Characterization techniques for Hg1-xCdxTe epilayers are
mentioned.
1 INTRODUCTION now only some Fits in a single digit are required. It
is worthwhile to note the change in the predictions
The huge progress obtained in the integrated made by the Semiconductor Industry Association
circuit field led to smaller dimension electronic (SIA) in the editions 1994, 1995 and 1997 of the
equipment and reduced costs, but also to National Technology Roadmap for Semiconductors
improvements in power capability, reliability and [1][3]. From Table 1, one may see that the forecast
maintainability. The predictions are that the strong was overpassed by the reality: the performances
world-wide increase of computer and previewed in 1994 and 1995 for 1998 were attained
communication market will lead to an even higher earlier, in 1997.
growth rate of semiconductor industry in the next Basically, two types of integrated circuits were
decade, 20% per year, with a level of $300 billion developed since now: bipolar and MOS, taking into
immediately after the year 2000 [1]. account the basic cell: bipolar transistors or MOS
The complexity of ICs increased every year. In ones, respectively. In the beginning, the MOS ICs
fact, Gordon Moore, in the 70‘s, talked about a were n-channel MOS ICs or p-channel MOS ICs,
doubling of IC complexity every 18 months, with a but sooner complementary MOS ICs (or CMOS IC)
corresponding decrease in cost per function1. This including both types were developed. The main
became the so-called Moore’s Law, proved to be characteristic of CMOS circuits is the small supply
true for more than the subsequent twenty years. voltage. As the portable–electronics market
Many factors contributed to keeping this model on increases, low-power and low-voltage technologies,
course: the improvement of design tools and such as CMOS, became the most used. Also, the
manufacturing technologies, but also the permanent technological improvements leading to the remove
growth of the reliability level. The intrinsic of sodium contamination in the Si–SiO2 system
reliability of a transistor from an IC improved with encourage the use of CMOS ICs, because a high
two orders of magnitude (the failure rate decreases reliability level becomes possible to get. Recently
from 10-6h-1 in 1970, to 10-8h-1 in 1997). But also, in [4] reduced standard digital CMOS power supply
the same period of time, the number of transistors voltage of 3.3 V was obtained, reducing the power
per device increased with 9 orders of magnitude! consumption by 70%. These ULP (Ultra Low
Therefore, the IC reliability increased even faster Power) CMOS ICs were deeply investigated [4] and
than the prediction given by Moore’s Law. The proved to have a large potential.
model for reliability growth was called “Less’s The last challenge in the IC family is the
Law”, taking into account the known philosophy microsystem. Arisen in the early 90‘s, the
from the architectural design: “Less is More” [1]. microsystem represents a superior integration step
Actually, Less’s Law means a tremendous increase compared with common ICs: the “intelligent”
of the requirements for the IC’s failure rate: from element (the signal processing part) is integrated
1000 failures in 109 devices x hours (or 1000 Fits), with micro sensors and with micro actuators, in a
single component, basically still an IC [5]. In fact,
1 The cost per function is made up by two terms: aebN (increasing the microsystem is a “smart” sensor, able also to
with complexity [2] and representing the chip cost) and c/N actuate. This device determines the development of
(representing assembly and testing costs), where N is the number some new microtechnologies. Many disciplines
of functions and a, b, c are constants.
being involved, hybrid terms, such as: mechatronics,
chemtronics, bioinformatics were used [6], but the
term microtechnology (technology of are MEMS (Micro-Electro-Mechanical Systems),
microfabrication) seems to be the most adequate. In BIO-MEMS (BIOlogical MEMS) and MEOMS
Europe, the term microtechnology includes both (Micro-Electro-Opto-Mechanical-Systems). Silicon
microelectronics (the “classical” devices) and is still the basic material and the CMOS technology
microsystem technology (MST) [12]. Other related can be used for the manufacturing.
terms
Table 1 Predictions for Si CMOS technology development: 1994, 1995 and 1997 editions of the National Technology Roadmap for Semiconductors
Recently, a new term, nanotechnology, was Other models were built for hot-carrier effects:
proposed, because the structures have now CAS [17] and RELY [18], based also on SPICE. An
characteristic features of a few nanometers. important improvement was RELIC, built for three
Accordingly, the tools used for manufacturing these failure mechanisms: electromigration, hot-carrier
new technologies (micro- and nano-) are called effects and time-dependent dielectric breakdown
micromachines and nanomachines, respectively. [19].
A practical example of how a technology A high-level reliability simulator for
improvement influences the IC features is given in electromigration failures, named GRACE [20],
Fig. 2. The addition of copper to aluminium–copper assured a higher speed simulation for very large ICs.
alloys used for metallisation allows to avoid the Compared with the previously developed simulators,
electromigration (a well-known failure mechanism) GRACE has some advantages [20]:
and to increase the current density. This tendency • an orders-of-magnitude speedup allows the
was observed from 1982 till 1995. Further on, the simulation of VLSI many input vectors;
increase of the copper content produces a growth of • the generalised Eyring model [21] allows to
the resistivity, limiting the allowed current density simulate the ageing and eventually the failure of
(see the Al–Cu line in Fig. 2). The solution seems to physical elements due to electrical stress;
be the use of copper as a supplementary • the simulator learns how to simulate more
metallisation layer, over the Al–Cu layer (the dotted accurately as the design progresses.
line Cu/Al–Cu in Fig.2). If the typical failure mechanisms are known, by
taking into account the degradation and failure
1.1 Modeling IC reliability phenomena, models for the operational life of the
devices can be elaborated. Such models, in contrast
First, only simulators for one or two subsytems or with the regular CAD tools determining only
failure mechanisms were arise, such as: RELIANT wearout phenomena, predicts also the failures linked
[14], only for predicting electromigration of the to the early-failure zone.
interconnects and BERT EM [15]. Both use SPICE A Monte-Carlo reliability [26] simulation for IC,
for the prediction of electromigration by derivating incorporating the effect of process-flaw, test
the current. Other electromigration simulators were structure data, mask layout and operating
CREST [15], using switch-level combined with environmental conditions was proposed by Moosa
Monte-Carlo simulation, adequated for the simula- and Poole [13]. The device was divided into
tion of VLSI circuits and SPIDER [16].
subsystems (metallisation, gate oxide, wire bonds runs, vias, contacts), which may have various failure
and packaging), affected by various failure modes/mechanisms. The reliability-measures of the
mechanisms. Further on, these systems were divided objects are obtained by accelerated life testing on
into elementary objects (e.g. for metallisation: metal specially designed test
structure. Then the data are extrapolated at the subsystem and device level.
Table 2 Incidence of main failure mechanisms (in %) arising in infant mortality period
Process
defect Layout
Defect generation
distribution
s
Failure Defect
distribution probabilitie
s s
Calculation of net fai-
lure probabilities
Annotated
netlist
2. RELIABILITY EVALUATION
The simulation procedure is detailed in Fig. 1.
This simulator was used for a two-layers metal
interconnect subsytem and the typical failure 2.1 Some reliability problems
mechanism was electromigration. The effect of From a theoretical viewpoint, a higher reliability
various grain size distributions and defect (voids) is expected for integrated circuits than for discrete
size distributions was checked and the results components. In practice, these expectations were
(given as cumulative failures vs. system failure surpassed, because some basing conditions were
times) agree well with previously reported results. fulfilled:
2.2 Evaluation of integrated circuit reliability The definition of the failure criteria is,
Generally, three main problems arise at the eva- unavoidably, very difficult because the complexity
luation of integrated circuit reliability. of ICs is increasingly higher. Even for a simple
device, like a transistor, it is hard to define the
1. For modern devices, the failure rates decrease failure limits. For an integrated circuit, the basic
under a certain limit and the conventional methods parameters are more complex and hard to be
become less usable. To overcome these measured and the degradation of these parameters
difficulties, two solutions may be discussed: differs from an utilisation to another.
a) To perform on a very high number of integrated For evaluating the various stresses able to be
circuits reliability tests in normal operational used in reliability accelerated tests, the following
conditions, with the duration of a couple of years. aspects must be taken into account [7][8]:
Obviously, this solution is unacceptable. As an • The stress must be encountered in the
example, if one has to test a failure rate of 10-9h-1 operational environment. In principle, one
(called also 1 FIT), 1000 devices must be tested must note that the failure rate of integrated
for 114 years and only one device to be found circuits is influenced by the thermal, electrical
defect. and mechanical conditions of the operational
b) To perform on some integrated circuits environment. But for common industrial use,
reliability tests in higher than normal mechanical shock and vibrations have a little
conditions, the so-called accelerated tests. This influence on the integrated circuits
method may be applied only if at the encapsulated in epoxy packages, able to assure
accelerated tests the failure mechanisms are the the necessary mechanical stability and a good
same as for normal operational conditions. And protection. For instance, the acceleration
this fact must be indubitably demonstrated. measured at a sudden stop of a running car
The accelerated tests are used in the purpose to
obtain quickly and with a minimum of expense 2 Even if the purpose is to minimise the testing time, a too stron-
information about the reliability of the product. ger stress level must not be used, because new failure mecha-
nisms may be induced.
The used stresses are higher than for normal 3 If the time is the accelerated variable, this means that an hour of
operational conditions, the results are extrapolated tests at high stress level produces the same effect on the com-
and the fai-lure rate for normal conditions is ponent reliability as n hours at normal operation time.
reaches 40g, for airplanes take-off and landing carrier degradation mechanism, given in [10], an
– up to 5g and for missiles – up to 50g. aging curve due to hot-carrier effect under the
Compare these values with the acceleration static or periodically repeated AC stress was
level used for periodic tests: 30,000g. obtained [9], defined by the equations:
Consequently, mechanical factors will be used ∆V/Vo = k · ta (1)
only rarely for accelerated tests. On the
contrary, the temperature is the most used k = C · [Ids/w · exp(-Φ√i/(q· p ·Ech))]a (2)
stress for this kind of tests. The experimentally
where C is a constant, a and k are coefficients of
observed correlation between failure rate and
the aging curve (depending on the technology and
temperature is based on the fact that the speed
on IC structure), Ids is the drain current, w is the
of chemical reactions arising in the device is
channel width of a MOS transistor, Φi - the
thermally increased.
minimum energy required to cause impact
• The failure mechanisms must be always those
ionization, q - the electron charge (1.6 x 10-19C), p
arising in the operational environment.
– the hot electron mean free path, Ech – the channel
• All samples of integrated circuits used in
electric field, ∆V/Vo – circuit aging and t –
accelerated tests must behave in the same way
elapsed stress-time. In a log(∆V/Vo) vs. log t plot,
at a stress modification: the same circuits
a straight line with slope a and y-intersection log k
should be the first to fail at any stress level.
is obtained (see Fig. 3).
3 DYNAMIC LIFE TESTING
log(∆V/Vo)
The failure of an IC in operational life is an
unpleasant event not only because the owner of the
slope = a
equipment must replace it, but also because this
failure may induce serious damages to the equip-
log k
ment, loss of important information or even of
human life. Therefore, it is desirable to replace a
IC before failure. From economical reasons, this
log t
replacement must take place shortly before the
anticipated failure. This implies that the lifetime of Fig. 3 A log (∆V/Vo) vs. log t plot for hot-carrier degradation
mechanism
the IC be accurately estimated. This operation may
be done only if laboratory tests simulating as
closed as possible the real operational life are From the case study given in [9], one may
performed. In this respect, in the laboratory, not understand the procedure for IC replacing before
only static, but also dynamic testing must be done. to occur a failure by hot-carrier effect. For a 31-
The purpose is to quantify the performance stage inverter chain, designed according to MOSIS
degradation during IC operation. An example of 0.8 µm HP technology rules, the device operation
such testing is given by Son and Soma [9]. First, was simulated, the ageing being modeled by
the IC parameters which will be monitored during randomly changing device parameters. Based on
dynamic life testing are chosen by two criteria: (i) this model, the probability of survival until the
to be measurable at existing pin-outs, and (ii) to next inspection may be quantified at each
predict progressively IC degradation. Than, the inspection of dynamic-life testing. Then, the
typical failure and degradation mechanisms must optimal moment for replacement may be
be studied. In fact, there are two major types of calculated with respect to maintenance cost (the
degradation mecha-nisms: electrical ones (such as: recovery cost of an unanticipated failure and the
latchup, ESD, hot-carrier effect, dielectric break- wasted cost of replacing an IC too early).
down, electro-migra-tion, etc.) and environmental
ones (produced by thermal and mechanical stress,
humidity, etc.). By means of appropriately chosen 4 SCREENING AND BURN-IN
electrical para-meters (such as: static / transient
current level change, noise level in current, cut-off To better understand the role of screening tests
frequency, input offset voltage of CMOS for the reliability estimation, it will be given an
differential amplifier, etc.), these mechanisms are example concerning the failure causes. Assume
monitored during dyna-mic life testing. that a printed circuit board (PCB) has 60
Eventually, aging models for various failure integrated circuits (ICs), and the probability of
mechanisms must be elaborated. In [9] a model for failure for an IC is 2%; it is considered that all
hot-carrier effect is given. Starting from a widely the ICs are statistical independent. It results that
accepted empirical relationship between parameter the probability to find at least one defect IC is 1
deviation and the elapsed stress time for the hot- - 0.9860 = 0.7. Some reasons can lead to
components failures; for example, if the with great care, because it is essential that the
components are very old, or if they are failure mechanism acting at the higher stress level
overloaded. In these cases, the screening tests to be the same with that acting at normal stress
have no sense. Other defects result from the level. Lately, two tendencies were observed: the
intrinsic weaknesses of the components. These increase of the stress level and accelerated tests
weaknesses are surely unavoidable and - for performed early on the manufacturing process
very good defined li-mits - are accepted even by (even at the wafer level). The main problems
the manufacturing. With the aid of electrical which must be solved in the next years are: (i) the
tests and/or operating tests (during the identifying of the accelerated laws with different
fabrication or before the delive-ring) these stress factors; (ii) the taking into account at the
components with defects can be identified and design of the accelerated tests of the synergies
eliminated. Nevertheless it remains a small between the stress factors encountered in the
percentage4 of components with hidden defects operational environment [14]; and (iii) the
which - although still operational - have a small dependency of the activation energy on the stress
reliability and influence negatively the reliability level.
of the components batch. The role of the
screening tests is to identify the components
partially unreliable, with defects that do not lead 3 PHYSICS OF FAILURE
immediately to non-operation.
For at least two reasons it is difficult to define a The failure mechanism identification is
cost-effective screening sequence, while: (i) it may essential for the reliability accelerated testing,
activate failure mechanisms that would not appear because the obtained degradation laws must be
in field operation; (ii) it could introduce damage extrapolated beyond the time period of the test and
(transients, electrostatic discharges ESD) which the extrapolation must be made separately for each
may be the cause of further early failures. population affected by a failure mechanism. The
Generally, the selection is a 100% test (or a subject is still modern, taking into account that
combination of 100% tests), the stress factors being new failure mechanisms are discovered and even
the temperature, the voltage, etc. followed by a the old ones are not completely examined.
parametric electrical or functional control Consequently, a series of tutorial papers on failure
(performed 100%), with the aim to eliminate the mechanisms were published from 1973 by IEEE
defect items, the marginal items or the items that Transactions on Reliability, almost in each issue.
will probably have early failures (potentially Most of these papers were written by the research
unreliable items). To overcome these problems, team led by Michael Pecht, from Maryland
recently, a method was proposed [13]. The method University (USA). Trends in microsystems
was called MOVES, an acronym for Monitoring (integrating electronic, micro-electro-mechanical,
and Verifying a Screening sequence. MOVES electro-optical and micro-fluidic devices) are
contains five procedures, namely VERDECT, taking miniaturization close to its physical limits
LODRIFT, DISCRIM, POSE and INDRIFT. and creating a need for extensive reliability
By definition, an accelerate test is a trial during physics effort to identify and counter failure
which the stress levels applied to the components mechanisms in new devices [14].
are superior to these foreseen for operational level;
this stress is applied with the aim to shorten the
necessary time for the observation of the 4 PLASTIC ENCAPSULATED MICROCIRCUITS
component behavior at stress. (PEM)