Sei sulla pagina 1di 12

2096

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 58, NO. 9, SEPTEMBER 2011

Equalizer Design and Performance Trade-Offs in


ADC-Based Serial Links
Jaeha Kim, Senior Member, IEEE, E.-Hung Chen, Member, IEEE, Jihong Ren, Member, IEEE,
Brian S. Leibowitz, Member, IEEE, Patrick Satarzadeh, Member, IEEE, Jared L. Zerbe, Member, IEEE, and
Chih-Kong Ken Yang, Fellow, IEEE

AbstractThis paper investigates the performance benefit of


using nonuniformly quantized ADCs for implementing high-speed
serial receivers with decision-feedback equalization (DFE). A way
of determining an optimal set of ADC thresholds to achieve the
minimum bit-error rate (BER) is described, which can yield a
very different set from the one that minimizes signal quantization
errors. By recognizing that both the loop-unrolling DFE receiver
and ADC-based DFE receiver decide each received bit based
upon the result of a single slicer, an efficient architecture named
reduced-slicer partial-response DFE (RS-PRDFE) receiver is proposed. The RS-PRDFE receiver eliminates redundant or unused
slicers from the previous DFE receiver implementations. Both the
simulation and measurement results from a 10 Gb/s ADC-based
receiver fabricated in 65 nm CMOS technology and multiple
backplane channels demonstrate that the RS-PRDFE can achieve
the BER of a 34-bit uniform ADC only with 4 data slicers. Also,
the combined use of linear equalizers (LEs) can further reduce the
required slicer count in RS-PRDFE receivers, but only when the
LEs are realized in analog domain.
Index TermsAnalog-digital conversion, data communication,
equalizers, receivers.

I. INTRODUCTION
S THE complexity of electrical and optical communication links increases, there is a growing interest towards
implementing the transceivers based on analog-to-digital converters (ADCs) and digital signal processors (DSPs) [1][4].
As the data rates rose, various channel impairments including
skin loss, dielectric loss, reflections, and crosstalk have become
more pronounced and call for advanced coding and modulation schemes. While the aggressive scaling of CMOS has made
it feasible to build fast digital logic that can perform such sophisticated signal processing algorithms in the digital domain,
it is still very challenging to design an ADC with above 10

Manuscript received March 07, 2011; revised May 19, 2011; accepted June
23, 2011. Date of current version September 14, 2011. This paper was recommended by Editor G. Manganaro.
J. Kim is with School of Electrical Engineering and Computer Science, Seoul
National University, Seoul, 151-742, Korea (e-mail: jaeha@snu.ac.kr).
E.-H. Chen and C.-K. K. Yang are with the Electrical Engineering Department, University of California, Los Angeles, CA, 90095 USA (e-mail:
enochen@ee.ucla.edu; yang@ee.ucla.edu).
J. Ren, B. S. Leibowitz, and J. L. Zerbe are with Rambus, Inc., Sunnyvale, CA
94089, USA (e-mail: jren@rambus.com; brianl@rambus.com; jared@rambus.
com).
P. Satarzadeh was with Rambus, Inc. Sunnyvale, CA 94089, USA. He is now
with Texas Instruments, Inc., Dallas, TX 75243 USA (e-mail: psatarzadeh@ti.
com).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCSI.2011.2162465

Gbps sampling rates and high enough resolution. For example,


even at moderate resolution of 6 bits, an ADC may dissipate
more than 1 W [5][8]. Such high power consumption of ADCs
has been a discouraging factor for their full adoption in backplane transceivers and the design of low-power ADCs has been
one of the primary research directions. For instance, a recently
published work demonstrated a 10 Gbps ADC-based backplane
transceiver consuming 0.5 W [7].
In recognition of these trends, this paper aims to strike a
balance between the flexibility and power efficiency in ADCbased transceiver designs. In particular, it describes a way of
maximizing the performance of an ADC-based receiver with
a coarse-resolution ADC, performing linear equalization (LE)
and decision feedback equalization (DFE). We find that for a
given target bit-error rate (BER) performance, the required ADC
resolution can be greatly relaxed when the ADC is allowed to
have nonuniform quantization levels.
It is noteworthy that the optimal placement of such nonuniform ADC decision thresholds is not necessarily the one that
minimizes the quantization errors, especially for low-resolution
ADCs. We explain this by demonstrating that an ADC-based
DFE receiver is in fact equivalent to a loop-unrolling DFE receiver [9][11]. The optimal threshold placement for the minimum bit error rate (BER) is the one that maximizes the signal
margin of the selected data slicer. The equivalence between the
ADC-based DFE receiver and the loop-unrolling DFE receiver
is discussed in Section II.
Based on this observation, this paper proposes an optimally configured, nonuniform ADC-based receiver, called a
reduced-slicer partial-response DFE (RS-PRDFE) [15], which
differentiates itself from the conventional, fully expanded,
loop-unrolling DFE, also referred to as partial-response DFE
(PRDFE). Section III describes a dynamic programming algorithm that can determine the optimal placement of the ADC
thresholds for given channel characteristics. The performance
of the RS-PRDFE receivers is demonstrated both in simulation
and in measurement, as described in Section IV. Section V
then addresses the topic of jointly optimizing the RS-PRDFE
receiver with various types of linear equalizers.
II. REDUCED-SLICER PARTIAL-RESPONSE DECISION FEEDBACK
EQUALIZER (RS-PRDFE)
A. Equivalence Between ADC-Based DFE and Loop-Unrolling
DFE Receivers
Fig. 1 describes the signal flow in an ADC-based receiver performing decision feedback equalization (DFE). Once the ADC

1549-8328/$26.00 2011 IEEE

KIM et al.: EQUALIZER DESIGN AND PERFORMANCE TRADE-OFFS IN ADC-BASED SERIAL LINKS

2097

Fig. 1. An ADC-based DFE receiver. (a) Its architecture. (b) Its signal flow
diagram where the ADC is modeled as a source of quantization noise.

converts the received signal into a digital form, the DSP processes the DFE operation, which computes and subtracts the appropriate amount of offset from the digitized input based on the
prior bit decisions. The DSP also contains the decision slicer,
which compares the resulting value with a threshold and determines the current bit. To minimize BERs, the signal-to-noise
ratio (SNR) at this decision slicers input must be maximized.
The quantization errors introduced by the ADC are counted towards the unwanted noise and hence the ADC strives to have as
high resolution as possible.
On the other hand, an analog DFE receiver subtracts the offset
voltage in the analog domain as illustrated in Fig. 2(a). Another
difference is that its decision slicer compares the resulting signal
with an analog threshold (analog comparison) while that in the
ADC-based receiver in Fig. 1 compares the two inputs in digital
forms (digital comparison). Since the slicer output is always a
binary value, the signal around the DFE loop crosses the analogdigital boundary twice: once through the analog comparator and
once through the feedback path generating the analog offset
from the prior bits. The two conversion steps make it difficult
to close the timing around the loop within one bit period.
Loop-unrolling DFEs or partial-response DFEs (PRDFEs)
mitigate this difficulty by shifting this timing loop entirely into
the digital domain [9][11]. As illustrated in Fig. 2(b), the receiver precomputes all possible offset values and compares the
input signal with each and every offset. Once all the results enter
into the digital domain, one of them is selected based on the prior
bit history. Since the decision feedback loop is now entirely
within the digital domain, higher frequency operation is possible. However, a drawback is that the number of offset values
to be compared with and hence the number of decision slicers
with the number of DFE tap coeffigrows exponentially
cients (N).
These seemingly different ADC-based DFE receiver in
Fig. 1(a) and loop-unrolling PRDFE receiver in Fig. 2(b) are in
fact equivalent and can be optimized using the same principles.
Recall that the core DFE operation is to subtract a proper offset
from the received signal before the bit decision. For both types
of receivers, the bit decision is made based on a single, critical
analog comparison. For the PRDFE, it is quite apparent since
one of the slicer outputs is selected as the current bit decision.

Fig. 2. Analog DFE receivers. (a) Conventional DFE. (b) Loop-unrolling


PRDFE.

For the ADC-based DFE, determining whether the quantized


ADC output is greater than a certain offset is equivalent to
determining whether the analog input signal is above a quantization threshold that is closest to this offset. In case of a
flash-ADC, the bit is decided solely upon one particular slicer
output that has the corresponding threshold.
This observation suggests a DFE receiver architecture that we
call reduced-slicer partial-response DFE (RS-PRDFE), shown
in Fig. 3. This architecture is similar to the PRDFE in Fig. 2(b)
in that it selects one of the loop-unrolled slicer decisions as the
current bit value. But the key difference is that a slicer is selected
through a look-up table (LUT) rather than being direct-mapped,
and therefore a single slicer may be selected for multiple prior
bit histories. That is, if some of the slicers in the PRDFE have
similar threshold values, RS-PRDFE can merge those redundant slicers into one. On the other hand, an ADC-based DFE
receiver may contain slicers whose outputs are never used for
bit decision, especially when the ADC has uniformly fine resolution. RS-PRDFE eliminates those unused slicers and replaces
the thermometer-to-binary conversion, digital feedback equalin Fig. 1), and binary subtraction unit with a simple
izer (
look-up table and multiplexer.
Therefore, the proposed RS-PRDFE can save power and area
by removing redundant or unused slicers in the loop-unrolling
PRDFE and ADC-based DFE receivers without degrading the
BER performance.

2098

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 58, NO. 9, SEPTEMBER 2011

. If the received signal for the


that critical slicer
th bit experiences the total ISI of
from the neighboring L
bits, the probability of detecting the th bit incorrectly
can be expressed as

(1)

Fig. 3. The proposed reduced-slicer PRDFE receiver (RS-PRDFE).

Fig. 4. The factors determining the bit-error rate (BER) of an ADC-based DFE
receiver in the presence of channel ISI. The signal margin is degraded by the ISI
from the bits within the DFE tap range (y ) and ISI from those outside the DFE
range (y ). The lowest BER is achieved when the decision threshold T is equal
to the y .

B. BER Model for ADC-Based DFE Receivers


Since the ADC-based DFE, PRDFE, and RS-PRDFE are all
functionally equivalent in that each bit decision maps to a single
critical analog comparison, their BER performance can be modeled in the same way. The BER is set by the probability of the
critical comparison resolving incorrect results.
To derive the expression for the BER, consider an N-tap DFE
receiver for a signaling system whose intersymbol interference
bit periods. For each of the possible prior
(ISI) spans L
N-bit histories, there is a slicer whose comparison result will be
denote the threshold of
used as the current bit decision. Let

is assumed normalized to the received signal


where the ISI
level of
and
denotes the right tail probability of standard normal distribution. An additive Gaussian noise is assumed
at the input of the slicer.
can be decomposed into two parts:
The ISI contribution
the part that can be canceled by the N-tap DFE
and the part
that cannot (i.e., ISI contribution from the L-N bits outside the
where
. By enumerating all
DFE range;
bit patterns each resulting in a different amount of
possible
, one can derive the expression for the average BER
ISI
for the 1- and 0-level received bits, shown at the bottom of the
page.
The expression in (2) tells how to choose the decision threshfor the best performance of an ADC-based DFE. To
olds
achieve the minimum BER, the worst case argument for Q funcmust be maximized. In other words,
tion
one must maximize the worst case signal margin for the slicer.
To do so, the decision thresholds should be placed as close as
.
possible to the predictable ISI levels to minimize
This leads to a very important observation: the best BER performance for an ADC-based DFE is achieved when it is optimized like a PRDFE. It is noteworthy that the nonuniform ADC
quantization levels resulting from this principle can be very different from the existing schemes that aim to minimize the signal
quantization errors, such as adaptive differential pulse-coded
modulation (ADPCM) used in voice applications.
When trying to minimize the quantization error, i.e.,
, one must place the decision thresholds where the signal
. In binary signaling, such an
is most expected
optimization tends to place the thresholds near the likely levels
for logic 0 and 1 as shown in Fig. 6(b). However, the next section
will demonstrate that the optimal placement for the minimum
and puts the thresholds near the center
BER minimizes
of the eye, as shown in Fig. 6(a).
slicers and therefore can achieve
The PRDFE receiver has
by asthe ideal N-tap DFE performance (i.e.,
signing each slicer threshold to one of the
possible values
slicers, it
of . However, when the receiver has less than
is necessary to optimize the slicer thresholds to minimize the
overall BER.

(2)

KIM et al.: EQUALIZER DESIGN AND PERFORMANCE TRADE-OFFS IN ADC-BASED SERIAL LINKS

y T

Fig. 5. Illustration of the optimal placement problem of


slicer thresholds
in an N-tap RS-PRDFE receiver for the minimum peak error j 0 j. The
problem is equivalent to contiguously grouping the sorted 2 ISI levels f gs
into M disjoint groups while minimizing the largest group span.

Fig. 6. The optimal placements of four slicer thresholds for: (a) the minimum
threshold error (i.e., the lowest BER; RS-PRDFE) and (b) the minimum signal
quantization error. 5 Gb/s operation on the FR4 channel.

2099

The worst case threshold error within each group is equal to one
half of the span (i.e., the difference between the minimum and
values) of that group. Therefore, minimizing the
maximum
maximum threshold error is equivalent to finding M contiguous
ISI levels ( s) that minimizes the largest span
grouping of
of the groups.
The optimal grouping of ISI levels to any number of M
groups can be done via a recursive, dynamic programming
s are sorted in
procedure. Assuming that the ISI levels
be the largest group span for
an ascending order, let
groups. To recurthe first ISI levels optimally split into
sively express
in terms of
with
and
smaller than and , respectively, we categorize the possible
-grouping of ISI points based on the number of elements
in the last th group. This last group can have as few as one
elements (since
element (i.e., ) and as many as
groups must each have at least one element). If
the other
through , then
the last group has elements, from
where can vary from 1 to
its span is simply
. And the minimum largest group span possible with
ISI levels is
. One should
the rest of the
choose the number of elements for the last group so that it
, which can
minimizes the overall largest group span of
be expressed in the following recursive relationship:

III. OPTIMIZATION OF RS-PRDFE SLICER THRESHOLDS


As mentioned previously, the drawback of a PRDFE receiver
is that the number of decision slicers grows exponentially with
the number of DFE taps. However, for voice-line modems, it is
known that the number of quantization steps in the ADC need
not grow with the number of DFE taps once the resolution is fine
enough for the quantization errors not to limit the overall BER.
The proposed RS-PRDFE receiver was inspired by this observation and relaxes the required number of slicers for the PRDFE by
merging the slicers with similar threshold values while keeping
small.
the threshold quantization error
When an RS-PRDFE receiver is constrained with M slicers
, one must choose the M slicer threshold levels
where
is minso that the threshold quantization error e(
s
can take
imized. Note that the set of
only M unique values, meaning that some of the slicer outputs
may be selected for more than one prior bit patterns. Instead of
minimizing the BER expressed in (2) directly, it is easier to minbased on the following
imize the worst case error
approximation:
(3)
The approximation is justified by that
is a very steep func, the
tion and for an extremely low target BER of less than
among
BER is easily dominated by the worst case error e
its
possible values.
The problem of placing M slicer thresholds for the minimum
can be thought of as first clustering
worst case error
values into M disjoint groups and then assigning
the set of
the center point of each group as the slicer threshold (Fig. 5).

..
.

(4)

(5)
With this definition,
is the minimum largest span
ISI levels into
groups. The opachievable for grouping
timal thresholds are given by the center of each groups span.
.
The minimum worst case threshold error is hence
While the described dynamic programming procedure is
guaranteed to find the optimal threshold placement for any
given channel response, it may not be suitable for an online
calibration scheme that can incrementally update the threshold
levels of the individual slicers and their assignments to the prior
bit patterns. One difficulty stems from the fact that the ISI levels
s need to be sorted first, whose resulting order can vary
strongly with the channel characteristics. Until an effective, yet
low-cost scheme of incremental adaptation is found, a possible
solution is to periodically characterize the channel response
(e.g., the single-bit response) and compute the optimal slicer
thresholds and assignments by firmware or software.
A. ADC Threshold Placements for Minimum BER Versus
Minimum Quantization Error
Fig. 6 compares the optimal threshold placements for the
minimum threshold error (i.e., the optimal RS-PRDFE) and for
the minimum signal quantization error with 4 slicers. Notice
the vast difference between the two placements. The optimal
RS-PRDFE places the thresholds near the center of the eye
while the minimum quantization ADC places those near the
signal levels.

2100

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 58, NO. 9, SEPTEMBER 2011

Fig. 7. Simulated bit-error rates of various types of ADC-based DFE receivers


operating at 5 Gb/s on the FR4 channel.

Prior to this work, it was reported that reducing the full-scale


range (FSR) of a uniform ADC can improve the DFE performance, even though the ADC cannot faithfully represent the
signal due to overflows and underflows [12]. Fig. 6 provides an
explanation to these surprising results; a uniform ADC with reduced FSR has the threshold levels resembling those of the optimal RS-PRDFE receiver.
Fig. 7 compares the simulated BER for the four different
types of DFE receivers: the optimal RS-PRDFE, the DFE with
reduced-FSR ADC, the DFE with nonuniform ADC optimized
for minimum quantization error, and the conventional DFE with
uniform ADC. The simulations are carried out with an in-house
statistical link simulation tool, LinkLab [13], which can simulate system BERs given the single-bit response of the channel.
Each analog slicer is assumed to have 10-mV deadband due to
input-referred noise.
hysteresis and metastability and 1-mV
The results show that the RS-PRDFE indeed achieves the
lowest BER. The uniform ADC with reduced FSR achieves
BERs close to the minimum possible values. The results also
demonstrate that minimum quantization error is clearly a poor
criterion for a DFE receiver as the resulting performance is
sometimes even worse than that of the conventional, uniform
ADC. In this case, note that the increase in the slicer count may
even degrade the BER since the minimum quantization error
ADC would place the slicer thresholds even close to the ex, which may be farther from
pected signal levels
.
the optimal levels from the BER perspective
The next section reconfirms this finding from a 10 Gb/s
RS-PRDFE receiver prototype implemented in 65 nm CMOS
technology [14].
IV. EXPERIMENTAL RESULTS
A simplified diagram of the prototype RS-PRDFE receiver is
shown in Fig. 8 along with the photograph of the chip implemented in 65 nm CMOS [14]. The receiver frontend is 4-way
interleaved to achieve 10 Gb/s data rate and can use up to 16

) whose referslicers for RS-PRDFE operation (i.e.,


ence voltages can be individually adjusted within a 100-mV
range in 2-mV steps. Each slicer is running at 2.5 GS/s and consumes 0.75 mW including the clock buffers. The RS-PRDFE
slicer selection is performed by a tap-assignment block, which
routes each slicers decision output to the proper input position of the subsequent 32:1 multiplexer. The 32:1 multiplexer
then forwards the preselected input to the receivers final output
. By performing the
based on the 5-bit prior history
slicer selection in the feedforward path rather than in the feedback path as shown in the conceptual architecture in Fig. 3, one
can shorten the critical feedback path delay around the multiplexer. Furthermore, when implemented as a tree type, the 32:1
multiplexer can be pipelined, exploiting the fact that the selection input bits arrive serially, to achieve a high throughput of
10-Gb/s using synthesized circuits. Also, the tap assignment
block can be utilized to reorder the slicers and minimize the
offset errors due to mismatch [16].
The receiver is tested with a 10 Gb/s, 700 mV
PRBS data pattern transmitted via a 25 -long Nelco backplane channel that has 17 dB loss at 5 GHz. The frequency and
single-bit response (SBR) of the channel is shown in Fig. 9.
To reduce the precursor ISIs and to effectively explore the receiver performance with different single-bit responses, a tunable
prefiltering circuit consisting of a high-pass filter (HPF), a 1-tap
discrete-time FIR filter, and a variable-gain amplifier (VGA) is
implemented in front of the RS-PRDFE receiver. In one setting,
the effective SBR was as shown in Fig. 10(a).
The effective voltage margin of the receiver is measured by
inserting an extra slicer that samples the incoming signal with
an adjustable, deliberate voltage offset. Its output is fed into a
replica datapath which is identical to the main datapath but replaces one of the main slicer outputs with the extra slicer output,
as shown in Fig. 8(a). Since the output data stream of the replica
path should be identical to that of the main path except the data
originating from the extra slicer, the two data streams can be
XORed to measure the voltage margin of the data slicer being
replaced.
Fig. 10(b) compares the measured voltage margin at
of various receiver architectures that can be configured by
the described prototype chip: the RS-PRDFE receiver, loop-unrolling PRDFE receiver, ADC-based receiver with uniform resolution, and ADC-based receiver with reduced full-scale range
(FSR). Since the effective SBR has one large postcursor ISI of
53 mV followed by many small ones [Fig. 10(a)], the possible
ISI offsets and hence the optimal slicer thresholds are clustered
mV and
mV levels. Such uneven distribution
around
of the required ADC thresholds is difficult to realize with a lowresolution uniform ADC and as a result, the uniform ADC receiver performs the worst in Fig. 10(b). Reducing the FSR of the
ADC improves the voltage margin somewhat. The RS-PRDFE
receiver always out-performs both the reduced-FSR ADC and
PRDFE receivers for the same number of slicers used. Especially, the 4-slicer RS-PRDFE achieves the equivalent voltage
margin of the 16-slicer (4-bit) uniformly spaced ADC.
The extra slicer performing the pseudo-BER detection is
triggered by an independently adjustable clock phase and the
effective eye-diagram seen by the DFE receiver can be obtained

KIM et al.: EQUALIZER DESIGN AND PERFORMANCE TRADE-OFFS IN ADC-BASED SERIAL LINKS

2101

Fig. 10. (a) The measured single-bit response of a 24 Nelco backplane


channel with prefiltering. (b) Measured voltage margin versus the number of
slicers (M) for different ADC-based receiver configurations.

Fig. 8. (a) Block diagram of the prototype RS-PRDFE receiver with a voltage
margin detection circuit. (b) Chip photograph. The receiver contains an analog
frontend, a 16-slicer flash ADC with adjustable reference, and a 5-tap digital
DFE. Total active area is 0.26 mm .

Fig. 9. The measured (a) frequency response and (b) 10 Gb/s single-bit response of a 25 Nelco backplane channel.

by measuring the BERs as a function of voltage and timing


offsets. Fig. 11 plots such effective eye diagrams measured for
the receiver configurations mentioned earlier with 4 slicers.
As expected, the RS-PRDFE receiver achieves the widest eye
opening.

Fig. 11. Measured eye diagram of the 4-slicer ADC receivers: (a) the partial
response eye diagrams of the individual 4 slicers; the effective eye diagrams of
(b) RS-PRDFE, (c) reduced-FSR ADC, and (d) uniform quantization ADC.

2102

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 58, NO. 9, SEPTEMBER 2011

Fig. 13. Different approaches to reduce the RS-PRDFE slicer count with linear
equalizers: (a) suppress the far-end ISIs to zeros, (b) suppress any ISIs within
the DFE tap range, (c) make the ISIs specific values to force overlaps in ISI
offsets.

izers. The next section investigates this possibility of optimizing


the RS-PRDFE receiver jointly with linear equalizers, such as
transmit and receive equalizers.
V. JOINT OPTIMIZATION WITH LINEAR EQUALIZERS

Fig. 12. (a) The measured single-bit response with a different prefiltering setting. (b) Measured voltage margin versus the number of slicers (M) for different
ADC-based receiver configurations.

It should be noted that the performance benefits of an


RS-PRDFE receiver strongly depends on the SBR characteristics. For example, Fig. 12 compares the receiver voltage
margins for a different prefiltering setting. For the effective
SBR shown in Fig. 12(a), the RS-PRDFE continues to out-perform the uniform ADC and reduced-FSR ADC receivers, but
it has no performance gain over the PRDFE receiver. It is
because the SBR has the 5 postcursor ISIs with very distinct
values while the previous SBR in Fig. 10(a) has 3 ISIs at 5
mV. The latter distribution of ISIs results in some overlaps
among the possible ISI offsets, allowing the RS-PRDFE to
save slicers for the same BER performance. For instance, if
all the postcursor ISIs had equal magnitudes, the required
number of slicers would grow only linearly with the number of
DFE taps, rather than exponentially as in a PRDFE receiver.
The reduction in the required slicer count leads to savings
in power. The measured power dissipation of this prototype
receiver including the analog frontend was 130 mW (13 pJ/bit)
with 16-slicer RS-PRDFE and only 106 mW (10.6 pJ/bit) with
8-slicer RS-PRDFE configuration.
This observation suggests that it might be possible to further improve the performance of the RS-PRDFE receivers by
shaping the effective SBRs in a certain way with linear equal-

Combining the described RS-PRDFE receiver with linear


equalizers presents interesting new opportunities to further
reduce the slicer count. With linear equalizers either on the
transmitter side or on the receiver side, one can change the
effective channel characteristics to some degree. Then the
question is: how should one shape the channel to achieve the
lowest BER with minimal hardware? A typical answer for
conventional DFE receivers is to minimize the ISIs outside the
DFE tap range, since the number of decision slicers does not
depend on the ISI offset values being subtracted.
In an RS-PRDFE receiver, on the other hand, the number
of required slicers does depend on the distribution of the ISI
offsets created by the N prior bits within the DFE tap range.
As described in Section III, RS-PRDFE receivers leverage the
fact that some ISI offsets can be close to one another and share
a common slicer to reduce the hardware cost. Therefore, it is
also possible to reduce the slicer count not only by making the
channel cleaner (i.e., suppressing more ISIs to zero) but also by
creating deliberate overlaps among the ISI offsets. Fig. 13 illustrates this point. Suppose a channel with three postcursor ISIs.
An RS-PRDFE receiver would need 8 slicers if their ISI offsets
are not sufficiently close to each other. For traditional PRDFE
receivers, the slicer count can be reduced only if the last ISI is
suppressed to zero [Fig. 13(a)]. However, with RS-PRDFE receivers, the slicer count can also be reduced if any ISIs within
the DFE tap range become zero [Fig. 13(b)]. In this case, the
RS-PRDFE receiver acts as a roving-tap DFE whose nonzero
tap positions can freely move within the range. In addition, further reduction in the slicer count can be achieved if the first
and second ISIs become equal values, for example [Fig. 13(c)].
Since this channel generates only 3 unique ISI offsets, 3 slicers
can cancel all the possible ISI offsets.
Which option is the best? The answer depends on the cost of
the linear equalizer shaping the channel for each option listed

KIM et al.: EQUALIZER DESIGN AND PERFORMANCE TRADE-OFFS IN ADC-BASED SERIAL LINKS

2103

Fig. 14. An example circuit implementation of transmit FIR equalizer.

in Fig. 13. Different types of equalizers such as transmit FIR


equalizer, receive FIR equalizer, and receive continuous-time
equalizer have different costs and will be examined. Since the
cost also depends on the amount of change the linear equalizer
should affect, the answer strongly depends on the channel loss
and dispersion characteristics.
A. RS-PRDFE With Transmit FIR Equalizers
Transmit FIR filters are widely used in todays chip-to-chip
and backplane transceivers. One reason is that its high-speed
operation can be easily achieved with low-resolution, digital-to-analog converter like circuits with nonuniform quantization steps. Fig. 14 shows an example of transmit linear
equalizer.
The main drawback of transmit equalizers is that their peak
output swing is constrained. That is, the peak output voltage or
current must not exceed given limits, typically set by the circuit
topology or the operation conditions (e.g., supply voltage). If
the transmit equalizer has an impulse response of
, its peak
s
output is proportional to the sum of all absolute values of
and should be limited to a certain value
(6)
The above (6) implies that every nonzero tap coefficient in the
transmit FIR equalizer leads to the reduction in the main signal
. In other words, the transmit equalizer may signifiswing
cantly degrade signal margins if it tries to alter the channel response too much.
Due to this peak swing constraint, for many channels encountered in chip-to-chip and backplane applications, the
optimal configurations for transmit linear equalizer combined
with RS-PRDFE are generally the ones that use the transmit
LE to suppress the far-end ISIs first. The far-end ISIs refer
to the postcursor ISIs that are positioned far from the main
cursor. Since those ISIs are usually smaller in magnitudes than
the near-end ones (the postcursor ISIs that are nearer to the
main cursor), using the transmit FIR equalizer to suppress them
results in less degradation in the signal margin. On the other
hand, changing the larger ISIs in order to create overlaps among
the ISI offsets may require large tap coefficients and reduce the
main signal swing significantly.
To explore such trade-offs in combining RS-PRDFE with various types of linear equalizers, a few representative backplane

Fig. 15. Measured S-parameters (S21) of two example backplane channels. (a)
A 3 -long trace on a FR4 backplane. (b) A 10.6 -long trace on Elma ATCA
Dual Star backplane. The measurement includes the characteristics of the connectors, line card traces, and the 5 feet-long, low-loss SMA cables.

Fig. 16. The measured 10-Gb/s single-bit responses for the two channels.

channels were chosen. The measured S-parameter characteristics and single-bit responses (SBR) of those channels are shown
in Figs. 15 and 16, respectively. The first channel, a 3 -long
trace on a FR4 backplane has low loss of 15 dB at 5 GHz but
strong reflections while the second channel, a 10.6-long trace
on Elma ATCA Dual Star backplane has the higher dispersion
loss with the total loss being 20 dB at 5 GHz.
The effects of a linear equalizer were emulated by convolving
PRBS time-domain waveforms
the measured 10-Gb/s,
seen at the channel output with the impulse response of the
linear equalizer in MATLAB. The time-domain waveforms
were collected with a sampling oscilloscope with pattern lock
capability (Agilent DCA-J 86100C). The time and voltage
resolutions were 6.25 ps (32 points per unit interval) and 1 mV,
respectively. Since the measured waveforms include noises,
this procedure can predict the noise enhancement effects of
certain linear equalizers as well.
The signal quality seen by the RS-PRDFE receiver can be
visualized by constructing an effective composite eye diagram,
as illustrated in Fig. 17. The eye diagrams for the individual
slicers were first composed by accumulating the input traces
only when the corresponding slicer output was selected as the
received bits. Then, these individual eye diagrams were folded
into one after adjusting their decision thresholds to net zeros.

2104

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 58, NO. 9, SEPTEMBER 2011

Fig. 17. The construction of the effective eye diagram for an RS-PrDFE receiver. The eye diagrams seen by the individual decision slicers (left) are folded
onto a single eye diagram, after being adjusted for their different slicer threshold
levels (right).

Fig. 18. The benefits of combining transmit FIR equalizers with RS-PRDFE.
(a) Equalized eye diagrams by the optimal transmit FIR filters. (b) Effective eye
diagrams seen by the optimal 4-slicer RS-PRDFE receivers.

Fig. 19. The simulated signal margins versus the number of slicers (M) of the
RS-PRDFE receivers with and without the transmit FIR equalizers. (a) 3 FR4
channel. (b) 10.6 ATCA channel.

The opening in the resulting effective eye diagram indicates the


voltage and timing margins of the RS-PRDFE receiver.
Fig. 18 shows the eye diagrams after the jointly optimized
transmit equalizer and the effective eye diagrams seen by the
RS-PRDFE receiver with 4 effective taps and 4 slicers (i.e.,
and
) for the two example channels described. For both
channels, the optimal transmit equalizers suppress mainly the
far-end ISIs and widen the eye openings. The benefits of combining the transmit equalizer with RS-PRDFE are also shown
in Fig. 19, which plots the simulated signal margins of the optimal 4-tap RS-PRDFE receiver with different number of slicers.
For both channel examples, the RS-PRDFE receiver with only
4 slicers can achieve comparable signal margins with those of a
4-tap PRDFE receiver that would require 16 slicers.

The difference is that the equalization is applied after the signal


has been attenuated by channel rather than before. However,
boosting the high-frequency content of the received signal can
enhance noise, degrading the SNR and the BER.
At the moment, one of the most widely used receive equalizer
types is the continuous-time, linear equalizer (CTLE) whose
example circuit is shown in Fig. 20(a). This circuit realizes a
high-pass filter by enhancing the transconductance of the input
stage at high frequencies. However, most CTLEs used in practice are of low orders, lacking enough degrees of freedom to
shape the channel responses into the ones in Fig. 13(b) or (c). Instead, their high-pass characteristics are best utilized when improving the channel bandwidth and thus suppressing the far-end
ISIs first. The remaining near-end ISIs are handled by the DFE.
It is also possible to implement an FIR equalizer on the receiver side, of which example is shown in Fig. 20(b) [1]. This
circuit looks similar to the transmit equalizer circuit in Fig. 14
except that the input to each stage is a discrete-time sampled

B. RS-PRDFE With Receive Linear Equalizers


LEs on the receiver side, on the other hand, are not subject
to a similar peak swing constraints to the transmit equalizers.

KIM et al.: EQUALIZER DESIGN AND PERFORMANCE TRADE-OFFS IN ADC-BASED SERIAL LINKS

2105

Fig. 20. Example implementations of receive linear equalizers. (a) Continuoustime linear equalizer (CTLE). (b) Receive FIR equalizer.

Fig. 21. The benefits of combining receive FIR equalizers with RS-PRDFE.
(a) Equalized eye diagrams by the optimal receive FIR filters. (b) Effective eye
diagrams seen by the optimal 4-slicer RS-PRDFE receivers.

analog input instead of a full-swing digital one. Unlike CTLEs,


this discrete-time FIR filter has the ability to individually adjust
the tap coefficients. For example, the first postcursor tap of this
circuit in Fig. 20(b) is adjusted by the current amplitude .
Since a receive FIR equalizer can adjust the individual tap
weights without being subject to a peak swing constraint, it
can explore the opportunities illustrated in Fig. 13(b) or (c).
Fig. 21 shows the eye diagrams after the jointly optimized receive FIR equalizers and their effective eye diagrams seen by
the RS-PRDFE receiver. For the FR4 channel, the optimal receive equalizer suppresses the far-end ISIs first, making the third
and fourth ISIs zero values. However, for the ATCA channel,
it is interesting to note that the FIR equalizer suppresses the
second and third postcursor ISIs, leaving the fourth ISI positive.
Nonetheless, for both cases, Fig. 22 shows that the RS-PRDFE
can effectively cancel the remaining ISI with only
with
4 slicers and their achieved signal margins are superior to those
with the 4-slicer PRDFE receivers.
C. Digital Versus Analog Receive Linear Equalizers
In the discussions so far, the receive FIR equalizer showed
the highest flexibility in reducing the required slicer count in

Fig. 22. The simulated signal margins versus the number of slicers (M) of the
RS-PRDFE receivers with and without the receive FIR equalizers. (a) 3 FR4
channel. (b) 10.6 ATCA channel.

an RS-PRDFE receiver. However, implementing an analog FIR


filter with wide signal bandwidth may incur large power consumption [1]. Therefore, one may be interested in the potential
benefits of implementing the receive FIR equalizer in digital domain, after the received signal is quantized by an ADC.
Analysis shows that implementing the receiver FIR equalizer
in digital domain does more harm than good unless the ADC
resolution is sufficiently high. This is evidenced by the simulation results shown in Fig. 23, which plot the signal margins
of the optimal RS-PRDFE receivers combined with analog
and digital receive FIR equalizers. For both cases, for slicer
counts less than 16 (i.e., less than 4 bits of ADC resolution),
an RS-PRDFE with an analog FIR equalizer outperforms one
with a digital equalizer. It is because the RS-PRDFE and
digital FIR equalizers have conflicting requirements on the
ADC threshold placements. The former needs placement for
minimum threshold error [Fig. 6(a)] while the latter needs
placement for minimum signal quantization error [Fig. 6(b)].
This gap is more pronounced at lower resolution ranges.

2106

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 58, NO. 9, SEPTEMBER 2011

ACKNOWLEDGMENT
The authors would like to thank Dr. Ravi Kollipara and My
Nguyen for measuring the response characteristics of various
backplane channels used in this work.
REFERENCES

Fig. 23. Performance comparison of the analog and digital receive FIR equalizers when combined with RS-PrDFE. (a) 3 FR4 channel. (b) 10.6 ATCA
channel.

Therefore, a practical way of implementing a high- performance receiver with a low-resolution ADC is to combine the
proposed RS-PRDFE with an analog-type linear equalizer.

VI. CONCLUSION
This paper introduced a way of designing high-performance
equalizing receiver with low-resolution ADCs. The quantization thresholds of the ADC may have to be individually adjusted and optimized for the best signal margins rather than for
the least quantization errors. The described RS-PRDFE receiver
with only 4 slicers demonstrated the equivalent performance to a
receiver with 34-bit uniformly quantizing ADC. It also showed
some synergistic effects of combining RS-PRDFE with LEs, especially with the receive FIR equalizers. It was shown to be
preferable to leave the LEs in analog domain, since the DFE
and LE have conflicting requirements on the ADC quantization
thresholds.

[1] C.-K. K. Yang and E.-H. Chen, ADC-based serial I/O receivers, in
Proc. Custom Integr. Circuits Conf., Sep. 2009, pp. 323330.
[2] H. Chung and G.-Y. Wei, Design-space exploration of backplane
receivers with high-speed ADCs and digital equalization, in Proc.
Custom Integr. Circuits Conf., Sep. 2009, pp. 555558.
[3] M. Harwood et al., A 12.5 Gb/s SerDes in 65 nm CMOS using a
baud-rate ADC with digital receiver equalization and clock recovery,
in ISSCC Dig. Tech. Papers, Feb. 2007, pp. 436437.
[4] O. E. Agazzi et al., A 90 nm CMOS DSP MLSD transceiver with
integrated AFE for electronic dispersion compensation of multimode
optical fibers at 10 Gb/s, J. Solid-State Circuits, pp. 29392957, Dec.
2008.
[5] P. Schvan et al., A 24 GS/s 6 b ADC in 90 nm CMOS, in ISSCC Dig.
Tech. Papers, Feb. 2008, pp. 544545.
[6] Y. M. Greshishchev et al., A 40 GS/s 6 b ADC in 65 nm CMOS, in
ISSCC Dig. Tech. Papers, Feb. 2010, pp. 390391.
[7] J. Cao et al., A 500-mW ADC-based CMOS AFE with digital calibration for 10 Gb/s serial links over KR-backplane and multimode fiberd,
J. Solid-State Circuits, pp. 11721185, Jun. 2010.
[8] B. Murmann, A/D converter trends: Power dissipation, scaling and
digitally assisted architectures, in Proc. Custom Integr. Circuits Conf.,
Sep. 2008, pp. 105112.
[9] S. Kasturia and H. J. Winters, Techniques for high-speed implementation of non-linear cancellation, IEEE J. Sel. Areas Commun., vol. 9,
no. 5, pp. 711717, Jun. 1991.
[10] Y.-S. Sohn et al., A 2.2-Gbps CMOS look-ahead DFE receiver for
multidrop channel with pin-to-pin time skew compensation, in Proc.
Custom Integr. Circuits Conf., Sep. 2003, pp. 473476.
[11] V. Stojanovic et al., Adaptive equalization and data recovery in a dualmode (PAM2/4) serial link transceiver, in Proc. VLSI Circuits Symp.,
Jun. 2004, pp. 348351.
[12] E.-H. Chen et al., Adaptation of CDR and full scale range of ADCbased Serdes receiver, in Proc. VLSI Circuits Symp., Jun. 2009, pp.
1213.
[13] D. Oh et al., Accurate system voltage and timing margin simulation
in high-speed I/O system designs, IEEE Trans. Adv. Packag., vol. 31,
no. 4, pp. 722730, Nov. 2008.
[14] E.-H. Chen et al., 10 Gb/s serial I/O receiver based on variable reference ADC, in Proc. VLSI Circuits Symp., 2011, in review.
[15] J. Kim et al., Equalizer design and performance trade-offs in ADCbased serial links, in Proc. Custom Integr. Circuits Conf., Sep. 2010,
pp. 18.
[16] C. Donovan and M. P. Flynn, A digital 6-bit ADC in 0.25- m
CMOS, IEEE J. Solid-State Circuits, vol. 37, no. 3, pp. 432437,
Mar. 2002.
Jaeha Kim (S94-M03-SM10) received the B.S.
degree in electrical engineering from Seoul National
University (SNU), Seoul, Korea, in 1997, and
received the M.S. and Ph.D. degrees in electrical
engineering from Stanford University, Stanford, CA,
in 1999 and 2003, respectively.
From 2001 to 2003, he was with True Circuits,
Inc., Los Altos, CA as Circuit Designer; with
Inter-university Semiconductor Research Center
(ISRC), SNU, as Postdoctoral Researcher from
2003 to 2006; with Rambus, Inc., Los Altos, CA
as Principal Engineer from 2006 to 2009; and with Stanford University, CA
as Acting Assistant Professor from 2009 to 2010. He is currently Assistant
Professor at SNU and his research interests include low-power mixed-signal
systems and their design methodologies.
Prof. Kim is a recipient of the Takuo Sugano award for outstanding far-east
paper at 2005 International Solid-State Circuits Conference (ISSCC) and the
Low Power Design Contest Award at 2001 International Symposium on Low
Power Electronics and Design (ISLPED). He served on the technical program
committees of Design Automation Conference (DAC), International Conference
on Computer Aided Design (ICCAD), and Asian Solid-State Circuit Conference
(A-SSCC).

KIM et al.: EQUALIZER DESIGN AND PERFORMANCE TRADE-OFFS IN ADC-BASED SERIAL LINKS

E-Hung Chen (S05M06) was born in Taipei,


Taiwan. He received the B.S. degree in electrical
engineering from National Taiwan University and
the M.S. degree in electrical engineering from the
University of California, Los Angeles (UCLA),
respectively. He is currently working toward the
Ph.D. degree at UCLA.
Since 2005, he has been with Broadcom, Rambus,
and Texas Instruments as a summer intern working
on channel equalization technique and receiver modeling. His research interests are high-speed serial link

2107

Patrick Satarzadeh (S04-M09) received the B.S.,


M.S., and Ph.D. degrees in electrical and computer
engineering from the University of California, Davis,
in 2004, 2006, and 2009, respectively.
He has held internships with MIT Lincoln Lab,
Lexington, MA, and Rambus, Inc., Sunnyvale, CA.
In 2009, he joined Texas Instruments Inc., Dallas,
TX, as a Member of the Technical Staff, where he
has since worked on continuous time delta-sigma
data converters. His research interests include signal
processing and mixed signal circuit design.

design and its adaptation.

Jihong Ren (S03M06) received the Ph.D. degree


in computer science from the University of British
Columbia, Vancouver, Canada, in 2006, where she
worked on optimal equalization for chip-to-chip
high-speed buses.
She has been with Rambus, Sunnyvale, CA, since
January 2006, where she has worked on equalization
algorithms and link performance analysis.

Brian Leibowitz (S97-M05) was born in New


Jersey in 1976. He received the B.Sc. degree in
electrical engineering from Columbia University,
New York, in 1998 and the Ph.D. degree in electrical
engineering and computer science from the University of California, Berkeley, in 2004, where his
doctoral research included the development of a fully
integrated CMOS imaging receiver for free-space
optical communication.
Since 2004 he has been with Rambus, Inc., Sunnyvale, CA, where he has worked on equalization and
mixed-signal circuit design for a variety of high-speed and low power serial
links and memory interfaces.
Dr. Leibowitz received the Edwin H. Armstrong Award from Columbia University. His graduate studies at Berkeley were supported by a fellowship from
the Fannie and John Hertz Foundation.

Jared Zerbe (M90-SM10) was born in New York


City in 1965. He received the B.S. degree in electrical
engineering from Stanford University, Stanford, CA,
in 1987.
From 1987 to 1992, he worked in circuit design at
VLSI Technology and MIPS Computer Systems. In
1992 he joined Rambus Inc., Sunnyvale, CA, where
he has since specialized in the design of high-speed
I/O, PLL/DLL clock recovery, and equalization
and data-synchronization circuits. He has authored
or coauthored over 30 papers and over 50 patents
in the area of high-speed signaling and clocking and has taught courses at
Berkeley and Stanford in high-speed serial link design. He is currently a
Technical Director at Rambus where he is focused on development of future
high-performance and low-power signaling technologies.

Chih-Kong Ken Yang (S94-M98-SM07-F10)


was born in Taipei, Taiwan. He received the B.S. and
M.S. degrees in electrical engineering in 1992 and
the Ph.D. degree in electrical engineering in 1998
from Stanford University, Stanford, CA in electrical
engineering.
He joined the University of California, Los Angeles, as an Assistant Professor in 1999 and has been
a Professor since 2009. His current research area
is high-performance mixed-mode circuit design for
VLSI systems such as clock generation, high-performance signaling, low-power digital functional blocks, and analog-to-digital
conversion.

Potrebbero piacerti anche