Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Dissertation Report
on
High Precision Duty Cycle Correction
Circuit Design for 5.2GHz Speed I/O
Applications
Submitted to the Department of Electronics Engineering
in Partial Fulfillment of the Requirements for the Degree of
Master of Technology
(VLSI & Embedded Systems)
by
Upadhyay Bhargav S
(P13VL003)
Guided by
Dr. A. D. Darji
Assistant Professor, ECED
&
CERTIFICATE
This is to certify that Upadhyay Bhargav S, (Adm. No. P13VL003), Full-time
M.Tech student has presented his Dissertation Preliminary Report on High Precision
Duty Cycle Correction Circuit Design for 5.2GHz Speed I/O Applications , in par-
tial fulfillment of the requirement for the award of the degree Master of Technology in
Electronics Engineering with specialization in VLSI & Embedded Systems during
the year 2014 - 15.
Bhargav Upadhyay
SVNIT, Surat
July, 2015
v
Abstract
With enhancement in the technology, requirement for high speed data communication
becomes very important. The data communication system has receiver (Rx) and trans-
mitter (Tx) at physical layer. To achieve high speed date rate ( 10 gbps) between
Rx and Tx data should be sampled at this rate. To sample data at high data-rate, half
rate clock data recovery (Rx)/sampling (Tx) circuit is used. This circuit samples data
on both the edges of clock to double the data rate. For such circuits which operates
on both the edges, 50% duty cycle is very important. To sample the data correctly it
should meet setup and hold margin specifications of sampling flop. In the present day
technology setup and hold margin ranges in 20-50 ps including all uncertainty of man-
ufacturing process. The application for which duty cycle correction circuit is designed
is operating at 5.2 GHz (period = 192.3 ps). At such high frequency if duty cycle degra-
dation is there then probability of false sampling is very high which eventually results
in poor performance of over all data communication system. To avoid this duty cycle
correction is needed before feeding clock to the sampling circuit and DCC circuit plays
crucial role in robust and effective performance of data communication system.
The main cause of degradation is buffers used in the clock paths. Mismatch between
driving strength of nMOS and pMOS of the buffer changes rising and falling slopes
of the clock signals which eventually results in duty cycle change. This mismatch is
mainly duo to manufacturing and process variations, temperature variation also plays
part in degradation but very less compare to process variation.
In this report, a DCC architecture is presented to correct duty cycle of clock fre-
quency 5.2 GHz. Duty cycle adjuster and Duty cycle corrector circuits are chosen to
have minimum chip area and power consumption. As calibration of this DCC is done
during start up of the application, correction time require to correct the duty cycle is
not concern here. As operating frequency is very high it is challenging to get accuracy
up to 0.1 % (0.2ps). The design of the circuit is expected to meet 0.1% accu-
racy and should support 5% (10ps) correction range across 9 Process, Voltage and
Temperature (PVT) variations.
vii
Declaration
I declare that this written submission represents my ideas in my own words and where
others ideas or words have been included, I have adequately cited and referenced the
original sources. I also declare that I have adhered to all principles of academic honesty
and integrity and have not misrepresented or fabricated or falsified any idea or data or
fact or source in my submission. I understand that any violation of the above will be
cause for disciplinary action by the Institute and can also evoke penal action from the
sources which have thus not been properly cited or from whom proper permission has
not been taken when needed.
ix
Table of Contents
Page
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Chapters
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Importance of DCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Cause of Degradation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Design Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Basic Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Organization of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Digital Duty Cycle Corrector . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Basic Digital DCC Architecture . . . . . . . . . . . . . . . . . 7
2.1.2 Recent Architecture of Digital DCC . . . . . . . . . . . . . . . 9
2.2 Analog Duty Cycle Corrector . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Basic Architecture of Analog DCC . . . . . . . . . . . . . . . 13
2.2.2 Recent Architecture of Analog DCC . . . . . . . . . . . . . . . 14
2.3 Mixed Mode Duty Cycle Corrector . . . . . . . . . . . . . . . . . . . . 16
2.3.1 Basic Architecture of Mixed Mode DCC . . . . . . . . . . . . 17
2.3.2 Recent Architecture of Mixed Mode DCC . . . . . . . . . . . . 19
2.4 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Architecture Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Proposed Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1 Duty Cycle Adjuster . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.1 Coarse Correction Circuit . . . . . . . . . . . . . . . . . . . . 27
3.1.2 Fine Correction Circuit . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Single to Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Low pass filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Comparator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.1 Design of Dynamic Comparator Circuit . . . . . . . . . . . . . 33
3.5 Digital Feedback Mechanism . . . . . . . . . . . . . . . . . . . . . . . 34
xi
Table of Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
xii
List of Figures
xiii
List of Figures
xiv
List of Figures
4.35 Corrected Duty Cycle After Pre-layout Close loop Correction Across
PVT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.36 Corrected Duty Cycle After Post-layout Close loop Correction Across
PVT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
xv
List of Tables
xvii
Chapter 1
Introduction
With enhancement in the technology, requirement for high speed data communication
becomes very important. Receiver (Rx) and transmitter (Tx) are the part of physical
layer of the data communication system. To support transfer of data at high data rate (
10gbps) either clock of equal frequency is needed or data can be sampled on both the
edges of clock. It is very difficult to generate stable clock with frequency in terms of
tens of GHz. Thus practical solution is to design a hardware so that it can sample data at
both the edges of clock and transfer rate can be twice of clock frequency. For this pur-
pose half rate clock data recovery (in Rx) / sampling (in Tx) circuit is used. Using both
the edges of clock helps to double the data rate but in this case duty cycle of the clock
becomes very crucial especially when the clock period is several pico seconds.Here
the application for which Duty Cycle Correction(DCC) circuit is designed has clock
frequency of 5.2GHz (clock period=192.3ps, data-rate = 10.4 gbps). In present day
technologies uncertainty margin (time window in which flop may sample false data) of
flops approximately varies from 40 to 80 ps. In case of duty cycle degradation there is
very high chance that data sample by Half clock rate data sampling circuit is incorrect,
this fact is explained in more detail in next section. This false data sampling directly
affects performance of the data communication system. Thus for robust and effective
performance of high speed I/O applications design of DCC circuit is very crucial. Based
on the application, design constrains of DCC varies widely, e.g. Half rate Clock Data
Recovery (CDR) application in high speed serial I/O (10 Gbps) requires DCC with
very high precision (10 20 ps) and correction cycles can be relaxed (DCC settles dur-
ing booting of system). Whereas DCC designed for double sampling ADC expected to
support some frequency range, with optimum hardware and power consumption (Spe-
cially ADC used in SoC). Thus researchers are putting their efforts in designing DCC
with (1) high correction accuracy, (2) less correction time, (3) less chip area, (4) less
power consumption, (5) high frequency/ large frequency range, (6) support high correc-
tion range.
1
Chapter 1. Introduction
50% duty cycle. It is clear from the figure that setup and hold margin will be same for
both on and off time of the clock. In Fig.1.3 the clocks duty cycle is not 50%, in this
case setup and hold margin is different for both, on and off time. Now this become very
critical when clock frequency is very high. Here clock period is 192.3 ps thus half cycle
period is 96.15 ps, if duty cycle is degraded by 10% then on time equals to 76.92 ps.
Approximative uncertainty margin of flops varies in the range of 40 to 80 ps. Thus with
degraded duty cycle it is quite possible to read false data due to setup or hold violation.
2
1.3. Design Parameters
1. Operating Frequency
3. Correction Accuracy
4. Correction Range
Fig.1.4 shows that how each parameter is interrelated with all others and creates trade-
off in the design.
3
Chapter 1. Introduction
1.5 Classification
DCC can be classified in to: Digital DCC, Analog DCC and Mixed Mode DCC. This
classification is based on the approach used for implementing DCC. As shown in Fig.1.6
digital DCC can be implement in, with and without feedback. Without-feedback mech-
anism corrects duty cycle with very less correction time but it does not provide correc-
tion against PVT variation and correction accuracy is also less compared to feedback
mechanism. Each approach is discussed in detail in the chapter 2 with performance
comparison.
4
1.6. Thesis Contribution
1. Technology : 28nm
2. Simulator : Specter
4. Correction Range: 50 5 %
6. Correction Time: 5 s
Correction range parameter for design is calculated based on (i) Possible Degradation
by Process variation and (ii) Possible degradation by manufacturing variation, addition
of both this gives required correction range for DCC.
5
Chapter 1. Introduction
6
Chapter 2
Literature Review
As shown in section 1.5 DCC can be classified in Digital DCC, Analog DCC and Mixed
Mode DCC. In this chapter varies architectures available for these approaches will be
presented followed by a performance comparison. In all sections a vary basic architec-
ture is discussed first to explain basic working mechanism followed by latest architec-
tures. Chapter is divided in sections based on the classification.
Most of the digital DCC [1] [7] [2] uses positive edge of input clock and Clock Period of
input clock to generate output clock with 50 % duty cycle. For this approach controlled
delay line is used as DCA. Phase or Frequency detector mainly used as duty cycle
detector. Some circuits [2] uses Time to Digital (TDC) for fast locking. Also latest
proposed architectures [2] [7] are having delay line with coarse and fine tuning which
helps to achieve higher accuracy with lesser chip area and hardware cost. In digital
DCC output clock is phase aligned with input clock [1] [7] [2] [8], thus digital DCC is
more suitable for applications having multiphase phase clocks requirements [8]. Digital
DCC stores the correction code in registers thus when DRAM is being operated in
power-down mode it can hold correction information whereas in analog DCC feedback
voltage to the DCA stored in capacitor which cannot be maintained during power-down
mode.
Digital DCC uses Finite State machine (FSM) in the feedback path which used to
update and store the correction code. Thus applications which support power down
mode (No input clock while circuit is not in use) digital DCC is more suitable compare
to Analog DCC in which feedback signal is usually voltage signal stored in capacitor.
Because when circuit starts again from power down mode, in digital DCC correction
is not required again where as in Analog DCC as feedback voltage gets change due to
discharging of capacitor, correction needed to be done again.
Fig. 2.1 shows the basic architecture of digital DCC. This is the most basic architecture
without any additional features like Fast locking, Fine tuning.
7
Chapter 2. Literature Review
8
2.1. Digital Duty Cycle Corrector
Working Mechanism
From Fig.2.1 and 2.2 it is easy to understand that output of the clock generator (clock
C) block goes high at the rising edge of the input clock. Clock C will be at logic 0 at
the rising edge of clock B. Clock B is delayed version of Clock A. Thus the rising and
falling edge of output clock is controlled by rising edge of input clock. By adjusting
delay of delay line equal to half of the clock period output clock duty cycle set to 50 %.
The duty cycle detector detects the duty cycle of output clock. The delay line used here
is consist of five stage inverters with 4-bit binary weighted NMOS capacitors placed at
the output of each inverter. The delay of each stage is controlled by a 4-bit control signal
coming from the duty cycle detector. The correction range supported by the circuit is
limited by the minimum pulse width that can be passed through the delay line.
1. Use of Delay Recycled architecture which reduces the required length of de-
lay line to (0.5 * Clock period) thus with the same operating frequency, power
consumption and chip area are less compare to conventional circuit.
2. Delay line used with coarse and fine tuning for higher accuracy also tuning range
is set in such a way that delay increases monotonically with increase in control
codes which reduces Jitter at the output clock
3. Balanced rise time and fall time delay line architecture makes it has a high toler-
ance to the unbalanced process variations (SF and FS corners).
Fig.2.3 shows the block diagram of latest Digital DCC. The block diagram composed of
pulse generator (PG), a half-cycle delay line (HCDL), a phase and frequency detector
(PFD), a ADDCC controller, a Time-to-Digital converter(TDC) encoder and a D-Flip-
Flop. The HCDL is composed of a 5-bit fine-tuning delay line (FDL) and a 4-bit TDC-
embedded coarse-tuning delay line (CDL). The total correction range covered by FDL is
always equal to one coarse tuning delay step at all PVT variations. Thus this delay line
always gives a monotonic response between the delay line control code (ctrl code[8:0])
and the output delay.
Working Mechanism
Fig.2.4 shows the overall timing diagram of the proposed architecture. Working of the
architecture is described in the below given steps. After the circuit is reset,
9
Chapter 2. Literature Review
PG generates the short pulses from the input clock (CLK IN), and the coarse-tuning
control code (ctrl code[8:5]) of the HCDL is set to the maximum value (i.e. 4b1111) at
this cycle. Subsequently, the short pulses propagate through the HCDL.
At the next rising edge of the input clock (CLK IN), the TDC captures the propa-
gated pulse signals and stores as tdc data [15:0].
The signal is half cycle is used to determine if the period of the input clock (CLK IN)
is larger than the maximum delay time of the HCDL.
The TDC encoder will search for the bit location of the first 1 in tdc data[15:0] from
the most significant bit to the least-significant bit.
The TDC encoder outputs the initial delay control code (tdc code[3:0]) for the AD-
DCC controller to achieve fast lock-in.
After setting the initial control code, the proposed ADDCC increases or decreases
the delay line control code (ctrl code[8:0]) according to the PFDs output until the output
clock (CLK OUT) is phase aligned with the input clock (CLK IN).
A binary search scheme is adopt in the ADDCC controller to speed up the fine-
tuning process.
Whenever the PFDs output is changed from UP to DOWN or vice versa, the search
step (step[4:0]) is divided by 2 until the step is reduced to 1.
Once the step is equal to one fine-tuning control code, the ADDCC is locked.
As shown in Fig.2.5, when input clock period is grater than the maximum delay pro-
vided by the delay line then it is considered as low frequency signal for given archi-
tecture. In this case before coming next rising edge the small pulse propagates through
delay line trigger the DFF then generate the feedback pulse (fb pulse). The signal
is half cycle is pulled high in this case. In Figure 2.5, the first 1 bit location of tdc data
10
2.2. Analog Duty Cycle Corrector
Figure 2.4: Overall Timing Diagram of Latest Digital DCC Architecture [2]
[15:0] from the most significant bit to the least significant bit is 3. Therefore, the period
of the input clock (CLK IN) can be quantized as 20 (=16+4) coarse-tuning delay units
delay time. Since the lock condition is to have half delay than the input clock period,
the tdc code[3:0] outputs by the TDC encoder is 10(=20/2) in this case.
As shown in Fig.2.6,when input clock period is less than the maximum delay provided
by the delay line then it is considered as high frequency signal for given architecture.
In this case small pulse is not able to propagate fully through the delay before the next
rising edge of the input clock signal comes. Thus the signal is half cycle is having value
zero. In Figure 2.6, the first 1 bit location of tdc data [15:0] from the most significant
bit to the least-significant bit is 11. Therefore, the period of the input clock (CLK IN)
can be quantized as 12 coarse-tuning delay units delay time. In addition,to have delay
of delay line equal to half clock period, the tdc code[3:0] is 6(=12/2) in this example.
11
Chapter 2. Literature Review
Figure 2.5: Timing Diagram at Low frequency of Latest Digital DCC Architecture [2]
Figure 2.6: Timing Diagram at High frequency of Latest Digital DCC Architecture [2]
12
2.2. Analog Duty Cycle Corrector
input clock signal thus corrected output signal doesnt hold phase information of input
clock.
Fig.2.8 shows the delay cell with tunable internal rising and falling times for pulse
shrinking/stretching delay line. The transistors M1 and M2 are used to adjust the sourc-
ing current Ip and sinking current In respectively. The pulse shrinking/stretching is done
based on the Value of Vbias . Logical explanation correction mechanism is described in
table 2.1. Dummy transistors M3 and M4 are used to shape the output waveforms.
Waveforms to understand shrinking and stretching mechanism are given in Fig.2.9.
13
Chapter 2. Literature Review
Fig.2.10 shows latest architecture of analog DCC. This architecture is proposed in 2014.
In this offset cancellation approach is used for duty cycle adjustment. Thus frequency
of input clock is limited only due to loop gain of the architecture. There is a CML-
CMOS buffer is used after DCA to have full swing at the output. Duty Cycle detection
if done by differential Duty Amplifier through which an analog feedback voltage is
generated which is used to set DC level of input clock signal in such a way that after
passing it through CML-CMOS buffer, the output signal should have 50% duty cycle.
14
2.2. Analog Duty Cycle Corrector
Working Mechanism
Fig.2.11 shows the circuit diagram of the DCA, which is composed of the AC coupled
buffer to reset common mode voltage, the two NMOS differential-pair amplifiers to
correct the input offset and the CMLCMOS buffer. CML-CMOS buffer is simply a level
converter. The Duty Cycle Detection is performed within the differential duty amplifier
which causes the feedback loop architecture to provide high operating frequency, high
15
Chapter 2. Literature Review
loop stability, and a wide duty cycle correction range. As shown in Fig.2.12, in order to
track the duty error rapidly, the duty detector is designed with a cross-coupled circuit,
which can increase their isolation, thus improving the duty-correction accuracy and
achieving a large gain bandwidth. This differential duty amplifier and loop filter convert
the duty cycle error of the complementary waveform into offset voltages. The loop filter
outputs the differential voltage which feeds back to the second differential-pair amplifier
in the Duty Cycle Amplifier, and the DCA adjusts the duty cycle error every cycle until
the output duty cycle is balanced at 50%.
16
2.3. Mixed Mode Duty Cycle Corrector
higher accuracy and operating frequency of Analog DCC, Fast locking and digitally
saved correction codes of digital DCC).
Working Mechanism
Fig.2.14 shows flow chart of working of this architecture. Working can be divided in
to two parts, First it detects whether the duty cycle of the input clock cycle is greater
or less than 50%. Based on this it makes sign bit of SAR controller 0 or 1. Then
according to binary search algorithm SAR controller sets the delay of the delay line to
get 50% duty cycle at the output. Waveforms of working for 6bit SAR controller is
shown in Fig.2.15.
17
Chapter 2. Literature Review
18
2.3. Mixed Mode Duty Cycle Corrector
Figure 2.15: Timing diagram of the 6-bit SAR-DCC with input clock duty cycle is
39% [5]
2. Duty cycle correction done by Digital feedback loop by SAR controller using
19
Chapter 2. Literature Review
3. To provide correction for real time temperature changes, digital feedback loop is
used and SAR controller uses sequential search algorithm.
Working Mechanism
As shown in Fig.2.16, feedback path includes dual loop: digital feedback loop and
analog feedback loop. In the analog feedback loop, the charge pump (CP) generates
the analog control voltage Vctrl/Vctrlb based on the duty-cycle of the output clock
(OU TCLK /OU T bCLK ). The control voltage Vctrl/Vctrlb is also used in the digital feed-
back loop to generate the digital control voltage VDctrl/VDctrlb. The digital feedback
loop consists of a comparator, an 8-bit digital-to-analog converter (DAC) and an 8-bit
SAR. The comparator generates the up or down signals depending on the CP outputs.
The operating frequency of SAR clock (SAR CLK) is 1/64 of the input clock frequency
. This slow SAR using binary search scheme gives enough timing margin for the ana-
log charge pump and DAC operation, resulting in wider duty-cycle correction range and
minimized integrated errors in duty-cycle without increasing the lock time. By adapting
binary search algorithm, the digital output Q[7:0] of the SAR is then used for the DAC
input. The 8-bit DAC provides the quantized bias current IDAC/IDACb to generate
20
2.3. Mixed Mode Duty Cycle Corrector
VDctrl/VDctrlb. The two control voltages, Vctrl/Vctrlb and VDctrl/VDctrlb, are then
used for the DA to correct the clock duty-cycle of the input clock, INCLK/INbCLK.
Figure 2.17: Flow Chart of Latest Mixed Mode DCC Architecture [6]
The Duty Amplifier corrects external differential input clock signals with duty-cycle
distortions and generates a small-swing 50% duty-cycle clock as output. The level
converter, which converts a small swing to full-swing, produces a full-swing output
clock signal, (OU TCLK /OU T bCLK ).
When the DCC is enabled, the digital and analog feedback block starts together.
Since the analog feedback block has a fast duty-correction capability by increasing the
gain of the CP , the output clock duty-cycle is corrected to 50% in about only 40 clock
cycles in this design. Then the digital feedback block with an initial value of the 8-bit
SAR Q[7:0]=[10000000] slowly replaces the analog feedback block.
Fig.2.17 shows the flowchart of the mixed search (binary + sequential) algorithm.
At the end of the binary search mode, the DCC enters into the sequential search mode
automatically . By converting the binary search SAR into a sequential search counter
after the first DCC lock-in, this architecture keeps the closed-loop characteristic and
21
Chapter 2. Literature Review
Parameters Architecture
- Basic Recent Basic Recent Basic Recent
Digi- Digi- Ana- Ana- Mixed Mixed
tal [1] tal [2] log [3] log [4] Mode [5] Mode [6]
CMOS 180nm, 90nm, 1V 350 nm 55 nm, 130 nm, 180nm,
Technol- 1.8V 1.2V 1.2 V 1.8V
ogy
Correction 20%-80% 20%-80% 30%-70% 20%-80% 40% - 15% -
Range 60% 85%
Duty Cy- 500.25% 1.4 @ 450 501% 500.1% 501% 500.86%
cle Error MHz, 1.9 @ 2 Ghz
@ 1 GHz
Operating 0.8-1.7 450 Mhz - 3 Mhz - 1 - 5 Ghz 312.5 0.5 - 2
Fre- Ghz 1GHz 660 MHz Mhz - GHz
quency 1Ghz
Power 3.2 mW 1.7 mW 1.1 mW 3.6 mW 3.2 mW 3.8 mW
Con- @ 450 @ 3 Ghz @ 1GHz @ 1 Ghz
sumption Mhz, 3.45
mW @ 1
GHz
Chip 100m x 0.0049 0.06 mm2 0.00174 0.048 0.075
Area 75m mm2 mm2 mm2 mm2
22
2.5. Architecture Summary
circuits used in high speed I/O are designed to get high accuracy with minimum chip
area and power consumption. Thus DCA architecture design plays an important role.
Table 2.3 shows brief summery of various DCA architectures used in DCC circuits
discussed in this chapter.
After reviewing several architecture it can be concluded that for high precision duty
cycle correction circuit, DCA block should be designed to meet below specifications.
Different mechanisms for Course and Fine Correction with only one type of feed-
back. (To have optimum chip area)
Design should have minimum variation across PVT variations (less sensitive to
PVT variations)
23
Chapter 2. Literature Review
Architecture Parameters
- Duty Cycle Fine & Conclusion
Adjuster Coarse
Mechanism Correc-
tion
Basic Digi- Controlled De- No DCC mainly delays rising/falling edge
tal [1] lay line of input clock, thus Delay line is prefer-
able for Digital DCC
Advance Digi- Controlled De- Yes Important concept of Separate Coarse
tal [2] lay line and Fine delay line is used. This can be
also applicable to other DCA architec-
ture
Basic Ana- Pulse Stretch- No Feedback mechanism is analog in na-
log [3] ing/Shrinking ture thus highly sensitive to supply
Delay Line voltage noise when designed very high
precision. Correction accuracy is less
Advance Ana- Duty Ampli- No Extra CML to CMOS buffer required,
log [4] fier To support high Frequency device
width needed to keep high to increase
bandwidth of the amplifier. Analog
feedback is highly sensitive to sup-
ply noise, No support for power down
mode.
Basic Mixed Controlled De- No No mechanism for coarse and fine tun-
Mode [5] lay Line ing. Correction accuracy is less. More
suitable for application with less accu-
racy and correction time
Advanced Duty Ampli- Yes Two different feedback mechanism are
Mixed fier used to add fine and coarse correction
Mode [6] which increases chip area, no support
for power down mode. To support
high frequency, bandwidth of amplifier
should be high which increases chip
area
24
Chapter 3
Proposed Architecture
As presented in sections 2.4 and 2.5, highest accuracy obtain by existing DCC architec-
tures is 1ps that is also without considering PVT variations. But as per specifications
required accuracy is 0.192 ps across 9 PVT variations. To meet this accuracy, DCA
mechanism should have very tight control and feedback path must able to sense duty
cycle error up to 500.1%. In chapter 2, 6 different DCC architectures has been dis-
cussed with their advantages and limitations. From literature review it is clear that
Mixed mode architectures include advantages of both, Analog and Digital DCC which
enables to have high accuracy with less hardware, better control on feedback etc. Thus
the proposed architecture is also a Mixed mode DCC where Correction mechanism is
analog with digital feedback as shown in Fig.3.1. The proposed architecture includes
some important observations from existing architectures addition to that design level
modifications to meet specifications. The architecture has following properties.
25
Chapter 3. Proposed Architecture
26
3.1. Duty Cycle Adjuster
Design Procedure
In this section design steps of DCA is described. Also dependency of correction range
and accuracy depends on circuit elements. In Fig.3.3 transistors M1 and M2 are act
as switches and other will act as current sources. Though this current is not constant
throughout charging and discharging of load capacitor. But (W/L) of these transistors
can be approximated by calculating average current, then after more accurate value
can be derived from simulations. Here the first thing to be decided is the correction
step size which means how much correction is required per code change, from the
specification of step size and correction range, number of MOSFETs required to use as
current source can be determined. While designing for 9 PVT variations, design should
meet specifications for worst case corner. For same (W/L) ratio rise and fall slopes
and thus delay will be the highest for slow corner and the lowest for fast corner. Thus
27
Chapter 3. Proposed Architecture
for the specification of step size, slow corner is the worst case and for the correction
range, fast corner is the worst case. In the present design 6 pMOS (nMOS) are used
thus total 26 = 64 steps to cover the correction range of 505%. Further design steps
are mentioned below.
2. Calculation for sizing of M3(M4) can be done by knowing the fact that when
control code is maximum (111111b) only M3 is on. Thus value of low to high
delay Tplh for M3 = 192.35
100
= 9.615ps. This value should be placed in (3.5) to
find equivalent value of average current. For a given size,voltage and temperature,
fast corner has minimum value of slope and thus delay. So this calculation should
be done for fast corner MOSFET. For M4, same procedure is required except one
needed to consider high to low delay of 3.4.
3. To calculate (W/L) of M16 to M11 (M10 to M5) first calculate for M16 (M10),
for which procedure is mentioned below. In DCA, Duty cycle is adjusted by
28
3.2. Single to Differential
changing High to low delay (Tphl ) and low to high delay (Tplh ). Original clock
signal is with 20ps rise and fall slope. Tphl and Tplh can be change according to
(3.4) and (3.5). Here Tr and Tf are accordingly rise and fall time of input clock
signal. q
2
Tphl = Tphl (stepi nput) + (Tr /2) (3.4)
q
2
Tplh = Tplh (stepi nput) + (Tf /2) (3.5)
2 Cload
Tphl (stepi nput) = (3.6)
[I(vout = vcc) + I(vout = 50%vcc)]
2 Cload
Tplh (stepi nput) = (3.7)
[I(vout = 0) + I(vout = 50%vcc)]
For coarse correction circuit total 26 = 64 correction steps are there to cover
505%. Size of M1 and M2 are large enough compare to M11 to M16 (M5
to M10) thus when both are on, approximate equivalent (W/L) will be equal to
(W/L) of M11 to M16 (M5 to M10). Now to calculate (W/L) for M16, it is clear
192.3ps 5%
that it should provide Tphl = = 0.3ps. For this value of Tphl
32
find average current from (3.4) and (3.5). Find (W/L) from the average value of
current.
4. To set (W/L) of M1 and M2, One thing should be consider that it must be capable
enough to handle, current coming from M11 to M16 (M5 to M10). Thus it can
be set as equivalent (W/L) of M11 to M16 (M5 to M10).
29
Chapter 3. Proposed Architecture
30
3.3. Low pass filter
Shaping output signal of DCA block to use as clock at Half rate clock data recov-
ery/sampling circuit.
To meet these specifications, circuit diagram is shown in Fig.3.7. Here back to back
inverters are used as output stage to get very sharp edges. Here positive feedback in
back to back inverter causes fast charging and discharging thus output signal slope will
be sharp. Basic function of this block can understood by the waveform in Fig.3.8.
Accuracy
Response time
31
Chapter 3. Proposed Architecture
Ideal value of RC is , but it also take time to response change in the duty cycle.
Thus it is always a trade-off between these two parameters. Cutoff frequency of a first
order RC filter is given by 3.8.
1
fc = (3.8)
2RC
Here operating frequency is very high and correction frequency is 12.5 Mhz thus RC
value is kept large enough to get accuracy of 500.1%.
3.4 Comparator
Design of comparator plays important role in achieving high accuracy. Comparator is
very important circuit used in many applications like ADC. Below are the specifications,
32
3.4. Comparator
Power consumption
Chip area
Output of comparator is given to digital feedback system which is a clock driven circuit,
thus dynamic comparator is more suitable for this application which also works on clock
to save static power consumption. Dynamic comparator based on switched capacitor are
very simple in design and consume less chip area. As the major duty cycle correction
done by the DCC circuit is performed during initialization of the hardware which takes
time in microseconds. Thus For whole coarse correction available time is approximately
5s. Thus operation speed of comparator can be several MHz. But input offset should
less enough to sense difference of vcc(mV)1%.
33
Chapter 3. Proposed Architecture
34
3.6. System Level Working
tor output is logic 1 then, INB signal of FSM set to logic 1, which results in increment
by 1 in coarse correction counter.
During fine correction if comparator output is logic 0 then, INT signal of FSM
set to logic 1, which results in increment by 1 in fine correction counter. If comparator
output is logic 1 then, DET signal of FSM set to logic 1, which results in decrement
by 1 in coarse correction counter.
Binary to Thermal block converts fine correction codes from binary to thermal code.
Thus one extra block is required to generate thermal codes. Fine correction gives high
accuracy and control on correction. To control then thermal codes are required. But
to generate thermal codes one additional block is required in DFM. Thus in the main
design only fine correction block is controlled by thermal code whereas binary codes
are used for fine correction.
35
Chapter 3. Proposed Architecture
cation block. Thus this correction take place during startup of the system. After this
correction, architecture tries to correct duty cycle as best as it can, during this procedure
BFM uses thermal code to control duty cycle adjustment, change in the duty cycle per
iteration is also very small compare to part 1. During this correction the output clock is
feeding to the main application block, this adjustment helps to correct duty cycle caused
by any temperature changes.
Fig.3.14 shows working of architecture when used to correct duty cycle against process
and manufacturing variations. Initially all codes are set at the middle values of its
range. One counter is used in the DFM which helps in deciding the end of correction
procedure of part 1. When clock signal arrives at the input of DCC in the first iteration it
get adjusted based on the middle codes, this adjusted clock signal passes through single
to differential block which generates two clock one is the buffered version of the clock
at its input and other is inverted version of it. It also helps in shaping slope at the output
side. These clock signals are given to differential low-pass filter (LPF), made up of
series combination of resister and capacitor. This differential LPF gives average values
of the input clock signals which will be fed to comparator. If input clocks duty cycle
is less than 50% then comparators output will be logic zero else one which goes as an
input to DFM. Working of DFM is described in Fig.3.14. When duty cycle reaches less
than 1 step resolution, comparators output start toggling in 0101 pattern. When such
behavior repeats successively 4 times then coarse correction stops.
36
3.6. System Level Working
37
Chapter 3. Proposed Architecture
38
Chapter 4
Results and Summery
In this chapter simulation results are discussed to support performance of the proposed
architecture. Simulation results of individual design block is discussed first then close
loop simulation results are mentioned.
In this section, graphs and waveforms are presented of pre and post-layout simulations.
Fig.4.1 shows pre-layout result summery across 9 PVT corners. Objective of the circuit
is to meet 505% correction range across PVT variations. From Fig.4.1 it is clear
that in pre-layout simulations circuit is meeting specifications and worst case corner is
FF FFF PR,0C,1.05V. Fig.4.2 shows result summery for post-layout simulation.
Fig.4.3 shows duty cycle change when minimum control code (0) is used. It is clear
that at this code all the pMOS are on and all nMOS are off (except nenable nMOS) thus
rising edge have very less slope and falling edge having large slope value which results
in duty cycle increment. Thus if in close loop when duty cycle of the signal is needed
to be increase then control code should be decrease towards minimum code.
Fig.4.4 shows duty cycle change at mid code. From this it can be observed that when
code is set to mid code the duty cycle is near about 50%. From design it is expected that
for this code if the input duty cycle is 50% then out duty cycle should be about 501%.
In Fig.4.4 on time period is 95.47 ps which means duty cycle is 49.64% , which is in
the allowable range.
39
Chapter 4. Results and Summery
Fig.4.5 shows duty cycle change when maximum code (63) is used. It is clear that at
this code all the nMOS are on and all pMOS are off (except penable pMOS) thus rising
edge is having large slope value and falling edge is with very sharp slope. Which results
in decrement of duty cycle. Thus if in close loop when duty of the signal is needed to
be decreased then control code should be increased towards maximum value.
40
4.1. Duty Cycle Adjuster
Figure 4.3: Binary Weighted Coarse Correction min code Post-layout waveforms
Figure 4.4: Binary Weighted Coarse Correction mid code Post-layout waveforms
Figure 4.5: Binary Weighted Coarse Correction max code Post-layout waveforms
41
Chapter 4. Results and Summery
Figure 4.6: Fine Correction Across PVT pre-layout simulation result summery
Single to differential block is used to serve two purposes. First is to generate differ-
ential clock from single ended clock and second is to shape the output clock signal.
Fig.4.11 and Fig.4.12 present pre-layout and post-layout simulation results summery
respectively. Here it can be observed that slope of the output clock signal is very sharp
and almost equal for both the clocks. Fig.4.13 shows waveforms of post-layout simula-
tion across PVT. Fig.4.14 and Fig4.15 shows measurement using waveform for pre and
post-layout simulation.
42
4.2. Single to Differential
Figure 4.7: Fine Correction Across PVT post-layout simulation result summery
43
Chapter 4. Results and Summery
44
4.2. Single to Differential
45
Chapter 4. Results and Summery
46
4.3. Low Pass Filter
Figure 4.17: Low pass filter simulation result after correction steps
47
Chapter 4. Results and Summery
shows simulation results of offset measurement across PVT. Maximum offset is less
than 1 mV, which is the required resolution to achieve 0.1% accuracy.
In the inverter comparator design, inverter acts as linear amplifier during evaluation
thus it should have enough DC gain to produce rail to rail output. Fig.4.21 shows
waveform of inverter gain across PVT and Fig.4.22 shows result summery in graph.
48
4.4. Dynamic Comparator
49
Chapter 4. Results and Summery
Digital feedback mechanism(DFM) is used to control DCA circuit based on the out-
put of comparator. Working of DFM is presented in flowcharts of section 3.6. While
designing DFM for coarse correction, one important thing should be kept in mind that
coarse correction stage have a fine correction stage directly connected at the load end.
These both the stages generates inverted results of its input signal with change in the
duty cycle. Thus the signal going at the input of coarse correction will pass through
fine correction block before single to differential block. This fact should taken in to
consideration while designing DFM.
Thus for coarse correction block if duty cycle of input signal is needed to increase
then correction code also needed to increase and vice-versa. But for fine correction it is
exactly opposite from the coarse correction that is to increase duty cycle of input signal
correction code should be decreased and vice versa.
Fig.4.23 shows coarse code initialization and change with respect to comparator
output. Fig.4.24 shows coarse correction code change when comparator output in in-
verted from Fig.4.23. From these figures it can be observed that coarse code initialize as
mid code (32) and it changes till 0101 or 1010 pattern take place at comparator output.
50
4.5. Digital Feedback Mechanism
51
Chapter 4. Results and Summery
Fig.4.27 shows change in the internal design variables with respect to comparator
output.
52
4.6. Close loop Simulation Results
This section presents close loop simulation results of DCC circuit. Working and simu-
lation results of individual blocks has discussed in chapter 3 and 4. In this section graph
and waveforms of the simulation results are presented to support design of DCC circuit.
Fig.4.28 shows input signal with 55% duty cycle. Fig.4.29 shows how coarse correction
take place in close loop. Initially comparator output is zero only because it is reset at
the beginning. As comparator output is one, to decrease duty cycle, coarse correction
codes are also decreasing as discussed in section 4.4. After Detecting 1010 pattern at the
comparators output coarse correction codes are frozen. Fig.4.30 and Fig.4.31 shows
the duty cycle of the output signal after coarse correction for pre-layout and post-layout
respectively.
53
Chapter 4. Results and Summery
54
4.6. Close loop Simulation Results
Further correction is done by fine correction circuit. Fig.4.32 shows how fine cor-
rection has taken place in close loop simulation. As during coarse correction only duty
cycle of the clock signal has reached near about 50%, fine correction code has also
started toggling very early. In the case when result of coarse correction is not near to
50% then it takes few more steps before toggling around a specific code. In real situa-
tions, duty cycle degradation also happens due to ambient temperature variation at that
time this fine correction correct the degradation and gives clock signal with allowable
degradation. Fig.4.33 and Fig.??hows output signal duty cycle after fine correction.
55
Chapter 4. Results and Summery
Finally Fig.4.35 and 4.36 shows corrected Duty cycle across PVT variations for
pre-layout and post-layout respectively.
Figure 4.35: Corrected Duty Cycle After Pre-layout Close loop Correction Across
PVT
56
4.7. Performance Comparison
Figure 4.36: Corrected Duty Cycle After Post-layout Close loop Correction Across
PVT
57
Chapter 4. Results and Summery
Parameters Architecture
- Basic Recent Basic Recent Basic Recent Proposed
Digi- Digi- Ana- Ana- Mixed Mixed Architec-
tal [1] tal [2] log [3] log [4] Mode Mode ture
[5] [6]
CMOS 180nm, 90nm, 350 nm 55 nm, 130 nm, 180nm, 28nm, 1V
Technol- 1.8V 1V 1.2V 1.2 V 1.8V
ogy
Correction20%- 20%- 30%- 20%- 40% - 15% - 45% -
Range 80% 80% 70% 80% 60% 85% 55%
Duty Cy- 500.25% 1.4 @ 501% 500.1% 501% 500.86% 500.1%
cle Error 450 @ 2 Ghz @ 5.2
MHz, GHz
1.9 @ 1
GHz
Operating 0.8-1.7 450 Mhz 3 Mhz 1 - 5 Ghz 312.5 0.5 - 2 5.2 GHz
Fre- Ghz - 1GHz - 660 Mhz - GHz
quency MHz 1Ghz
Power 3.2 mW 1.7 mW 1.1 mW 3.6 mW 3.2 mW 3.8 mW 4.474mW
Con- @ 450 @ 3 Ghz @ 1GHz @ 1 Ghz @ 5.2
sumption Mhz, GHz
3.45
mW @ 1
GHz
Power 1.88 3.45 1.67 1.2 3.2 3.8 8.60
Delay 1012 1012 1012 1012 12
10 Ws 12
10 Ws 13
10 Ws
Product Ws Ws Ws Ws
Chip 0.075 0.0049 0.06 0.00174 0.048 0.075 0.000579
2
Area mm mm2 mm2 mm2 mm2 mm2 mm2
58
Conclusion and Future Scope
From the results presented in chapter 4 it can be concluded that the proposed design is
meeting the design specifications very closely in post-layout simulations and exactly in
pre-layout simulations for 9 PVT variations. The purpose of the proposed architecture
is to design DCC circuit with high precision and correction time is high (5s).Different
circuits used for fine and coarse correction in DCA results in high accuracy with opti-
mum chip area. From performance comparison table it can be concluded that proposed
architecture gives highest correction accuracy with least power delay product and chip
area.
In future while designing DCC circuit for higher than 5.2 GHz frequency, it is really
critical to attend such high accuracy across PVT variations. Thus some new mechanism
should be designed which does not depended on clock frequency. The feedback mech-
anism used here is quite suitable for high accuracy provided correction time is in s.
In this design fine correction code always toggles even there is no change in the duty
cycle, this introduces jitter in the output clock signal, thus DFM should be modified in
such a way that fine correction code should vary only when there is more than on step
size degradation in the clock signal.
59
References
[1] Y. Jang, S. Bae, and H. Park, Cmos digital duty cycle correction circuit for multi-
phase clock, Electronics Letters, vol. 39, no. 19, pp. 13831384, Sept 2003.
[2] C.-C. Chung and C.-J. Li, A low-power delay-recycled all-digital duty-cycle cor-
rector with unbalanced process variations tolerance, in VLSI Design, Automation,
and Test (VLSI-DAT), 2013 International Symposium on, April 2013, pp. 14.
[3] P. Chen, S.-W. Chen, and J.-S. Lai, A low power wide range duty cycle corrector
based on pulse shrinking/stretching mechanism, in Solid-State Circuits Confer-
ence, 2007. ASSCC 07. IEEE Asian, Nov 2007, pp. 460463.
[4] Y. Qiu, Y. Zeng, and F. Zhang, 1-5 ghz duty-cycle corrector circuit with wide
correction range and high precision, Electronics Letters, vol. 50, no. 11, pp. 792
794, May 2014.
[5] Y.-J. Min, C. hui Jeong, K.-Y. Kim, W. H. Choi, J.-P. Son, C. Kim, and S. won
Kim, A 0.31 - 1 ghz fast-corrected duty-cycle corrector with successive approx-
imation register for ddr dram applications, Very Large Scale Integration (VLSI)
Systems, IEEE Transactions on, vol. 20, no. 8, pp. 15241528, Aug 2012.
[6] S. Han and J. Kim, A 0.5 2.0 ghz dual-loop sar-controlled duty-cycle corrector
using a mixed search algorithm, vol. 13, no. 2, pp. 152156.
[7] R. Swathi and M. Srinivas, All digital duty cycle correction circuit in 90nm based
on mutex, in VLSI, 2009. ISVLSI 09. IEEE Computer Society Annual Symposium
on, May 2009, pp. 258262.
[8] J. Ha, J. Lim, Y. Kim, W. Jung, and J. Wee, Unified all-digital duty-cycle and
phase correction circuit for qdr i/o interface, Electronics Letters, vol. 44, no. 22,
pp. 13001301, October 2008.
[9] L. Raghavan and T. Wu, Architectural comparison of analog and digital duty
cycle corrector for high speed i/o link, in VLSI Design, 2010. VLSID 10. 23rd
International Conference on, Jan 2010, pp. 270275.
[10] T.-H. Lin, C.-C. Chi, W.-H. Chiu, and Y.-H. Huang, A synchronous 50% duty-
cycle clock generator in 0.35m cmos, Very Large Scale Integration (VLSI) Sys-
tems, IEEE Transactions on, vol. 19, no. 4, pp. 585591, April 2011.
[11] C.-C. Chung, D. Sheng, and S.-E. Shen, High-resolution all-digital duty-cycle
corrector in 65-nm cmos technology, Very Large Scale Integration (VLSI) Sys-
tems, IEEE Transactions on, vol. 22, no. 5, pp. 10961105, 2014.
61
References
[13] F. Lin, All digital duty-cycle correction circuit design and its applications in high-
performance dram, in Microelectronics and Electron Devices (WMED), 2011
IEEE Workshop on, April 2011, pp. 14.
[14] R. J. Baker, CMOS Circuit Design, Layout, and Simulation, 3rd ed. Wiley-IEEE
Press, 2010.
[15] S. Patil and S. Rudraswamy, Duty cycle correction using negative feedback loop,
in Mixed Design of Integrated Circuits Systems, 2009. MIXDES 09. MIXDES-
16th International Conference, June 2009, pp. 424426.
[16] S.-K. Kao and S.-I. Liu, All-digital fast-locked synchronous duty-cycle correc-
tor, Circuits and Systems II: Express Briefs, IEEE Transactions on, vol. 53,
no. 12, pp. 13631367, Dec 2006.
[18] K.-H. Cheng, C.-W. Su, and K.-F. Chang, A high linearity, fast-locking
pulsewidth control loop with digitally programmable duty cycle correction for
wide range operation, Solid-State Circuits, IEEE Journal of, vol. 43, no. 2, pp.
399413, 2008.
[19] S.-K. Kao and S.-l. Liu, A wide-range all-digital duty cycle corrector with a pe-
riod monitor, in Electron Devices and Solid-State Circuits, 2007. EDSSC 2007.
IEEE Conference on. IEEE, 2007, pp. 349352.
[20] C. Yoo, C. Jeong, and J. Kih, Open-loop full-digital duty cycle correction circuit,
Electronics letters, vol. 41, no. 11, pp. 635636, 2005.
62