Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Brain-Machine
Interface
Circuits and Systems
Brain-Machine Interface
Amir Zjajo
Brain-Machine Interface
Circuits and Systems
13
Amir Zjajo
Delft University of Technology
Delft
The Netherlands
The author acknowledges the contributions of Dr. Rene van Leuken of Delft
University of Technology, and Dr. Carlo Galuzzi of Maastricht University.
vii
Contents
1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 BrainMachine Interface: Circuits and Systems . . . . . . . . . . . . . . . . 2
1.2 Remarks on Current Design Practice. . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Organization of the Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Neural Signal Conditioning Circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Power-Efficient Neural Signal Conditioning Circuit. . . . . . . . . . . . . 18
2.3 Operational Amplifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 Neural Signal Quantization Circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Low-Power A/D Converter Architectures . . . . . . . . . . . . . . . . . . . . . 34
3.3 A/D Converter Building Blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.1 Sample and Hold Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.2 Bootstrap Switch Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.3 Operational Amplifier Circuit. . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.4 Latched Comparator Circuit. . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Voltage-Domain SAR A/D Conversion. . . . . . . . . . . . . . . . . . . . . . . 52
3.5 Current-Domain SAR A/D Conversion. . . . . . . . . . . . . . . . . . . . . . . 58
3.6 Time-Domain Two-Step A/D Conversion . . . . . . . . . . . . . . . . . . . . . 60
3.7 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
ix
x Contents
xi
Abbreviations
xiii
xiv Abbreviations
xvii
xviii Symbols
eq Quantization error
e2 Noise power
E{.} Expected value
Econv Energy per conversion step
fclk Clock frequency
fin Input frequency
fp,n(di) Eigenfunctions of the covariance matrix
fS Sampling frequency
fsig Signal frequency
fspur Frequency of spurious tone
fT Transit frequency
f(x,t) Vector of noise intensities
FQ Function of the deterministic initial solution
g Conductance
gm Transconductance
Gi Interstage gain
Gm Transconductance
h Numerical integration stepsize, surface heat transfer coefficient
i Index, circuit node, transistor on the die
imax Number of iteration steps
I Current
Iamp Total amplifier current consumption
Idiff Difussion current
ID Drain current
IDD Power supply current
Iref Reference current
j Index, circuit branch
J0 Jacobian of the initial data z0 evaluated at pi
k Boltzmanns coefficient, error correction coefficient, index
K Amplifier current gain, gain error correction coefficient
K(t) Variancecovariance matrix of (t)
L Channel length
Li Low-rank Cholesky factors
L(|TX) Log-likelihood of parameter with respect to input set TX
m Index
M Number of terms, number of channels in BMI
n Index, number of circuit nodes, number of bits
N Number of bits
Naperture Aperture jitter limited resolution
P Power
p Process parameter
p(di,) Stochastic process corresponding to process parameter p
pX|(x|) Gaussian mixture model
p* Process parameter deviations from their corresponding nominal values
Symbols xix
Best way to predict the future is to invent it. Medicine in the twentieth century
relied primarily on pharmaceuticals that could chemically alter the action of neu-
rons or other cells in the body, but twenty-first century health care may be defined
more by electroceuticals: novel treatments that will use pulses of electricity to
regulate the activity of neurons, or devices that interface directly with our nerves.
Systems such as brainmachine interface (BMI) detect the voltage changes in
the brain that occur when neurons fire to trigger a thought or an action, and they
translate those signal into digital information that is conveyed to the machine, e.g.,
prosthetic limb, speech prosthesis, a wheelchair.
Recently, many promising technological advances are about to change our con-
cept about healthcare, as well as the provision of medical cares. For example, the
telemedicine, e-hospital, and ubiquitous healthcare are enabled by emerging wire-
less broadband communication technology. While initially becoming main-stream
for portable devices such as notebook computers and smart phones, wireless
communication (e.g., wireless sensor network, body sensor network) is evolving
toward wearable and/or implantable solutions. The combination of two technol-
ogies, ultra-low power sensor technology and ultra-low power wireless commu-
nication technology, enables long-term continuous monitoring and feedback to
medical professionals wherever needed.
Neural prosthesis systems enable the interaction with neural cells either by
recording, to facilitate early diagnosis and predict intended behavior before under-
taking any preventive or corrective actions, or by stimulation, to prevent the onset
of detrimental neural activity. Monitoring the activity of a large population of neu-
rons in neurobiological tissue with high-density microelectrode arrays in multi-
channel implantable BMI is a prerequisite for understanding the cortical structures
and can lead to a better conception of stark brain disorders, such as Alzheimers
and Parkinsons diseases, epilepsy and autism [1], or to reestablish sensory (e.g.,
hearing and vision) or motor (e.g., movement and speech) functions [2].
Metal-wire and micro-machined silicon neural probes, such as the Michigan
probe [3] or the Utah array [4], have aided the development of highly integrated
multichannel recording devices with large channel counts, enabling study of brain
activity and the complex processing performed by neural systems in vivo [57].
Several studies have demonstrated that the understanding of certain brain functions
can only be achieved by monitoring the electrical activity of large numbers of indi-
vidual neurons in multiple brain areas at the same time [8]. Consequently, real-time
acquisition from many parallel readout channels is thus needed both for the suc-
cessful implementation of neural prosthetic devices as well as for a better under-
standing of fundamental neural circuits and connectivity patterns in the brain [9].
One of the main goals of the current neural probe technologies [1021] is to
minimize the size of the implants while including as many recording sites as pos-
sible, with high spatial resolution. This enables the fabrication of devices that
match the feature size and density of neural circuits [22], and facilitates the spike
1.1 BrainMachine Interface: Circuits and Systems 3
classification process [23, 24]. Because electrical recording from single neurons is
invasive, monitoring large numbers of neurons using large implanted devices inev-
itably increases the tissue damage; thus, there exists a trade-off between the probe
size and the number of recording sites. Although existing neural probes can record
from many neurons, the limitations in the interconnect technology constrains the
number of recording sites that can be routed out of the probe [8].
The study of highly localized neural activity requires, besides implantable
microelectrodes, electronic circuitry for accurately amplifying and conditioning
the signals detected at the recording sites. While neural probes have become more
compact and denser in order to monitor large populations of neurons, the inter-
facing electronic circuits have also become smaller and more capable of handling
large amounts of parallel recording channels. Some of the challenges in the design
of analog front-end circuits for neural recording are associated with the nature of
the neural signals. These signals have amplitudes in the order of few V to several
mV and frequency spans from dc to a few kHz. Local field potentials (LFPs), rep-
resenting averaged activity from small sets of neurons surrounding the recording
sites, can be found in the low-frequency range (~1300Hz). On the other hand,
action potentials (APs) or spikes, representing single-cell activity, are located in
the higher frequency range (~30010kHz). Recording both LFPs and APs using
implanted electrodes yields the most informative signals for studying neuronal
communication and computation. Thus, according to the nature of a specific sig-
nal, the recording circuits have to be designed with sufficiently low input-referred
noise [i.e., to achieve a high signal-to-noise ratio (SNR)] and sufficient gain and
dynamic range.
The raw data rates that are generated by simultaneous monitoring of hun-
dreds and even thousands of neurons are large [25]. When sampled at 32kS/s
with 10-bit precision, 100 electrodes would generate raw data rate of 32Mbs1.
Communicating such volumes of neuronal data over battery-powered wireless
links, while maintaining reasonable battery life, is hardly possible with common
methods of low-power wireless communications. Evidently, some form of data
reduction or lossy data compression to reduce the raw waveform data capacity,
e.g., wavelet transform [26], must be applied. Alternatively, only significant fea-
tures of the neuronal signal could be extracted and the transmitted data could be
limited to those features only [8], which may lead to an order of magnitude reduc-
tion in the required data rate [27]. Additionally, if the neuronal spikes are sorted
on the chip [28], and mere notifications of spike events are transmitted to the host,
another order of magnitude reduction can be achieved. Adapting power-efficient
spike sorting algorithms for utilization in very-large-scale integration (VLSI) can
yet lead to significant power savings, with only a limited accuracy loss [29, 30].
The block diagram of M-channel neural recording system is illustrated in
Fig. 1.1. With an increase in the range of applications and their functionalities,
neuroprosthetic devices are evolving to a closed-loop control system [31] com-
posed of a front-end neural recording interface and a backend neural signal pro-
cessing, containing features such as local field potential measurement circuits
[32] or spike detection circuits [33]. To evade the risk of infection, these systems
4 1Introduction
DSP
M LNA M M M n n K K
recording low noise band-pass filter programmable gain SAR A/D digital signal processing D/A converter reconstructionfilter stimulator
electrode amplifier amplifier converter system electrode
are implanted under the skin, while the recorded neural signals and the power
required for the implant operation is transmitted wirelessly. If a battery is used
with an energy capacity of 625mAh at 1.5V, a CMOS IC with 100mW power
consumption can only last for nine and a half hours. Most of implantable biomedi-
cal devices in contrast should last more than 10years and this limits the average
system power consumption (when using the same battery) to 10W. Proximity
between electrodes and circuitry and the increasing density in multichannel elec-
trode arrays are creating significant design challenges in respect to circuit minia-
turization and power dissipation reduction of the recording system.
Power density is limited to 0.8mW/mm2 [34] to prevent possible heat damage
to the tissue surrounding the device (and subsequently, limited power consumption
prolong the batterys longevity and evade recurrent battery replacements surger-
ies). Furthermore, the space to host the system is restricted to ensure minimal tis-
sue damage and tissue displacement during implantation.
The signal quality in neural interface front-end, beside the specifics of the elec-
trode material and the electrode/tissue interface, is limited by the nature of the
bio-potential signal and its biological background noise, dictating system resource
constraints, such as power, area, and bandwidth. The BMI architecture includes,
additionally, a micro-stimulation module to apply stimulation signals to the brain
neural tissues. Currently, multi-electrode arrays contain 10100s electrodes and
are projected to double every seven years [35]. When a neuron fires an action
potential, the cell membrane becomes depolarized by the opening of voltage-con-
trolled neuron channels, which leads to a flow of current both inside and outside
the neuron. Since extracellular media is resistive [36], the extracellular potential is
approximately proportional to the current across the neuron membrane [37]. The
membrane roughly behaves like an RC circuit and most current flows through the
membrane capacitance [38].
The neural data acquired by the recording electrodes is conditioned using
analog circuits. The electrode is characterized by its charge density and imped-
ance characteristics (e.g., a 36m diameter probe (1000m2) may have a capaci-
tance of 200pF, equivalent to 80k impedance at 10kHz), which determines the
amount of noise added to the signal (e.g., 7Vrms for a 10kHz recording band-
width). As a result of the small amplitude of neural signals (typically ranging from
10 to 500V and containing data up to~10kHz), and the high impedance of
the electrode tissue interface, low-noise amplification (LNA), band-pass filtering,
1.1 BrainMachine Interface: Circuits and Systems 5
(a) (b)
Correction Approach A/D
Block
System Level
Correction
Error
D/A Estimation
(c)
Block Level Block Level Block Level
Correction Correction Correction A/D Error
Block Correction
Fig. 1.2a Correction approach for mixed-signal and analog circuits, b mixed-signal solution
(digital error estimation, analog error correction), c alternative mixed-signal scheme (error esti-
mation and correction are done digitally)
demand for reduced circuit offset. Initial work on digital signal-correction process-
ing started in the early nineties, and focused on offset attenuation or dispersion.
The next priority became area scaling for analog functions, to keep up with the
pace at which digital cost-per-function was reducing [42]. Lately, the main focus
is on correcting analog device characteristics, which became impaired as a result
of aggressive feature size reduction and area scaling. However, efficient digital sig-
nal-correction processing of analog circuits is only possible if their analog behav-
ior is sufficiently well characterized. As a consequence, an appropriate model, as
well as its corresponding parameters, has to be identified. The model is based on a
priori knowledge about the system. The key parameters that influence the system
and their time behavior are typical examples. Nevertheless, in principle, the model
itself can be derived and modified adaptively, which is the central topic of adap-
tive control theory. The parameters of the model can be tuned during the fabrica-
tion of the chip or during its operation. Since fabrication-based correction methods
are limited, algorithms that adapt to a nonstationary environment during operation
have to be employed.
In this section, the most challenging design issues for analog circuits in deep sub-
micron technologies such as contrasting the degradation of analog performances
caused by requirement for biasing at lower operating voltages, obtaining high
dynamic rangewith low voltage supplies and ensuring good matchingfor low-off-
set are reviewed. Additionally, the subsequent remedies to improves the perfor-
mance of analog circuits and data converters by correcting or calibrating the static
and possibly the dynamic limitations through calibration techniques are briefly
discussed as well.
1.2 Remarks on Current Design Practice 7
(a)
200 1000
Line width
GBW [GHz]
150 100
Line Width [nm] GBW
100 10
Supply voltage
0 0.1
1998 2003 2008 2015
Year
(b) 12
10 CL=100 fF
8
GBW [GHz]
90 nm
6 CL=200 fF
4
CL=100 fF
0.25 m
2
CL=200 fF
0
0.25 0.75 1.25 1.75
IDS [A]
Fig. 1.3a Trend of analog features in CMOS technologies. b Gain-bandwidth product versus
drain current in two technological nodes
high dynamic range, with low supply voltages and low power dissipation in ultra-
deep submicron CMOS technology is a major challenge. The key limitation of
analog circuits is that they operate with electrical variables and not simply with
discrete numbers that, in circuit implementations, gives rise of a beneficial noise
margin. On the contrary, the accuracy of analog circuits fundamentally relies on
matchingbetween components, low noise, offset and low distortions.
With reduction of the supply voltage to ensure suitable overdrive voltage for
keeping transistors in saturation, even if the number of transistors stacked-up is
kept at the minimum, the swing of signals is low if high resolution is required.
Low voltage is also problematic for driving CMOS switches especially for the
ones connected to signal nodes as the on-resistance can become very high or at the
limit the switch does not close at all in some interval of the input amplitude.
In general, to achieve a high gain operation, high output impedance is neces-
sary, e.g., drain current should vary only slightly with the applied VDS. With the
transistor scaling, the drain assert its influence more strongly due to the growing
proximity of gate and drain connections and increase the sensitivity of the drain
current to the drain voltage. The rapid degradation of the output resistance at gate
lengths below 0.1m and the saturation of gm reduce the device intrinsic gain
gmro characteristics.
As transistor size is reduced, the fields in the channel increase and the dopant
impurity levels increase. Both changes reduce the carrier mobility, and hence the
transconductance gm. Typically, desired high transconductance value is obtained
at the cost of an increased bias current. However, for very short channel the car-
rier velocity quickly reaches the saturation limit at which the transconductance
also saturates becoming independent of gate length or bias gm = WeffCoxvsat/2.
As channel lengths are reduced without proportional reduction in drain voltage,
raising the electric field in the channel, the result is velocity saturation of the
carriers, limiting the current and the transconductance. A limited transconduct-
ance is problematic for analog design: for obtaining high gain it is necessary to
use wide transistors at the cost of an increased parasitic capacitances and, con-
sequently, limitations in bandwidth and slew rate. Even using longer lengths
obtaining gain with deep submicron technologies is not appropriate; it is typi-
cally necessary using cascode structures with stack of transistors or circuits with
positive feedback. As transistors dimension reduction continues, the intrinsic gain
keeps decreasing due to a lower output resistance as a result of drain-induced bar-
rier lowering and hot carrier impact ionization. To make devices smaller, junction
design has become more complex, leading to higher doping levels, shallower junc-
tions, halo doping, etc., all to decrease drain-induced barrier lowering. To keep
these complex junctions in place, the annealing steps formerly used to remove
damage and electrically active defects must be curtailed, increasing junction leak-
age. Heavier doping also is associated with thinner depletion layers and more
recombination centers that result in increased leakage current, even without lat-
tice damage. In addition, gate leakage currents in very thin-oxide devices will set
an upper bound on the attainable effective output resistance via circuit techniques
1.2 Remarks on Current Design Practice 9
(a) 500
Cgs
W
10
fT
1
0.1 0.2 0.3 0.4 0.5
L[m]
(b) 10k
b
c
1k 90 nm
a 0.13m
fC [Mz]
0.18 m
100 0.25 m
10
0.01 0.1 1 10
IDS [A]
Fig.1.4a Scaling of gate width and transistor capacitances. b Conversion frequency fc versus
drain current for four technological nodes
In the region of the current being less than this value (region a), the conversion
frequency increases with an increase of the sink current. Similarly, in the region
of the current being higher than this value (region c), the conversion frequency
decreases with an increase of the sink current. There are two reasons why this
characteristic is exhibited; in the low current region, the gm is proportional to the
sink current, and the parasitic capacitances are smaller than the signal capacitance.
At around the peak, at least one of the parasitic capacitances becomes equal to the
signal capacitance. In the region of the current being larger than that value, both
parasitic capacitances become larger than the signal capacitance and the conver-
sion frequency will decrease with an increase of the sink current.
The offset of any analog circuit and the static accuracy of data converters criti-
cally depend on the matchingbetween nominally identical devices. With transis-
tors becoming smaller, the number of atoms in the silicon that produce many of
1.2 Remarks on Current Design Practice 11
the transistors properties is becoming fewer, with the result that control of dopant
numbers and placement is more erratic. During chip manufacturing, random pro-
cess variations affect all transistor dimensions: length, width, junction depths,
oxide thickness, etc., and become a greater percentage of overall transistor size
as the transistor scales. The stochastic nature of physical and chemical fabrica-
tion steps causes a random error in electrical parameters that gives rise to a time
independent difference between equally designed elements. The error typically
decreases as the area of devices. Transistor matching properties are improved
with a thinner oxide [43]. Nevertheless, when the oxide thickness is reduced to
a few atomic layers, quantum effects will dominate and matching will degrade.
Since many circuit techniques exploit the equality of two components it is impor-
tant for a given process obtaining the best matching especially for critical devices.
Some of the rules that have to be followed to ensure good matching are: firstly,
devices to be matched should have the same structure and use the same materials,
secondly, the temperature of matched components should be the same, e.g., the
devices to be matched should be located on the same isotherm, which is obtained
by symmetrical placement with respect to the dissipative devices, thirdly, the dis-
tance between matched devices should be minimum for having the maximum spa-
tial correlation of fluctuating physical parameters, common-centroid geometries
should be used to cancel the gradient of parameters at the first order. Similarly, the
same orientation of devices on chip should be the same to eliminate dissymme-
tries due to unisotropic fabrication steps, or to the uniostropy of the silicon itself
and lastly, the surroundings in the layout, possibly improved by dummy structures
should be the same to avoid border mismatches.
The use of digital enhancing techniques in A/D converters (i.e., foreground,
background) reduces the need for expensive technologies with special fabrication
steps; a side advantage is that the cost of parts is reduced while maintaining good
yield, reliability and long-term stability. The foreground calibration interrupts the
normal operation of the converter for performing the trimming of elements or the
mismatch measurement by a dedicated calibration cycle normally performed at
power-on or during periods of inactivity of the circuit. Any miscalibration or sud-
den environmental changes such as power supply or temperature may make the
measured errors invalid. Therefore, for devices that operate for long periods it is
necessary to have periodic extra calibration cycles. The input switch restores the
data converter to normal operational after the mismatch measurement and every
conversion period the logic uses the output of the A/D converter to properly
address the memory that contains the correction quantity. In order to optimize the
memory size the stored data should be the minimum word-length, which depends
on technology accuracy and expected A/D linearity. The digital measure of errors,
that allows for calibration by digital signal processing, can be at the element, block
or entire converter level. The calibration parameters are stored in memories but, in
contrast with the trimming case, the content of the memories is frequently used, as
they are input of the digital processor.
12 1Introduction
Methods using background calibration work during the normal operation of the
converter by using extra circuitry that functions all the time synchronously with
the converter function. Often these circuits use hardware redundancy to perform
a background calibration on the fraction of the architecture that is not temporarily
used. However, since the use of redundant hardware is effective but costs silicon
area and power consumption, other methods aim at obtaining the functionality by
borrowing a small fraction of the sampled data circuit operation for performing the
self-calibration.
1.3Motivation
large-scale neural spike data classification can be obtained with a low power (less
than 41W, corresponding to a 15.5W/mm2 of power density), compact, and a
low resource usage structure (31k logic gates resulting in a 2.64mm2 area).
In Chap. 5, we develop a yield constrained sequential power-per-area (PPA)
minimization framework based on dual quadratic program that is applied to mul-
tivariable optimization in neural interface design under bounded process variation
influences. In the proposed algorithm, we create a sequence of minimizations of
the feasible PPA regions with iteratively generated low-dimensional subspaces,
while accounting for the impact of area scaling. With a two-step estimation flow,
the constrained multi-criteria optimization is converted into an optimization with
a single objective function, and repeated estimation of non-critical solutions are
evaded. Consequently, the yield constraint is only active as the optimization con-
cludes, eliminating the problem of overdesign in the worst-case approach. The PPA
assignment is interleaved, at any design point, with the configuration selection,
which optimally redistributes the overall index of circuit quality to minimize the
total PPA ratio. The proposed method can be used with any variability model and,
subsequently, any correlation model, and is not restricted by any particular perfor-
mance constraint. The experimental results, obtained on the multichannel neural
recording interface circuits implemented in 90nm CMOS technology, demonstrate
power savings of up to 26% and area of up to 22%, without yield penalty.
In Chap. 6 the main conclusions are summarized and recommendations for fur-
ther research are presented.
References
12. R.H. Olsson etal., Band-tunable and multiplexed integrated circuits for simultaneous record-
ing and stimulation with microelectrode arrays. IEEE Trans. Biomed. Eng. 52(7), 13031311
(2005)
13. T.J. Blanche, M.A. Spacek, J.F. Hetke, N.V. Swindale, Polytrodes: high-density silicon elec-
trode arrays for large-scale multiunit recording. J. Neurophysiol. 93(5), 29873000 (2005)
14. R.J. Vetter, etal., in Development of a Microscale Implantable Neural Interface (MINI)
Probe Systems. Proceedings of International Conference of Engineering in Medicine and
Biology Society, pp. 73417344, 2005
15. G.E. Perlin, K.D. Wise, An ultra compact integrated front end for wireless neural recording
microsystems. J. Microelectromech. Syst. 19(6), 14091421 (2010)
16. P. Ruther, etal., in Compact Wireless Neural Recording System for Small Animals using
Silicon-Based Probe Arrays. Proceedings of International Conference of Engineering in
Medicine and Biology Society, pp. 22842287, 2011
17. T. Torfs etal., Two-dimensional multi-channel neural probes with electronic depth control.
IEEE Trans. Biomed. Circ. Syst. 5(5), 403412 (2011)
18. U.G. Hofmann etal., A novel high channel-count system for acute multisite neuronal record-
ings. IEEE Trans. Biomed. Eng. 53(8), 16721677 (2006)
19. P. Norlin etal., A 32-site neural recording probe fabricated by DRIE of SOI substrates. J.
Microelectromech. Microeng. 12(4), 414 (2002)
20. J. Du etal., Multiplexed, high density electrophysiology with nanofabricated neural probes.
PLoS ONE 6(10), e26204 (2011)
21. K. Faligkas, L.B. Leene, T.G. Constandinou, in A Novel Neural Recording System Utilising
Continuous Time Energy Based Compression. Proceedings of International Symposium on
Circuits and Systems, pp. 30003003, 2015
22. J.T. Robinson, M. Jorgolli, H. Park, Nanowire electrodes for high-density stimulation and
measurement of neural circuits. Frontiers Neural Circ. 7(38), 2013
23. C.M. Gray, P.E. Maldonado, M. Wilson, B. McNaughton, Tetrodes markedly improve the
reliability and yield of multiple single-unit isolation from multi-unit recordings in cat striate
cortex. J. Neurosci. Methods 63(12), 4354 (1995)
24. K.D. Harris, D.A. Henze, J. Csicsvari, H. Hirase, G. Buzski, Accuracy of tetrode spike
separation as determined by simultaneous intracellular and extracellular measurements. J.
Neurophysiol. 84(1), 401414 (2000)
25. R.R. Harrison, in A Low-Power Integrated Circuit for Adaptive Detection of Action Potentials
in Noisy Signals. Proceedings of Annual International Conference of the IEEE Engineering
in Medicine and Biology Society, pp. 33253328, 2003
26. K. Oweiss, K. Thomson, D. Anderson, in A Systems Approach for Real-Time Data
Compression in Advanced Brain-Machine Interfaces. Proceedings of IEEE International
Conference on Neural Engineering, pp. 6265, 2005
27. Y. Perelman, R. Ginosar, Analog frontend for multichannel neuronal recording system with
spike and lfp separation. J. Neurosci. Methods 153, 2126 (2006)
28. Z.S. Zumsteg, etal., in Power Feasibility of Implantable Digital Spike-Sorting Circuits for
Neural Prosthetic Systems. Proceedings of Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, pp. 42374240, 2004
29. A. Zviagintsev, Y. Perelman, R. Ginosar, in Low Power Architectures for Spike Sorting.
Proceedings of IEEE International Conference on Neural Engineering, pp. 162165, 2005
30. A. Zviagintsev, Y. Perelman, R. Ginosar, in Low Power Spike Detection and Alignment
Algorithm. Proceedings of IEEE International Conference on Neural Engineering, pp. 317
320, 2005
31. B. Gosselin, Recent advances in neural recording microsystems. Sensors 11(5), 45724597
(2011)
32. R.R. Harrison, G. Santhanam, K.V. Shenoy, in Local Field Potential Measurement with Low-
power Analog Integrated Circuit. International Conference of IEEE Engineering in Medicine
and Biology Society, vol. 2, pp. 40674070, 2004
16 1Introduction
33. R.R. Harrison etal., A low-power integrated circuit for a wireless 100-electrode neural
recording system. IEEE J. Solid-State Circ. 42(1), 123133 (2007)
34. S. Kim, R. Normann, R. Harrison, F. Solzbacher, in Preliminary Study of the Thermal Impact
of a Microelectrode Array Implanted in the Brain. Proceedings of Annual International
Conference of the IEEE Engineering in Medicine and Biology Society, pp. 29862989, 2006
35. I.H. Stevenson, K.P. Kording, How advances in neural recording affect data analysis. Nat.
Neurosci. 14(2), 139142 (2011)
36. C.I. de Zeeuw etal., Spatiotemporal firing patterns in the cerebellum. Nat. Rev. Neurosci.
12(6), 327344 (2011)
37. F. Klbl, etal., in In Vivo Electrical Characterization of Deep Brain Electrode and Impact
on Bio-amplifier Design. IEEE Biomedical Circuits and Systems Conference, pp. 210213,
2010
38. A.C. West, J. Newman, Current distributions on recessed electrodes. J. Electrochem. Soc.
138(6), 16201625 (1991)
39. S.K. Arfin, Low power circuits and systems for wireless neural stimulation, PhD Thesis,
MIT, 2011)
40. K.H. Kim, S.J. Kim, A wavelet-based method for action potential detection from extracel-
lular neural signal recording with low signal-to-noise ratio. IEEE Trans. Biomed. Eng. 50,
9991011 (2003)
41. K. Okada, S. Kousai (ed.), Digitally-Assisted Analog and RF CMOS Circuit Design for
Software defined Radio (Springer Verlag GmbH, Berlin, 2011)
42. M. Verhelst, B. Murmann, Area scaling analysis of CMOS ADCs. IEEE Electron. Lett. 48(6),
314315 (2012)
43. M. Pelgrom, A. Duinmaijer, A. Welbers, Matching properties of MOS transistors. IEEE J.
Solid-State Circ. 24(5), 14331439 (1989)
Chapter 2
Neural Signal Conditioning Circuits
2.1Introduction
recording in vivo demands complying with severe safety requirements. For exam-
ple, the maximum temperature increase due to the operation of the cortical implant
in any surrounding brain tissue should be kept at less than 1C [1].
The limited total power budget imposes strict specifications on the circuit
design of the low-noise analog front-end and high-speed circuits in the wideband
wireless link, which transmits the recorded data to a base station located outside
the skull. The design constraints are more pronounced when the number of record-
ing sites increases to several hundred for typical multi-electrode arrays.
Front-end neural amplifiers are crucial building blocks in implantable cortical
microsystems. Low-power and low-noise operation, stable dc interface with the
sensors (microprobes), and small silicon area are the main design specifications of
these amplifiers. The power dissipation is dictated by the tolerable input-referred
thermal noise of the amplifier, where the trade-off is expressed in terms of noise
efficiency factor [2]. For an ideal thermal-noise-limited amplifier with a constant
bandwidth and supply voltage, the power of the amplifier scales as 1/v2n where vn
is the input-referred noise of the amplifier. This relationship shows the steep power
cost of achieving low-noise performance in an amplifier.
In this chapter, we introduce a novel, low-power neural recording interface
system with capacitive-feedback low noise amplifier and capacitive-attenuation
band-pass filter. The capacitive-feedback amplifier offers low-offset and low-
distortion solution with optimal power-noise trade-off. Similarly, the capacitive-
attenuation band-pass filter provides wide tuning range and low-power realization,
while allowing simple extension of the transconductors linear range, and conse-
quently, ensuring low harmonic distortion. The low noise amplifier and band-pass
filter circuit are realized in a 65 nm CMOS technology, and consume 1.15W
and 390nW, respectively. The fully differential low-noise amplifier achieves
40dB closed loop gain, and occupies an area of 0.04mm2. Input referred noise is
3.1Vrms over the operating bandwidth 0.120kHz. Distortion is below 2% total
harmonic distortion (THD) for typical extracellular neural signals (smaller than
10mV peak-to-peak). The capacitive-attenuation band-pass filter with first-order
slopes achieves 65dB dynamic range, 210mVrms at 2% THD and 140Vrms
total integrated output noise.
The chapter is organized as follows: Sect.2.2 focuses on the signal condition-
ing circuit details, while Sect.2.3 offers brief overview of the operational amplifier
circuit concepts. Experimental results obtained are presented in Sect.2.4. Finally,
Sect.2.5 provides a summary and the main conclusions.
The neural spikes, typically ranging from 10 to 500V and containing data up
to~20kHz, are amplified with low noise neural amplifier (LNA) illustrated in
Fig. 2.1, where Vref voltage designates the node connected to the reference elec-
trode. The amplifier A1 is designed based on an operational transconductance
2.2 Power-Efficient Neural Signal Conditioning Circuit 19
T1 T2
Cin Cf
Vin C
A1
Vref
Gm2 Vout
C/ (A+1) A2
Cin AC Gm1 C
Cf T3
T4 VSS VSS R1
AC R2
VSS VSS
VSS
Fig.2.1Schematic of the signal conditioning circuit including low noise amplifier, band-pass
filter and programmable gain amplifier
VDD
T15 T16
T13 T14
T18 T17
T9 T12
T10 T11
T5 T6 T7 T8
R1 R2 R3 R4
VSS
than the active loads the gm of the cascading transistors is maximized, boosting the
dc gain, while their saturation voltage is reduced, allowing for a larger saturation
voltage for the active loads, without exceeding the voltage headroom. The bias
current of the LNA can be varied to adapt its noise per unit bandwidth.
To keep the overall bandwidth constant when the bias current of the gain stage
is varied, a band-pass filter [8] (Fig.2.3) is added to the output of the LNA. High
gain provided by the LNA stage alleviates noise floor requirements of this band-
width-limiting stage. The total integrated output voltage noise of the filter depends
on the linear range of the transconductors Gm1 and Gm2 (Fig.2.4), the ratio of the
attenuator capacitances A and the unit capacitance C. The linear range of the Gm is
effectively improved by attenuating the input. In the high-pass stage, the signal is
attenuated by a factor of A+1 and the full capacitance of (A+1)C is then utilized
for filtering with Gm1. In the low-pass stage, a gain of A+1 is applied to signals
in the pass-band. A capacitance C/(A+1) is added in parallel with the attenuating
capacitances to increase the filtering capacitance.
2.3Operational Amplifiers
Operating on the edge of the performance envelope, op amps exhibit intense trade-
offs amongst the dynamic range, linearity, settling speed, stability, and power con-
sumption. As a result, accuracy and speed are often dictated by the performance of
these amplifiers.
Amplifiers with a single gain stage have high output impedance providing an
adequate dc gain, which can be further increased with gain boosting techniques.
Single-stage architecture offers large bandwidth and a good phase margin with
VDD
T9
Vpbias
T10
Vinp Vinn
T1 T3 T4 T2
T5 T6
T7 T8
Vout
T11 T12
VSS
VDD
T13 T14
Vinp Vinn
T1 T2 Vout
T3 T4
T7 T5 T6 T8
VSS
VDD
bias1
T7 T8
bias2
T5 T6
outn outp
bias3
T3 T4
inp inn
T1 T2
cmfb
T3
VSS
VDD
T7 bias2 bias2 T8
bias3 bias3
T5 T6
bias4
T11
T3 cmfb cmfb T4
VSS
the saturation voltage of a transistor. With this maximum possible output swing
the input common-mode range is zero. In practice, some input common-mode
range, which reduces the output swing, always has to be reserved so as to permit
inaccuracy and settling transients in the signal common-mode levels. The high-
speed capability of the amplifier is the result of the presence of only n-channel
transistors in the signal path and of relatively small capacitance at the source of
the cascode transistors. The gain-bandwidth product of the amplifier is given by
GBW=gm1/CL, where gm1 is the transconductance of transistors T1 and CL is the
load capacitance. Thus, the GBW is limited by the load capacitance.
Due to its the simple topology and dimensioning, the telescopic cascode ampli-
fier is preferred if its output swing is large enough for the specific application. The
output signal swing of this architecture has been widened by driving the transis-
tors T7T8 into the linear region [10]. In order to preserve the good common mode
rejection ratio and power supply rejection ratio properties of the topology, addi-
tional feedback circuits for compensation have been added to these variations. The
telescopic cascode amplifier has low current consumption, relatively high gain,
low noise and very fast operation. However, as it has five stacked transistors, the
topology is not suitable for low supply voltages.
The folded cascode amplifier topology [11] is shown in Fig.2.6. The swing of
this design is constrained by its cascoded output stage. It provides a larger output
swing and input common-mode range than the telescopic amplifier with the same
dc gain and without major loss of speed. The output swing is VDD4VDS,SAT and
is not linked to the input common-mode range, which is VDD VT 2VDS,SAT.
The second pole of this amplifier is located at gm7/Cpar, where gm7 is the transcon-
ductance of T7 and Cpar is the sum of the parasitic capacitances from transistors
T1, T7 and T9 at the source node of transistor T7. The frequency response of this
amplifier is deteriorated from that of the telescopic cascode amplifier because of
a smaller transconductance of the p-channel device and a larger parasitic capac-
itance. To assure symmetrical slewing, the output stage current is usually made
24 2 Neural Signal Conditioning Circuits
VDD
bias3 bias3
T8 T9
outn outp
inn inp
bias2 T2 T3 bias2
T10 T11
T16 T17
T12 T1 T13
K:1 bias1 1:K
KIB/2 IB/2 IB IB/2 KIB/2
VSS
equal to that of the input stage. The GBW of the folded cascode amplifier is also
given by gm1/CL.
The open loop dc gain of amplifiers having cascode transistors can be boosted
by regulating the gate voltages of the cascode transistors [12]. The regulation is
realized by adding an extra gain stage, which reduces the feedback from the output
to the drain of the input transistors. In this way, the dc gain of the amplifier can be
increased by several orders of magnitude. The increase in power and chip area can
be kept very small with appropriate feedback amplifier architecture [12]. The cur-
rent consumption of the folded cascode is doubled compared to the telescopic cas-
code amplifier although the output voltage swing is increased since there are only
four stacked transistors. The noise of the folded cascode is slightly higher than in
the telescopic cascode as a result of the added noise from the current source tran-
sistors T9 and T10. In addition, the folded cascade has a slightly smaller dc gain
due to the parallel combination of the output resistance of transistors T1 and T9.
A push-pull current-mirror amplifier, shown in Fig.2.7, has much better slew-
rate properties and potentially larger bandwidth and dc gain than the folded cas-
code amplifier. The slew rate and dc gain depend on the current-mirror ratio K,
which is typically between one and three. However, too large current-mirror ratio
increases the parasitic capacitance at the gates of the transistors T12 and T13, push-
ing the non-dominant pole to lower frequencies and limiting the achievable GBW.
The non-dominant pole of the current mirror amplifier is much lower than that of
the folded cascode amplifier and telescopic amplifiers due to the larger parasitic
capacitance at the drains of input transistors.
The noise and current consumption of the current-mirror amplifier are larger
than in the telescopic cascode amplifier or in the folded cascode amplifier. A cur-
rent-mirror amplifier with dynamic biasing [13] can be used to make the amplifier
biasing be based purely on its small signal behavior, as the slew rate is not limited.
In dynamic biasing, the biasing current of the operational amplifier is controlled
2.3 Operational Amplifiers 25
VDD
T3 bias1 T4
T5 T6
CC CC
bias2
T9
cmfb cmfb
T7 T8
VSS
on the basis of the differential input signal. With large differential input signals,
the biasing current is increased to speed up the output settling. Hence, no slew
rate limiting occurs, and the GBW requirement is relaxed. As the settling proceeds,
the input voltage decreases and the biasing current is reduced. The biasing current
needs to be kept only to a level that provides enough GBW for an adequate small-
signal performance. In addition to relaxed GBW requirements, the reduced static
current consumption makes the design of a high-dc gain amplifier easier. With
very low supply voltages, the use of the cascode output stages limits the avail-
able output signal swing considerably. Hence, two-stage operational amplifiers
are often used, in which the operational amplifier gain is divided into two stages,
where the latter stage is typically a common-source output stage. Unfortunately,
with the same power dissipation, the speed of the two-stage operational amplifiers
is typically lower than that of single-stage operational amplifiers.
Of the several alternative two-stage amplifiers, Fig.2.8 shows a simple Miller
compensated amplifier [14]. With all the transistors in the output stage of this ampli-
fier placed in the saturation region, it has an output swing of VDDVDS,SAT. Since
the non-dominant pole, which arises from the output node, is determined domi-
nantly by an explicit load capacitance, the amplifier has a compromised frequency
response.
The gain bandwidth product of a Miller compensated amplifier is given approx-
imately by GBW=gm1/CC, where gm1 is the transconductance of T1. In general,
the open loop dc gain of the basic configuration is not large enough for high-res-
olution applications. Gain can be enhanced by using cascoding, which has, how-
ever, a negative effect on the signal swing and bandwidth. Another drawback of
this architecture is a poor power supply rejection at high frequencies because of
the connection of VDD through the gate-source capacitance CGS5,6 of T5 and T6
and CC. The noise properties of the two-stage Miller-compensated operational
26 2 Neural Signal Conditioning Circuits
VDD
T7 bias3 bias3 T8
CC CC
outn T15 inp inn T16 outp
T1 T2
bias2 bias2
T5 T6
bias1
T11
cmfb cmfb cmfb cmfb
T13 T3 T4 T14
VSS
Fig.2.9Two-stage amplifiers: folded cascode amplifier with a common-source output stage and
Miller frequency compensation
amplifier are comparable to those of the telescopic cascode and better than those
of the folded cascode amplifier. The speed of a Miller-compensated amplifier is
determined by its pole-splitting capacitor CC. Usually, the position of this non-
dominant pole, which is located at the output of the two-stage amplifier, is lower
than that of either a folded-cascode or a telescopic amplifier.
Thus, in order to push this pole to higher frequencies, the second stage of the
amplifier requires higher currents resulting in increased power dissipation. Since
the first stage does not need to have a large output voltage swing, it can be a cas-
code stage, either a telescopic or a folded cascode. However, the current consump-
tion and transistor count are also increased. The advantages of the folded cascode
structure are a larger input common-mode range and the avoidance of level shift-
ing between the stages, while the telescopic stage can offer larger bandwidth and
lower thermal noise.
Figure2.9 illustrates a folded cascode amplifier with a common-source output
stage and Miller compensation. The noise properties are comparable with those
of the folded cascode amplifier. If a cascode input stage is used, the lead-compen-
sation resistor can be merged with the cascode transistors. An example of this is
the folded cascode amplifier with a common-source output stage and Ahuja-style
compensation [15] shown in Fig.2.10. The operation of the Ahuja-style compen-
sated operational amplifier is suitable for larger capacitive loads than the Miller-
compensated one and it has a better power supply rejection, since the substrate
noise coupling through the gate-source capacitance of the output stage gain tran-
sistors is not coupled directly through the pole-splitting capacitors to the opera-
tional amplifier output [15].
2.4 Experimental Results 27
VDD
T7 bias3 bias3 T8
bias2 bias2
T5 T6
CC CC
bias1
T11
cmfb cmfb cmfb cmfb
T13 T3 T4 T14
VSS
2.4Experimental Results
Amplitude
1
0
-1
-2
0.5
-0.5
(c) Detected spikes
1
Amplitude
0.5
-0.5
1 2 3 4 5 6 7 8 9 10
Time [40uS/step]
Fig.2.11Test data set, the y axis is arbitrary; a top: raw signal after amplification, not corrected
for gain, b bandpass filtered signal, and c detected spikes
20
Membrane potential [mV]
-20
-40
-60
0 5 10 15 20 25
Time [ms]
Fig.2.12Statistical voltage trace of neuron cell activity; Grey AreaVoltage traces from 1000
randomly selected neural channel compartments, Black AreaExpected voltage trace
(a) x 10
-3
Amplitude [V]
0
-5
0 2 4 6 8 10 12 14 16 18 20
Time [ms]
(b)
-80
-100
Magnitude [dBV 2rms /Hz]
-120
-140
-160
-180
0 1 2
10 10 10
Frequency [kHz]
Fig.2.13a Noise amplitude in time-domain at the output of the low-pass filter; b noise PSD at
the output of the low-pass filter
example of the time-domain noise estimation and noise power spectral den-
sity at the output of the low-pass filter is illustrated in Fig.2.13. For frequencies
higher than~10kHz, capacitances at the interface form the high-frequency pole
and shape both the signal and the noise spectrum; the noise is low-pass filtered
to the recording amplifier inputs. The interfaces input equivalent noise volt-
age decreases as the gain across the amplifying stages increases, i.e. the ratio
of the square of the signal power over its noise variance can be expressed as
SNR = F 2 / 2 2
neural + electrode + i j Gj
1 2
amp,i , where F is the total sig-
nal power, amp,i
2 represents the variance of the noise added by the ith amplification
stage with gains Gj , electrode
2 is the variance of the electrode, and neural
2 is variance
of the biological neural noise. The observed SNR of the system also increases as
the system is isomorphically scaled up, which suggests a fundamental trade-off
between SNR and speed of the system.
The fully differential low-noise amplifier achieves 40dB closed loop gain, and
occupies an area of 0.04mm2. Input referred noise is 3.1Vrms over the oper-
ating bandwidth 0.120kHz. Distortion is below 2% total harmonic distortion
(THD) for typical extracellular neural signals (smaller than 10mV peak-to-peak).
The common-mode rejection ratio (CMRR) and the power-supply rejection ratio
(PSRR) exceed 75dB.
The capacitive-attenuation band-pass filter with first-order slopes achieves
65dB dynamic range, 210mVrms at 2% THD and 140Vrms total inte-
grated output noise. Total harmonic distortion of the V/I converter is 0.04% at
20kHz. Table2.1 compares the state of the art neural recording systems to this
work.
30 2 Neural Signal Conditioning Circuits
2.5Conclusions
Bio-electronic neural interfaces enable the interaction with neural cells by record-
ing, to facilitate early diagnosis and predict intended behavior before undertak-
ing any preventive or corrective actions, or by stimulation, to prevent the onset
of detrimental neural activity such as that resulting in tremor. Multi-channel
neural interfaces allow for spatial neural recording and stimulation at multiple
sites. To evade the risk of infection, these systems are implanted under the skin,
while the recorded neural signals and the power required for the implant opera-
tion is transmitted wirelessly. The maximum number of channels is constrained
with noise, area, bandwidth, power, which has to be supplied to the implant exter-
nally, thermal dissipation i.e. to avoid necrosis of the tissues, and the scalability
and expandability of the recording system. Very frequently an electrode records
the action potentials from multiple surrounding neurons. Subsequently, the ability
to differentiate spikes from noise is governed by, both, the discrepancies between
the noise-free spikes from each neuron, and the signal-to-noise level of the record-
ing interface. After the waveform alignment, a feature extraction step character-
izes detected spikes and represent each detected spike in a reduced dimensional
space. The feature extraction and spike classification significantly reduce the data
requirements prior to data transmission (in multi-channel systems, the raw data
rate is substantially higher than the limited bandwidth of the wireless telemetry).
In this chapter, we introduce a low-power neural signal conditioning circuit
with capacitive-feedback low-noise amplifier and capacitive-attenuation band-
pass filter. The capacitive-feedback amplifier offers low-offset and low-distortion
solution with optimal power-noise trade-off. Similarly, the capacitive-attenuation
band-pass filter provides wide tuning range and low-power realization, while
allowing simple extension of the transconductors linear range, and consequently,
ensuring low harmonic distortion.
References 31
References
1. IEEE Standards Coordinating Committee, in IEEE standard for safety levels with respect to
human exposure to radio frequency electromagnetic fields, 3kHz to 300GHz, C95.1-2005,
2006
2. M. Steyaert, W. Sansen, C. Zhongyuan, A micropower low-noise monolithic instrumentation
amplifier for medical purposes. IEEE J. Solid-State Circuits 22(6), 11631168 (1987)
3. R. Harrison, C. Charles, A low-power low-noise CMOS amplifier for neural recording appli-
cations. IEEE J. Solid-State Circuits 38(6), 958965 (2003)
4. M.C. Chae, W. Liu, M. Sivaprakasam, Design optimization for integrated neural recording
systems. IEEE J. Solid-State Circuits 43(9), 19311939 (2008)
5. W. Wattanapanitch, M. Fee, R. Sarpeshkar, An energy-efficient micropower neural recording
amplifier. IEEE Trans. Biomed. Circuits Syst. 1(2), 136147 (2007)
6. C. Qian, J. Parramon, E. Sanchez-Sinencio, A micropower low-noise neural recording front-
end circuit for epileptic seizure detection. IEEE J. Solid-State Circuits 46(6), 13291405
(2011)
7. F. Bahmani, E. Snchez-Sinencio, A highly linear pseudo-differential transconductance, in
Proceedings of IEEE European Solid-State Circuits Conference, 2004, pp. 111114
8. S.K. Arfin, Low power circuits and systems for wireless neural stimulation. PhD thesis,
Massachusetts Institute of Technology, 2011
9. G. Nicollini, P. Confalonieri, D. Senderowicz, A fully differential sample-and-hold circuit for
high-speed applications. IEEE J. Solid-State Circuits 24(5), 14611465 (1989)
10. K. Gulati, H.-S. Lee, A high-swing CMOS telescopic operational amplifier. IEEE J. Solid-
State Circuits 33(12), 20102019 (1998)
11. T.C. Choi, R.T. Kaneshiro, W. Brodersen, P.R. Gray, W.B. Jett, M. Wilcox, High-frequency
CMOS switched-capacitor filters for communications application. IEEE J. Solid-State
Circuits 18, 652664 (1983)
12. K. Bult, G. Geelen, A fast-settling CMOS op amp for SC circuits with 90-dB DC gain. IEEE
J. Solid-State Circuits 25(6), 13791384 (1990)
13. R. Harjani, R. Heineke, F. Wang, An integrated low-voltage class AB CMOS OTA. IEEE J.
Solid-State Circuits 34(2), 134142 (1999)
14. R. Hogervorst, J.H. Huijsing, Design of low-voltage low-power operational amplifier cells
(Kluwer Academic Publishers, Dordrecht, 1999)
15. B.K. Ahuja, An improved frequency compensation technique for CMOS operational ampli-
fiers. IEEE J. Solid-State Circuits 18(6), 629633 (1983)
16. C.I. de Zeeuw etal., Spatiotemporal firing patterns in the cerebellum. Nat. Rev. Neurosci.
12(6), 327344 (2011)
17. D. Han etal., A 0.45V 100-channel neural-recording IC with sub-W/channel consumption
in 0.18m CMOS. IEEE Trans. Biomed. Circuits Syst. 7(6), 735746 (2013)
18. K. Abdelhalim etal., 64-channel UWB wireless neural vector analyzer SoC with a closed-
loop phase synchrony-triggered neurostimulator. IEEE J. Solid-State Circuits 48(10), 2494
2510 (2013)
19. C.M. Lopez etal., An implantable 455-active-electrode 52-channel CMOS neural probe, in
IEEE International Solid-State Circuits Conference, pp. 288289, 2013
20. K.A. Ng, Y.P. Xu, A multi-channel neural-recording amplifier system with 90dB CMRR
employing CMOS-inverter-based OTAs with CMFB through supply rails in 65nm CMOS, in
IEEE International Solid-State Circuits Conference, pp. 206207, 2015
Chapter 3
Neural Signal Quantization Circuits
Abstract Integrated neural implant interface with the brain using biocompatible
electrodes provides high yield cell recordings, large channel counts, and access to
spike data and/or field potentials with high signal-to-noise ratio. By increasing the
number of recording electrodes, spatially broad analysis can be performed that can
provide insights into how and why neuronal ensembles synchronize their activity.
In this chapter, we present several A/D converter realizations in voltage-, current-
and time-domain, respectively, suitable for multichannel neural signal-processing.
The voltage-domain SAR A/D converter combines the functionalities of program-
mable-gain stage and analog to digital conversion, occupies an area of 0.028mm2,
and consumes 1.1W of power at 100kS/s sampling rate. The current-mode suc-
cessive approximation A/D converter is realized in a 65nm CMOS technology,
and consumes less than 367nW at 40kS/s, corresponding to a figure of merit
of 14 fJ/conversion-step, while operating from a 1V supply. A time-based, pro-
grammable-gain A/D converter allows for an easily scalable, and power-efficient,
implantable, biomedical recording system. The time-domain converter circuit is
realized in a 90nm CMOS technology, operates at 640 kS/s, occupies an area of
0.022mm2, and consumes less than 2.7W corresponding to a figure of merit of
6.2fJ/conversion-step.
3.1Introduction
Bioelectronic interfaces allow the interaction with neural cells by both recording,
to facilitate early diagnosis and predict intended behavior before undertaking any
preventive or corrective actions [1], or stimulation devices, to prevent the onset
of detrimental neural activity such as that resulting in tremor. Monitoring large
scale neuronal activity and diagnosing neural disorders has been accelerated by
the fabrication of miniaturized microelectrode arrays, capable of simultaneously
recording neural signals from hundreds of channels [2]. By increasing the num-
ber of recording electrodes, spatially broad analysis of local field potentials can
be performed that can provide insights into how and why neuronal ensembles
synchronize their activity. Studies on body motor systems have uncovered how
kinematic parameters of movement control are encoded in neuronal spike time-
stamps [3] and inter-spike intervals [4]. Neurons produce spikes of nearly identi-
cal amplitude near to the soma, but the measured signal depend on the position of
the electrode relative to the cell. Additionally, the signal quality in neural inter-
face front-end, beside the specifics of the electrode material and the electrode/tis-
sue interface, is limited by the nature of the bio-potential signal and its biological
background noise, dictating system resources. For any portable or implantable
device, microelectrode arrays require miniature electronics locally to amplify the
weak neural signals, filter out noise and out-of band interference and digitize for
transmission. A single-channel [5] or a multichannel integrated neural amplifiers
and A/D converters provide the frontline interface between recording electrode
and signal conditioning circuits, and thus face critical performance requirements.
In this chapter, we present several A/D converter realizations in voltage-, cur-
rent- and time-domain, respectively, suitable for multichannel neural signal-pro-
cessing, and we evaluate trade-off between noise, speed and power dissipation on
a circuit-architecture level. This approach provides key insight required to address
SNR, response time, and linearity of the physical electronic interface. The voltage-
domain SAR A/D converter combines the functionalities of programmable-gain
stage and analog to digital conversion, occupies an area of 0.028mm2, and con-
sumes 1.1W of power at 100 kS/s sampling rate. The current-mode successive
approximation A/D converter is realized in a 65 nm CMOS technology, and con-
sumes less than 367 nW at 40 kS/s, corresponding to a figure of merit of 14 fJ/con-
version-step, while operating from a 1V supply. A time-based, programmable-gain
A/D converter allows for an easily scalable, and power-efficient, implantable, bio-
medical recording system. The time-domain converter circuit is realized in a 90nm
CMOS technology, operates at 640 kS/s, occupies an area of 0.022mm2, and con-
sumes less than 2.7W corresponding to a figure of merit of 6.2 fJ/conversion-step.
The chapter is organized as follows: Sect.3.2 present the overview of the low-
power A/D converter architectures, while in Sect.3.3 analyses of the main building
blocks of the A/D converter are given, namely, sample and hold circuit, operation
amplifier, and comparator. Section3.4 focuses on the voltage-domain A/D conversion,
and the noise fluctuations on a circuit-architecture level. In Sect.3.5, the main build-
ing blocks of the current-domain ADC are evaluated. In Sect.3.6, the time-domain
A/D conversion, which utilizes a linear voltage-to-time converter (VTC) and a two-
step time-to-digital converter is discussed. Experimental results obtained are presented
in Sect.3.7. Finally, Sect.3.8 provides a summary and the main conclusions.
Since the existence of digital signal processing, A/D converters have been play-
ing a very important role to interface analog and digital worlds. They perform the
digitalization of analog signals at a fixed time period, which is generally specified
3.2 Low-Power A/D Converter Architectures 35
by the application. The A/D conversion process involves sampling the applied
analog input signal and quantizing it to its digital representation by comparing it to
reference voltages before further signal processing in subsequent digital systems.
Depending on how these functions are combined, different A/D converter architec-
tures can be implemented with different requirements on each function. To imple-
ment power-optimized A/D converter functions, it is important to understand the
performance limitations of each function before discussing system issues. In this
section, the concept of the basic A/D conversion process and the fundamental limi-
tation to the power dissipation of each key building block are presented.
Parallel (Flash) A/D conversion is by far the fastest and conceptually simplest
conversion process [615], where an analog input is applied to one side of a com-
parator circuit and the other side is connected to the proper level of reference from
zero to full scale. The threshold levels are usually generated by resistively dividing
one or more references into a series of equally spaced voltages, which are applied
to one input of each comparator. For n-bit resolution, 2n1 comparators simulta-
neously evaluate the analog input and generate the digital output as a thermometer
code. Since flash converter needs only one clock cycle per conversion, it is often
the fastest converter. On the other hand, the resolution of flash ADCs is limited by
circuit complexity, high power dissipation, and comparator and reference mismatch.
Its complexity grows exponentially as the resolution bit increases. Consequently,
the power dissipation and the chip area increase exponentially with the resolution.
To reduce hardware complexity, power dissipation, and die area, and to increase
the resolution, but to maintain high conversion rates, flash converters can be
extended to a two-step/multistep [1624] or sub-ranging architecture [2533]
(also called series-parallel converter). Conceptually, these types of converters need
m2n instead of 2mn comparators for a full flash implementation assuming n1,
n2, , nm are all equal to n. However, the conversion in sub-range, two-step/multi-
step ADC does not occur instantaneously like a flash ADC, and the input has to
be held constant until the sub-quantizer finishes its conversion. Therefore, a sam-
ple and hold circuit is required to improve performance. The conversion process is
split into two steps as shown in Fig.3.1. Simplified two-step A/D architecture and
Analog In + A
S/H
D
-
A=2n1
A D
D A
n1 n2
(n1+n2) Digital Out
voltage
amplifier
Vin
+ A
D
- amplified
voltage residue lower bit
ADC
A D
D A Vin-LSB<V<Vin
upper bit
ADC
Two-step A/D converter
time difference
amplifier
Tdiff=Tstart-Tstop + T
Tstop
D
- amplified
time residue lower bit
TDC
Tstart
T Delay
D Tdiff-LSB<T<Tdiff
upper bit
TDC
Two-step T/D converter
cycle, the S/H circuit between the two stages holds the value of the amplified resi-
due. Therefore, the second stage is able to operate on that residue independently of
the first stage, which in turn can convert a new, more recent sample. The maximum
sampling frequency of the pipelined two-step converter is determined by the set-
tling time of the first stage only due to the independent operation of the two stages.
To generate the digital output for one sample, the output of the first stage has
to be delayed by one clock cycle by means of a shift register (SR) (Fig.3.3).
Although the sampling speed is increased by the pipelined operation, the delay
between the sampling of the analog input and the output of the corresponding digi-
tal value is still two clock cycles. For most applications, however, latency does not
play any role, only conversion speed is important. In all signal processing and tel-
ecommunications applications, the main delay is caused by digital signal process-
ing, so a latency of even more than two clock cycles is not critical.
The architecture as described above is not limited to two stages. Because the
inter-stage sample and hold circuit decouples the individual stages, there is no dif-
ference in conversion speed whether one single stage or an arbitrary number of
stages follow the first one. This leads to the general pipelined A/D converter archi-
tecture, as depicted in Fig.3.4 [3455]. Each stage consists of an S/H, an N-bit
flash A/D converter, a reconstruction D/A converter, a subtracter, and a residue
Analog In + A
S/H S/H
D
-
A=2n1
A D
D A
n1
SR
n2
(n1+n2) Digital Out
Fig.3.3Two-Step converter with an additional sample and hold circuit and a shift register (SR)
to line up the stage output in time
In
S/H
+ In
S/H
+ S/H A
D
- -
A=2n1 A=2n2
A D A D
D A D A
n1 n2
SR
nm
SR SR
(n1+n2++nm) Digital Out
Fig.3.5Successive In
S/H + SAR
approximation A/D converter
architecture - logic
D
A n
Digital Out
3.2 Low-Power A/D Converter Architectures 39
SAR A/D converter illustrated in Fig.3.5 typically consists of a S/H circuit followed
by a feedback loop composed by a comparator, a successive approximation Register
(SAR) logic block, and an n-bit D/A converter.
The SAR logic captures the data from the comparator at each clock cycle, and
assembles the words driving the D/A converter bit by bit, from the most- to the
least-significant bit, according to the successive approximation algorithm: The D/A
converter generate a value representing half of the reference voltage. Subsequently,
the comparator determines whether the held signal value is over or under the output
value of the digital-to-analog converter and keeps or resets the MSB. The algorithm
proceeds in the same way predicting each successive bit until all n-bits have been
determined. At the start of the next conversion, while the S/H circuit is sampling
the next input, the SAR provides the n-bit output and resets the registers. Offsets in
the S/H circuit or the comparator generate a shift of the conversion range, however
this shift is identical for every code. The S/H circuit requires a low distortion figure
for relatively low sample periods. Additionally, the D/A converter have stringent
requirements as it determines the overall circuit linearity and the conversion speed.
Due to a minimum number of analog blocks required, and a very simple digi-
tal logic needed to perform the complete conversion, the SAR A/D converters are
usually chosen as the most efficient in terms of power consumption to digitalize
biomedical signals.
Fig.3.6Switched capacitor CF
S/H circuit configurations in
sample phase: a circuit with
separate CH and CF
CH
Vin Vout
VSS VSS
Fig.3.7Switched capacitor
S/H circuit configurations in
sample phase: a circuit with CH
one capacitor Vin Vout
VSS
off-resistance needed for a voltage memory are far easier to implement in a prac-
tical integrated circuit technology than inductors and switches with a very small
on-resistance required for a current memory, all sample and hold circuits are based
on voltage sampling with switched capacitor (SC) technique. S/H circuit archi-
tectures can roughly be divided into open-loop and closed-loop architectures. The
main difference between them is that in closed-loop architectures the capacitor, on
which the voltage is sampled, is enclosed in a feedback loop, at least in hold mode.
Although open-loop S/H architecture provide high-speed solution, its accuracy,
however, is limited by the harmonic distortion arising from the nonlinear gain of
the buffer amplifiers and the signal-dependent charge injection from the switch.
These problems are especially emphasized with a CMOS technology. Enclosing
the sampling capacitor in the feedback loop reduces the effects of nonlinear para-
sitic capacitances and signal-dependent charge injection from the MOS switches.
Unfortunately, an inevitable consequence of the use of feedback is reduced speed.
Figures3.6, 3.7 and 3.8 illustrate three common configurations for closed-loop
switched-capacitor S/H circuits [56, 6276]. For simplicity, single-ended configu-
rations are shown; however, in circuit implementation all would be fully differ-
ential. In a mixed-signal circuit such as A/D converters, fully differential analog
signals are preferred as a means of getting a better power supply rejection and
immunity to common mode noise. The operation needs two nonoverlapping clock
phasessampling, and holding, or transferring. Switch configurations shown
in Figs.3.6, 3.7 and 3.8 are for the sampling phase, while configurations shown
in Figs.3.9, 3.10, and 3.11 are for hold phase. In all cases, the basic operations
include sampling the signal on the sampling capacitor(s) CH and transferring the
signal charge onto the feedback capacitor CF by using an opamp in the feedback
configuration. In the configuration in Fig.3.6, which is often used as an integrator,
3.3 A/D Converter Building Blocks 41
Fig.3.8Switched capacitor
S/H circuit configurations CF
in sample phase: a circuit
with CF shared as a sampling V in
capacitor CH
V out
V SS
Fig.3.9Switched capacitor CF
S/H circuit configurations
in hold phase: a circuit with
separate CH and CF
CH
Vin Vout
VSS VSS
Fig.3.10Switched capacitor
S/H circuit configurations in
hold phase: a circuit with one CH
capacitor Vin Vout
VSS VSS
Fig.3.11Switched capacitor
S/H circuit configurations CF
in hold phase: a circuit with
CF shared as a sampling Vin
capacitor CH
Vout
VSS VSS
42 3 Neural Signal Quantization Circuits
assuming an ideal opamp and switches, the opamp forces the sampled signal
charge on CH to transfer to CF.
If CH and CF are not equal capacitors, the signal charge transferred to CF will dis-
play the voltage at the output of the opamp according to Vout=(CH/CF) Vin. In this
way, both S/H and gain functions can be implemented within one SC circuit [75, 76].
In the configuration shown in Fig.3.7, only one capacitor is used as both sam-
pling capacitor and feedback capacitor. This configuration does not implement the
gain function, but it can achieve high speed because the feedback factor (the ratio
of the feedback capacitor to the total capacitance at the summing node) can be
much larger than that of the previous configuration, operating much closer to the
unity gain frequency of the amplifier. Furthermore, it does not have the capaci-
tor mismatch limitation as the other two configurations. Here, the sampling is
performed passively, i.e., it is done without the opamp, which makes signal acqui-
sition fast. In hold mode, the sampling capacitor is disconnected from the input
and put in a feedback loop around the opamp [56, 62].
Figure 3.8 shows another configuration which is a combined version of the
configurations in Figs.3.6 and 3.7. In this configuration, in the sampling phase,
the signal is sampled on both CH and CF, with the resulting transfer function
Vout=(1+(CH/CF)) Vin. In the next phase, the sampled charge in the sampling
capacitor is transferred to the feedback capacitor. As a result, the feedback capac-
itor has the transferred charge from the sampling capacitor as well as the input
signal charge. This configuration has a wider bandwidth in comparison to the con-
figuration shown in Fig.3.6, although feedback factor is comparable. Important
parameters in determining the bandwidth of the SC circuit are Gm (transconduct-
ance of the opamp), feedback factor , and output load capacitance. In all of these
three configurations, the bandwidth is given by 1/=Gm/CL, where CL is the
total capacitance seen at the opamp output. Since S/H circuit use amplifier as
buffer, the acquisition time will be a function of the amplifier own specifications.
Similarly, the error tolerance at the output of the S/H is dependent on the ampli-
fiers offset, gain, and linearity. Once the hold command is issued, the S/H faces
other errors. Pedestal error occurs as a result of charge injection and clock feed-
through. Part of the charge built up in the channel of the switch is distributed onto
the capacitor, thus slightly changing its voltage. Also, the clock couples onto the
capacitor via overlap capacitance between the gate and the source or drain.
Another error that occurs during the hold mode is called droop, which is related
to the leakage of current from the capacitor due to parasitic impedances and to
the leakage through the reverse-biased diode formed by the drain of the switch.
This diode leakage can be minimized by making the drain area as small as can be
tolerated. Although the input impedance to the amplifier is very large, the switch
has a finite off impedance through which leakage can occur. Current can also leak
through the substrate.
A prominent drawback of a simple S/H is the on-resistance variation of the
input switch that introduces distortion. Technology scales the supply voltage
faster than the threshold voltage, which results in a larger on-resistance variation
in a switch. As a result, the bandwidth of the switch becomes increasingly signal
3.3 A/D Converter Building Blocks 43
MOS transistors are used as switches at low voltages. When the signal ampli-
tudes are large, accuracy and signal bandwidth are limited by distortion, which
originates from the fact that switch on-resistance are not constant but vary
as functions of drain and source voltages. The on-resistance is expressed as
Ron = L/(CoxW(VGSVT)), if VDS is small. In the equation two different signal-
dependent terms can be identified. The first and dominant one is the gate-source
voltage VGS. The second is the threshold voltage VT dependency on the source-
bulk. Although large transistor switches can be used for the worst case VT design,
the switch parasitic capacitance can significantly overload the output of the circuit.
Therefore, increasing VGSVT is desirable to implement low on-resistance switch
without adding too much parasitic capacitance.
Several methods allow increase of this gate voltage drive. One method is to
reduce VT by including an extra low-threshold transistor in the process, although it
will add to process complexity. Another method is to increase VGS using one large
supply created from chip supply to drive all switches on the chip, but potential
problems including possible cross-talk to some sensitive nodes through the shared
supply and difficulty in estimating the total charge drain to drive all switches ren-
ders this method absolvent.
Another viable solution to avoid major source of nonlinearity is to make the
switch gate-source voltage constant, by making the gate voltage track the source
voltage with an offset Voff_in, which is, at its maximum, equal to the supply volt-
age. This technique, which is implemented in this design, is called bootstrap-
ping [81]. In this case, bootstrap circuit shown in Fig.3.12 drives each switch
that use the same clock to avoid the problem of crosstalk through the clock line.
A Voff_in can be generated with a switched capacitor, which is pre-charged in
every clock cycle. During the clock phase when the transistor is nonconductive the
switched capacitor is pre-charged to Voff_in. To turn the switch on, the capacitor
VDD
T2 clk clkn
T1 T3 T4 T13
T8
T11 T12
VSS
T5 T6
clk
T9 T10 out
in
clkn
T7
VSS
is switched between the input voltage and the transistor gate. The capacitor values
are chosen as small as possible for area considerations but large enough to suf-
ficiently charge the load to the desired voltage levels. The device sizes are chosen
to create sufficiently fast rise and fall times at the load. The load consists of the
gate capacitance of the switching device T10 and any parasitic capacitance due to
inter-connect between the bootstrap circuit and the switching device. Therefore,
it is desirable in the layout to minimize the distance between the bootstrap cir-
cuit and the switch or to insert shielding protection. When the switch T10 is on,
its gate voltage VG is greater than the analog input signal Vin by a fixed differ-
ence of Voff_in = VDD. Although the absolute voltage applied to the gate may
exceed for a positive input signal, none of the terminal-to-terminal device voltages
exceeds VDD. A single-phase clock clk turns the switch T10 on and off. During the
off phase, clk is low discharging the gate of the switch to ground through devices
T11 and T12.
At the same time, VDD is applied by T3 and T7 across as capacitor connected
transistor T16, which act as the battery across the gate and source during the on
phase. T8 and T9 isolate the switch from the capacitance while it is charging. When
clkn goes high, T6 pulls down the gate of T8, allowing charge from the battery
capacitor to flow onto gate of T10. This turns on both T9 and T10. T9 enables gate
of T10 to track the input voltage applied at the source of T10 shifted by VDD, keep-
ing the gate-source voltage constant regardless of the input signal.
The maximum speed and, to a large extent, the power consumption of S/H is
determined by the operational amplifier. In general, the amplifiers open loop dc
gain limits the settling accuracy of the amplifier output, while the bandwidth and
slew rate of the amplifier determine the maximal clock frequency. The operational
amplifiers in S/H circuit have some unique requirements, the most important of
which is the input impedance, which must be purely capacitive so as to guarantee
the conservation of charge. Consequently, the operational amplifier input has to be
either in the common source or the source follower configuration. Another char-
acteristic feature of S/H circuit is the load at the amplifier output, which is typi-
cally purely capacitive and as a result, the amplifier output impedance can be high.
The benefit of driving solely capacitive loads is that no output voltage buffers are
required. In addition, if all the amplifier internal nodes have low impedance, and
only the output node has high impedance, the speed of the amplifier can be max-
imized. Unfortunately, an output stage with very high output impedance cannot
usually provide high signal swing.
The ultimate settling accuracy is limited by the finite amplifier dc gain. What
the exact settling error is depends not only on the gain but also on the feedback
factor in the circuit utilizing the amplifier. A very widely used method to improve
the dc gain is based on local negative feedback [8284]. In addition to this cascode
46 3 Neural Signal Quantization Circuits
regulation other techniques for increasing the dc gain have been proposed as well.
Gain boosting with positive feedback has been investigated, [85, 86]. In [87],
dynamic biasing, where the opamp current is decreased toward the end of the set-
tling phase, is used to increase the dc gain. It exploits the fact that current reduc-
tion lowers the transistor gDS, which increases the dc gain. By regulating the gate
voltages of the cascode transistors [88] by adding an extra gain stage, the dc gain
of the amplifier can be increased by several orders of magnitude.
Besides the amplifier bandwidth, the settling time is limited by the fact that
the amplifier can supply only a finite current to the load capacitor. Consequently,
the output cannot change faster than the slew rate. When designing an ampli-
fier, the load capacitor is known and the required slew rate SR = kVmax/TS can
be calculated from the largest voltage step Vmax and the clock period TS. A com-
monly used rule of thumb suggests that one third of the settling time should
be reserved for slewing, resulting in k of six. The required slewing current is
ISR =(kVmaxCL)/TS. It is linearly dependent on the clock frequency, while the
current needed to obtain the amplifier bandwidth has a quadratic dependence. The
opamp unity gain frequency 1 can be made larger increasing gm, in by means of
making the transistors bigger; however, this does not necessarily imply a faster
opamp. The parasitic capacitance is also increased, therefore feedback factor
becomes smaller and dominant pole p=1 is pushed towards lower frequen-
cies. Therefore, a trade-off between the increase of gmin and CG exists. This
suggests that an optimum size for the input pair exist, which maximizes the
transconductance of the opamp by avoiding to make the input capacitance domi-
nant on the feedback factor.
Overview of several single-, and two-stage amplifiers is given in Sect.2.3.
Because of its fast response, regenerative latches are used, almost without excep-
tion, as comparators for high-speed applications. An ideal latched comparator is
composed of a preamplifier with infinite gain and a digital latch circuit. Since the
amplifiers used in comparators need not to be either linear or closed-loop, they can
incorporate positive feedback to attain virtually infinite gain [89]. Because of its
special architecture, working process of a latched comparator could be divided in
two stages: tracking and latching stages. In tracking stage the following dynamic
latch circuit is disabled, and the input analog differential voltages are amplified by
the preamplifier. In the latching stage while the preamplifier is disabled, the latch
circuit regenerates the amplified differential signals into a pair of full-scale digital
signals with a positive feedback mechanism and latches them at output ends.
Depending on the type of latch employed, the latch comparators can be divided
into two groups: static [56, 90, 91], which have a constant current consumption
during operation and dynamic [9294], which does not consume any static power.
While dynamic latch circuits regenerate the difference signals, the large voltage
3.3 A/D Converter Building Blocks 47
outn
outp
inp T2 T3 inn
VSS
bias T6 T7
outn
T10
bias
T1 T8 T9
VSS
chosen such that gm8,9R<2 and should be small enough to reset the output at the
clock rate. Since all transistors are in active region, the latch can start regenerat-
ing right after the latch signal goes low. The one disadvantage of this scheme is
the large kickback noise. The folding nodes (drains of T4 and T5) have to jump up
to VDD in every clock cycle since the latch output does the full swing. Because of
this, there are substantial amounts of kickback noise into the inputs through the
gate-drain capacitor of input transistors T1 and T2 (CGD1, CGD2). To reduce kick-
back noise, the clamping diode has been inserted at the output nodes [96].
In Fig.3.15 design shown in [91] is illustrated. Here, when the latch signal is
low (resetting period), the amplified input signal is stored at gate of T8 and T9 and
T12 shorts both Voutp and Voutn. When the latch signal goes high, the cross-coupled
transistors T10 and T11 make a positive feedback latch. In addition, the positive
feedback capacitors, C1 and C2, boost up the regeneration speed by switching T8
and T9 from an input dependant current source during resetting period to a cross-
coupled latch during the regeneration period. Because of C1 and C2, the T8~T11
work like a cross-coupled inverter so that the latch does not dissipate the static
power once it completes the regeneration period. However, there is a large amount
of kickback noise through the positive feedback capacitors, C1 and C2. The
switches (T6, T7 and T13) have been added to isolate the preamplifier from the
latch. Therefore, the relatively large chip area is required due to the positive feed-
back capacitors (C1, C2), isolation switches (T6, T7 and T13) and complementary
latch signals.
The concept of a dynamic comparator exhibits potential for low power and
small area implementation and, in this context, is restricted to single-stage topolo-
gies without static power dissipation. A widely used dynamic comparator is based
on a differential sensing amplifier as shown in Fig.3.16 was introduced in [92].
Transistors T14, biased in linear region, adjust the threshold resistively and above
them transistors T512 form a latch. When the latch control signal is low, the
3.3 A/D Converter Building Blocks 49
T7 T6
clkn
T7
T6
outn outp
T12
bias
T1
T10 T11
VSS
comparator [92]
T9 T10 T11 T12
clk clk
outp
T7 T8
outn
T5 T6
VSS
transistors T9 and T12 are conducting and T7 and T8 are cut off, which forces both
differential outputs to VDD and no current path exists between the supply voltages.
Simultaneously, T10 and T11 are cut off and the transistors T5 and T6 conduct.
This implies that T7 and T8 have a voltage of VDD over them. When the compara-
tor is latched, T7 and T8 are turned on. Immediately after the regeneration moment,
the gates of the transistors T5 and T6 are still at VDD and they enter saturation,
amplifying the voltage difference between their sources. If all transistors T512 are
assumed to be perfectly matched, the imbalance of the conductances of the left
and right input branches, formed by T12 and T34, determines which of the out-
puts goes to VDD and which to 0V. After a static situation is reached (Vclk is high),
both branches are cut off and the outputs preserve their values until the comparator
is reset again by switching Vclk to 0V. The transistors T14 connected to the input
50 3 Neural Signal Quantization Circuits
and reference are in the triode region and act like voltage controlled resistors. The
transconductance of the transistors T14 operating in the linear region, is directly
proportional to the drain-source voltage of the corresponding transistor VDS14,
while for the transistors T56 the transconductance is proportional to VGS5,6VT. At
the beginning of the latching process, VDS140 while VGS5,6VTVDD. Thus,
gm5,6gm14, which makes the matching of T5 and T6 dominant in determining the
latching balance. As small transistors are preferred, offset voltages of a few hun-
dred millivolts are easily resulted. Mismatch in transistors T712 are attenuated by
the gain of T5 and T6, which makes them less critical. To cope with the mismatch
problem, the layout of the critical transistors must be drawn as symmetric as pos-
sible. In addition to the mismatch sensitivity, the latch is also very sensitive to an
asymmetry in the load capacitance. This can be avoided by adding an extra latch
or inverters as a buffering stage after the comparator core outputs.
The resistive divider dynamic comparator topology has one clear benefit, which
is its low kickback noise. This results from the fact that the voltage variation at the
drains of the input transistors T14 is very small. On the other hand, the speed and
resolution of the topology are relatively poor because of the small gain of the tran-
sistors biased in the linear region.
A fully differential dynamic comparator based on two cross-coupled differen-
tial pairs with switched current sources loaded with a CMOS latch is shown in
Fig.3.17 [93]. The trip point of the comparator can be set by introducing imbal-
ance between the source-coupled pairs. Because of the dynamic current sources
together with the latch, connected directly between the differential pairs and the
supply voltage, the comparator does not dissipate dc-power. When the comparator
is inactive the latch signal is low, which means that the current source transistors
T5 and T6 are switched off and no current path between the supply voltages exists.
Simultaneously, the p-channel switch transistors T9 and T12 reset the outputs by
shorting them to VDD. The n-channel transistors T7 and T8 of the latch conduct
outn
outp
T7 T8
clk clk
T5 T6
VSS
3.3 A/D Converter Building Blocks 51
and also force the drains of all the input transistors T14 to VDD, while the drain
voltage of T5 and T6 are dependent on the comparator input voltages. When clock
signal is raised to VDD, the outputs are disconnected from the positive supply, the
switching current sources T5 and T6 turn on and T14 compare VinpVinn with Vrefp
Vrefn. Since the latch devices T78 are conducting, the circuit regeneratively ampli-
fies the voltage difference at the drains of the input pairs. The threshold voltage of
the comparator is determined by the current division in the differential pairs and
between the cross-coupled branches.
The threshold level of the comparator can be derived using large signal cur-
rent equations for the differential pairs. The effect of the mismatches of the other
transistors T712 is in this topology not completely critical, because the input is
amplified by T14 before T712 latch. The drains of the cross-coupled differential
pairs are high impedance nodes, and the transconductances of the threshold-volt-
age-determining transistors T14 large. A drawback of the differential pair dynamic
comparator is its high kickback noise: large transients in the drain nodes of the
input transistors are coupled to the input nodes through the parasitic gate-drain
capacitances. However, there are techniques to reduce the kickback noise, e.g., by
cross coupling dummy transistors from the differential inputs to the drain nodes
[97]. The differential pair topology achieves a high speed and resolution, which
results from the built-in dynamic amplification.
Figure 3.18 illustrates the schematic of the dynamic latch given in [94]. The
dynamic latch consists of pre-charge transistors T12 and T13, cross-coupled
inverter T69, differential pair T10 and T11 and switch T14 which prevent the static
current flow at the resetting period. When the latch signal is low (resetting period),
the drain voltages of T1011 are VDDVT, and their source voltage is VT below the
latch input common mode voltage. Therefore, once the latch signal goes high, the
n-channel transistors T7,911 immediately go into the active region. Because each
transistor in one of the cross-coupled inverters turns off, there is no static power
dissipation from the latch once the latch outputs are fully developed.
outn
outp
T7 T9
bias clk
T3 T14
VSS
52 3 Neural Signal Quantization Circuits
Fig.3.19Multichannel &K0
neural interfaces: &K
$QDORJPX[
$'&
%3/1$ 3*$
Fig.3.20Multichannel Ch #M
neural interfaces; an ADC Ch #2
vcm
vcm
s2 s1
s1 C3 s2 s3
Vin-
vcm
vcm
vcm
2n C4
C1 1 1
s2 2 C2 2p
Vcomp+
Vref+
Vref-
s2 2 2p Vcomp-
C2
C1 1 1
2n C4
vcm
vcm
vcm
Vin+ s1 s2 s3
C3
s2 s1
vcm
vcm
Fig.3.22Maximum 85
achievable SNR for different 14 bit
80 SNR[dB]
sampling capacitor values
and resolutions 75
12 bit
70
65
60 10 bit
55
50
C4 [F] 8 bit
45
1f 10f 0.1p 1p 10p
70 12 bit
60 10 bit
50
8 bit
P [W]
40
10n 0.1u 1u 10u 0.1m 1m
capacitance value and the OTA size. This means that the PG ADC circuit power
quadruples for every additional bit resolved for a given speed requirement and
supply voltage as illustrated in Figs.3.22 and 3.23. Notice that for a small sam-
pling capacitor values, thermal noise limits the SNR, while for a large sampling
capacitor, the SNR is limited by the quantization noise and the curve flattens out.
Improving the power efficiency beyond topological changes of the OTA and sup-
ply voltage reduction require smart allocation of the biasing currents. Hence,
techniques such as current reuse [105, 106], time multiplexing [4, 106], and adap-
tive duty-cycling of the entire analog front end [107, 108] can be used to improve
power efficiency by exploiting the fact that neurons spikes are irregular and low
frequency.
Choosing the OTA bandwidth too high increases the noise and additionally
demands unnecessarily low on-resistance of the switches and thus large transistor
dimensions. The optimum time constant remains constant regardless of the circuit
size (or ID) because CL scales together with C4 and the parasitic capacitance Cp.
The choice of the hold capacitor value is a trade-off between noise requirements
on the one hand and speed and power consumption on the other hand.
3.4 Voltage-Domain SAR A/D Conversion 55
Fig.3.24Closed loop 5k
normalized time constant /t
versus hold capacitance
CH for different biasing
conditions; case for
C4=3CL, CL=Cp. The time 10 A
1k
constant is normalized to the
t (=1/ft,intrinsic) of the device, 40 A
which is approximately
(CG/gm) 100 A
400 A
C4 [F] 1 mA
0.1k
0 1p 10p
The sampling action adds kT/C noise to the system which can only be reduced
by increasing the hold capacitance C4. A large capacitance, on the other hand,
increases the load of the operational amplifier and thus decreases the speed for
a given power. The OTA size and its bias current for a given speed requirement
and minimum power dissipation are determined using -versus-C4 curves as in
Fig.3.24. Note that for low frequency operation (where /t is large), the COTA that
achieves the minimum power dissipation for given settling time and noise require-
ments, usually does not correspond to the minimum time constant point. This is a
consequence of setting the C4/COTA ratio of the circuit to the minimum time con-
stant point, which requires larger COTA and results in power increase and excessive
bandwidth. Near the speed limit of the given technology (where the ratio /t is
small), however, the difference in power between the minimum power point and
the minimum time constant point becomes smaller as the stringent settling time
requirement forces the C4/COTA ratio (Fig.3.25) to be at its optimum value to
achieve the maximum bandwidth.
The OTA in PG ADC circuit has some unique requirements; the most important
is the input impedance, which must be purely capacitive so as to guarantee the
conservation of charge. Consequently, the OTA input has to be either in the com-
mon source or the source follower configuration.
Another characteristic feature is the load at the OTA output, which is typi-
cally purely capacitive, and as a result, the OTA output impedance must be high.
The benefit of driving solely capacitive loads is that no output voltage buffers
are required. The implemented folded-cascode OTA is illustrated in Fig.3.26.
The input stage of the OTA is provided with two extra transistors T10 and T11 in
a common-source connection, having their gates connected to a desired reference
common-mode voltage at the input, and their drains connected to the ground [88].
The advantage of this solution is that the common-mode range at the output is
not restricted by a regulation circuit, and can approach a rail-to-rail behavior very
closely. The transistors of the output stage have two constrains: the gm of the cas-
cading transistors T5,6 must be high enough, in order to boost the output resistance
56 3 Neural Signal Quantization Circuits
2 CL =1.5pF Cp =1.5pF
1.5 Cp =1pF
CL =1pF
1 Cp =0.5pF
CL =0.5pF
C4 [F]
0.5
0.1p 1p 10p
Fig.3.25Optimum gate capacitance COTA,opt versus hold capacitance C4 for different loading
and parasitic conditions
VDD
T3 T4 T16
VSS
Fig.3.26OTA schematic
of the cascode, allowing a high enough dc gain and the saturation voltage of the
active loads T3,4 and T7,8 must be maximized, in order to reduce the extra noise
contribution of the output stage. These considerations underline a tradeoff between
fitting the saturation voltage into the voltage headroom and minimizing the noise
contribution. A good compromise is to make the cascading transistors larger
than the active loads: in such a way the gm of the cascading transistors is max-
imized, boosting the dc gain, while their saturation voltage is reduced, allowing
for a larger saturation voltage for the active loads, without exceeding the voltage
headroom.
3.4 Voltage-Domain SAR A/D Conversion 57
In order to maximize the output SNR, CL must be maximized, which means that
bandwidth must be minimized. The input-referred noise of the OTA input pair is
reduced by increasing the gm, increasing the current, or increasing the aspect ratio
of the devices. The effect of the last method, however, is partially canceled by the
increase in the noise excess factor. When referred to the OTA input, the noise volt-
ages of the current sources (or mirrors) in the first stages are multiplied by the gm
of the device itself, and divided by the gm of the input transistor, which again sug-
gests that maximizing input pair gm minimizes noise. It can be further reduced by
decreasing the gm of the current sources. Since the current is usually set by other
requirements, the only possibility is to decrease the aspect ratio of the device.
This leads to an increase in the gate overdrive voltage, which, as a positive side
effect, also decreases . Increasing L to avoid short channel effects is also possible,
although with a constant aspect ratio it increases the parasitic capacitances.
The dynamic latch illustrated in Fig.3.27 consists of pre-charge transistors T14
and T17, cross-coupled inverter T1213 and T1516, differential pair T10 and T11 and
switch T9, which prevent the static current flow at the resetting period [94]. A large
portion of the total comparator current is allocated to the input branches to boost
the input gm. Similarly, the noise from the non-gain element i.e. the load transis-
tor, is minimized, by applying small biasing current. Additionally, small width and
large length for their gate dimensions is chosen.
The converter utilizes a synchronous SAR logic consisting of a cascade mul-
tiple input, n bit shift register (Fig.3.28) to generate digital output code and the
switch control signals for the D/A converter. The successive approximation algo-
rithm starts with the activation of the MSB, while the other bits remain zero. When
the conversion continues, the rest of the bits are successively activated. Each bit
evaluates the state of the others and in function of the result, it decides either it
has to be activated, keeps its value, or take the value of the comparator [109]. The
selection depends on the state of the register itself and the state of the following
registers states. As a result, its switching activity is not high and the leakage power
dominates the total power. To reduce the leakage currents several techniques are
VDD
T14 T15 T16 T17
T28 T26 T25 T19 T18 T7 T8 clk clk
outn
outp
inp inn
Ibias T1 T2
T12 T13
T20
T27 T24
T5 T6
T10 T11
VSS
Fig.3.27Comparator schematic
58 3 Neural Signal Quantization Circuits
(a) (b)
comp clk
comp
k
clk clk clk clk clk D Q
shift
MUX
cmp k cmp k cmp k cmp k cmp k
shift shift shift shift
Ak Ak Ak Ak Ak
Ak
clk
The current-mode converters offer high resource efficiency in terms of power and
area [111114]. In contrast to voltage mode charge redistribution SAR A/D con-
verter, corresponding current mode circuit have several intrinsic advantages includ-
ing tunable input impedances, wide bandwidth, and low supply voltage requirement.
Additionally, only MOSFET devices are required for logical and numerical opera-
tion limiting the area requirements. The current mode SAR A/D converter is imple-
mented following the conventional architecture. The output digital code is generated
by comparing the input current offered through current sample and hold circuit
(S/H) with a reference current provided by binary current D/A converter (DAC). The
comparison is performed in sequence for each bit in the selected resolution, adding
up to n-cycles per conversion (i.e., a binary search). The current comparison requires
only injecting two currents into a single node and using the current, which flows
out of the node, as the algebraic difference of the two input currents. Since most
of current source implementations have high output impedance, the nodal voltage
generated by the output current indicates the result of the comparison. The current
comparator feeds back in each cycle to the SAR logic, adjusting the reference cur-
rent generated by a current mode D/A converter closer to the input value. The input
dynamic range of the D/A converter is controlled by biasing current. As a conse-
quence, the power consumption of the DAC is directly proportional to the signal
level and accordingly, advantageous for the low energy neural signals.
A S/H circuit capture the input signal at the sampling instants and subsequently
hold the signal value, which is then further processed in a current based binary
search algorithm SAR loop. The schematic of the implemented circuit is illus-
trated in Fig.3.29. The circuit is (pseudo) differential, and only a single-ended
version is shown. A sample-and-hold operation is performed by using analog
switch formed by transmission gate T45 and hold capacitor CH. In sample mode,
switch T45 is turned on, and the gates of the current-mirror circuit transistors T1
3.5 Current-Domain SAR A/D Conversion 59
Fig.3.29Schematic of VDD
current mode sample and
hold circuit I+Iin-I Ib2 I+Iin-I
I+Ib1 T7 T8
T3
clk
I+Iin-I
T4
T1 T2
I-I
T5 Ib2
I+Iin Ib1
CH
clkn
VSS
Fig.3.30Schematic of VDD
inverter cascade current mode
comparator circuit
T1
IDAC
T2 T5
VOUT
T3 T6
IS/H
T4
VSS
60 3 Neural Signal Quantization Circuits
The current mode D/A converter circuit illustrated in Fig.3.31 consists of a cur-
rent replication network, which generates weighted currents using cascoded cur-
rent mirrors (T2341), and a current switching network of differential pairs (T120)
controlled by the binary bits. The cascade current sources are sized up according
to the bit weight and are biased by the same bias voltages. The weighted sources
and each weighted current source (or cascodes) is made of a number of LSB
devices connected in parallel (the LSB device becomes the unit device). By parti-
tioning the weighted devices in units, the unit devices can be positioned according
to common-centroids to reduce the impact of matching error gradients. This sim-
ple and compact implementation is able to reach very high conversion rates, being
limited only by the steepness of the data waveforms carrying the bits, by the maxi-
mum switching speed of the current switches, and by the technology limitation.
At nano-ampere bias levels, mismatch will limit the linearity of the current mode
D/A converter, thus restricting the maximum resolution of the A/D converter
[116]. To achieve an 10-bit resolution calibration as in [111] is employed.
The time-mode converters based on asynchronous ADCs [117], slope and inte-
grating ADC [118], or pulse-position modulation [119] provide high power and
area efficiency. In time-based methodology, conventional voltage and current vari-
ables are replaced by corresponding time differences between two rising edges as
the time variables, and logic circuits substitute the large-sized and power-hungry
analog blocks. In deep-submicron CMOS devices, even with the supply voltage
reduction, time resolution is increased due to the decrease of gate delay [120].
In the proposed design, a voltage signal is converted to a time-domain repre-
sentation using a comparator-based switched-capacitor circuit [121] and a con-
tinuous-time comparator. To improve the power efficiency, resulting time domain
information is converted to the corresponding digital code with a two-step time-
to-digital converter (TDC), where fine quantization of the resulting residue is
obtained with folding Vernier converter. The implementation results in a 90nm
CMOS technology show that a significant gain on throughput, resource usage and
3.6 Time-Domain Two-Step A/D Conversion 61
&
9'' FORFN
Fig.3.32Block diagram of an ADC with two-step time-to-digital conversion; single input ver-
sion shown for clarity
power reduction (less than 2.7W corresponding to a figure of merit of 6.2 fJ/
conversion-step) can be obtained for large-scale neural spike data, with a simple
and compact ADC structure that has minimal analog complexity.
The basic concept of the architecture, which utilizes a linear voltage-to-
time converter (VTC) , and a two-step time-to-digital converter, is illustrated in
Fig.3.32. The scheme is reconfigurable in terms of input gain (through program-
mable capacitance C2), resolution (controlling the number of performed iterations)
and sampling frequency (through the frequency of the input clock). Once a con-
figuration has been selected, the bias current is also dynamically controlled during
the conversion operation to adapt to the reference voltage. A comparator-based,
switched-capacitor gain stage [121] eliminates high-gain, high-speed operational
amplifier from the design, and does not require stabilizing high-gain, high-speed
feedback loop, reducing complexity, and the associated stability versus bandwidth/
power tradeoff. The VTC converts a sampled input voltage to a pulse, whose time
period is linearly proportional to the input voltage.
During the charge transfer phase, the current source IX1 turns on, charges up
the capacitor network consisting of C1 and C2, and generate a constant voltage
ramp on the output voltage Vo and, subsequently, causes the virtual ground volt-
age VX to ramp simultaneously, (Fig.3.33a), via the capacitor divider. The volt-
ages continue to ramp until the comparator detects the virtual ground condition
(VX = VCM), and turns off the current source. When the voltage at the sampling
capacitor reaches the comparator threshold, the comparator output goes high. The
time-to-digital converter measures the time interval tm from the start of the ramp
until the ramp and the input signal crossover point, as illustrated in Fig.3.33b). i.e.
between the start signal rising edges, and the comparator generated stop signal.
The time interval is measured by the TDC, which generates a corresponding digi-
tal output. The most simplest TDC realization, a digital counter, requires a (very)
high counter frequency to realize a high resolution converter. Similarly, delay
line circuits, although more power efficient, necessitate large number of stages to
measure required periods of time, significantly degrading INL and effective reso-
lution [122]. A TDC combining a low-frequency, low-power counter as a coarse
62 3 Neural Signal Quantization Circuits
Fig.3.33a The output voltage ramps to the final value in comparator-based switched capacitor
charge transfer phase, b ADC timing signals, c input versus output voltage of the proposed ADC
quantizer, and a folding Vernier delay line TDC as a fine quantizer, offer both, a
large dynamic range and power efficiency. The preferred post-processed data of
fine and course TDC output as a function of input voltage is shown in Fig.3.33c).
The maximum and minimum values of the fine folding Vernier TDC match to the
half- and one-and-a-half- period, of the coarse time-to-digital converter, respec-
tively, as measured in increases in the fine TDC unit step size.
The circuit realization of a fully-differential comparator with digitally-pro-
grammable offset adjustment [123] is illustrated in Fig.3.34. Transistors T58
employ iterated instance notation to designate 5 transistors placed in parallel. The
widths of these devices are binary weighted to offer a programmable current gain,
which creates an offset programmable pre-amplifier that is employed for offset
compensation. The continuous-time comparator at the output of the voltage-to-
time converter consists of a differential amplifier followed by a common source
stage (Fig.3.35). The input transistors operate in the subthreshold region for
Fig.3.34Differential VDD
comparator with digitally
programmable offset Vcontrol
T7[4:0] T8[4:0] T12
adjustment
Voff[4:0] Voff[4:0]
T5[4:0] T6[4:0]
Vout
T10 T11
Vinp Vinn
T1 T2
Vbias
T3
Vcontrol
T9
Vout
T4
VSS
3.6 Time-Domain Two-Step A/D Conversion 63
Fig.3.35Continuous-time VDD
comparator
T15 T16 T19
stop
ref Vin
T13 T14
Vbias
T17 T18
VSS
reduced power consumption and to offer a larger input common mode range, and,
consequently, increased ramp dynamic range.
The coarse current source (Fig.3.32) is a PMOS cascode that is controlled by
a switch at the gate of the cascade transistor, and the fine current source is a single
NMOS device with a series switch.
A coarse time quantizer, designed using a counter, measures the number of
reference clock cycles. The fine resolution quantization of the two-step time-to-
digital converter corresponds to a folding Vernier delay TDC. The proposed archi-
tecture executes time-to-digital conversion by counting transitions between the
stop signal and the next reference clock rising edge after stop signal. These transi-
tions are enabled only during the measurement interval. The synchronizer block,
which consists of three flip-flops in series, ensures that the coarse and fine time
measurements are correctly aligned.
A folding Vernier delay TDC is easily scalable to different time resolution and
higher number of bits without increasing the area. The architecture achieves mini-
mum time resolution of Vernier delay element (i.e., basic inverter delay), and, due to
the folding, offers area-efficient solution. Instead of 32-element delay line required
for the regular Vernier architecture, the folding feature allows the same Vernier delay
stages to be used repeatedly to measure the delay. Additionally, with implemented
dynamic control, we sequentially reduce the power required for each conversion.
Block level of a folding TDC is illustrated in Fig.3.36. Simplified overview
of a freeze Vernier delay line architecture is shown in Fig.3.37. In this design,
only four thermal codes are generated at every cycle, and, hence, in the worst case,
the measurement cycle is repeated eight times, which is equivalent to 32-bit ther-
mal code with only four Vernier delay elements. The 4-bit thermal codes are con-
verted into 4 pulses with thermal-to-clock generator, and clock a 5-bit counter at
the output of TDC. For each thermal bit generated in a freeze Vernier delay line,
a corresponding pulse is generated using pulse generator. The distance between
two pulses is controlled with current-starved inverters. For rising edge input, the
circuit generates a pulse. The width of the pulse is determined by the NAND gate,
inverter and the buffer. The enable signal, which decides if either signals start/stop
or v1_start/v1_stop continue into the next cycle, is generated using signals vt4 and vp4.
64 3 Neural Signal Quantization Circuits
Q[4:0]
5bit counter
Thermal bit to
1 clock converter
start 0
vst vt4
Freeze Vernier
0
4-bit thermal code
stop vsp vp4
1
v1_stop
Pulse Gen
isolation isolation
inverters inverters
vsp vp4
In the first conversion cycle, enable=0 and vstart/vstop is selected for measure-
ment, otherwise v1_start/v1_stop is selected. The enable signal is switched from 1
to 0 when the rising edge of vstart/v1_start crosses the rising edge of vstop/v1_stop.
This particular feature dynamically decides when the conversion is stopped, hence,
power/conversion is optimized based on the input. The TDC also offers a feed-
back to the system with a ready signal (inverted signal of enable), indicating that
it is ready for next conversion. The 4-bit thermal code is generated with freeze
Vernier architecture [124]. In the conventional Vernier architecture, time capture
elements or early-late detectors (e.g., a D-register or an arbiter) impose the large
load on the circuit. In the freeze Vernier TDC, the time capturing is instead per-
formed by freezing the node voltages of the start line in a linear Vernier delay line,
allowing a power- and area-efficient conversion. The freeze Vernier converter con-
sists of inverters and current enabled inverters only. Additionally, the circuit does
not require any reset signalit resets on the falling edge of the stop and the start
3.6 Time-Domain Two-Step A/D Conversion 65
signal. The delays of the inverters in the freeze Vernier delay elements are con-
trolled using bias current, thus, controlling the resolution of the TDC.
3.7Experimental Results
Fig.3.38Spectral signature 0
SNDR=45.6 dB
-45
-60
-75
-90
-105
0 12.5 25 37.5 50
Frequency [kHz]
55
50
45
40
1 2 3 4 5 6 7 8
Gain
66 3 Neural Signal Quantization Circuits
Fig.3.40SFDR, SNDR, 65
SFDR
and THD versus sampling
SNDR
frequency with fin=10kHz 60 THD
50
45
40
10 20 30 40 50 60 70 80 90 100
Sampling frequency [kHz]
Fig.3.41Spectral signature 0
of the current-domain SAR -20 f =18.9 kHz
in
A/D converter f =40 kS/s
S
-40 SFDR=64.7 dB
Power [dBFS]
SNDR=58.3 dB
-60
-80
-100
-120
-140
0 5 10 15 20
Frequency [kHz]
70 THD
65
60
55
2 4 6 8 10 12 14 16 18 20
Input frequency [kHz]
Fig.3.43SFDR, SNDR, 75
SFDR
and THD versus sampling
SNDR
frequency with fin=1kHz THD
SFDR, THD, SNDR [dB]
70
65
60
55
5 10 15 20 25 30 35 40
Sample frequency [kHz]
voltage-to-time converter. SNDR, SFDR and THD versus sampling, and input fre-
quency is illustrated in Figs.3.45 and 3.46, respectively. The THD in the range
of 40640 kS/s is above 63dB within the bandwidth of neural activity of up to
20kHz; SNDR is above 58dB, and SFDR more than 64dB. The maximum sim-
ulated DNL is 0.6 LSB and the maximum simulated INL is 0.8 LSB. Variation
68 3 Neural Signal Quantization Circuits
Fig.3.44Spectral signature
of the time-domain A/D ILQ N+]
converter IV N6V
6)'5 G%
3RZHU>G%)6@
61'5 G%
)UHTXHQF\>N+]@
Fig.3.45SFDR, SNDR,
6)'5
and THD versus sampling
61'5
frequency with fin=20kHz
6)'57+'61'5>G%@
7+'
and gain set to 18dB
6DPSOLQJIUHTXHQF\>N+]@
7+'
set to 18dB
,QSXWIUHTXHQF\>N+]@
across slow-slow and fast-fast corner is 0.35 ENOB. The VTC is >9 bit linear
across 0.5V input range.
Consequently, ramp rate variation across the input range is limited to 10%,
leading to 400V nonlinear voltage variation across the output range. The refer-
ence clock frequency is 80MHz, and, subsequently, the counter realizes a 5 bit
resolution over the 400ns TDC input time signal range. The ramp repetition fre-
quency, i.e., sampling frequency of the proposed ADC, is 640kHz. The simulated
3.7 Experimental Results 69
ENOB is 9.4 bits over the entire neural spikes input bandwidth. The total A/D
converter consumes 2.7W, when sampled at 640 kS/s, and 1.6W at 40 kS/s,
respectively. The area of the folding Vernier TDC design sums up to 10.5m2,
the average resolution is 10.05ps, it operates at a power supply of 0.4V, and con-
sumes 0.6W of power at 640 kS/s sampling rate. Table3.1 summarize the per-
formance, while Table3.2 show comparison with previous art.
3.8Conclusions
converter consumes less than 2.7W of power when operating at 640 kS/s sam-
pling frequency. With 6.2 fJ/conversion-step, the circuit realized in 90nm CMOS
technology exhibits one of the best FoM reported, and occupies an estimated area
of only 0.022mm2.
References
19. H. van der Ploeg, G. Hoogzaad, H.A.H. Termeer, M. Vertregt, R.L.J. Roovers, A 2.5-V 12-b
54-Msample/s 0.25-m CMOS ADC in 1-mm2 with mixed-signal chopping and calibration.
IEEE J. Solid-State Circuits 36(12), 18591867 (2001)
20. M. Clara, A. Wiesbauer, F. Kuttner, A 1.8V fully embedded 10 b 160 MS/s two-step ADC
in 0.18m CMOS, in Proceedings of IEEE Custom Integrated Circuit Conference, pp.
437440, 2002
21. T.-C. Lin, J.-C. Wu, A two-step A/D converter in digital CMOS processes, in Proceedings of
IEEE Asia-Pacific Conference on ASIC, pp. 177180, 2002
22. A. Zjajo, H. van der Ploeg, M. Vertregt, A 1.8V 100mW 12-bits 80Msample/s two-
step ADC in 0.18-m CMOS, in Proceedings of IEEE European Solid-State Circuits
Conference, pp. 241244, 2003
23. N. Ning, F. Long, S.-Y. Wu, Y. Liu, G.-Q. Liu, Q. Yu, M.-H. Yang, An 8-Bit 250MSPS mod-
ified two-step ADC, in Proceedings of IEEE International Conference on Communications,
Circuits and Systems, pp. 21972200, 2006
24. S. Hashemi, B. Razavi, A 7.1 mW 1 GS/s ADC with 48dB SNDR at Nyquist rate. IEEE J.
Solid-State Circuits 49(8), 17391750 (2014)
25. A. Wiesbauer, M. Clara, M. Harteneck, T. Potscher, C. Fleischhacker, G. Koder, C. Sandner,
A fully integrated analog front-end macro for cable modem applications in 0.18-m CMOS,
in Proceedings of IEEE European Solid-State Circuits Conference, pp. 245248, 2001
26. R.C. Taft, M.R. Tursi, A 100-MS/s 8-b CMOS subranging ADC with sustained parametric
performance from 3.8V down to 2.2 V. IEEE J. Solid-State Circuits 36(3), 331338 (2001)
27. J. Mulder, C.M. Ward, C.-H. Lin, D. Kruse, J.R. Westra, M. Lughtart, E. Arslan, R.J. van de
Plassche, K. Bult, F.M.L. van der Goes, A 21-mW 8-b 125-MSample/s ADC in 0.09-mm2
0.13-m CMOS. IEEE J. Solid-State Circuits 39(5), 21162125 (2004)
28. P.M. Figueiredo, P. Cardoso, A. Lopes, C. Fachada, N. Hamanishi, K. Tanabe, J. Vital, A
90nm CMOS 1.2V 6b 1GS/s two-step subranging ADC, in IEEE International Solid-State
Circuits Conference Digest of Technical Papers, pp. 568569, 2006
29. Y. Shimizu, S. Murayama, K. Kudoh, H. Yatsuda, A 30mW 12b 40MS/s subranging ADC
with a high-gain offset-canceling positive-feedback amplifier in 90nm digital CMOS, in
IEEE International Solid-State Circuits Conference Digest of Technical Papers, pp. 216
217, 2006
30. J. Huber, R.J. Chandler, A.A. Abidi, A 10b 160MS/s 84mW 1V subranging ADC in 90nm
CMOS, in IEEE International Solid-State Circuits Conference Digest of Technical Papers,
pp. 454455, 2007
31. C. Cheng, Y. Jiren, A 10-bit 500-MS/s 124-mW subranging folding ADC in 0.13m CMOS
in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 17091712,
2007
32. Y. Shimizu, S. Murayama, K. Kudoh, H. Yatsuda, A split-load interpolation-amplifier-array
300MS/s 8b subranging ADC in 90nm CMOS, in IEEE International Solid-State Circuits
Conference Digest of Technical Papers, pp. 552553, 2008
33. K. Yoshioka etal., Dynamic architecture and frequency scaling in 0.8-1.2 GS/s 7b subrang-
ing ADC. IEEE J. Solid-State Circuits 50(4), 932945 (2015)
34. D.A. Mercer, A 14-b, 2.5 MSPS pipelined ADC with on-chip EPROM. IEEE J. Solid-State
Circuits 31(1), 7076 (1996)
35. I. Opris, L. Lewicki, B. Wong, A single-ended 12-bit 20 MSample/s self-calibrating pipeline
A/D converter. IEEE J. Solid-State Circuits 33(11), 18981903 (1998)
36. A.M. Abo, P.R. Gray, A 1.5-V, 10-bit, 14.3-MS/s CMOS pipeline analog-to-digital con-
verter. IEEE J. Solid-State Circuits 34(5), 599606 (1999)
37. H.-S. Chen, K. Bacrania, B.-S. Song, A 14b 20MSample/s CMOS pipelined ADC, in IEEE
International Solid-State Circuits Conference Digest of Technical Papers, pp. 4647, 2000
38. I. Mehr, L. Singer, A 55-mW, 10-bit, 40-Msample/s Nyquist-rate CMOS ADC. IEEE J.
Solid-State Circuits 35(3), 7076 (2000)
39. Y. Chiu, Inherently linear capacitor error-averaging techniques for pipelined A/D conver-
sion, in IEEE Transaction on Circuits and SystemsII, vol. 47, pp. 229232, 2000
72 3 Neural Signal Quantization Circuits
40. X. Wang, P.J. Hurst, S.H. Lewis, A 12-bit 20-Msample/s pipelined analog-to-digital converter
with nested digital background calibration. IEEE J. Solid-State Circuits 39(11), 17991808
(2004)
41. D. Kurose, T. Ito, T. Ueno, T. Yamaji, T. Itakura, 55-mW 200-MSPS 10-bit pipeline ADCs
for wireless receivers, in Proceedings of IEEE European Solid-State Circuits Conference,
pp. 527530, 2005
42. C.T. Peach, A. Ravi, R. Bishop, K. Soumyanath, D.J. Allstot, A 9-b 400 Msample/s pipe-
lined analog-to-digital converter in 90nm CMOS, in Proceedings of IEEE European Solid-
State Circuits Conference, pp. 535538, 2005
43. A.M.A. Ali, C. Dillon, R. Sneed, A.S. Morgan, S. Bardsley, J. Kornblum, L. Wu, A 14-bit
125 MS/s IF/RF sampling pipelined ADC with 100dB SFDR and 50fs Jitter. IEEE J. Solid-
State Circuits 41(8), 18461855 (2006)
44. M. Daito, H. Matsui, M. Ueda, K. Iizuka, A 14-bit 20-MS/s pipelined ADC with digital dis-
tortion calibration. IEEE J. Solid-State Circuits 41(11), 24172423 (2006)
45. T. Ito, D. Kurose, T. Ueno, T. Yamaji, T. Itakura, 55-mW 1.2-V 12-bit 100-MSPS pipeline
ADCs for wireless receivers, Proceedings of IEEE European Solid-State Circuits Conference,
pp. 540543, 2006
46. J. Treichler, Q. Huang, T. Burger, A 10-bit ENOB 50-MS/s pipeline ADC in 130-nm CMOS at
1.2V supply, in Proceedings of IEEE European Solid-State Circuits Conference, pp. 552555,
2006
47. I. Ahmed, D.A. Johns, An 11-bit 45MS/s pipelined ADC with rapid calibration of DAC
errors in a multi-bit pipeline stage, in Proceedings of IEEE European Solid-State Circuits
Conference, pp. 147150, 2007
48. S.-C. Lee, Y.-D. Jeon, J.-K. Kwon, J. Kim, A 10-bit 205-MS/s 1.0-mm2 90-nm CMOS pipe-
line ADC for flat panel display applications. IEEE J. Solid-State Circuits 42(12), 26882695
(2007)
49. J. Li, R. Leboeuf, M. Courcy, G. Manganaro, A 1.8V 10b 210MS/s CMOS pipelined ADC
featuring 86dB SFDR without calibration, in Proceedings of IEEE Custom Integrated
Circuits Conference, pp. 317320, 2007
50. M. Boulemnakher, E. Andre, J. Roux, F. Paillardet, A 1.2V 4.5mW 10b 100MS/s pipeline
ADC in a 65nm CMOS, in IEEE International Solid-State Circuits Conference Digest of
Technical Papers, pp. 250251, 2008
51. Y.-S. Shu, B.-S. Song, A 15-bit linear 20-MS/s pipelined ADC digitally calibrated with sig-
nal-dependent dithering. IEEE J. Solid-State Circuits 43(2), 342350 (2008)
52. J. Shen, P.R. Kinget, A 0.5-V 8-bit 10-Ms/s pipelined ADC in 90-nm CMOS. IEEE J. Solid-
State Circuits 43(4), 17991808 (2008)
53. C.-J. Tseng, Y.-C. Hsieh, C.-H. Yang, H.-S. Chen, A 10-bit 200 MS/s capacitor-sharing pipe-
line ADC. IEEE Trans. Circuits Syst.-I: Regul. Pap. 60(11), 29022910 (2013)
54. R. Sehgal, F. van der Goes, K. Bult, A 12 b 53 mW 195 MS/s pipeline ADC with 82dB
SFDR using split-ADC calibration. IEEE J. Solid-State Circuits 50(7), 15921603 (2015)
55. L. Yong, M.P. Flynn, A 100 MS/s 10.5 bit 2.46 mW comparator-less pipeline ADC using
self-biased ring amplifiers. IEEE J. Solid-State Circuits 50(10), 23312341 (2015)
56. S.H. Lewis, H.S. Fetterman, G.F. Gross, R. Ramachandran, T.R. Viswanathan, A 10-b
20-Msample/s analog-to-digital converter, in IEEE Journal of Solid-State Circuits, vol. 27,
no. 3, pp. 351358, 1992
57. B. Xia, A. Valdes-Garcia, E. Sanchez-Sinencio, A configurable time-interleaved pipeline
ADC for multi-standard wireless receivers, in Proceedings of IEEE European Solid-State
Circuits Conference, pp. 259262, 2004
58. S.-C. Lee, G.-H. Kim, J.-K. Kwon, J. Kim, S.-H. Lee, Offset and dynamic gain-mismatch
reduction techniques for 10b 200Ms/s parallel pipeline ADCs, in Proceedings of IEEE
European Solid-State Circuits Conference, pp. 531534, 2005
59. S. Limotyrakis, S.D. Kulchycki, D.K. Su, B.A. Wooley, A 150-MS/s 8-b 71-mW CMOS
time-interleaved ADC. IEEE J. Solid-State Circuits 40(5), 10571067 (2005)
References 73
60. C.-C. Hsu, F.-C. Huang, C.-Y. Shih, C.-C. Huang, Y.-H. Lin, C.-C. Lee, B. Razavi, An 11b
800MS/s time-interleaved ADC with digital background calibration, in IEEE International
Solid-State Circuits Conference Digest of Technical Papers, pp. 464465, 2007
61. Z.-M. Lee, C.-Y. Wang, J.-T. Wu, A CMOS 15-bit 125-MS/s time-interleaved ADC with
digital background calibration. IEEE J. Solid-State Circuits 42(10), 21492160 (2007)
62. C.-Y. Chen etal., A 12-bit 3 GS/s pipeline ADC with 0.4mm2 and 500 mW in 40nm digital
CMOS. IEEE J. Solid-State Circuits 47(4), 10131021 (2012)
63. J. Park, H.-J. Park, J.-W. Kim, S. Seo, P. Chung, A 1 mW 10-bit 500 kSps SAR A/D converter,
in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 581584, 2000
64. P. Confalonleri et. al., A 2.7 mW 1 MSps 10 b analog-to-digital converter with built-in
reference buffer and 1 LSB accuracy programmable input ranges, in Proceedings of IEEE
European Solid-State Circuits Conference, pp. 255258, 2004
65. N. Verma, A.P. Chandrakasan, An ultra low energy 12-bit rate-resolution scalable SAR ADC
for wireless sensor nodes. IEEE J. Solid-State Circuits 42(6), 11961205 (2007)
66. C.-C. Liu etal., A 10-bit 50-MS/ SAR ADC with a monotonic capacitor switching proce-
dure. IEEE J. Solid-State Circuits 45(4), 731740 (2010)
67. S. Shikata, R. Sekimoto, T. Kuroda, H. Ishikuro, A 0.5V 1.1 MS/sec 6.3 fJ/conversion-step
SAR-ADC with tri-level comparator in 40nm CMOS. IEEE J. Solid-State Circuits 47(4),
10221030 (2012)
68. Z. Dai, A. Bhide, A. Alvandpour, A 53-nW 9.1-ENOB 1-kS/s SAR ADC in 0.13-m CMOS
for medical implant devices. IEEE J. Solid-State Circuits 47(7), 15851593 (2012)
69. G.-Y. Huang etal., A 1-W 10-bit 200-kS/s SAR ADC with a bypass window for biomedical
applications. IEEE J. Solid-State Circuits 47(11), 27832795 (2012)
70. M. Yip, A.P. Chandrakasan, A resolution-reconfigurable 5-to-10-bit 0.4-to-1V power scalable
SAR ADC for sensor applications. IEEE J. Solid-State Circuits 48(6), 14531464 (2013)
71. P. Harpe, E. Cantatore, A. van Roermund, A 10b/12b 40 kS/s SAR ADC with data-driven
noise reduction achieving up to 10.1b ENOB at 2.2 fJ/conversion-step. IEEE J. Solid-State
Circuits 48(12), 30113018 (2013)
72. F.M. Yaul, A.P. Chandrakasan, A 10b SAR ADC with data-dependent energy reduction using
LSB-first successive approximation. IEEE J. Solid-State Circuits 49(12), 28252834 (2014)
73. J.-H. Tsai etal., A 0.003mm2 10 b 240 MS/s 0.7 mW SAR ADC in 28nm CMOS with dig-
ital error correction and correlated-reversed switching. IEEE J. Solid-State Circuits 50(6),
13821398 (2015)
74. B.-S. Song, M.F. Tompsett, K.R. Lakshmikumar, A 12 bit 1MHz capacitor error averaging
pipelined A/D converter. IEEE J. Solid-State Circuits 23(10), 13241333 (1988)
75. Y.-M. Lin, B. Kim, P.R. Gray, A 13-b 2.5-MHz self-calibrated pipelined A/D converter in
3-m CMOS. IEEE J. Solid-State Circuits 26(5), 628635 (1991)
76. C.S.G. Conroy, D.W. Cline, P.R. Gray, A high-speed parallel pipelined ADC technique in
CMOS, Proceedings of IEEE Symposium on VLSI Circuits, pp. 9697, 1992
77. B.-S. Song, M.F. Tompsett, K.R. Lakshmikumar, A 12 bit 1MHz capacitor error averaging
pipelined A/D. IEEE J. Solid-State Circuits 23(10), 13241333 (1988)
78. J.M. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits: A Design
Perspective, 2nd edn. (Prentice Hall, New Jersey, 2003)
79. A.A. Abidi, High-frequency noise measurements on FETs with small dimensions. IEEE
Trans. Electron Devices 33(11), 18011805 (1986)
80. C. Enz, Y. Cheng, MOS transistor modeling for RF IC design. IEEE J. Solid-State Circuits
35(2), 186201 (2000)
81. A.M. Abo, P.R. Gray, A 1.5-V, 10-bit, 14.3-MS/s CMOS pipeline analog-to-digital converter.
IEEE J. Solid-State Circuits 34(5), 599606 (1999)
82. B.J. Hosticka, Improvement of the gain of MOS amplifiers. IEEE J. Solid-State Circuits
14(6), 11111114 (1979)
83. E. Sckinger, W. Guggenbhl, A High-swing, high-impedance MOS cascode circuit. IEEE
J. Solid-State Circuits 25(1), 289297 (1990)
74 3 Neural Signal Quantization Circuits
84. U. Gatti, F. Maloberti, G. Torelli, A novel CMOS linear transconductance cell for continuous-
time filters, in Proceedings of IEEE International Symposium on Circuits and Systems,
pp. 11731176, 1990
85. C.A. Laber, P.R. Gray, A positive-feedback transconductance amplifier with applications to
high frequency high Q CMOS switched capacitor filters. IEEE J. Solid-State Circuits 13(6),
13701378 (1988)
86. A.A. Abidi, An analysis of bootstrapped gain enhancement techniques. IEEE J. Solid-State
Circuits 22(6), 12001204 (1987)
87. B.J. Hosticka, Dynamic CMOS amplifiers. IEEE J. Solid-State Circuits 15(5), 881886
(1980)
88. K. Bult, G. Geelen, A fast-settling CMOS op amp for SC circuits with 90-dB DC gain.
IEEE J. Solid-State Circuits 25(6), 13791384 (1990)
89. R. Ockey, M. Syrzycki, Optimization of a latched comparator for high-speed analog-to-digital
converters, in IEEE Canadian Conference on Electrical and Computer Engineering, vol. 1,
pp. 403408, 1999
90. F. Murden, R. Gosser, 12b 50MSample/s two-stage A/D converter, in IEEE International
Solid-State Circuits Conference Digest of Technical Papers, pp. 278279, 1995
91. J. Robert, G.C. Temes, V. Valencic, R. Dessoulavy, D. Philippe, A 16-bit low-voltage CMOS
A/D converter. IEEE J. Solid-State Circuits 22(2), 157263 (1987)
92. T.B. Cho, P.R. Gray, A 10 b, 20 Msample/s, 35 mW pipeline A/D converter. IEEE J. Solid-
State Circuits 30(3), 166172 (1995)
93. L. Sumanen, M. Waltari, K. Halonen, A mismatch insensitive CMOS dynamic comparator
for pipeline A/D converters, in Proceedings of the IEEE International Conference on Circuits
and Systems, pp. 3235, 2000
94. T. Kobayashi, K. Nogami, T. Shirotori, Y. Fujimoto, A current-controlled latch sense ampli-
fier and a static power-saving input buffer for low-power architecture. IEEE J. Solid-State
Circuits 28(4), 523527 (1993)
95. P.M. Figueiredo, J.C. Vital, Low kickback noise techniques for CMOS latched comparators,
in IEEE International Symposium on Circuits and Systems, vol. 1, pp. 537540, 2004
96. B. Nauta, A.G.W. Venes, A 70-MS/s 110-mW 8-b CMOS folding and interpolating A/D
converter. IEEE J. Solid-State Circuits 30(12), 13021308 (1995)
97. J. Lin, B. Haroun, An embedded 0.8V/480W 6b/22MHz flash ADC in 0.13m digital
CMOS Process using nonlinear double-interpolation technique, in IEEE International Solid-
State Circuits Conference Digest of Technical Papers, pp. 244246, 2002
98. F. Shahrokhi etal., The 128-channel fully differential digital integrated neural recording and
stimulation interface. IEEE Trans. Biomed. Circuits Syst. 4(3), 149161 (2010)
99. H. Gao etal., HermesE: a 96-channel full data rate direct neural interface in 0.13um CMOS.
IEEE J. Solid-State Circuits 47(4), 10431055 (2012)
100. D. Han etal., A 0.45V 100-channel neural-recording IC with sub-W/channel comsumption
in 0.18m CMOS. IEEE Trans. Biomed. Circuits Syst. 7(6), 735746 (2013)
101. M.S. Chae, W. Liu, M. Sivaprakasham, Design optimization for integrated neural recording
systems. IEEE J. Solid-State Circuits 43(9), 19311939 (2008)
102. T.M. Seese, H. Harasaki, G.M. Saidel, C.R. Davies, Characterization of tissue morphology,
angiogenesis, and temperature in the adaptive response of muscle tissue to chronic heating.
Lab. Invest. 78(12), 15531562 (1998)
103. A. Rodrguez-Prez etal., A 64-channel inductively-powered neural recording sensor array,
in Proceedings of IEEE Biomedical Circuits and Systems Conference, pp. 228231, 2012
104. C. Enz, Y. Cheng, MOS transistor modeling for RF IC design. IEEE J. Solid-State Circuits
35(2), 186201 (2000)
105. S. Song etal., A 430nW 64nV/VHz current-reuse telescopic amplifier for neural recording
application, in Proceedings of IEEE Biomedical Circuits and Systems Conference, pp. 322325,
2013
106. X. Zou etal., A 100-channel 1-mW implantable neural recording IC. IEEE Trans. Circuits
Syst. I Regul. Pap. 60(10), 25842596 (2013)
References 75
107. J. Lee, H.-G. Rhew, D.R. Kipke, M.P. Flynn, A 64 channel programmable closed-loop neu-
rostimulator with 8 channel neural amplifier and logarithmic ADC. IEEE J. Solid-State
Circuits 45(9), 19351945 (2010)
108. K. Abdelhalim, R. Genov, CMOS DAC-sharing stimulator for neural recording and stimula-
tion arrays, in Proceedings of IEEE International Symposium on Circuits and Systems, pp.
17121715, 2011
109. A. Rossi, G. Fucilli, Nonredundant successive approximation register for A/D converters.
Electronic Lett. 32(12), 10551056 (1996)
110. S. Narendra, V. De, S. Borkar, D.A. Antoniadis, A.P. Chandrakasan, Full-chip subthreshold
leakage power prediction and reduction techniques for sub-0.18-m CMOS. IEEE J. Solid-
State Circuits 39(2), 501510 (2004)
111. B. Haaheim, T.G. Constandinou, A sub-1W, 16kHz Current-mode SAR-ADC for single-
neuron spike recording, in Proceedings of IEEE Biomedical Circuits and Systems Conference,
pp. 29572960, 2012
112. A. Agarwal, Y.B. Kim, S. Sonkusale, Low power current mode ADC for CMOS sensor IC,
in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 584587,
2005
113. R. Dlugosz, K. Iniewski, Ultra low power current-mode algorithmic analog-to-digital
converter implemented in 0.18m CMOS technology for wireless sensor network, in
Proceedings of IEEE International Conference on Mixed Design of Integrated Circuits and
Systems, pp. 401406, 2006
114. S. Al-Ahdab, R. Lotfi, W. Serdijn, A 1-V 225-nW 1kS/s current successive approximation
ADC for pacemakers, in Proceedings of IEEE International Conference on Ph.D. Research
in Microelectronics and Electronics, pp. 14, 2010
115. Y. Sugimoto, A 1.5-V current-mode CMOS sample-and-hold IC with 57-dB S/N at 20 MS/s
and 54-dB S/N at 30 MS/s. IEEE J. Solid-State Circuits 36(4), 696700 (2001)
116. B. Linares-Barranco, T. Serrano-Gotarredona, On the design and characterization of femto-
ampere current-mode circuits. IEEE J. Solid-State Circuits 38(8), 13531363 (2003)
117. E. Allier etal., 120nm low power asynchronous ADC, in Proceedings of IEEE
International Symposium on Low Power Electronic Design, pp. 6065, 2005
118. M. Park, M.H. Perrot, A single-slope 80MS/s ADC using two-step time-to-digital conversion,
in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 11251128,
2009
119. S. Naraghi, M. Courcy, M.P. Flynn, A 9-bit, 14W and 0.006mm2 pulse position modulation
ADC in 90nm digital CMOS. IEEE J. Solid-State Circuits 45(9), 18701880 (2010)
120. A.P. Chandrakasan etal., Technologies for ultradynamic voltage scaling. Proc. IEEE 98(2),
191214 (2010)
121. J.K. Fiorenza etal., Comparator-based switched-capacitor circuits for scaled CMOS tech-
nologies. IEEE J. Solid-State Circuits 41(12), 26582668 (2006)
122. J.P. Jansson, A. Mantyniemi, J. Kostamovaara, A CMOS time-to-digital converter with better
than 10ps single-shot precision. IEEE J. Solid-State Circuits 41(6), 12861296 (2006)
123. L. Brooks, H.-S. Lee, A 12b, 50 MS/s, fully differential zero-crosssing based pipelined
ADC. IEEE J. Solid-State Circuits 44(12), 33293343 (2009)
124. K. Blutman, J. Angevare, A. Zjajo, N. van der Meijs, A 0.1pJ freeze Vernier time-to-digital
converter in 65nm CMOS, in Proceedings of IEEE International Symposium on Circuits
and Systems, pp. 8588, 2014
125. R.H. Walden, Analog-to-digital converter survey and analysis. IEEE J. Sel. Areas Commun.
17, 539550 (1999)
126. C.M. Lopez etal., An implantable 455-active-electrode 52-channel CMOS neural probe.
IEEE J. Solid-State Circuits 49(1), 248261 (2014)
127. T. Rabuske etal., A self-calibrated 10-bit 1MSps SAR ADC with reduced-voltage charge-
sharing DAC, in Proceedings of IEEE International Symposium on Circuits and Systems,
pp. 24522455, 2013
76 3 Neural Signal Quantization Circuits
128. C. Gao etal., An ultra-low-power extended counting ADC for large scale sensor arrays, in
Proceedings of IEEE International Symposium on Circuits and Systems, pp. 8184, 2014
129. L. Zheng etal., An adaptive 16/64kHz, 9-bit SAR ADC with peak-aligned sampling
for neural spike recording, in IEEE International Symposium on Circuits and Systems,
pp. 23852388, 2014
130. Y.-W. Cheng, K.T. Tang, A 0.5-V 1.28-MS/s 10-bit SAR ADC with switching detect logic,
in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 293296,
2015
Chapter 4
Neural Signal Classification Circuits
4.1Introduction
an electrode records the action potentials from multiple surrounding neurons (e.g.,
due to the background activity of other neurons, slight perturbations in electrode
position or external electrical or mechanical interference, etc.), and the recorded
waveform/spikes consist of the superimposed potentials fired from these neurons.
The ability to distinguish spikes from noise [4], and to distinguish spikes from
different sources from the superimposed waveform, therefore depends on both
the discrepancies between the noise-free spikes from each source and the signal-
to-noise level (SNR) in the recording system. The time occurrences of the action
potentials emitted by the neurons close to the electrode are detected, depending on
the SNR, either by voltage thresholding with respect to an estimation of the noise
amplitude in the signal or with a more advanced technique, such as continuous
wavelet transform [5]. After the waveform alignment, to simplify the classifica-
tion process, a feature extraction step, such as principal component analysis (PCA)
[6] or wavelet decomposition [7] characterizes detected spikes and represents each
detected spike in a reduced-dimensional space, i.e., for a spike consisting of n
sample points, the feature extraction method produces m variables (m<n), where
m is the number of features. Based on these features the spikes are classified into
m-dimensional clusters by k-means [8], expectation maximization (EM) [9], tem-
plate matching [10], Bayesian clustering [11], and artificial neural network (ANN)
with each cluster corresponding to the spiking activity of a single neuron.
The support vector machine (SVM) has been introduced to bioinformatics
and spike classification/sorting [1214] because of its excellent generalization,
sparse solution, and concurrent utilization of quadratic programming, which pro-
vides global optimization. This absence of local minima is a substantial difference
from the artificial neural network classifiers. Like ANN classifiers, applications
of SVMs to any classification problem require the determination of several user-
defined parameters, e.g., choice of an appropriate kernel and related parameters,
determination of regularization parameter (i.e., C) and an appropriate optimization
technique. Correspondingly, SVM applies the structure risk minimization instead
of the empirical risk minimization and solves the problems of nonlinear, dimen-
sionality curse efficiently. However, the methods [1214] could not identify multi-
class neural spikes nor could they decompose overlapping neural spikes resulting
from variable triggering of data collection (e.g., due to noise or other spike events
leading to premature or delayed waveform). Recording multiple spikes on a spe-
cific electrode can also create complex sums of neuron waveforms [15].
In this chapter, we present a 128-channel, programmable, neural spike classifier
based on nonlinear energy operator spike detection, and multiclass kernel support
vector machine classification that is able to accurately identify overlapping neural
spikes even for low SNR. For efficient algorithm execution, we transform the mul-
ticlass problem with the Keslers construction and extend iterative greedy optimi-
zation reduced set vectors approach with a cascaded method. The power-efficient,
multichannel clustering is achieved by a combination of the several algorithm and
circuit techniques, namely, the Keslers transformation, a boosted cascade reduced
set vectors approach, a two-stage pipeline processing units, the power-scalable
kernels, the register-bank memory, high-VT devices, and a near-threshold supply.
4.1Introduction 79
The results obtained in a 65nm CMOS technology show that an efficient, large-
scale neural spike data classification can be obtained with a low power (less than
41W, corresponding to a 15.5W/mm2 of power density), compact, and a low
resource usage structure (31k logic gates resulting in a 2.64mm2 area).
This chapter is organized as follows: Sect.4.2 focuses on the neural spike clas-
sifier and associated design decisions. In Sect.4.3, SVM training and classification
are described, and iterative greedy optimization reduced set vectors approach is
extended with a cascaded method boosted cascade. Section4.4 elaborates experi-
mental results. Finally, Sect.4.5 provides a summary and the main conclusions.
4.2Spike Detector
neural feature
spike detection classification sorting
signals extraction
(ex: threshold) (ex: K-means) results
(from ADC) (ex: PCA)
training
previous art required
on-chip
proposed system implementation training
recording low noise band-pass filter programmable N:1 mux A/D converter
gain amplifier required
electrode amplifier
neural energy-filter- max-min multiclass
based spike feature SVM sorting
signals
detection extraction classification results
(from ADC)
time-multiplexed
neural samples
SRAM for
control unit
16-channel X[n-2] X[n-1] X[n] X[n+1] X[n+2] sorting
max-min multiclass
input neural results
feature SVM
configuration signal
extractor classification
spike detection decision unit noise shaping
filter
FSM
energy filter threshold unit
instruction
SRAM
system control unit
data SRAM arbiter ALU
spike detection algorithm programmability and parameter set flexibility. The sys-
tem control unit is loaded with 32 10-bit filter coefficients and a 16-bit threshold
value. The spike detector algorithm calculates the energy function for waveforms
inside a slicing window; when a spike event reaches the threshold, a spike data is
stored and transferred for the alignment process and further feature extraction. The
noise-shaping filter provides the spike waveforms derivatives to identify neurons
kernel signatures (including the positive and negative peaks of the spike derivative
and spike height). The filter coefficients are programmable through the coefficient
register array. Consequently, a variety of noise profiles and spike widths can be
precisely tuned. To attain the marginal phase distortion, we utilized Bessel filter
structure. For real-time, high-signal throughput, all spike-processing operations
including detection, filtering, and feature extraction are performed in parallel.
The SRAM is implemented as the register-bank memory, since it can be scaled
to subthreshold voltages (i.e., to reduce the leakage power). In contrast, the com-
piled SRAM has limited read noise margin, and subsequently, cannot be scaled
below 0.7V.
The register-bank memories are organized as spike registers [16], as shown in
Fig.4.3. Each spike register module consists of 10-bit registers to save the spike
waveforms, and a delay line for clock gating. The decoder enables sequential,
clock-controlled selection of each spike sample S from a spike register. In each
register 1
write decoder
w_en spk_out
spike
clk_enN
register N
addr_w addr_r
10-bit spike register, only 1-bit D flip-flops have an active clock. Accordingly,
such delay-line-based clock-gating arrangement reduces the redundant clock tran-
sitions, and subsequently, allows 10-fold reduction in the clock-switching power
(corresponding to a 32% reduction in the total power consumed by the memory).
4.3Spike Classifier
The support vector machine is a linear classifier in the parameter space; never-
theless, it becomes a nonlinear classifier as a result of the nonlinear mapping of
the space of the input patterns into the high-dimensional feature space. The clas-
sifier operations can be combined to realize variety of multiclass [21] and ensem-
ble classifiers (e.g., classifier trees and adaptive boosting [22]). Instead of creating
many binary classifiers to determine the class labels, we solve a multiclass prob-
lem directly [23] by modifying the binary class objective function and adding and
constraining it for every class. The modified objective function allows simultane-
ous computation of multiclass classification [24]. Let us consider labeled train-
ing spike trains of N data points {yk(i) , xk }k=1,
k=N, i=m
i=1 , where xk is the kth input pattern
from n-dimensional space Rn and y(i)k denotes the output of the ith output unit for
pattern k, i.e., approach very similar to ANN methodology. The m outputs can
encode q=2m different classes. The training procedure of the SVM corresponds
to a convex optimization and amounts to solving a constrained quadratic optimiza-
tion problem (QP); the solution found is, thus, guaranteed to be the unique global
minimum of the objective function. To maximize the margin of y(x), and b are
chosen such that they minimize the nearest integer |||| subject to the optimization
problem formulated as [25]
m N m
(m) 1
min JLS (i , bi , k,i ) = min ||i ||22 + bi2 + C k,i (4.1)
i ,bi ,k ,i 2
i=1 k=1 i=1
term b2/2 is added to the objective. To solve the optimization problem, we use the
KarushKuhnTucker theorem [27]. We add a dual set of variables, one for each
constraint and obtain the Lagrangian of the optimization problem (4.1)
N
(m) (i)
L(m) (i , bi , k,i ; k,i ) = JLS k,i {yk [iT i (xk ) + bi ] 1+k,i } (4.3)
k=1
for k=1,, N and i=1,, m. The offset of the hyperplane from the origin is
determined by the parameter b/||||. The function (.) is a nonlinear function,
which maps the input space into a higher dimensional space. To avoid working
with the high-dimensional map , we instead choose a kernel function by defin-
ing the dot product in Hilbert space
(x)T (xk ) = (x, xk ) (4.5)
enabling us to treat nonlinear problems with principally linear techniques.
Formally, is a symmetric, positive semidefinite Mercer kernel; the only con-
dition required is that the kernel satisfies a general positivity constraint [27].
To allow for mislabeled examples a modified maximum margin technique is
employed [28]. If there exists no hyperplane x + b = 0 that can divide differ-
ent classes, the objective function is penalized with nonzero slack variables i. The
modified maximum margin technique then finds a hyperplane that separates the
training set with a minimal number of errors and the optimization becomes a
trade-off between a large margin and a small error penalty . The maximum mar-
gin hyperplane and consequently the classification task is then only a function of
the support vectors
N
N
max Q1 (k ; (xk , xl )) = k 1/2 yk yl (xk , xl )k l
k
k=1 k,l=1
N
s.t. Rm |0 k C, k = 1, . . . , N, k yk = 0
k=1
(4.6)
where k are weight vectors. The QP optimization task in (4.6) is solved efficiently
using sequential minimal optimization, i.e., by constructing the optimal separating
4.3 Spike Classifier 83
hyperplane for the full dataset [29]. Typically, many k go to zero during optimiza-
tion, and the remaining xk corresponding to those k>0 are called support vectors.
To simplify notation, we assume that all nonsupport vectors have been removed,
so that Nx is now the number of support vectors, and k>0 for all k. The resulting
classification function f(x) in (4.6) has the following expansion:
N
f (x) = sgn k yk (x, xk ) + b (4.7)
k=1
where the support vector machine classifier uses the sign of f(x) to assign a class
label y to the object x [30]. The complexity of the computation of (4.7) scales
with the number of support vectors. To simplify the kernel classifier trained by the
SVM, we approximate an input pattern xkR (using (4.7)), e.g., =k(xk) by
a reduced set vectors ziR, e.g., =k(zk), kR, where the weight vector
kR and the vectors zi determine the reduced kernel expansion. The problem of
finding the reduced kernel expansion can be stated as the optimization task
Nx
2
min || || = min k l (xk xl )
,z ,z
k,l=1
Nz Nz
Nx
(4.8)
+ k l (zk zl ) 2 k l (xk zl )
k,l=1 k=1 l=1
Although is not given explicitly, (4.8) can be computed (and minimized) in terms
of the kernel and carried out over both the zk and k. The reduced set vectors zk and
the coefficients l,k for a classifier fl(x) are solved by iterative greedy optimization [31]
m
fl (x) = sgn l,k (x, zl ) + b, l = 1, . . . , Nz (4.9)
k=1
For a given complexity (i.e., number of reduced set vectors) the classifier provides
the optimal greedy approximation of the full SVM decision boundary; the first one
is the one which, using the objective function (4.8) is closest to the full SVM (4.7)
constrained to using only one reduced set vector.
The transformation from the multiclass SVM problem in (4.1) to the single
class problem is based on the Keslers construction [28, 30]. Resulting SVM clas-
sifier is composed of the set of discriminant functions, which are computed as
fl (x) = (xk x) km ((l, yk ) (l, m)) + bl
(4.10)
k m
(a) (b)
td/8 td/8 td/8 td/8 td/8 td/8 td/8 td/8
training
N neural feature 1st layer
selection and cascade
signals
classifier classifier
sv(x1) sv(x2)sv(x3) sv(x4)sv(x5) sv(x6)sv(x7) sv(x8)
training training
2nd layer
sv(x9) sv(x10) sv(x11) sv(x12)
detection
N neural 3rd layer
signals pre- Result
classification
processing sv(x13) sv(x14)
4th layer
sv(x15)
Since the data xk appears only in the form of dot products in the dual form, we
can construct the dot product (xk, zl) using the Kronecker delta, i.e., (k, l)=1 for
k=l, and (k, l)=0 for kl and map it to a reproducing kernel Hilbert space
such that the dot product obtains the same value as the function . This property
allows us to configure the SVM classifier via various energy-scalable kernels [32]
for finding nonlinear classifiers. For (.,.) one typically has the following choices:
T
(x,xk) = xTk x (linear SVM); (x,xk) =(xk x +1)d (polynomial SVM of degree
T
d); (x,xk)=tanh[xk x-] sigmoid SVM); (x,xk)=exp{-||x-xk||2} (radial basis
function (RBF) SVM); (x,xk)=exp{-||x-xk||/(22)} exponential radial basis func-
tion (ERBF) SVM; and (x,xk)=exp{-||x-xk||2/(22)} Gaussian RBF SVM, where
, , , and are positive real constants. The kernels yield increasing levels of
strength (e.g., false alarm for linear kernel of 18 per day decrease to 1.2 per day
for RBF kernel [33]). However, the required power for each kernel (from simula-
tion of the CPU) varies by orders of magnitude.
The complexity of the computation of (4.10) scales with the number of sup-
port vectors. To simplify the kernel classifier trained by the SVM, we extend itera-
tive greedy optimization reduced set vectors approach [31] with boosted cascade
classifier (Fig.4.4). Accordingly, the reduced expansion is not evaluated at once,
but rather in a cascaded way, such that in most cases a very small number of sup-
port vectors are applied. The computation of classification function fl(x) involves
matrixvector operations, which are highly parallelizable. Therefore, the problem
is segmented into smaller ones and parallel units are instantiated for the processing
of each subproblem. Consider a set of reduced set vectors classification functions
where the lth function is an approximation with l vectors, chained into a sequence.
After partition of the data into disjoint subsets, we iteratively train the SVM on
subsets of the original dataset and combine support vectors of resulting models to
create new training sets [34, 35]. A query vector is then evaluated by every func-
tion in the cascade and if classified negative the evaluation stops
fc,l (x) = sgn(f1 (x))sgn(f2 (x)) . . . , (4.12)
4.3 Spike Classifier 85
where fc,l(x) is the cascade evaluation function of (4.10). In other words, we bias
each cascade level in a way that one of the binary decisions is very confident, while
the other is uncertain and propagates the data point to the next, more complex cas-
cade level. Biasing of the functions f is done by setting the parameter b to achieve a
desired accuracy of the function on an evaluation set. When a run through the cas-
cade is completed, we combine the remaining support vectors of the final model
with each subset from the first step of the first run. Frequently, a single pass through
the cascade produces satisfactory accuracy, however, if the global optimum is to be
reached, the result of the last level is fed back into the first level to test its fraction of
the input vectors, i.e., whether any of the input vectors have to be incorporated into
the optimization. If this is not valid for all input layer support vectors, the cascade is
converged to the global optimum, else it proceeds with additional pass through the
network.
The training data (td) in Fig.4.4 are split into subsets, and each one is eval-
uated individually for support vectors in the first layer [36]. Hence, eliminating
nonsupport vectors early from the classification, significantly accelerates SVM
procedure. The scheme requires only modest communication from one layer to
the next, and a satisfactory accuracy is often obtained with a single pass through
the cascade. When passing through the cascade, merged support vectors are used
to test data d for violations of the KarushKuhnTucker (KKT) conditions [37]
(Fig. 4.5a). Violators are then combined with the support vectors for the next
iteration. The required arithmetic over feature vectors (the elementwise operands
as well as SVM model parameters) is executed with, two-stage pipeline (i.e., to
reduce glitch propagation) processing unit (Fig.4.5b). Flip-flops are inserted in
the pipeline to lessen the impact of active glitching [38], and to reduce the leakage
energy.
(a) (b)
d1 d2 sv(xi)[j] xj[j]
SUB
Test KKT Test KKT
Merge Merge
MULT
sv(x1) sv(x2)
F/F
0 b
Merge
ADD/SUB
sv(x3)
F/F F/F
f[j] k(.)
Fig.4.5a A cascade with two input sets, b two-stage pipeline processing unit
86 4 Neural Signal Classification Circuits
4.4Experimental Results
1
Amplitude
0
-1
0.5
0
(c) Detected spikes
1
Amplitude
0
-1
0 1 2 3 4 5 6 7 8 9 10
Time [s]
Fig.4.6Spike detection from continuously acquired data, the y-axis is arbitrary; a top: raw
signal after amplification, not corrected for gain, b middle: threshold (line) crossings of a local
energy measurement with a running window of 1ms, and c bottom: detected spikes
4.4 Experimental Results 87
0.5
Amplitude
0
-0.5
-1
0 400 600 800 1000 1200
Time [ms]
(b) Detected spikes
1
0.5
Amplitude
-0.5
-1
0 400 600 800 1000 1200
Time [ms]
(c)
RBF
SVM =5.12, 2 =1.72 with 3 different classes
5 2
4 3
2 Classifier
3 spike 1
spike 2
2 3 spike 3
1
X2
0
1
-1
3
2
-2
-3
-5 -4 -3 -2 -1 0 1 2 3
X1
Fig.4.7a Spike detection from continuously acquired data, b detected spikes, c the SVM sepa-
ration hypersurface for the RBF kernel ( IEEE 2015)
training classification error, margin of the found hyperplane, and number of ker-
nel evaluations.
To improve the data structure from the numerical point of view, the system
in (4.12) is first preprocessed by reordering the nonzero patterns for bandwidth
reduction (Fig.4.8). Figure4.7c gives a three-class classification graphical illus-
tration, where the bold lines represent decision boundaries. For a correctly classi-
(1) (2)
fied example x1, we have 1 =0 and 1 =0, i.e., no loss counted, since both 1,2
and 1,3 are negative.
On the other hand, for an example x2 that violates two margin bounds
(2,2,2,3>0), both methods generate a loss. The algorithm converges very fast
at first steps and slows down as the optimal solution is approached. However,
88 4 Neural Signal Classification Circuits
0 0
100 100
200 200
300 300
400 400
500 500
600 600
700 700
800 800
0 200 400 600 800 0 200 400 600 800
almost the same classification error rates were obtained for all the parameters
=[102, 5103, 103], indicating that to find good classifier we do not
need the extremely precise solution with 0. The SVM performance is sen-
sitive to hyperparameter settings, e.g., the settings of the complexity param-
eter C and the kernel parameter for the Gaussian kernel. As a consequence,
hyperparameter tuning with grid search approach is performed before the final
model fit. More sophisticated methods for hyperparameter tuning are available
as well [39].
The SVM spike sorting performance has been summarized and benchmarked
(Fig. 4.9) versus four different, relatively computationally efficient meth-
ods for spike sorting, e.g., template matching, principle component analysis,
Mahalanobis, and Euclidean distance. The performance is quantified using the
effective accuracy, e.g., total spikes classified versus spikes correctly classified
(excluding spike detection). The source of spike detection error is either the false
inclusion of a noise segment as a spike waveform or the false omission of spike
waveforms. These errors can be easily modeled by the addition or removal of
spikes at random positions in time, so that the desired percentage of error ratio is
obtained. In contrast, care should be taken in modeling spike classification errors,
since an error in one unit may or may not cause an error in another unit. In all
methods the suitable parameters are selected with which better classification per-
formance is obtained.
The SVM classifier consistently outperforms benchmarked methods over the
entire range of SNRs tested, although it only exceeds the Euclidean distance met-
ric by a slight margin reaching an asymptotic success rate of~97%. The different
SNRs in BMI have been obtained by superimposing attenuated spike waveforms
such as to mimic the background activity observed at the electrode. If we increase
the SNR of the entire front-end brainmachine interface, the spike sorting accu-
racy increases by up to 45% (depending on spike sorting method used).
Similarly, the accuracy of the spike sorting algorithm increases with A/D
converter resolution, although it saturates beyond 56 bit resolution, ultimately
4.4 Experimental Results 89
(a) 100
95
90
85
Accuracy [%]
80
75
70
Mahalanobis
65
PCA
60 SVM
Template Matching
55
Euclidean
50
10 12 14 16 18 20 22 24 26 28 30
SNR [dB]
(b)
100
95
90
85
Accuracy [%]
80
75
70
65 Euclidean
Mahalanobis
60 PCA
SVM
55
Templeate Matching
50
10 12 14 16 18 20 22 24 26 28 30
SNR [dB]
Fig.4.9a Effect of SNR on single spike sorting accuracy of the BMI system, b effect of SNR
on overlapping spikes of three classes on sorting accuracy of the BMI system. ( IEEE 2015)
limited by the SNR. However, since the amplitude of the observed spike signals
can vary, typically, by one order of magnitude, additional resolution is needed
(i.e., 23 bit), if the amplification gain is fixed. Additionally, increasing the sam-
pling rate of A/D converter improves spike sorting accuracy, since this captures
finer features further differentiating the signals. The sorting accuracy of the spike
waveforms, which overlap at different sample points is illustrated in Fig.4.9b.
The correct classification rate of the proposed method is on average 48% larger
than that of other four methods. If the training data contains the spike waveforms
appearing in the process of complex spike bursts, we classify other distorted
spikes generated by the bursting neurons first before resolving the problem of
complex spike bursts partially. The performance of the four other methods is lim-
ited if the distribution of the background noise is non-Gaussian or if the multiple
spike clusters are overlapped.
The estimation error varies with the number of spikes detected (Fig.4.10a),
and it reaches 60dB with normalized distribution at around 700 spikes over
the entire dataset. The convergence period is~0.1s assuming a firing rate at 20
spikes/s from three neurons. The number of support vectors required is partly gov-
erned by the complexity of the classification task. The kernels yield increasing
90 4 Neural Signal Classification Circuits
(a) -10
cluster 1
cluster 2
-25 cluster 3
cluster 4
cluster 5
error [dB]
-40
-55
-70
10 # spikes 100 1000
(b) 1
10
0
10
-1
10
Power [mW]
-2
10 Linear
MLP
-3 Poly
10
RBF
-4
10
-5
10
1 2 3 4
10 10 10 10
# support vectors
(c) 0
10
RBF
Poly
-5
log normalized error
10
-10
10
-15
10
10 100 200 300 400
# support vectors
Fig.4.10a The error versus number of spikes, b energy per cycle versus various SVM kernels,
c log-normalized error in reduced set model order reduction versus number of support vectors
levels of strength; however, the required energy for each kernel varies by orders of
magnitude as illustrated in Fig.4.10b. As the SNR decreases more support vectors
are needed in order to define a more complex decision boundary. For our dataset,
the number of support vectors required is reduced within the range of 300310
(Fig.4.10c). The required cycle count (0.14kcycles) and memory (0.2kB) for lin-
ear kernel versus (4.86kcycles) and (6.7kB) for RBF kernel highlights the mem-
ory usage dependence on the kernels.
The spike detection implementation includes 31k logic gates resulting in a
2.64mm2 area, and consumes only 41W of power from a 0.4V supply voltage.
4.4 Experimental Results 91
4.5Conclusions
The support vector machine has been introduced to bioinformatics and spike
classification/sorting because of its excellent generalization, sparse solution,
and concurrent utilization of quadratic programming. In this chapter, we pro-
pose a programmable neural spike classifier based on multiclass kernel SVM for
128-channel spike sorting system that tracks the evolution of clusters in real time,
and offers high accuracy, has low memory requirements, and low computational
complexity. The implementation results show that the spike classifier operates
online, without compromising on required power and chip area, even in neural
interfaces with a low SNR.
References
1. M.A. Lebedev, M.A.L. Nicolelis, Brain-machine interfaces: Past, present and future. Trends
Neurosci. 29(9), 536546 (2006)
2. G. Buzsaki, Large-scale recording of neuronal ensembles. Nat. Neurosci. 7, 446451 (2004)
3. F.A. Mussa-Ivaldi, L.E. Miller, Brain-machine interfaces: Computational demands and clini-
cal needs meet basic neuroscience. Trends Neurosci. 26(6), 329334 (2003)
4. K.H. Lee, N. Verma, A low-power processor with configurable embedded machine-learning
accelerators for high-order and adaptive analysis of medical-sensor signals. IEEE J. Solid-
State Circuits 48(7), 16251637 (2013)
5. K.H. Kim, S.J. Kim, A wavelet-based method for action potential detection from extracel-
lular neural signal recording with low signal-to-noise ratio. IEEE Trans. Biomed. Eng. 50,
9991011 (2003)
6. D.A. Adamos, E.K. Kosmidis, G. Theophilidis, Performance evaluation of pca-based spike
sorting algorithms. Comput. Methods Programs Biomed. 91, 232244 (2008)
92 4 Neural Signal Classification Circuits
7. R.Q. Quiroga, Z. Nadasdy, Y.B. Shaul, Unsupervised spike detection and sorting with wavelets
and superparamagnetic clustering. Neural Comput. 16, 16611687 (2004)
8. S. Takahashi, Y. Anzai, Y. Sakurai, A new approach to spike sorting for multi-neuronal activities
recorded with a tetrode-how ICA can be practical. Neurosci. Res. 46, 265272 (2003)
9. F. Wood, M. Fellows, J. Donoghue, M. Black, Automatic spike sorting for neural decoding,
in Proceedings of IEEE Conference on Engineering in Medicine and Biological Systems, pp.
40094012, 2004
10. C. Vargas-Irwin, J.P. Donoghue, Automated spike sorting using density grid contour cluster-
ing and subtractive waveform decomposition. J. Neurosci. Methods 164(1), 118 (2007)
11. J. Dai, etal. Experimental study on neuronal spike sorting methods, in IEEE Future
Generation Communication Networks Conference, pp. 230233, 2008
12. R.J. Vogelstein, K. Murari, P.H. Thakur, G. Cauwenberghs, S. Chakrabartty, C. Diehl, Spike
sorting with support vector machines, in Proceedings of Annual International Conference on
IEEE Engineering in Medicine and Biology Society, pp. 546549, 2004
13. K.H. Kim, S.S. Kim, S.J. Kim, Advantage of support vector machine for neural spike train
decoding under spike sorting errors, in Proceedings of Annual International Conference on
IEEE Engineering in Medicine and Biology Society, pp. 52805283, 2005
14. R. Boostani, B. Graimann, M.H. Moradi, G. Pfurtscheller, A comparison approach toward
finding the best feature and classifier in cue-based BCI. Med Biol Eng Comput. 45, 403412
(2007)
15. G. Zouridakis, D.C. Tam, Identification of reliable spike templates in multi-unit extracellular
recordings using fuzzy clustering. Comput. Methods Programs Biomed. 61(2), 9198 (2000)
16. V. Karkare, S. Gibson, D. Markovic, A 75-W, 16-channel neural spike-sorting processor
with unsupervised clustering. IEEE J. Solid-State Circuits 48(9), 22302238 (2013)
17. T.C. Ma, T.C. Chen, L.G. Chen, Design and implementation of a low power spike detection
processor for 128-channel spike sorting microsystem, in IEEE International Conference on
Acoustics, Speech and Signal Processing, pp. 38893892, 2014
18. Z. Jiang, Q. Wang, M. Seok, A low power unsupervised spike sorting accelerator insensitive to
clustering initialization in sub-optimal feature space, in IEEE Design Automation Conference,
pp. 16, 2015
19. K.H. Kim, S.J. Kim, A wavelet-based method for action potential detection from extracel-
lular neural signal recording with low signal-to-noise ratio. IEEE Trans. Biomed. Eng. 50(8),
9991011 (2003)
20. T. Chen, etal., NEUSORT2.0: A multiple-channel neural signal processor with systolic array
buffer and channel-interleaving processing schedule, in International Conference of IEEE
Engineering in Medicine and Biology Society, pp. 50295032, 2008
21. E. Shih, J. Guttag, Reducing energy consumption of multi-channel mobile medical moni-
toring algorithms, in Proceedings of International Workshop on Systems and Networking
Support for Healthcare and Assisted Living Environments, no. 15, pp. 17, 2008
22. R.E. Schapire, A brief introduction to boosting, in Proceedings of International Joint
Conference on Artificial Intelligence, pp. 14011406, 1999
23. B. Schlkopf, A.J. Smola, Learning with kernelssupport vector machines, regularization,
optimization and beyond (The MIT Press, Cambridge, MA, 2002)
24. C.W. Hsu, C.-J. Lin, A comparison of methods for multi-class support vector machines.
IEEE Trans. Neural Networks 13, 415425 (2002)
25. O. Mangasarian, D. Musicant, Successive overrelaxation for support vector machines. IEEE
Trans. Neural Networks 10(5), 10321037 (1999)
26. C.W. Hsu, C.J. Lin, A simple decomposition method for support vector machines. Mach.
Learn. 46, 291314 (2002)
27. V.N. Vapnik, Statistical Learning Theory (Wiley, New York, 1998)
28. V. Franc, V. Hlavac, Multi-class support vector machine, in Proceedings of IEEE
International Conference on Pattern Recognition, vol. 2, pp. 236239, 2002
References 93
29. J. Platt, Fast Training of Support Vector Machines Using Sequential Minimal Optimization,
in Advances in kernel methods: Support vector learning, chapter, Cambridge, MA: The MIT
Press, 1999
30. R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification (Wiley, New York, 2000)
31. B. Scholkopf, P. Knirsch, C. Smola, A. Burges, Fast approximation of support vector ker-
nel expansions, and an interpretation of clustering as approximation in feature spaces, in
Mustererkennung 199820, pp, ed. by P. Levi, M. Schanz, R.J. Ahler, F. May (Springer-
Verlag, Berlin, Germany, 1998), pp. 124132
32. H. Lee, S.Y. Kung, N. Verma, Improving kernel-energy tradeoffs for machine learning in
implantable and wearable biomedical applications, in Proceedings of IEEE International
Conference on Acoustics, Speech, and Signal Processing, pp. 15971600, 2011
33. Available: http://www.physionet.org%2cPhysionet
34. C.J. Burges, Simplified support vector decision rules, in International Conference on
Machine Learning, pp. 7177, 1996
35. S.R.M. Ratsch, T. Vetter, Efficient face detection by a cascaded support vector machine
expansion. Roy Soc London Proc Ser 460, 32833297 (2004)
36. H.P. Graf, etal., Parallel support vector machines: the cascade SVM, in Advances in Neural
Information Processing Systems, pp. 521528, 2004
37. R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification (Wiley, New York, 2000)
38. K.H. Lee, N. Verma, A low-power processor with configurable embedded machine-learning
accelerators for high-order and adaptive analysis of medical-sensor signals. IEEE J. Solid-
State Circuits 48(7), 16251637 (2013)
39. P. Koch, B. Bischl, O. Flasch, T. Bartz-Beilstein, W. Konen, On the tuning and evolution of
support vector kernels. Evol. Intel. 5, 153170 (2012)
Chapter 5
BrainMachine Interface: System
Optimization
5.1Introduction
Neural prosthesis systems enable the interaction with neural cells either by record-
ing, to facilitate early diagnosis and predict intended behavior before undertaking
any preventive or corrective actions, or by stimulation, to prevent the onset of det-
rimental neural activity. Monitoring the activity of a large population of neurons
in neurobiological tissue with high-density microelectrode arrays in multichannel
implantable brainmachine interface (BMI) is a prerequisite for understanding the
cortical structures and can lead to a better conception of stark brain disorders, such
as Alzheimers and Parkinsons diseases, epilepsy and autism [1] or to reestablish
sensory (e.g., hearing and vision) or motor (e.g., movement and speech) functions
[2]. Practical multichannel BMI systems are combined with CMOS electronics for
the length of the random process is infinite or periodic. The use of the Karhunen
Love expansion [21] has generated interest because of its biorthogonal property,
that is, both the deterministic basis functions and the corresponding random coef-
ficients are orthogonal [22], e.g., the orthogonal deterministic basis function and
its magnitude are, respectively, the eigenfunction and eigenvalue of the covari-
ance function. Assuming that pi is a zero-mean Gaussian process and using the
KarhunenLove expansion, pi can be written in truncated form (for practical
implementation) by a finite number of terms M as
M
pi = p,i + p (di ) p,n p,n ()fp,n (di ) (5.2)
n=1
Fig.5.1a Behavior of
modeled covariance func-
tions p using M=5 for
a/p=[1,,10], and b the
model fitting on the available
measurement data (IEEE
2011)
100 5 BrainMachine Interface: System Optimization
Without loss of generality, consider for instance two transistors with given
threshold voltages. In our approach, their threshold voltages are modeled as sto-
chastic processes over the spatial domain of a die, thus making parameters of any
two transistors on the die two different correlated random variables. The value of M
is governed by the accuracy of the eigenpairs in representing the covariance func-
tion rather than the number of random variables. Unlike previous approaches, which
model the covariance of process parameters due to the random effect as a piece-
wise-linear model [23] or through modified Bessel functions of the second kind
[24], here the covariance is represented as a linearly decreasing exponential function
Cp (d1 , d2 ) = 1 + dx,y ecx |dx1 dx2 |cy |dy1 dy2 |/
(5.3)
where defines a fitting parameter estimated from the extracted data, W* and L*
represent the geometrical deformation due to manufacturing variations, and p*
models electrical parameter deviations from their corresponding nominal values,
e.g., altered transconductance, threshold voltage, etc. (Appendix B).
In addition to process parameter variability, which sets the upper bound on the cir-
cuit design in terms of accuracy, linearity and timing, existence of noise associated
with fundamental processes represents an elementary limit on the performance of
electronic circuits.
Neural cell noise model: In the Hodgkin and Huxley framework, a neural chan-
nels configuration is determined by the states of its constituent subunits, where
each subunit can be either in an open or closed state [25]. Adding a noise term
x(V,t) (x=m,h, or n) to the deterministic ordinary differential equation (ODE)
of Hodgkin and Huxley is consistent with the behavior of the Markov process for
channel gating [26]. Such process can be contracted to a Langevin description
(via a Fokker-Planck equation) and expressed as delta-correlated noise processes
neuron(t+,t)=1/x[x(1x)+xx](), where x is the total number of neural
channels, and the transition rates x(t) and x(t) are instantaneous functions of the
membrane potential V(t). Diracs delta function designates that the noise at dif-
ferent times is uncorrelated and the variables m, h, and n represent the aggregated
fraction of open subunits of different types, aggregated across the entire cell mem-
brane. Subsequently, the neural channel noise is modeled as Brownian motion, i.e.,
as a Gauss-distributed nonstationary stochastic process with independent incre-
ments and heuristically fixed constant variance [27].
Electrodetissue interface and signal conditioning circuit noise model: In intra-
cortical microelectrode recordings, biological (neural cell) noise mainly origi-
nates from the firing of several neurons in the tissue surrounding the recording
microelectrode, while thermal noise levels are influenced by the electrodetissue
interface impedance in each individual recording site (as a result of the foreign
body reaction) and the recording bandwidth, i.e., a wider recording bandwidth
increase thermal noise levels. The electrodetissue interface noise includes the
tissue/bulk thermal noise and the electrodeelectrolyte interface noise. Tissue
noise is modeled as the thermal noise generated by the solution/spreading or tis-
sue/encapsulation resistance [28] and the electrode noise is the thermal noise
generated by the charge transfer resistor [29]. The noise of the signal condition-
ing electronic circuits is mainly determined by the thermal and flicker noise.
The most important types of electrical noise sources (thermal, shot, and flicker
noise) in passive elements and integrated-circuit devices have been investi-
gated extensively, and appropriate models derived [30] as stationary and in [31]
as nonstationary noise sources. We adapt model descriptions as defined in [31],
where thermal and shot noise are expressed as thermal(t+,t)=2kTG(t)() and
102 5 BrainMachine Interface: System Optimization
Device variability effects limitations are rudimentary issues for the robust circuit
design and their evaluation has been subject of numerous studies. Several models
have been suggested for device variability [3436], and correspondingly, a number
of Computer-aided design (CAD) tools for statistical circuit simulation [3742]. In
general, a circuit design is optimized for parametric yield so that the majority of
manufactured circuits meet the performance specifications. The computational cost
and complexity of yield estimation, coupled with the iterative nature of the design
process, make yield maximization computationally prohibitive. As a result, circuit
designs are verified using models corresponding to a set of worst-case conditions
of the process parameters. Worst-case analysis refers to the process of determining
the values of the process parameters in these worst-case conditions and the cor-
responding worst-case circuit performance values. Worst-case analysis is very effi-
cient in terms of designer effort, and thus has become the most widely practiced
technique for statistical analysis and verification. Algorithms previously proposed
for worst-case tolerance analysis fall into four major categories: corner technique,
interval analysis, sensitivity-based vertex analysis, and Monte Carlo simulation.
5.3 Stochastic MNA for Process Variability Analysis 103
The most common approach is the corners technique. In this approach, each
process parameter value that leads to the worst performance is chosen indepen-
dently. This method ignores the correlations among the processes parameters, and
the simultaneous setting of each process parameter to its extreme value result in
simulation at the tails of the joint probability density of the process parameters.
Thus, the worst-case performance values obtained are extremely pessimistic.
Interval analysis is computationally efficient but leads to overestimated results,
i.e., the calculated response space enclose the actual response space, due to the
intractable interval expansion caused by dependency among interval operands.
Interval splitting techniques have been adopted to reduce the interval expan-
sion, but at the expense of computational complexity. Traditional vertex analysis
assumes that the worst-case parameter sets are located at the vertices of param-
eter space, thus the response space can be calculated by taking the union of circuit
simulation results at all possible vertices of parameter space. Given a circuit with
M uncertain parameters, this will result in a 2M simulation problem. To further
reduce the simulation complexity, sensitivity information computed at the nomi-
nal parameter condition is used to find the vertices that correspond to the worst
cases of circuit response. The Monte Carlo algorithm takes random combinations
of values chosen from within the range of each process parameter and repeatedly
performs circuit simulations. The result is an ensemble of responses from which
the statistical characteristics are estimated. Unfortunately, if the number of itera-
tions for the simulation is not very large, Monte Carlo simulation always underes-
timates the tolerance window. Accurately determining the bounds on the response
requires a large number of simulations, so consequently the Monte Carlo method
becomes very cpu-time consuming if the chip becomes large. Other approaches
for statistical analysis of variation-affected circuits, such as the one based on the
Hermite polynomial chaos [43] or the response surface methodology, are able to
perform much faster than a Monte Carlo method at the expense of a design of an
experiments preprocessing stage [44]. In this section, the circuits are described as
a set of stochastic differential equations (SDE) and Gaussian closure approxima-
tions are introduced to obtain a closed form of moment equations. Even if a ran-
dom variable is not strictly Gaussian, a second-order probabilistic characterization
yields sufficient information for most practical problems.
Modern integrated circuits are often distinguished by a very high complex-
ity and a very high packing density. The numerical simulation of such circuits
requires modeling techniques that allow an automatic generation of network equa-
tions. Furthermore, the number of independent network variables describing the
network should be as small as possible. Circuit models have to meet two contra-
dicting demands: they have to describe the physical behavior of a circuit as correct
as possible while being simple enough to keep computing time reasonably small.
The level of the models ranges from simple algebraic equations, over ordinary and
partial differential equations to Boltzmann and Schrodinger equations depending
on the effects to be described. Due to the high number of network elements (up to
millions of elements) belonging to one circuit, one is restricted to relatively simple
models. In order to describe the physics as good as possible, so called compact
104 5 BrainMachine Interface: System Optimization
models represent the first choice in network simulation. Complex elements such
as transistors are modeled by small circuits containing basic network elements
described by algebraic and ODE only. The development of such replacement cir-
cuits forms its own research field and leads nowadays to transistor models with
more than 500 parameters. A well-established approach to meet both demands to
a certain extent is the description of the network by a graph with branches and
nodes. Branch currents, branch voltages and node potentials are introduced as
variables. The node potentials are defined as voltages with respect to one refer-
ence node, usually the ground node. The physical behavior of each network ele-
ment is modeled by a relation between its branch currents and its branch voltages.
In order to complete the network model, the topology of the elements has to be
taken into account. Assuming the electrical connections between the circuit ele-
ments to be ideally conducting and the nodes to be ideal and concentrated, the
topology can be described by Kirchhoffs laws (the sum of all branch currents
entering a node equals zero and the sum of all branch voltages in a loop equals
zero). In general, for time-domain analysis, modified nodal analysis (MNA) leads
to a nonlinear ODE or differential algebraic equation system which, in most cases,
is transformed into a nonlinear algebraic system by means of linear multi-step
integration methods [45, 46] and, at each integration step, a Newton-like method
is used to solve this nonlinear algebraic system (Appendix B). Therefore, from a
numerical point of view, the equations modeling a dynamic circuit are transformed
to equivalent linear equations at each iteration of the Newton method and at each
time instant of the time-domain analysis. Thus, we can say that the time-domain
analysis of a nonlinear dynamic circuit consists of the successive solutions of
many linear circuits approximating the original (nonlinear and dynamic) circuit at
specific operating points.
Consider a linear circuit with N+1 nodes and B voltage-controlled branches
(two-terminal resistors, independent current sources, and voltage-controlled
n-ports), the latter grouped in set B. We then introduce the source current vec-
tor iRB and the branch conductance matrix GRBB. By assuming that the
branches (one for each port) are ordered element by element, the matrix is block
diagonal: each 11 block corresponds to the conductance of a one-port and in
any case is nonzero, while nn blocks correspond to the conductance matrices of
voltage-controlled n-ports. More in detail, the diagonal entries of the nn blocks
can be zero and, in this case, the nonzero off-diagonal entries, on the same row or
column, correspond to voltage-controlled current sources (VCCSs). Now, consider
MNA and circuits embedding, besides voltage-controlled elements, independent
voltage sources, the remaining types of controlled sources and sources of process
variations. We split the set of branches B in two complementary subsets: BV of
voltage-controlled branches (v-branches) and BC of current-controlled branches
(c-branches).
Conventional nodal analysis (NA) is extended to MNA [46] as follows: currents
of c-branches are added as further unknowns and the corresponding branch equa-
tions are appended to the NA system. The NB incidence matrix A can be par-
titioned as A=[Av Ac], with AvRNBv and AcRNBc. As in conventional NA,
5.3 Stochastic MNA for Process Variability Analysis 105
Let x0=x(0,t) be the generic point around which to linearize, and with the
change of variable =xx0=[(qp0)T,(0)T]T, the first-order Taylor
piecewise-linearization of (5.7) in x0 yields
P(x0 ) + (K(x0 ) + P (x0 )) = 0 (5.9)
where K(x)=B(x), P(x)=F(x). Transient analysis requires only the solution of
the deterministic version of (5.7), e.g., by means of a conventional circuit simu-
lator, and of (5.9) with a method capable of dealing with linear SDE with sto-
chasticity that enters only through the initial conditions. Since (5.9) is a linear
homogeneous equation in , its solution, will always be proportional to 0. We
can rewrite (5.9) as
(x0 ) = E(x0 )0 + F(x0 )0 (5.10)
Equation(5.10) is a system of SDE which is linear in the narrow sense (right-hand
side is linear in and the coefficient matrix for the vector of variation sources is
independent of ) [47]. Since these stochastic processes have regular properties,
they can be considered as a family of classical problems for the individual sample
paths and be treated with the classical methods of the theory of linear SDE. By
expanding every element of (t) with
m
i (t) = (t)( 0 ) = ij (t)j (5.11)
j=1
for m elements of a vector . As long as j(t) is obtained, the expression for (t) is
known, so that the covariance matrix of the solution can be written as
= T (5.12)
Defining aj(t)=(a1j, a2j, , anj)T, Fj(t)=(F1j, F2j, , Fnj)T, the requirement for
(t) is
here are caused by the small current and voltage fluctuations, such as thermal,
shot, and flicker noise, that are generated within the integrated-circuit devices
themselves.
The noise performance of a circuit can be analyzed in terms of the small-signal
equivalent circuits by considering each of the uncorrelated noise sources in turn
and separately computing their contribution at the output. A nonlinear circuit is
assumed to have time-invariant (dc) large-signal excitations and time-invariant
steady-state large-signal waveforms and that both the noise sources and the noise
at the output are wide-sense stationary stochastic processes. Subsequently, the
nonlinear circuit is linearized around the fixed operating point to obtain a linear
time-invariant network for noise analysis. Implementation of this method based on
the interreciprocal adjoint network concept [48] results in a very efficient com-
putational technique for noise analysis, which is available in almost every circuit
simulator. Unfortunately, this method is only applicable to circuits with fixed oper-
ating points and is not appropriate for noise simulation of circuits with changing
bias conditions.
In a noise simulation method that uses linear periodically time-varying trans-
formations [49, 50], a nonlinear circuit is assumed to have periodic large-signal
excitations and periodic steady-state large-signal waveforms and that both the
noise sources and the noise at the output are cyclostationary stochastic processes.
Afterward, the nonlinear circuit is linearized around the periodic steady-state oper-
ating point to obtain a linear periodically time-varying network for noise analysis.
Nevertheless, this noise analysis technique is applicable to only a limited class of
nonlinear circuits with periodic excitations.
Noise simulation in time-domain has traditionally been based on the Monte
Carlo technique [51], where the circuit with the noise sources is simulated using
numerous transient analyzes with different sample paths of the noise sources.
Consequently, the probabilistic characteristics of noise are then calculated using
the data obtained in these simulations. However, accurately determining the
noise content requires a large number of simulations, so consequently, Monte
Carlo method becomes very cpu-time consuming if the chip becomes large.
Additionally, to accurately model shot and thermal noise sources, time-step in
transient analysis is limited to a very small value, making the simulation highly
inefficient.
In this section, we treat the noise as a nonstationary stochastic process, and
introduce an It system of SDE as a convenient way to represent such a pro-
cess. Recognizing that the variance-covariance matrix when backward Euler is
applied to such a matrix can be written in the continuous-time Lyapunov matrix
form, we then provide a numerical solution to such a set of linear time-varying
equations. We adapt model description as defined in [31], where thermal and shot
noise are expressed as delta-correlated noise processes having independent values
at every time point, modeled as modulated white noise processes. These noise pro-
cesses correspond to current noise sources which are included in the models of the
integrated-circuit devices. As numerical experiments suggest that both the conver-
gence and stability analyses of adaptive schemes for SDE extend to a number of
108 5 BrainMachine Interface: System Optimization
sophisticated methods which control different error measures, we follow the adap-
tation strategy, which can be viewed heuristically as a fixed time-step algorithm
applied to a time rescaled differential equation. Additionally, adaptation also con-
fers stability on algorithms constructed from explicit time-integrators, resulting in
better qualitative behavior than for fixed time-step counter-parts [52].
The inherent nature of white noise process differ fundamentally from a wide-
sense stationary stochastic process such as static manufacturing variability and
cannot be treated as an ODE using similar differential calculus as in Sect. 5.3.
The MNA formulation of the stochastic process that describes random influences,
which fluctuate rapidly and irregularly (i.e., white noise ) can be written as
F(r , r, t) + B(r, t) = 0 (5.14)
where r is the vector of stochastic processes which represents the state variables
(e.g., node voltages) of the circuit, is a vector of white Gaussian processes and
B(r,t) is a state and time dependent modulation of the vector of noise sources.
Since the magnitude of the noise content in a signal is much smaller in comparison
to the magnitude of the signal itself in any functional circuit, a system of nonlinear
SDE described in (5.14) can be piecewise-linearized under similar assumptions as
noted in Sect. 5.3. Including the noise content description, (2.10) can be expressed
in general form as
(t) = E(t) + F(t) (5.15)
where =[(rr0)T,(0)T]T.We will interpret (5.15) as an Ito system of
SDE. Now rewriting (5.15) in the more natural differential form
d(t) = E(t)dt + F(t)dw (5.16)
where we substituted dw(t)=(t)dt with a vector of Wiener process w. If
the functions E(t) and F(t) are measurable and bounded on the time interval of
interest, there exists a unique solution for every initial value (t0) [47]. If is a
Gaussian stochastic process, then it is completely characterized by its mean and
correlation function. From Itos theorem on stochastic differentials
d((t)T (t))/dt = (t) d(T (t))/dt + d((t))/dt T (t) + F(t) F T (t)dt (5.17)
and expanding (5.17) with (5.16), noting that and dw are uncorrelated, vari-
ance-covariance matrix K(t) of (t) with the initial value K(0)=[ T] can be
expressed in differential Lyapunov matrix equation form as [47]
dK(t)/dt = E(t)K(t) + K(t)E T (t) + F(t)F T (t) (5.18)
Note that the mean of the noise variables is always zero for most integrated cir-
cuits. In view of the symmetry of K(t), (5.18) represents a system of linear ODE
with time-varying coefficients. To obtain a numerical solution, (5.18) has to be
discretized in time using a suitable scheme, such as any linear multi-step method,
or a Runge-Kutta method. For circuit simulation, implicit linear multi-step meth-
ods, and especially the trapezoidal method and the backward differentiation for-
mula were found to be most suitable [53]. If backward Euler is applied to (5.18),
5.4 Stochastic MNA for Noise Analysis 109
the differential Lyapunov matrix equation can be written in a special form referred
to as the continuous-time algebraic Lyapunov matrix equation
Pr K(tr ) + K(tr )PrT + Qr = 0 (5.19)
K(t) at time point tr is calculated by solving the system of linear equations in
(5.19). Such continuous-time Lyapunov equations have a unique solution K(t),
which is symmetric and positive semidefinite.
Several iterative techniques have been proposed for the solution of the alge-
braic Lyapunov matrix Eq.(5.19) arising in some specific problems where the
matrix Pr is large and sparse [5457], such as the BartelsStewart method [58],
and Hammarlings method [47], which remains the one and only reference for
directly computing the Cholesky factor of the solution K(tr) of (5.19) for small
to medium systems. For the backward stability analysis of the BartelsStewart
algorithm, see [59]. Extensions of these methods to generalized Lyapunov equa-
tions are described in [60]. In the Bartels-Stewart algorithm, first Pr is reduced
to upper Hessenberg form by means of Householder transformations, and then
the QR-algorithm is applied to the Hessenberg form to calculate the real Schur
decomposition [61] to transform (5.19) to a triangular system which can be solved
efficiently by forward or backward substitutions of the matrix Pr
S = U T Pr U (5.20)
where the real Schur form S is upper quasi-triangular and U is orthonormal. Our
formulation for the real case utilizes a similar scheme. The transformation matri-
ces are accumulated at each step to form U [58]. If we now set
K = U T K(tr )U
(5.21)
Q = U T Qr U
where S1, K1, Q1R(n1)(n1); s, k, qR(n1). The system in (5.20) then gives
three equations
(n + n )knn + qnn = 0 (5.24)
knn can be obtained from (5.23) and set in (5.24) to solve for k. Once k is known,
(5.25) becomes a Lyapunov equation which has the same structure as (5.22) but of
order (n1), as
S1 K1 + K1 S1T = Q1 sk T ksT (5.27)
We can apply the same process to (5.26) until S1 is of the order 1. Note under
the condition that i=1,,n at the k-th step (k=1,2,,n) of this process, we can
obtain a unique solution vector of length (n+1k) and a reduced triangular
matrix equation of order (nk). Since U is orthonormal, once (5.22) is solved for
K , then K(tr) can be computed using
K(tr ) = U KU T (5.28)
Large dense Lyapunov equations can be solved by sign function based techniques
[61]. Krylov subspace methods, which are related to matrix polynomials have
been proposed [62] as well.
Relatively large sparse Lyapunov equations can be solved by iterative
approaches, e.g., [63]. Here, we apply a low rank version of the iterative method
[64], which is related to rational matrix functions. The postulated iteration for the
Lyapunov Eq.(5.19) is given by K(0)=0 and
for i=1,2, This method generates a sequence of matrices Ki which often con-
verges very fast toward the solution, provided that the iteration shift parameters i
are chosen (sub)optimally. For a more efficient implementation of the method, we
replace iterates by their Cholesky factors, i.e., Ki=LiLH
i and reformulate in terms
of the factors Li. The low rank Cholesky factors Li are not uniquely determined.
Different ways to generate them exist [64].
Note that the number of iteration steps imax needs not be fixed a priori.
However, if the Lyapunov equation should be solved as accurate as possible,
correct results are usually achieved for low values of stopping criteria which are
slightly larger than the machine precision.
5.5.1Power Optimization
Random process variations have a major influence on the design parameters and
yield of the manufactured circuits. We define yield as the percentage of manufac-
tured circuits that meets all the specifications, considering process variations
(5.30)
5.5 PPA Optimization of Multichannel Neural Recording Interface 111
where E{.} is the expected value, and each vector d has an upper and lower
bound determined by the technological process variation pz with probability den-
sity function pdf(pz). The deterministic designable parameters dr, ,
e.g., bias voltages and currents, transistor widths and lengths, resistances, capaci-
tances, are denoted by the vector dD, where D is the designable parameter
space. Let the total area of the circuit be Atotal=k(xkAk), where A is the area of
a transistor or a discrete component (resistor or capacitor), k is an index that runs
over all transistors or a discrete components in the circuit and x is the sizing fac-
tor (x1). The optimization problem can then be formulated as the search for a
design point that minimizes the total power Ptotal 1cl over the deterministic
designable parameters d with lower bounds aj, and upper bounds bj, for 1jm
in the design space D, subject to a minimum yield requirement y with bound
(5.31)
Let D(Ptotal) be the compact set of all valid design variable vectors d, such that
Ptotal(d)=Ptotal. The designable parameter space D is assumed to be compact,
which for all practical purposes is no real restriction when the problem has a
finite minimum. The main advantage of this approach is its generality: it imposes
no restrictions on the distribution of p and on how the data enters the constraints.
We can approximately subdivide the algorithm into two steps; the yield fulfill-
ment, and the objective function optimization. If, as an approximation, we restrict
D(Ptotal) to just the one-best derivation of Ptotal, then we obtain the structured per-
ceptron algorithm [65]. As a consequence, given active constraints, including opti-
mum power budget and minimum frequency of operation, (5.31) can be effectively
solved by a sequence of minimizations of the feasible region with iteratively gen-
erated low-dimensional subspaces using a cutting plane method [66].
The statistical yield constrained problems require mechanisms for quantifying
the reliability associated with the resulting solution, and bounding the true optimal
value of the yield constraint problem (5.31). We define a reliable bound on prob-
ability Prob{ajdbj; 1jm) as the random quantity
v
:= arg max{ :
r
r (1 )vr (5.32)
[0,1] r=0
feasibility of d can be evaluated with a high reliability, provided that the bound is
within realistic assumption.
The power optimization problem implicates varying the design point to optimize
power, subject to constraints of other, secondary performance measures, and
designable parameter boundaries. With a metric PPA, we quantify the minimum
power design that meets a targeted performance, while including the impact of
area scaling. The PPA metric depends on the process and operating conditions, cir-
cuit specification and the technologys VT option. We can express this multi-crite-
ria circuit performance optimization problem as
(5.33)
(5.34)
The PPA value, at any design point, is converted into a performance score s and,
subsequently, score s is utilized to compute an overall index of circuit quality,
denoted by PPA (d;s), which is the objective function for the design optimization.
Accordingly, the constrained multi-criteria optimization is converted into an opti-
mization with a single objective function [67]. As a result, the general form of the
optimization problem becomes
(5.35)
5.5 PPA Optimization of Multichannel Neural Recording Interface 113
(5.36)
where is a combined feature representation of a performance function in a
given application. We replace each nonlinear inequality in (5.36) by |D|1 linear
inequalities
(5.37)
If the system of inequalities in (5.37) is feasible, typically, more than one solu-
tion d is possible. For a unique solution, we select d with ||d||1 for which s is
uniformly different from the next closest score update. The score update is than
expressed as dual quadratic program (QP)
(5.38)
where is the step size, the Lagrange multiplier imposing the constraint for
label ddi, and h(d) are the feature vectors of a design variable vector d. To find
the local maxima and minima, we repeatedly select a pair of derivatives of d and
optimize their dual (Lagrange) variables . The dual program formulation has
two main advantages over the primal QP; since dual program is determined only
by inner products defined by , it allows the usage of kernel functions, and addi-
tionally, the constraint matrix of the dual program supports problem decomposi-
tion. At the end of sequence, we average all the score vectors s obtained at each
iteration, similar to structured perceptron algorithm [65].
5.6Experimental Results
All the experimental results are carried out on a single processor Ubuntu Linux
9.10 system with Intel Core 2 Duo CPUs 2.66GHz processor and 6GB of
memory. The circuit netlist is simulated in Cadence Specter using 90nm CMOS
model files. The simulation date points are processed with a PERL script and
fed back into the MatLab code. The evaluated front-end neural recording inter-
face is illustrated in Fig.5.2. The test dataset (Fig.5.3a) is based on recordings
from the human neocortex and basal ganglia, however, the proposed optimization
114 5 BrainMachine Interface: System Optimization
T1 T2
Cin Cf clkin
Vin C
Vref A1 clock boosting
Gm2
C/ (A+1) A2
Cin AC Gm1 SAR
C S1
T3 logic
Cf
Fig.5.2Schematic of the front-end neural recording interface including LNA, band-pass filter,
PGA, and SAR A/D converter
F is the total signal power, amp,i represents the variance of the noise added by
2
speed of the SAR ADC is primarily a function of the technologys gate delay and
kT/C noise multiplied by the number of SAR cycles necessary for one conversion.
The maximum resolution in SNR-bits of an SAR (for a given value of an effective
thermal resistance Reff, which sums together the effects of all noises, e.g., thermal,
shot, 1/f and input-referred noise) over 1/2 band (0fNeuronfs/2) is
the full-Nyquist
than expressed as Nnoise = log2 VFS 2 / 6kTf R
s eff 1, where VFS is a full-scale
input signal and fs is the sampling frequency. The accuracy of the neural spike
classification in a backend signal processing unit directly increase with A/D con-
verter resolution, although it saturates beyond 56 bit resolution, ultimately lim-
ited by the SNR.
However, since the amplitude of the observed spike signals can vary, typically,
by one order of magnitude, additional resolution is needed (i.e., 23 bit), if the
amplification gain is fixed. Additionally, increasing the sampling rate of the A/D
converter improves spike sorting accuracy, since this captures finer features fur-
ther differentiating the signals. The PPA ratio differs for each design depending on
circuit characteristics, such as power consumption, bandwidth, gain, linearity, etc.
Closed form symbolic expressions of the constraints and the objective are passed
on to the optimization algorithm. Design heuristics are used to provide a good ini-
tial starting point. The total run-time of the optimization method is only dozens
5.6 Experimental Results 115
Amplitude
0.5
0
-0.5
-1
0 200 400 600 800 1000 1200 1400 1600 1800 2000
zoom in
1
Amplitude
0.5
0
-0.5
-1
380 400 420 440 460 480 500 520 540 560
Time [ms]
(b)
20
Membrane potential [mV]
-20
-40
-60
0 5 10 15 20 25
Time [ms]
(c) 0
-50
dB
-100
-150
1 2 3 4 5
10 10 10 10 10
Frequency [Hz]
Fig.5.3a The test dataset, the y axis is arbitrary; a top raw signal after amplification, not cor-
rected for gain, b bottom zoom in of the raw signal, and c Spectral signature of SAR A/D con-
verter-two tone test; black area spectral content with nominal gain, gray area spectra with 20%
gain reduction, equivalent to 4 LSB loss in the dynamic range (IEEE 2015)
of seconds, and the number of iterations required to reach the stopping criterion
never exceeds 6 throughout the entire simulated range (from 103 to 101).
The design trade-off exploration space for circuit area, sample frequency and
PPA is illustrated in Fig.5.4a. The area and sample frequency curves are plotted
for the worst-case design (WCD), and the proposed quadratic program optimized
116 5 BrainMachine Interface: System Optimization
1.3
1.2
1.1
Relative Area
1
0.7
0.6
0.5
0.4
0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
Relative 1/fs
(b) 4
3.5
2.5
PPA
1.5 tolerance
box
1
optimal
0.5 yield box
0
0 0.5 1 1.5 2 2.5 3 3.5 4
Relative 1/fs
Fig.5.4a Area, sampling frequency and PPA trade-off for neural recording channel optimized
with quadratic programming (QPO) and worst-case design (WCD). The iso-PPA is shown as an
overlay (IEEE 2015), and b optimized PPA versus relative sampling frequency
approach (QPO). The normalized PPA ratio of the design is represented at the
intersection with the area-sample frequency curves. For a given circuit area, the
optimized design obtains higher performance than the corresponding WCD. The
points lying on the lowest intersections are most power efficient for the given input
and output constraints, and represent the PPA curve of interest. With the same
yield constraints, the optimization produces uniformly better optimum signal band-
width curves for a given power. The improvement is determined by the underly-
ing structure of physical process variation. If the amount of uncorrelated variability
increases, i.e., the intra-chip variation increases in comparison with the chip-to-
chip variation, the feasible yield facilitated by optimization increases. Similarly, to
maintain a constant power efficiency as area is reduced, the circuit noise and the
current and voltage efficiencies need to be held constant. The power consumption
of the neural interface front-end increases linearly with sampling frequency.
5.6 Experimental Results 117
(a)
0.7
0.6
Area
0.5
m D2
(g /I )
0.4
0.3
0.2
0.1
0.1 0.2 0.3 0.4 0.5 0.6 0.7
(g /I )
m D1
(b) 4
power-gain trade-off
3.5
2.5
ref
optimal
2
P/P
PPA
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4
Relative Gain
(c) 4
power-area trade-off
3.5
2.5
ref
2
P/P
1.5
1 optimal
PPA
0.5
0
0 0.5 1 1.5 2 2.5 3
Relative Area
Fig.5.5a Two stages gm/ID versus constant gain (plain), constant area (plain hyperbolic), and
constant current (dashed elliptic) contours, b normalized contours showing optimal power per
area (PPA) versus relative gain (IEEE 2015), and c normalized contours showing optimal
power per area (PPA) versus relative area
The constant power, area, and gain contours for two gain stages are illustrated
in Fig.5.5a. The total area is shown as the hyperbolic-shaped contour, while ellip-
tic contours define the total current, IDtotal. Large transistor bias point (gm/ID)
corresponds to more current and smaller transistors. Contrasting, if we decrease
the current, the gain (due to larger gm/ID), and the total area increase. The plot in
Fig.5.5b illustrates the position of the optimal PPA versus relative (given) gain.
Consumed power in neural interface gain stages increase proportionally with gain
increase.
Typically, desired high gm is obtained at the cost of an increased bias current
(increased power) or area (wide transistors). However, for very short channel the
carrier velocity quickly reaches the saturation limit at which the gm also saturates,
becoming independent of gate length or bias. The intrinsic gain degradation can
be alleviated with open-loop residue amplifiers [68], comparator-based switched
5.6 Experimental Results 119
capacitor circuits [69], and correlated level shifting [70]. The plot in Fig.5.5c
illustrates the position of the optimal PPA under maximum yield reference design
point versus relative area. The offset and the static accuracy critically depend on
the matching between nominally identical devices. This error, however, typically
decreases as the area of devices increases. Several rules exist [71] to ensure suf-
ficient matching; the matched devices should have the same structure and the sur-
roundings in the layout, use the same materials, have the same orientation and
temperature, and the distance between matched devices should be minimum.
In Table5.1, the worst-case design (WCD) is compared across the neural inter-
face circuits with the optimization approach. The QP optimized circuits allow
large area reduction when designed for maximum WCD frequency ranging from
9 to 19%, with 16% on average. When operating at the same frequency, the opti-
mized total power is reduced up to 21%. The optimization space in symmetrical
circuits is restricted and, consequently, the additional power saving obtained by an
optimization is limited, particularly with the higher yield.
For decreased yield, 95% instead of 99%, higher power saving of up to 32%
on average can be achieved as a consequence of a larger optimization space (not
shown in Table5.1). Note that over-dimensioning in a case of higher yield, leads
to a larger area and higher power consumption. As yield increases when tolerance
decreases, an agreeable trade-off needs to exist between increase in yield and the
cost of design and manufacturing. Consequently, continuous observation of pro-
cess variation and thermal monitoring becomes a necessity [72]. The observed cir-
cuits power consumption scales with its bandwidth and SNR. The limit on power
dissipated can be expressed as (8kT)f(SNR), where f is an increasing function
of SNR [73]. Additionally, the interface input to the neural system is subject to
external noise, which can be represented by an effective temperature. Reducing
noise to improve signal processing requires larger numbers of receptors, channels,
or neurons, requiring additional power resources [74].
5.7Conclusions
Integrated neural implants interface with the brain using biocompatible elec-
trodes to provide high yield cell recordings, large channel counts, and access to
spike data and/or field potentials with high signal-to-noise ratio. Rapid advances in
computational capabilities, design tools, and biocompatible electrodes fabrication
techniques allow for the development of neural prostheses capable of interfacing
with single neurons and neuronal networks. The miniaturization of the functional
blocks in neural recording interface, however, presents significant circuit design
challenges in terms of noise, area, power, and the reliability of the recording sys-
tem. In this chapter, we develop a yield constrained sequential PPA minimization
framework that is applied to a multivariable optimization in a neural record-
ing interface. By limiting over-dimensioning of the circuit, the proposed method
achieves consistently a better PPA ratio over the entire range of neural recording
120 5 BrainMachine Interface: System Optimization
interface circuits, with no loss of circuit performance. Our approach can be used
with any variability model and is not restricted to any particular performance
constraint. As the experimental results in CMOS 90nm technology indicate, the
suggested numerical methods provide accurate and efficient solutions of the PPA
optimization problem offering yield up to 26% power savings and up to 22% area
reduction, without penalties.
References
18. S. Seth, B. Murmann, Design and optimization of continuous-time filters using geometric
programming, in Proceedings of IEEE International Symposium on Circuits and Systems
(2014), pp. 20892092
19. A. Zjajo, C. Galuzzi, R. van Leuken, Sequential power per area optimization of multichan-
nel neural recording interface based on dual quadratic programming, in Proceedings of IEEE
International Conference on Neural Engineering (2015), pp. 912
20. M. Grigoriu, On the spectral representation method in simulation. Probab. Eng. Mech. 8,
7590 (1993)
21. M. Love, Probability Theory (D. Van Nostrand Company Inc., Princeton, 1960)
22. R. Ghanem, P.D. Spanos, Stochastic Finite Element: A Spectral Approach (Springer, Berlin,
1991)
23. P. Friedberg, Y. Cao, J. Cain, R. Wang, J. Rabaey, C. Spanos, Modeling within-die spatial
correlation effects for process-design co-optimization, in Proceedings of IEEE International
Symposium on Quality of Electronic Design (2005), pp. 516521
24. J. Xiong, V. Zolotov, L. He, Robust extraction of spatial correlation, in Proceedings of IEEE
International Symposium on Physical Design (2006), pp. 29
25. A. Hodgkin, A. Huxley, A quantitative description of membrane current and its application to
conduction and excitation in nerve. J. Physiol. 117, 500544 (1952)
26. R.F. Fox, Y.-N. Lu, Emergent collective behavior in large numbers of globally coupled inde-
pendently stochastic ion channels. Phys. Rev. E. 49, 34213431 (1994)
27. A. Saarinen, M.L. Linne, O. Yli-Harja, Stochastic differential equation model for cerebellar
granule cell excitability. PLoS Comput. Biol. 4(2), 111 (2008)
28. A.C. West, J. Newman, Current distributions on recessed electrodes. J. Electrochem. Soc.
138(6), 16201625 (1991)
29. Z. Yang, Q. Zhao, E. Keefer, W. Liu, Noise characterization, modeling, and reduction for in
vivo neural recording, in Advances in Neural Information Processing Systems (2010), pp.
21602168
30. P.R. Gray, R.G. Meyer, Analysis and Design of Analog Integrated Circuits (Wiley, New York,
1984)
31. A. Demir, E. Liu, A. Sangiovanni-Vincentelli, Time-domain non-Monte Carlo noise simu-
lation for nonlinear dynamic circuits with arbitrary excitations, in Proceedings of IEEE
International Conference on Computer-Aided Design (1994), pp. 598603
32. J.H. Fischer, Noise sources and calculation techniques for switched capacitor filters. IEEE J.
Solid-State Circuits 17(4), 742752 (1982)
33. T. Sepke, P. Holloway, C.G. Sodini, H.-S. Lee, Noise analysis for comparator-based circuits.
IEEE Trans. Circuits Syst. I 56(3), 541553 (2009)
34. C. Michael, M. Ismail, Statistical Modeling for Computer-Aided Design of MOS VLSI
Circuits (Kluwer, Boston, 1993)
35. H. Zhang, Y. Zhao, A. Doboli, ALAMO: an improved -space based methodology for
modeling process parameter variations in analog circuits, in Proceedings of IEEE Design,
Automation and Test in Europe Conference (2006), pp. 156161
36. M. Pelgrom, A. Duinmaijer, A. Welbers, Matching properties of MOS transistors. IEEE J.
Solid-State Circuits 24(5), 14331439 (1989)
37. R. Lpez-Ahumada, R. Rodrguez-Macas, FASTEST: a tool for a complete and efficient sta-
tistical evaluation of analog circuits, dc analysis. in Analog Integrated Circuits and Signal
Processing, vol 29, no 3 (Kluwer Academic Publishers, The Netherlands, 2001), pp. 201212
38. G. Biagetti, S. Orcioni, C. Turchetti, P. Crippa, M. Alessandrini, SiSMA-a statistical simu-
lator for mismatch analysis of MOS ICs, in Proceedings of IEEE/ACM International
Conference on Computer-Aided Design (2002), pp. 490496
39. B. De Smedt, G. Gielen, WATSON: design space boundary exploration and model generation
for analogue and RF IC design. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 22(2),
213224 (2003)
122 5 BrainMachine Interface: System Optimization
63. E. Wachspress, Iterative solution of the Lyapunov matrix equation. Appl. Math. Lett. 1,
8790 (1998)
64. J. Li, F. Wang, J. White, An efficient Lyapunov equation-based approach for generat-
ing reduced-order models of interconnect, in Proceedings of IEEE Design Automation
Conference (1999), pp. 16
65. Y. Freund, R.E. Schapire, Large margin classification using the perceptron algorithm. Mach.
Learn. 37, 277296 (1999)
66. I. Tsochantaridis, T. Hofmann, T. Joachims, Y. Altun, Support vector machine learning for
interdependent and structured output spaces, in International Conference on Machine
Learning (2004), pp. 18
67. A. Dharchoudbury, S.M. Kang, Worst-case analysis and optimization of VLSI circuits perfor-
mances. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 14(4), 481492 (1995)
68. B. Murmann, B.E. Boser, A 12-bit 75-ms/s pipelined ADC using open-loop residue amplifi-
cation. IEEE J. Solid-State Circuits 38(12), 20402050 (2003)
69. T. Sepke etal., Comparator-based switched-capacitor circuits for scaled CMOS technologies,
in IEEE International Solid-State Circuit Conference Digest of Technical Papers (2006), pp.
220221
70. B.R. Gregoire, U.-K. Moon, An over-60db true rail-to-rail performance using correlated
level shifting and an opamp with 30db loop gain, in IEEE International Solid-State Circuit
Conference Digest of Technical Papers (2008), pp. 540541
71. A. Zjajo, J. Pineda de Gyvez, Low-Power High-Resolution Analog to Digital Converters
(Springer, New York, 2011)
72. A. Zjajo, M.J. Barragan, J. Pineda de Gyvez, Low-power die-level process variation and tem-
perature monitors for yield analysis and optimization in deep-submicron CMOS. IEEE Trans.
Instrum. Meas. 61(8), 22122221 (2012)
73. E.A. Vittoz, Future of analog in the VLSI environment, in Proceedings of IEEE International
Symposium on Circuits and Systems (1990), pp. 13721375
74. J.E. Niven, S.B. Laughlin, Energy limitation as a selective pressure on the evolution of sen-
sory systems. J. Exp. Biol. 211(11), 17921804 (2008)
Chapter 6
Conclusions
Best way to predict the future is to invent it. Medicine in the twentieth century
relied primarily on pharmaceuticals that could chemically alter the action of neu-
rons or other cells in the body, but twenty-first century health care may be defined
more by electroceuticals: novel treatments that will use pulses of electricity to
regulate the activity of neurons, or devices that interface directly with our nerves.
Systems such as brain machine interface detect the voltage changes in the brain
that occur when neurons fire to trigger a thought or an action, and they translate
those signal into digital information that is conveyed to the machine, e.g., pros-
thetic limb, speech prosthesis, a wheelchair.
To help accomplish specific tasks a hybrid BMI could be build that combines
brain signals with input from other sensors. Sensors exist or are in the works that
can observe eye movement, breath, sweat, gaze, facial expressions, heart rate,
muscle movements, and sleep patterns, as well as the ambient temperature and air
quality. For example, an eye-tracking sensor follows the subjects gaze to locate
the target object, and ECoG sensor record brain activity while the subject reaches
toward that target. A computer analyzes the brain activity associated with the
subjects arm movement and sends a command to a robotic arm; with the help of
depth sensor, the arm reaches out and grabs the object. If a prosthetic limb has
6.2 Recommendations and Future Research 129
sensors that register when it touches an object, it could in principle send that sen-
sory feedback to a patient by stimulating the brain though the ECoG electrodes.
Consequently, a two-way communication between brain and prosthesis can be
used to help a user deftly control the limb.
What would it take to build a hybrid BMI? First, we need to improve our
recording hardware. Todays systems use only a few dozen electrodes on the cor-
tex; clearly, a much higher density of electrodes would produce a better signal.
We need a suite of sensors, possibly with a wearable gadget/clothing that moni-
tors, stimulates, and collects the data. To decipher neural activity of not just in one
area but across large regions of the brain, signal analysis needs to improve. We
will need better spatial and temporal resolution to determine the exact sequence in
which groups of neurons across the cortex fire to produce a command or a thought.
Finally, and the most importantly, we need novel circuit- to system-level tech-
niques to enhance the power efficiency of autonomous BMI systems and wire-
less sensors to ensure continued performance enhancements under a tight power
budget. Dramatic improvements in power efficiency can be obtained through sev-
eral principles:
electronics is going toward increasingly complex systems: meaningful circuit
solutions need to fit a system concept first;
power efficiency comes from synergy: working cooperatively across levels of
abstraction leads to benefits that are largely greater than the sum of the single
benefits;
exploring alternative signal processing circuits, e.g., time-based, current-based
processing, for power-efficient solutions; using digitally assisted analog circuit
and analog-assisted digital circuit techniques;
power is a valuable currency, and needs to be continuously traded-off with other
available commodities (performance, sample rate, resolution, signal quality,);
power needs to be truly scalable across voltage and time-varying specifications:
every time we can give up something, power needs to benefit from it;
using power-efficient machine-learning techniques to recognize certain general
states of mind from EEG or ECoG recordings; using power-scalable kernels for
the classification of a neural spikes;
emerging technologies are a significant source of inspiration to look at the
future, and to learn new ways to use what exists; circuit and system integration
with emerging and post-CMOS technologies (TFET, SymFET, BiSFET);
understanding or at least measuring are powerful tools to increase power effi-
ciency by avoiding pessimism and reducing design margin.
Additional design challenges posed by increased system integration of a multi-
physical domain hybrid bioelectronic interface, where not only analog and digi-
tal electronics are integrated, but also the mechanical, chemical, optical, and
thermal sensors are becoming integral part of the embedded system needs to be
addressed as well. Creation of a unified design environment where the system
definition and its design partitioning across the different physical domains can be
analyzed and verified remains priority. In addition, non-functional constraints that
130 6Conclusions
2 = 4kT g
in,T m (A.1)
2 = 4kT
in,R (A.2)
R
The input-referred thermal noise of the (single transistor, common-source) ampli-
fier with resistive load can be calculated as the output noise divided by the gain of
the amplifier
2 = 1 4kT 4kT 1 4kT
vn,i 2
4kT g m + = + (A.3)
gm R gm gm R gm
assuming the gmR, which is the gain of the amplifier, is much greater than 1/, thus
1/(gmR) is negligible compared to if the amplifier has a high-enough gain. The
total input-referred thermal noise of the amplifier can be calculated by integrating
the noise over the entire frequency range to be
4kT 1 4kT
Vrms,ni = = (A.4)
2 gm 2RC gm RC
Since the total power consumption is P = ItotVDD, we can express the total power
consumption of the amplifier as a function of input-referred thermal noise as [3]
1 UT kT VDD
P= 2 (A.6)
Vrms,ni 2RC 2
Previous equation illustrates the trade-off between the power consumption and
the total input-referred thermal noise of a subthreshold amplifier for a given sup-
ply voltage and bandwidth (denoted by RC product in this case). To reduce the
input-referred thermal noise by a factor of 2, the total power consumption must be
increased by a factor of 4. This relationship shows a steep power cost of achieving
low-noise performance in a thermal-noise limited amplifier, even without taking a
flicker noise into account.
The power-noise tradeoff in the amplifier is aggravated if the transistor is oper-
ating in strong inversion. In strong inversion, the transconductance gm is propor-
tional to Itot. As a result, the total power consumption scales as 1/Vni4 instead of
1/Vni2 as in the subthreshold case.
where VDD is the supply voltage, k is the Boltzmann constant, T is the temperature
in Kelvins, BWLNA = fLP fHP is the 3-dB bandwidth of the LNA, fLP and fHP are
low-pass and high-pass bandwidth, respectively, UT is the thermal voltage (kT/q),
and noise efficiency factor NEF is defined as [3]
2ILNA
NEF = Vrms,in (A.8)
4kT UT BWLNA
Appendix 133
The total LNA output noise voltage should be less than the ADC quantization
noise
1 VDD 2
1
G2LNA G2PGA Vrms,in
2
LSB2 = (A.9)
12 12 2n
where GLNA is the gain of the LNA, GPGA is the gain of the programmable gain
amplifier, LSB is the ADC least significant bit voltage value, and n is the resolu-
tion of the A/D converter. Combining (A.7) and (A.9), the minimum LNA power
consumption is expressed as
24 2n kT UT BWLNA
PLNA G2LNA G2PGA (NEF)2 (A.10)
VDD
The PGA derives the following ADC and must meet a slew rate constraint. By set-
ting the time constant = tslew, where tslew = 1/2fs is the maximum allowable time
for slewing, the minimum required biasing current of the PGA (IPGA,slew = gmVeff) is
CL,PGA GPGA Veff
IPGA,slew = (A.11)
Tslew
where CL,PGA is the load capacitance of the PGA, Veff is the voltage swing of the A/D
converter, and fs is the sampling rate for one recording channel. Consequently, the
power consumption of the PGA is [4]
PPGA = 2fs CL,PGA GPGA Veff VDD (A.12)
2n
CS = 12kT 2 (A.13)
VFS
To charge this capacitor to VFS within one half period of the sampling frequency
fS, we need a current of I = 2fSCSVFS. Assuming that we have an ideal amplifier,
driving the capacitor leads to a minimum supply current for that amplifier. Further
assuming that the supply voltage of the amplifier is equal to VFS, we arrive at a
power dissipation of IVFS for the amplifier and, therefore, for the sampling process.
Combining these relationships gives a lower bound for the sampling power
PSH = 24kTfS 22n (A.14)
134 Appendix
In the binary search algorithm, n steps are needed to complete one conversion, as
the DAC output gradually approaches the input voltage. The DAC output voltage
for the i-th step can be expressed as
Vref Vref
VDAC,out = VI (i) = Vin + Dn1 + ... + i , 1 i n (A.15)
2 2
where VI is the input voltage difference, Vref is the reference voltage and Dn is the
digital representation of n bit code. The comparator must determine the output dig-
ital code of the sub-ADC converted into a voltage by the DAC for transfer phase
within the decision time td. Subsequently, the output voltage difference required to
make the comparison in the latch-based comparator can be expressed as
Vout = AV VI exp(td ) (A.16)
where AV acts as a gain factor from the input to the initial imbalance of the latch
decision stage, = CL,comp/gm, and CL,out and gm are the output load and transcon-
ductance of the comparator, respectively. Assuming the td = 1/ts, the required gm is
n
n2
CL,comp VDD VDD
gm,comp = ln = 2nf s C L,comp ln + ln 2
td AV (Vref /2K ) AV Vref 2
K=1
(A.17)
To identify the minimum power limit of the comparator, it is noted that its total
input-referred noise voltage has a fundamental kT/C limitation given by
kT
Vn2 = 4 (A.18)
CL,comp
Equating previous equation with the quantization noise VFS /12 22n , gives the
2
22n
CL,comp = 48kT 2 (A.19)
VFS
where VFS is the full scale voltage range. Substituting (A.19) in (A.17), the min-
imum gm,comp and Icomp = gm,compVeff can be found. The power consumption of the
comparator is [4]
22n n2
VDD
Pcomp = 96nfs kT 2 Veff VDD ln + ln 2 (A.20)
VFS AV Vref 2
To drive the SAR logic capacitance within the sampling phase requires a current of
Ilogic = (ClogicVFS)/ts which leads to the following minimum limit for the sampling power
2
Plogic = nfs Clogic VDD (A.21)
Appendix 135
The unit capacitor CU is usually determined by thermal noise and capacitor mis-
match. The thermal noise resulting from the sampling action of the input voltage
is given by kT/(2nCU). In a Nyquist ADC, CU should be large enough so that the
thermal noise is less than the converters quantization noise
2n
CU,n = 12kT 2 (A.23)
VFS
The input-referred noise vn (the total integrated output noise as well) still takes the
form of kT/C with some correction factor 1,
where Ron is resistance of the switch, Vns is noise source, Cp is parasitic capacitance
and COTA is the input capacitance of the OTA. Then in the conversion mode, the
sampling capacitor C4, which now contains the signal value and the offset of the
OTA, is connected across the OTA. The total noise charge will cause an output
voltage of
2
Qns C4 + Cp + COTA 1 kT
2
vns(out) = 2 = kT 2
= (A.27)
C4 C4 C4
where is the feedback factor. For differential implementation of the circuit, the
noise power of the previous equation increases by a factor of 2 assuming no cor-
relation between positive side and negative side, since the uncorrelated noise adds
in power. Thus, input-referred noise power, which is found by dividing the output
noise power by the square of the gain (GA = C3/C4) is given by
2
vns(out) 1 kT
2
vns(in) = = (A.28)
(GA )2 (GA )2 C4
The resistive channel of the MOS devices in OTA also has thermal noise and con-
tributes to the input-referred noise of the PG ADC circuit. The noise power at the
output is found from
H s|j i2 d = kT Gm Ro kT
2
2
vns(out) = ns = (A.29)
CLT (1 + Gm Ro ) CLT
0
where Ro is the output resistance and CLT is the capacitance loading at the output
CLT = CL + Cp + COTA (A.30)
The optimum gate capacitance of the OTA is proportional to the sampling capaci-
tor COTA,opt = 3C4, where 3 is a circuit-dependent proportionality factor. The drain
current ID yields
12 L 2 12 C4
ID = (A.31)
3
where is the carrier mobility, Cox is the gate oxide capacitance, 1 is the gain-
noise variance is
2 kT
vns(in) = (A.32)
(GC )2 CLT
The noise from acquisition and conversion mode can be added together to find the
total input-referred noise assuming that two noise sources are uncorrelated. Using
Appendix 137
the results from (3.28) and (3.32), the total input-referred noise power for differen-
tial input is given by
2 2 kT 2 kT 1 1 1
vns(in) = + = 2 + kT
(GC )2 CLT (GA )2 C4 (GC )2 CLT (GA )2 C4
(A.33)
The number of transistor process parameters that can vary is large. In previous
research aimed at optimizing the yield of integrated circuits [7, 8], the number of
parameters simulated was reduced by choosing parameters which are relatively
independent of each other, and which affect performance the most. The parameters
most frequently chosen are, for n- and p-channel transistors: threshold voltage at
zero backbias for the reference transistor at the reference temperature VTOR, gain
factor for an infinite square transistor at the reference temperature SQ, total length
and width variation Lvar and Wvar, oxide thickness tox, and bottom, sidewall, and
gate edge junction capacitance CJBR, CJSR, and CJGR, respectively. The variation in
absolute value of all these parameters must be considered, as well as the differences
between related elements, i.e., matching. The threshold voltage differences VT and
current factor differences are the dominant sources underlying the drain-source
current or gate-source voltage mismatch for a matched pair of MOS transistors.
Transistor Threshold Voltage: Various factors affect the gate-source voltage at
which the channel becomes conductive such as the voltage difference between
the channel and the substrate required for the channel to exist, the work function
difference between the gate material and the substrate material, the voltage drop
across the thin oxide required for the depletion region, the voltage drop across the
thin oxide due to implanted charge at the surface of the silicon, the voltage drop
across the thin oxide due to unavoidable charge trapped in the thin oxide, etc.
In order for the channel to exist the concentration of electron carriers in the
channel should be equal to the concentration of holes in the substrate, S = F.
The surface potential changed a total of 2F between the strong inversion and
depletion cases. Threshold voltage is affected by the built-in Fermi potential due
to the different materials and doping concentrations used for the gate material and
the substrate material. The work function difference is given by
kT ND NA
ms = FSub FGate = ln (A.35)
q ni2
138 Appendix
Due to the immobile negative charge in the depletion region left behind after the
p mobile carriers are repelled. This effect gives rise to a potential across the gate-
oxide capacitance of QB/Cox, where
2Si |2F |
QB = qNA xd = qNA = 2qNA Si |2F | (A.36)
qNA
and xd is the width of the depletion region. The amount of implanted charge at the
surface of the silicon is adjusted in order to realize the desired threshold voltage.
For the case in which the source-to-substrate voltage is increased, the effective
threshold voltage is increased, which is known as the body effect. The body effect
occurs because, as the source-bulk voltage, VSB, becomes larger, the depletion
region between the channel and the substrate becomes wider, and therefore more
immobile negative charge becomes uncovered. This increase in charge changes the
charge attracted under the gate. Specifically, QB becomes
QB = 2qNA Si (VSB + |2F |) (A.37)
The voltage drop across the thin oxide due to unavoidable charge trapped in the
thin oxide gives rise to a voltage drop across the thin oxide, Vox, given by
Qox qNox
Vox = = (A.38)
Cox Cox
Incorporating all factors, the threshold voltage, VT, is than given by
QB Qox QB Qox QB QB
VT = 2F ms + = ms 2F +
Cox Cox Cox
QB Qox 2qSi NA
= ms 2F + + |2F | + VSB |2F |
Cox Cox
(A.39)
When the source is shorted to the substrate, VSB = 0, a zero substrate bias is
defined as
QB Qox
VT 0 = ms 2F + (A.40)
Cox
The threshold voltage, VT, can be rewritten as
2qSi NA
VT = VT 0 + |2F | + VSB |2F | = (A.41)
Cox
Advanced transistor models, such as MOST model 9 [9], define the threshold volt-
age as
VT = VT 0 + VT 0 + VT 1 = VT 0 = (VT 0T + VT 0G + VT 0(M) ) + VT 0 + VT 1
(A.42)
Appendix 139
where threshold voltage at zero backbias VT0 [V] for the actual transistor at the
actual temperature is defined as geometrical model, VT0T [V] is threshold tem-
perature dependence, VT0G [V] threshold geometrical dependence and VT0(M) [V]
matching deviation of threshold voltage. Due to the variation in the doping in
the depletion region under the gate, a two-factor body-effect model is needed to
account for the increase in threshold voltage with VSB for ion-implanted transistors.
The change in threshold voltage for nonzero back bias is represented in the model
as
K0 (uS uS0)
uS < uSX
2
1 K
K0 uSX K0 uS0
K0
VT 0 = (A.43)
2
2 K 2
+ K uS 1 K0
uSX uS uSX
uS = VSB + B uS0 = B uST = VSBT + B uSX = VSBX + B
(A.44)
where the parameter VSBX [V] is the backbias value, at which the implemented
layer becomes fully depleted, K0 [V1/2] is low-backbias body factor for the actual
transistor and K [V1/2] is high-backbias body factor for the actual transistor. For
nonzero values of the drain bias, the drain depletion layer expands towards the
source and may affect the potential barrier between the source and channel regions
especially for short-channel devices. This modulation of the potential barrier
between source and channel causes a reduction in the threshold voltage. In sub-
threshold this dramatically increases the current and is referred to as drain-induced
barrier lowering (DIBL). Once an inversion layer has been formed at higher values
of gate bias, any increase of drain bias induces an additional increase in inversion
charge at the drain end of the channel. The drain bias still has a small effect in the
threshold voltage and this effect is most pronounced in the output conductance in
strong inversion and is referred to as static feedback. The DIBL effect is modeled
by the parameter 00 in the subthreshold region. This drain bias voltage depen-
dence is expressed by first part of
2
VGTX 2
VGT 1
VT 1 = 0 2 2
VDS 1 2 2
VDSDS (A.45)
VGTX + VGT 1 VGTX + V GT 1
VGS VT 1 VGS VT1
VGT 1 =
0 VGS < VT1
VGTX = 2/2 (A.46)
where 1 is coefficient for the drain-induced threshold shift for large gate drive for
the actual transistor and DS exponent of the VDS dependence of 1 for the actual
140 Appendix
where VT0(AIntra) and VT0(BIntra) are within-chip spread of VT0 [Vm], FS is a sort of
mechanism to switch between inter and intra die spread, for intra.die spread FS = 1,
otherwise is zero, and FC is correction for multiple transistors in parallel and units.
Transistor Current Gain: A single expression model the drain current for all
regions of operation in the MOST model 9 is given by
VGT 3 1+ 2
1
VDS1 VDS1
IDS = G3 (A.50)
{1 + 1 VGT 1 + 2 (us us0 )}(1 + 3 VDS1 )
Appendix 141
where
1 2
(K0 K)VSBX
1 = K+ 2 (A.51)
us VSBX + (2 VGT 1 + VSB )2
m
us0
m = 1 + m0 (A.54)
us1
1, 2, 3 are coefficients of the mobility reduction due to the gate-induced field, the
backbias and the lateral field, respectively, T thermal voltage at the actual tem-
perature, 1 weak-inversion correction factor, 1 and 2 are model constants and VP
is characteristic voltage of the channel-length modulation. The parameter m0 char-
acterizes the subthreshold slope for VBS = 0. Gain factor is defined as
We A / 2
= SQT Fold (1 + SSTI ) 1+ + B / 2 FS
Le W e Le F C
(A.55)
where SQT is gain factor temperature dependence, SSTI is STI stress, FS switching
mechanism factor, FC correction factor multiple transistors in parallel and units and A
area scaling factor and B a constant. Gain factor temperature dependence is defined as
T0 + T R
SQT = SQ (A.56)
T0 + TA + TA
BSQ BSQS
T0 + TR T0 + TR
BSQ = SQTR BSQS = SQSTR
T0 + TA + TA T0 + TA + TA
(A.58)
VGS VT 21 VDS
ID
= VDS (A.59)
1 + (VGS VT )
(VGS VT )2
ID
= (A.60)
2 1 + (VGS VT )
1 + 21 VDS (VGS VT )
o = o = (A.63)
VGS VT 21 VDS (1 + (VGS VT )) 1 + (VGS VT )
A / 2
= + B / 2 + S D (A.67)
Weff Leff
where Weff is the effective gate-width and Leff the effective gate-length, the propor-
tionality constants AVT, SVT, A, and S are technology-dependent factors, D is dis-
tance and BVT and B are constants. For widely spaced devices terms SVTD and SD
are included in the models for the random variations in two previous equations,
but for typical device separations (<1 mm) and typical device sizes this correction
is small. Most mismatch characterization has been performed on devices in strong
inversion, in the saturation or linear region but some studies for devices operat-
ing in weak inversion have also been conducted. Qualitatively, the behavior in all
regions is very similar; VT and variations are the dominant source of mismatch
and their matching scales with device area. The effective mobility degradation
mismatch term can be combined with the current factor mismatch term, as both
terms become significant in the same bias range (high gate voltage). The corre-
lation factor (VT, /) can be ignored as well, since correlation between
(VT) and the other mismatch parameters remains low for both small and large
devices. The drain source current error ID/ ID is important for the voltage biased
pair. For the current biased pair, the gate-source or input referred mismatch should
be considered, whose expression could be derived similarly as for drain source
current error. Change in gate-source voltage can be calculated by
VGS VGS
VGS = VT + (A.68)
VT
where AB [m2] is diffusion area, VR [V] voltage at which parameters have been
determined, VDB [V] diffusion voltage of bottom area AB, VDBR [V] diffusion
voltage of the bottom junction at T = TR and PB [] bottom-junction grading
coefficient.
Similar formulations hold for the locos-edge and the gate-edge compo-
nents; one has to replace the index B by S and G, and the area AB by LS and LG.
Capacitance of the bottom component is derived as
CJBR
P
B
V < VLB
1 V V
CJBV = DB (A.73)
C CLB PB (V VLB )
LB + VDB (1FCB ) V VLB
where
1
1 + PB PB
CLB = CJB (1 FCB )PB FCB = 1 VLB = FCB VDB (A.74)
3
and V is diode bias voltage. Similar expressions can be derived for sidewall CJSV and
gate edge component CJGV. The total diode depletion capacitance can be described by:
C = CJBV + CJSV + CJGV (A.75)
Appendix 145
Typical CMOS and BiCMOS technologies offer several different resistors, such
as diffusion n+/p+ resistors, n+/p+ poly resistors, and nwell resistor. Many fac-
tors in the fabrication of a resistor such as the fluctuations of the film thickness,
doping concentration, doping profile, and the dimension variation caused by the
photolithographic inaccuracies and nonuniform etch rates can display significant
variation in the sheet resistance. However, this is bearable as long as the device
matching properties are within the range the designs require. The fluctuations of
the resistance of the resistor can be categorized into two groups, one for which the
fluctuations occurring in the whole device are scaled with the device area, called
area fluctuations, another on in which fluctuations takes place only along the
edges of the device and therefore scaled with the periphery, called peripheral fluc-
tuations. For a matched resistor pair with width W and resistance R, the standard
deviation of the random mismatch between the resistors is
fp
= fa + W R (A.76)
W
where fa and fp are constants describing the contributions of area and periphery
fluctuations, respectively. In circuit applications, to achieve required matching,
resistors with width (at least 23 times) wider than minimum width should be
used. Also, resistors with higher resistance (longer length) at fixed width exhibit
larger mismatching. To achieve the desired matching, it has been a common prac-
tice that a resistor with long length (for high resistance) is broken into shorter
resistors in series. To model a (poly-silicon) resistor following equation is used
L Re
R = Rsh + (A.77)
W + W W + W
where Rsh is the sheet resistance of the poly-resistor, Re is the end resistance coef-
ficient, W and L are resistor width and length, W is the resistor width offset. The
correlations between standard deviations () of the model parameters and the stan-
dard deviation of the resistance are given in the following
2 2 2
R R R
R2 = 2
Rsh 2
+ Re 2
+ W (A.78)
Rsh Re W
2
L2
1 L Rsh Re
R2 = Rsh
2 + 2
Re + 2
W +
(W + W )2 (W + W )2 (W + W )2 (W + W )2
(A.79)
To define the resistor matching,
2 2 2
2 2 L 2 1 2 1
R = Rsh + Re + W
R (L Rsh + Re ) (L Rsh + Re ) (W + W )2
(A.80)
146 Appendix
ARsh AW
Rsh = Re = ARe W =
1
(A.81)
WL W 2
where fa and fp are factors describing the influence of the area and periphery fluctu-
ations, respectively. The contribution of the periphery components decreases as the
area (capacitance) increases. For very large capacitors, the area components
domi-
nate and the random mismatch becomes inversely proportional to C . A simple
capacitor mismatch model is given by
2 fp fa
C = p2 + a2 + d2 p = 3 a = 1 d = fd d (A.83)
C C4 C2
where fp, fa, and fd are constants describing the influence of periphery, area, and
distance fluctuations. The periphery component models the effect of edge rough-
ness, and it is most significant for small capacitors, which have relatively large
amount of edge capacitance. The area component models the effect of short-range
dielectric thickness variations, and it is most significant for moderate size capaci-
tors. The distance component models the effect of global dielectric thickness vari-
ations across the wafer, and it becomes significant for large capacitors or widely
spaced capacitors.
The modern analog circuit simulators use a modified form of nodal analysis [11,
12] and NewtonRaphson iteration to solve the system of n nonlinear equations
fi in n variables pi. In general, the time-dependent behavior of a circuit containing
linear or nonlinear elements may be described as [13]
q E = 0 q0 = q(0)
f (q, , w, p, t) = 0 (A.84)
This notation assumes that the terminal equations for capacitors and inductors are
defined in terms of charges and fluxes, collected in q. The elements of matrix E
Appendix 147
are either 1 or 0, and represents the circuit variables (nodal voltages or branch
currents). All nonlinearitys are incorporated in the algebraic system f(q, , w, p,
t) = 0, so the differential equations q E = 0 are linear. The initial conditions
are represented by q0. Furthermore, w is a vector of excitations, and p contains the
circuit parameters like parameters of linear or nonlinear components. An element
of p may also be a (nonlinear) function of the circuit parameters. It is assumed that
for each p there is only one solution of . The dc solution is computed by solving
the system
E0 = 0
(A.85)
f (q0 , 0 , w0 , pi , 0) = 0
which is derived by setting q = 0. The solution (q0, 0) is fond by Newton-
Raphson iteration. In general, this technique finds the solution of a nonlinear sys-
tem F() = 0 by iteratively solving the Newton-Raphson equation
J k k = f ( k ) (A.86)
k
where J is the Jacobian of f, with J ij = fi /j . Iteration starts with estimate
k
k
At each time point the circuit derivatives are obtained by solving previous system
of equation after the original system is solved. Suppose, for example, that a kth
order backward differentiation formula (BDF) is used [15, 16], with the corrector
k1
1
(q )n+k = ai qn+ki (A.90)
t
i=0
where the coefficients ai depend upon the order k of the BDF formula. After sub-
stituting (A.90) into (A.84), the Newton-Raphson equation is derived as
k1
a0 1
t E qn+k t ai qn+ki En+k
f f
n+k
= t=0 (A.91)
q f (qn+k , n+k , wn+k , pj , tn+k )
Iteration on this system provides the solution (qn+k,n+k). Substituting a kth order
BDF formula in (A.89) gives the linear system
a q k1
1 q
t0 E pj n+k t ai p j
(A.92)
f f = t=0 n+ki
q f
pj
n+k
p j
Thus (A.91) and (A.92) have the same system matrix. The LU factorization of this
matrix is available after (A.91) is iteratively solved. Then a forward and backward
substitution solves (A.92). For each parameter the right-hand side of (A.92) is
different and the forward and backward substitution must be repeated. If random
term (p, t), which models the tolerance effects is nonzero and added to the
equation (A.35) [1721]
f (q, , w, p, t) + (p, t) = 0 (A.93)
Solving this system means to determine the probability density function of the ran-
dom vector p(t) at each time instant t. For two instants in time, t1 and t2, with t1
= t1 t0 and t2 = t2 t0 where t0 is a time that coincides with dc solution of cir-
cuit performance function , t is assumed to satisfy the criteria that circuit per-
formance function can be designated as the quasi-static. To make the problem
manageable, the function can be linearized by first-order Taylor approximation
assuming that the magnitude of the random term p is sufficiently small to consider
the equation as linear in the range of variability of p or the nonlinearities are so
smooth that they might be considered as linear even for a wide range of p.
Appendix 149
Once the nominal parameter vector p0 is found for the nominal device, the param-
eter extraction of all device parameters pk of the transistors connected to particular
node n can be performed using a linear approximation to the model. Let p = [p1, p2,
, pn]T Rn denote the parameter vector, f = [f1, f2, , fm]T Rm performance vec-
k T Rm the measured performance vector of the kth device
tor, zk = z1k , z2k , . . . , zm
and w a vector of excitations w = [w1, w2, , wl]T Rl. Considering Eq. (A.84)
q E = 0 q0 = q(0)
f (q, , w, p, t) = 0 (A.94)
general model can be written. The measurements can only be made under certain
selected values of w, and if the initial conditions q0 are met, so the model can be
simply denoted as
f (p) = 0 (A.95)
To extract a parameter vector pk corresponding to the kth device
pk = arg min f (pk ) zk (A.96)
pk Rn
is found. The weighted sum of error squares for the kth device is formed as [13]
m
1 1
(pk ) = wi [fi (pk ) zik ]2 = [f (pk ) zk ]T W [f (pk ) zk ] (A.97)
2 2
i=1
So, for the measured performance vector zk for the kth device, an approximate esti-
mate of the model parameter vector for the kth device is obtained from
where
where H is the Hessian matrix [22], whose elements are the second-order
derivatives
Now define
r = rr pr + where rr pr = [1 . . . k ]T (A.105)
r rr pr 2
2 (A.106)
testable if the variance of its estimated deviation is below a certain limit. The off-
diagonal elements of Cpr contain the parameter covariances.
If an accuracy check shows that the performance function extraction is not
accurate enough, the performance function correction is performed to refine the
extraction. The basic idea underlying performance function correction is to correct
the errors of performance function extraction based on the given model and the
knowledge obtained from the previous stages by iteration process. Denoting
k k
(i) (p) = 0 + (i) (A.109)
the extracted performance function vector for the kth device at the ith iteration,
performance function correction can be found by finding the solution for the trans-
k k )
formation (i+1) = Fi ((i) such that more accurate performance function vectors
can be extracted, subject to
k k k k
(i+1) () < (i) () (A.110)
where
k k
() = arg min ( ) (A.111)
k Rn
is the ideal solution of the performance function. The error correction mapping Fi
is selected in the form of
k k k
(i+1) (p) = (i) (p) + di (i) (A.112)
{dik , (i)
k
, k = 1, 2, . . . , K} (A.113)
gives the information relating the errors due to inaccurate parameter extraction to
the extracted parameter values. A quadratic function is postulated to approximate
the error correction function
n
n
n
dt = pj + pj pl , t = 1, 2, . . . , n (A.114)
j=1 j=1 l=1
k
(i+1) k
= (i) k
(p) + di (i) (A.116)
2
n= 2z1 (A.121)
2
If, for example a mean value has to be estimated with a relative error / =
0.1 and a confidence level of = 0.99 (z1/2 2.5) the sample size is n = 2500.
Similar to that we have for the estimate of the variance
n
1
2 = (i )2 (A.122)
n1
i=1
in order to provide that the estimate 2 falls with probability into the interval
2 2
2 2 2 + (A.124)
2 2
Appendix 153
For example, the required number of samples for an accuracy of / = 0.1 and
a confidence level of 0.99 is n = 1250.
(VTi , j) T 1 T (VTi , j) W (VTi , j)
= d T (VTi , j) X(VTi , j)
VTi VTi VTi
(A.128)
(i , j) T (i , j) W (i , j)
= d T T 1 (i , j) X(i , j) (A.129)
i i i
The first-order derivatives of the magnitude of the circuit performance function are
computed from
|(j)| 1 (VTi , j)
= |(VTi , j)|Re (A.130)
VTi (VTi , j) VTi
|(i , j)| 1 (i , j)
= |(i , j)|Re (A.131)
i (i , j) i
154 Appendix
where Re denotes the real part of the complex variable function. The second-
order derivatives are calculated from
(A.132)
2 |(i , j)| (i , j) 2
1
= |( i , j)|Re
i2 (i , j) i
2
2 (i , j) (i , j) 2
1 1
+ |(i , j)|Re
(i , j) i2 (i , j)2 i
(A.133)
The circuit performance function (j) can be approximated with the truncated
Taylor expansions as
(j)
= (j) + J (j) (j) (A.134)
(A.136)
where the covariance matrix of the circuit performance function C (j) is defined
as
where
i +Li x
x j +Lj yi+Wi yj+Wj
1
Cp1 p1 = Rp1 p1 (xA , yA , xB , yB )
ij (Wi Li )(Wj Lj )
xi xj yi yj
p1 (xA , yA )p1 (xB , yB ) dxA dxB dyA dyB
(A.139)
i +Li x
x j +Lj yi+Wi yj+Wj
1
Cp1 p2 = Rp1 p2 (xA , yA , xB , yB )
ij (Wi Li )(Wj Lj )
xi xj yi yj
p1 (xA , yA )p2 (xB , yB ) dxA dxB dyA dyB
(A.140)
and Rp1p1(xA, yA, xB, yB), the autocorrelation function of the stochastic process p1,
is defined as the joint moment of the random variable p1(xA, yA) and p1(xB, yB), i.e.,
Rp1p1(xA, yA, xB, yB) = E{p1(xA, yA) p1(xB, yB)}, which is a function of xA, yA and xB, yB
and Rp1p2(xA, yA, xB, yB) = E{p1(xA, yA)p2(xB, yB)} the cross-correlation function of
the stochastic process p1 and p2. The experimental data shows that threshold volt-
age differences VT and current factor differences are the dominant sources
underlying the drain-source current or gate-source voltage mismatch for a matched
pair of MOS transistors.
The covariance pipj = 0, for i j, if pi and pj are uncorrelated. Thus the covari-
ance matrix CP of p1, , pk with mean pi and a variance pi2 is
Cp1 ,...pk = diag(1, . . . , 1) (A.141)
In [10] these random differences for the single transistor having a normal distribu-
tion with zero mean and a variance dependent on the device area WL are derived
as
AVT / 2
for i = j Cp1 p1 = VT = + BVT / 2 + SVT D; for i = j Cp1 p1 = 0
ij Weff Leff ij
(A.142)
A / 2
for i = j Cp2 p2 = / = + B / 2 + S D; for i = j Cp2 p2 = 0
ij Weff Leff ij
(A.143)
where Weff is the effective gate-width and Leff the effective gate-length, the propor-
tionality constants AVT, SVT, A, and S are technology-dependent factors, D is dis-
tance and BVT and B are constants.
156 Appendix
n
2
2 |(VTi , j)| 2 2 (Vi , j) 2
= VTi + i (A.145)
i=1
VTi2 i2
where n is total number of transistors in the circuit and is the mean of = f(VT
(j), (j)) over the local or global parametric variations.
and
P(G) = P(n G |G) = fn (n |G)dn = 1 fn (n |F)dn = 1
G G
(A.149)
P(F) = P(n F |F) = fn (n |F)dn = 1 fn (n |G)dn = 1
F F
(A.150)
Recall that if ~ N(, 2), then Z = ( /) ~ N(0, 1). In the present case,
the sample mean of , ~ N(, 2/n), since the variable is assumed to have
a normal distribution. Since and represent probabilities of events from the
same decision problem, they are not independent of each other or of the sample
size. Evidently, it would be desirable to have a decision process such that both
and are small. However, in general, a decrease in one type of error leads to an
increase in the other type for a fixed sample size. The only way to simultaneously
reduce both types of errors is to increase the sample size. However, this proves
to be time-consuming process. The NeymanPearson test is a special case of the
Bayes test, which provides a workable solution when the a priori probabilities may
be unknown or the Bayes average costs of making a decision may be difficult to
evaluate or set objectively. The Neyman-Pearson test is based on the critical region
C* , where is sample space of the test statistics
C = {(1 , . . . , n ) : l(1 , . . . , n |G, F) } (A.151)
which has the largest power (smallest -probability that faulty circuit is accepted
when it is faulty) of all tests with significance level . Introducing the Lagrange
multiplier to account for the constraint gives the following cost function, J,
which must be maximized with respect to the test and
J = 1 + (0 ) = 0 + fn (n |F) fn (n |G)dn
(A.152)
G
Now,
n
n
(i F )2 (i G )2 = n 2F 2G 2n(F G ) (A.156)
i=1 i=1
Using the NeymanPearson Lemma, the critical region of the most powerful test
of significance level is
1 2 2
C = 1,..., n : exp n F G 2n(F G )
2 2
2
(F + G )
(A.157)
= 1,..., n : log +
n(F G ) 2
= 1,..., n :
For the test to be of significance level
G
P | N , 2 /n = P Z = = G + z(1)
/ n n
(A.158)
where P(Z < z(1)) = 1 , which can be also written as 1(1 ). z(1) is
the (1 )-quantile of Z, the standard normal distribution. This boundary for the
critical region guarantees, by the Neyman-Pearson lemma, the smallest value of
obtainable for the given values of and n. From two previous equations, we can
see that the test T rejects for
G
T= z(1) (A.159)
/ n
Similarly, to construct a test for the two-sided alternative, one approach is to com-
bine the critical regions for testing the two one-sided alternatives. The two one-
sided tests form a critical region of
C = (1 , . . . , n ) : 2 , 1
(A.160)
Appendix 159
1 = G + z(1 ) 2 = G z(1 ) (A.161)
2 n 2 n
Thus, the test T rejects for
G G
T= z(1 ) or T = z(1 ) (A.162)
/ n 2 / n 2
G
T= tn1, (A.165)
S/ n
A critical region for the two-sided alternative if the variance 2 is unknown of the
form
G
C = (1 , . . . , n ) : t = 2 , t 1 (A.166)
S/ n
G G
T= tn1, 2 or T = tn1, 2 (A.168)
S/ n S/ n
160 Appendix
References
1. R.P. Jindal, Compact noise models for MOSFETs. IEEE Trans. Electron Devices 53(9),
20512061 (2006)
2. J. Ou, in gm/ID based noise analysis for CMOS analog circuits, Proceedings of IEEE
International Midwest Symposium on Circuits and Systems, pp. 14, 2011
3. W. Wattanapanitch, M. Fee, R. Sarpeshkar, An energy-efficient micropower neural recording
amplifier. IEEE Trans. Biomed. Circuits Syst. 1(2), 136147 (2007)
4. M. Zamani, A. Demosthenous, in Power optimization of neural frontend interfaces,
Proceedings of IEEE International Symposium on Circuits and Systems, pp. 30083011,
2015
5. C.C. Liu etal., A 10-bit 50-MS/s SAR ADC with a monotonic capacitor switching proce-
dure. IEEE J. Solid-State Circuits 45(4), 731740 (2010)
6. D. Zhang, C. Svensson, A. Alvandpour, in Power consumption bounds for SAR ADCs,
Proceedings of IEEE European Conference on Circuit Theory and Design, pp. 556559,
2011
7. T. Yu, S. Kang, I. Hajj, T. Trick, in Statistical modeling of VLSI circuit performances,
Proceedings of IEEE International Conference on Computer-aided Design, pp. 224227,
1986
8. K. Krishna, S. Director, The linearized performance penalty (LPP) method for optimization
of parametric yield and its reliability. IEEE Trans. CAD Integr. Circuits Syst. 15571568
(1995)
9. MOS model 9, available at http://www.nxp.com/models/mos-models/model-9.html
10. M. Pelgrom, A. Duinmaijer, A. Welbers, Matching properties of MOS transistors. IEEE J.
Solid-State Circuits 24(5), 14331439 (1989)
11. V. Litovski, M. Zwolinski, VLSI Circuit Simulation and Optimization (Kluwer Academic
Publishers, Dordrecht, 1997)
12. K.Kundert, Designers Guide to Spice and Spectre (Kluwer Academic Publishers, Dordrecht,
1995)
13. J. Vlach, K. Singhal, Computer Methods for Circuit Analysis and Design (Van Nostrand
Reinhold, New York, 1983)
14. N. Higham, Accuracy and Stability of Numerical Algorithms (SIAM, Philadelphia, 1996)
15. W.J. McCalla, Fundamentals of Computer-aided Circuit Simulation (Kluwer Academic
Publishers, Dordrecht, 1988)
16. F. Scheid, Schaums Outline of Numerical Analysis (McGraw-Hill, New York, 1989)
17. E. Cheney, Introduction to Approximation Theory (American Mathematical Society,
Providence, 2000)
18. S. Director, R. Rohrer, The generalized adjoint network and network sensitivities. IEEE
Trans. Comput. Aided Des. 16(2), 318323 (1969)
19. D. Hocevar, P. Yang, T. Trick, B. Epler, Transient sensitivity computation for MOSFET cir-
cuits. IEEE Trans. Comput. Aided Des. CAD-4, 609620 (1985)
20. Y. Elcherif, P. Lin, Transient analysis and sensitivity computation in piecewise-linear circuits.
IEEE Trans. Circuit Syst. I 38, 15251533 (1991)
21. T. Nguyen, P. OBrien, D. Winston, in Transient sensitivity computation for transistor level
analysis and tuning, Proceedings of IEEE International Conference on Computer-Aided
Design, pp. 120123, 1999
22. K. Abadir, J. Magnus, Matrix Algebra (Cambridge University Press, Cambridge, 2005)
23. A. Papoulis, Probability, Random Variables, and Stochastic Processes (McGraw-Hill, New
York, 1991)
24. C. Gerald, Applied Numerical Analysis (Addison Wesley, Reading, 2003)
Index
K P
Karhunen-Loeve expansion, 99, 100 Parameter space, 81, 103, 111
KarushKuhnTucker conditions, 82, 85 Parameter vector, 147
Kernel, 13, 77, 78, 80, 8284, 86, 87, 89, 91, Parametric yield, 102
113, 125, 127 Parametric yield optimization, 102
Keslers construction, 13, 77, 78, 83, 125 Pedestal voltage, 42
K-means, 78 Phase margin, 21
Kronecker delta, 84 Pipeline converters, 37, 38
Power per area, 14, 96, 97, 112, 117119, 126
Principal component analysis, 78
L Probability density function, 105, 111
Least significant bit, 39 Process variation, 11, 14, 96, 104, 110, 116,
Local field potentials, 19, 33 119, 126
Low noise amplifier, 18, 19, 124 Programmable gain amplifier, 19, 52
Low-noise amplifier, 13 Push-pull current mirror amplifier, 24
Lyapunov equations, 109, 110
Index 163
T
R Telescopic cascode amplifier, 2224, 26
Random variability, 97 Template matching, 78, 88
Random error, 11 Threshold voltage, 4244, 51, 80, 98, 100, 101
Random gate length variability, 8, 116 Threshold voltage-based models, 98
Random intra-chip variability, 116 Time-interleaved systems, 38
Random process, 11, 97100 Tolerance, 42, 102, 103, 117, 119
Random variables, 98, 100 Total harmonic distortion, 13, 18, 29, 65, 124
Random vector, 105 Transconductor, 13, 18, 21, 30, 124
Reliability, 11, 43, 111, 118 Transient analysis, 106, 107
Residuals, 148 Two-stage amplifier, 2527, 46
Runtime, 114 Two-step converter, 3638
S U
Sample and hold, 39, 58, 59, 66 Unbiased estimator, 157
Schur decomposition, 109 Utah array, 2
Sensors, 18, 124, 126, 127
Short-channel effects, 43, 96
Signal to noise and distortion ratio, 65 V
Signal-to-noise ratio, 3, 17, 33, 43, 119, 133 Variable gain amplifier, 25
Significance level, 9 Vernier, 60, 6264, 69
Slew rate, 8, 24, 45, 46 Very large-scale integrated circuit, 3
Spatial correlation, 100 Voltage-to-time converter, 34, 61, 62, 67
Spike classifier, 13, 78, 79, 81, 91, 125 Voltage variability, 2, 116, 126
Spurious free dynamic range, 65
Standard deviation, 86, 98
Static latch, 4749 W
Stationary random process, 98 Wafer, 98
Stochastic differential equations, 103, Wide-sense stationary, 98
105108 Wiener process, 108
Stochastic process, 98, 101, 102, 108 Within-die, 103, 107
Subrange, 35 Worst-case design, 115, 119
Substrate coupling, 26
Successive approximation register, 38, 39
Support vector machine, 13, 78, 79, 81, 83, Y
91, 125 Yield, 3, 11, 12, 14, 84, 89, 96, 97, 103, 106,
Surface potential-based models, 98 110, 111, 116120, 124, 126
Switched capacitor, 40, 41, 44, 118
System on chip, 1, 3, 9, 11, 12, 44, 96, 124