Research Presentation

Nirav A. Desai desai.nirav.12.09@gmail.
com
Nirav A. Desai desai.nirav.12.09@gmail.com
MM-Wave Active Sensor: BPSK Spectrum can be seen in the Spectrum Analyzer
3 Nirav Desai
I assisted in these mm-wave MIMO experiments at UCSB
10
11
12
13
14
15
16
17
18
19
20
21
22
23
EE 5323: VLSI DESIGN 1 PROJECT Course Instructor: Prof. Chris Kim 16-bit BRENT KUNG ADDER DESIGN in 45nM CMOS Nirav Desai ID: 4280229 Department of Electrical and Computer Engineering University of Minnesota
24
25
Brent Kung Adder Gate Level Diagram 1. Input Block with Pre Computation
1.097X 1X
3.883X
Input Adder Chain 1

Gi + Pi*Gi-1 1.224X
Input Adder Chain 2

1.562X 1X 10.1683X 36X
Input Adder Chain 3

1X
1.553X 3.043X 1.108X
Output Buffers to drive Capacitive Loads

Pi*Pi-1
Input Adder Chain 4
1.23X
1X 1.274X 1.034X
2.943X
10.8506X
40X

Nirav A. Desai desai.nirav.12.09@gmail.com 26
Brent Kung Adder Gate Level Diagram
2. Intermediate Dot Product Blocks
Intermediate Adder Chain 1
Gi + Pi*Gi-1
1X
Intermediate Adder Chain 2

6X 1.72X 1X 1X 4X 16X 16X
1X
Pi*Pi-1

Brent Kung Adder Gate Level Diagram

3. Output Block for Post Computation
Pi
Si
1.182X
1.117X
Ci-1
28
Brent Kung Adder Transistor Level Design

XOR GATE
29
Brent Kung Adder Transistor Level Design Inverter Design Optimization
110
100
90
80 TD*Iavg
70
NMOS Width = 90nm PMOS / NMOS Length = 50nM Vdd = 1.1V Current Averaged Over One Period of 2 ns Optimal PMOS Width = 165nM inverter = 165/90 = 1.834 Sizing for NAND, NOR and XOR Changed appropriately
60
50
40
120 140 160 180 200 220 240 260 280 300 PMOS Width (nM)
30

1. Input Block with Pre Computation
Logical Effort Design for Signal Chains labeled in previous slide #2

Input Adder Block Chain 1
Gate Number Gate Name g value f value b value S Value Stage 1.000 2.000 3.000 4.000 5.000 Stage F Stage B Stage H Gate H G BUFFER INVERTER NOR INVERTER NAND LOAD h 1.000 1.000 1.646 1.000 1.352 36.000 2.225 36.000 6.943 556.248 3.540 3.540 3.540 2.151 3.540 2.618648 2.893 2.400 1.000 1.000 1.000 1.000 1.000 1.224 1.097 3.883 10.16831 36.000

Gate Number Gate Name g value f value b value S Value 1.000 2.000 3.000 BUFFER INVERTER XOR NAND 1.000 1.000 1.893 4.518 4.518 2.386 2.893 2.400 1.780 1.000 1.562 1.553 4.000 LOAD 1.295 3.488 1.000 3.043 13.748 1.000 13.748 2.451 13.748 12.359 416.510 Stage Stage F G Stage B Stage H Gate H h 4.518

Gate Number Gate Name g value f value b value S Value 1.000 2.000 3.000 BUFFER INVERTER NOR 1.000 1.000 1.646 3.558 3.558 2.162 2.893 2.400 1.000 1.000 1.230 1.108 Stage Stage F G LOAD 3.941 1.646 3.941 6.943 45.038 Stage B Stage H Gate H h 3.558
3.941

Gate Number Gate Name g value f value b value S Value Stage 1.000 2.000 3.000 4.000 5.000 Stage F Stage B Stage H Gate H G BUFFER INVERTER XOR NAND INVERTER LOAD h 1.000 1.000 1.893 1.295 1.000 40.000 2.451 40.000 6.943 680.832 3.686 3.686 3.686 1.947 2.847 3.686447 2.893 2.400 1.000 1.000 1.000 1.000 1.000 1.274 1.034 2.943 10.85056 40.000 3.94084
31

2. Intermediate Dot Product Blocks
Logical Effort Design for Signal Chains labeled in previous slide #3

Intermediate Adder Block Chain 1
Gate Number Gate Name g value f value b value S Value 1.000 2.000 INVERTER NAND LOAD 1.000 1.352 1.000 2.848 2.107 2.848 1.000 1.000 1.000 1.000 2.107 6.000 Stage G 1.352 Stage F 6.000 Stage B 1.000 Stage H Gate H h 8.112 2.848
Intermediate Adder Block Chain 2

Gate Number Gate Name g value f value b value S Value 1.000 2.000 BUFFER NAND 1.000 1.352 2.775 2.053 2.000 1.000 1.000 1.026 Stage G LOAD 2.848 1.352 Stage F 2.848 Stage B 2.000 Stage H Gate H h 7.701 2.775
32
Brent Kung Adder Simulated Performance Simulations with maximally sized 1 stage buffers as determined by Logical Effort Design of individual chains
Voltage (V) Delay Max-C14 (nS) 0.359 0.503 0.937 Power Max (mW) 6.73 2.95 0.924 Power-Delay Product (xE-12) 2.41 1.483 0.865
1.1 0.9 0.7
Simulations with minimally sized 1 stage buffers

Voltage (V) Delay Max-C14 (nS) Power Max (mW) Power-Delay Product (xE-12)
1.1 0.9 0.7
0.403 0.569 1.069
5.186 2.277 0.692
2.089 1.295 0.739
Without Parasitic Extraction and Interconnect Parasitics buffering doesnt improve performance significantly. Nirav A. Desai desai.nirav.12.09@gmail.com 33
Brent Kung Adder Worst Case Delay Input Pattern: A: FFFF B: 0000 -> 0001 Dotted Lines show Carry Bits 15 and 14
Carry Bit 15
Carry Bit 14
34
Brent Kung Adder Layout Input Block with Pre Computation Input Inverters for Bit 0 and Bit 1 XOR NAND 10X
Output Buffers PEX waveforms show larger size may be needed
35
Brent Kung Adder Layout XOR 1.553X
36
Brent Kung Adder Layout NAND 10.57X Layout with inter digitated fingers to reduce parasitics
37
Brent Kung Adder Layout Intermediate Dot Product Generator
Output Buffers PEX Waveforms show larger Size may be necessary here
38
Brent Kung Adder Layout Output Stage with Buffers
39
Brent Kung Adder Layout Full Layout: 49.5um X 48.6um
40
Future Design Modifications

The design uses large buffers at the output of every stage to drive large capacitances The buffers are not needed at nodes with low fanouts and can be eliminated. The buffers at input nodes right now cause more power consumption and add to the delay . Thus the overall performance can be improved with fewer buffers.
41
References:
Course Slides from Prof. Kia Bazargans Course on VLSI
A Taxonomy of Parallel Prefix Networks

(David Harris ) Reference paper on course
website
Digital Integrated Circuits by Jan Rabaey
42
SRAM DESIGN PROJECT PHASE 2

Nirav Desai 4280229 VLSI DESIGN 2: Prof. Kia Bazargan Dept. of ECE College of Science and Engineering University of Minnesota, Twin Cities
University of Minnesota
43
43
SRAM CELL READ AND WRITE MARGIN FROM BUTTERFLY CURVE

NMOS inverter = 110nM PMOS inverter = 220nM NMOS Access = 90nM NMOSinv/NMOSaccess = 1.2 PMOSinv/NMOSaccess=2.4 Cbitline = 0.747fF for 512 cell array ( Interconnect Parasitics from ASU PTM Website )
44
SRAM CELL READ AND WRITE MARGIN FROM BUTTERFLY CURVE

NMOS inverter = 150nM PMOS inverter = 555nM NMOS Access = 180nM NMOSinv/NMOSaccess = 1.2 PMOSinv/NMOSaccess = 3 Cbitline = 0.747fF Curve shows SRAM cell is close to write failure. Bitline Precharge to less than 1.1V could be explored to increase SNM.
45
Simulation Setup
V(ic) V(word)
V(write)
V(bit)
V(qbar) V(q)
V(bitbar)
M0,M1,M3,M4 form the cross coupled inverter pair M5,M6 are access transistors C1, C2 is the bitline capacitance M7 is the precharge switch for bitline ( bit ) - V3 precharges the bitline to 0.8V V6 precharges bitbar and writes a 0 to the cell
Timing Waveforms for Characterization

V(write) precharges Cbit to 0.8V via M7 V(word) disables access transistors M5 and M6 during precharge . V(qbar) and V(q) are used to generate the butterfly curves. V(ic) enables M7 during precharge It could be implemented as NOT(V(word)). V(bitbar) precharges to 0.8V, shows charge pumping when M7 turns off and follows V(qbar) when wordline is enabled. V(bit) follows V(q) after word line is enabled. V(bit) precharged to Vdd by V6
V(write) Applied to source of M7 (precharge switch)
V(word) Wordline Voltage V(qbar) V(q)
V(ic) Enables the precharge switch M7
V(bitbar) V(bit)
47
PASS TRANSISTOR BASED TREE DESIGN 1:8 Row Decoder Tree
Similar Tree Decoder for 16 LSB Bits University of Minnesota

TREE DECODER DESIGN
49
PASS TRANSISTOR BASED TREE DESIGN

W 880 L 50
CK IN CK
Identical Sizing for NMOS and PMOS to minimize charge injection effects
OUT
Delay drops by ~40ps/2 for every Doubling of transistor widths Delay drop saturates around 1000nM to 89ps Used W/L of 880/50 for final tree
50
TREE DECODER TIMING DIAGRAMS
The following waveforms were applied to the row and column selection inputs of the tree decoder
51
It takes one cycle for initializing the tree decoder after which we get clean pulses for each row output LSB pulse is wider than MSB pulse in bottom figure to allow the tree to clear present state before next
52
The top waveforms shows the matrix point output where the row and column select inputs are high The output node discharges when the input goes low
53
54
READ WRITE CIRCUIT ( Design by Bong Jin )
Sense Amplifier
Write Driver
Precharge Circuit
55
READ WRITE CIRCUIT TEST SETUP
Single SRAM Cell for simulations
Cbit estimate for 512 rows
NMOS Switches to allow read without disabling write circuit Bitline Capacitance estimate from ASU PTM Website University of Minnesota
READ / WRITE TIMING WAVEFORMS
Precharge Pulse ( Active Low ) Data Meant to be written to cell Write Enable Pulse Read Enable Pulse Output of Write Buffer Disable output buffer ( tristate logic
Bitline
Bitline Bar Data Output Data Out Bar
SRAM Cell Layout
58
2X2 SRAM Array Layout

This unit can be replicated in all directions without any changes. LVS check remaining Array Size = 3.7975umX2.4725um B0 B0BAR B1 B1BAR
GND WORD 1
VDD
WORD 0 GND University of Minnesota

References
Digital Integrated Circuits Jan Rabaey, Anantha Chandrakasan, Borivoje Nikolic
( SRAM Cell Design, Decoders, Read Write Circuits )

CMOS VLSI Design by Weste and Harris ( Butterfly Curves ) CMOS Circuit Design, Layout and Simulation Baker, Li, Boyce (Decoder Design)
Course slides of Prof. Kia Bazargan

( Precharge Techniques, Decoders, SRAM Cell Design )
System Diagram for developing LMS Algorithm for Channel Estimation ( H(z) ) Errors e1 and e2 ( e2 being the Quantized Error ) could have the same convergence If the channel model H(z) is adapted using a LMS Model Next few slides show regular LMS and modified LMS Error Convergence
Nirav A. Desai desai.nirav.12.09@gmail.com Adaptive
61
DSP Course by Prof. Keshab Parhi
Error Convergence for regular LMS takes more time than the modified LMS
62
Modified LMS Adapts all tap weights using different errors computed using as many filter output estimates as the filter order. The assumption being that the optimum gradient direction for each tap weight is different and is given by the corresponding error Lattice Predictors would be a more efficient way to do this as compared to LMS since each stage of a predictor is optimum for that order unlike modified LMS where you adapt each tap weight in a sub optimal manner.
63
EEG Spectral Estimates for Pre-Ictal, Ictal and Post-Ictal Signal Sequences
64
Spectral Estimation for a low pass filtered impulse sequence using different techniques
65
Correlograms provide best Spectral Estimates for Low Pass Filtered Impulse Trains
66
EE 5364 / CS 5204: Advanced Computer Architecture Final Course Project on Design of a Branch Predictor Prepared by: Nirav Desai 4280229 Amanda Skinner 3749048 Course Instructor: Prof. Pen-Chung Yew Department of ECE University of Minnesota, Twin Cities
Why Branch Predictor?

Branch Predictors improve the flow of the instruction pipeline
As Branch predictor accuracy increases,
cache misses decrease, or improve, for

both data and instruction caches
Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS
68
Why Branch Predictor?

GCC Benchmark on Simple SCALAR sim-outorder Pipeline Efficiency with increasing Branch Predictor Accuracy
150000000 1.15
1.1 IPC and CPI with increasing BTB Size
145000000
1.05 140000000 1
0.95
135000000
0.9 130000000 0.85
0.8 IPC
16k 0.8669 1.1536 133748386
32k 0.8797 1.1367 136391615
64k 0.8952 1.117 139391610
128k 0.9132 1.095 143310093
256k 0.9291 1.0763 146842938
125000000
CPI
Address Hits
69
Why Prefetching ?
As branch predictor accuracy increases, cache misses go down
Prefetching and increasing cache size decreases cache misses
Miss Rate for Mesa benchmark. Both the L1-Data and L2 cache associativities were changed
[4]
70
Reference Prediction

[1] Table
LA-PC runs ahead of PC and keeps track of load and store instructions RPT keeps track of previous reference addresses and strides for load and store instructions
L2 Cache prefetching can be done by storing spill over data and instructions from L1 Cache blocks.
INTEL CORE 2 Duo uses RPT for L1 Cache Prefetching and Loop Counter Local Branch Predictor
71
Design of Branch Predictor

Loop Counter would give high accuracy on matrix multiplication Track all registers for loop counter as possibility of different interleaved threads using different registers Loop Counter error would imply dynamic update of registers based on non-local values
Tag registers giving repeated conditional branch errors on the Branch Decision Table
Use the O-GEHL predictor for all tagged branches
Using the loop counter and duplicate ALU will allow indexing long histories with limited geometric length
72
Branch Decision Table

Branch Address Predicted Direction Predicted Branch Target Entered by Duplicate ALU Actual Direction Entered by PC Actual Branch Target Entered by PC Counter s Used C(i)(j) Entered by OGEHL T Counters a Used g C(i)(j) Entered by OGEHL
Entered by LA-PC
Entered by Loop Counter or O-GEHL
if prediction != actual decision Prediction computed by Loop Counter ? Yes - Incorrect Duplicate Register Values Re-Initialize Duplicate Register Stack Set LA-PC to PC After 2 successive errors make an entry in O-GEHL Also tag the branch address in Branch Decision Table to be used with O-GEHL
Prediction computed by O-GEHL ? Yes Run the update equation on counters listed in table Set LA-PC to PC
73
Loop Counter Branch Predictor

Op-Code: Op-Code = 4 (beq) OR Op-Code = 5 (bne) Bits 31:26 Duplicate Register Flag == 0 ? Yes First Conditional Branch Copy Register Stack to Duplicate Register Stack ( Equivalent to initializing the duplicate register stack) Inc LA-PC By 4 Do addition and subtraction for all instructions having rs and rt with register flags set to 1 rs Bits 25:21 rt Bits: 20:16
The loop counter looks at only the conditional branches Can be extended to bgtz, blez
No Duplicate Register Stack Initialized Set Register Flag for rs and rt = 1 These registers will be tracked by the Duplicate ALU Proceed to Branch Prediction Computation rs == rt ? rs != rt ? Op code == 5 ? Execute yes no Inc LA-PC By 4
Op code == 4 ?
no yes
Copy Off-Set from bits 15 to bit 0

Sign Extend Off Set to bit 31 ( Total 32 bits ) Left Shift by 2 ( to get Word Address ) Add to PC+4 to get Branch Target Address
74
O-GEHL Branch Predictor[2]

History Lengths go in Geometric Progression given by L(i) = i-1 L(1) + constant Best Series found from experiments: 2, 4, 9, 12, 18, 31, 54, 114, 145, 266 Dynamic History length fitting with variable also possible. j,k,l .. Are incremented on every unconditional branch. j increments are modulo 2, k increments are modulo 4, l increments are modulo 266. Each C(i)(j) is a 4 bit saturating counter that counts -8 to 7. Counter Update given by: if(p!=out) if(branch==taken) c(i)(j)++ if(branch!=taken) c(i)(j)- Dynamic Threshold () Fitting possible Threshold() by default is 0. Sum > then p = taken Sum < then p = not taken
Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS C12() C11() C24() C23() C22() C21() C39() C38() C37() C36() C35() C34() C33() C32() C31() C101() C10266() C10265()
Sum = C(i)(j)+C(i+1)(k)+C(i+9)(l)
75
Duplicate ALU ( for MIPS )[3]

Duplicate Instruction Queue
Decode Unit
15-11 20-16
Compare Register Flags for reg1, reg2, reg3

If register flags set, do the computation for Op-Code: 0 bits(5:0) 32: add r1, r2, r3 Op-Code: 0 bits(5:0) 34: sub r1, r2, r3 Op-Code: 0 bits(5:0) 33: addu r1, r2, r3 Op-Code: 0 bits(5:0) 35: subu r1, r2, r3 Op-Code: 8: addi r1, constant Op-Code: 9: addiu r1, constant
Op-Code == 4 OR 5: (beq, bne) Use Loop Counter Op-Code == 2 OR 3: (jump, jal) Always take Op-Code == 0 & FUNCT==8 OR 9: (jr, jalr) Always take
Reg 3 Reg 2 Reg 1 Op Code Compare Op-Code
LA-PC
Address -Instruction
25-21 31-26
Set LA-PC Busy bit on instruction read When LA-PC updated by branch predictors, busy bit reset For arithmetic, reset busy bit after 2 cycles Instruction read when busy bit reset LA-PC different from that used in RPT
Branch Target for Jump: 32bits: bits 31:28: 4 MSB bits of current PC+4 bits 27:2: Jump Target from instruction bits 1:0 : 00 ( Word Addresses ) Branch Target for Branch: 32 bits: Current PC + 4 + bits 15:0 left shifted by 2 to give word addresses
This branch predictor can be used on Multi Threaded CPUs

Test results on O-GEHL Branch Predictor[5]
77
References
1. An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty Jean-Loup Baer, Tien-Fu Chen Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195 Supercomputing '91 Proceedings of the 1991 ACM/IEEE Conference on Supercomputing
2. The O-GEHL Branch Predictor Andre Seznec The 1st JILP Championship Branch Prediction Competition CBP1 (2004) Available from www.jilp.org
3. Computer Organisation and Design The Hardware-Software Interface David Patterson and John Hennessy
4. http://en.wikipedia.org/wiki/CPU_cache 5. Analysis of the Optimized GEHL Predictor Andre Seznec Available from: http://www.irisa.fr/caps/people/seznec/ISCA05.pdf Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS
Research Ideas I am working on right now
79
Strained Silicon on SiGe Solar Cell

Requires Chemical Vapor Deposition or MBE techniques for fabrication Tandem Solar Cell design gives a wide band of absorbable frequencies with different band gaps. Optimal thickness at quarter wavelength will give maximum absorption at designed frequency Back plate metal contacts and top plate fingered contacts Economically viable for charging battery packs in electric vehicles and for replacing LPG cooking gas cylinders.
Long term viability for power generation feasible due to low operating costs and low distribution costs in a distributed model.
Reference: Si/multicrystalline-SiGe heterostructure as a candidate for solar cells with high conversion efficiency: Photovoltaic Specialists Conference, 2002. Conference Record of the Twenty-Ninth
IEEE Date of Conference: 19-24 May 2002 Author(s): Usami, N. Inst. for Mater. Res., Tohoku Univ., Sendai, Japan Takahashi, T. ; Fujiwara, K. ; Ujihara, T. ; Sazaki, G. ; Murakami, Y. ; Nakajima, K. Page(s): 247 - 249
80
Rake Receiver with MDS Codes

Rake receivers could be used to identify strongest multi path component from a received signal. This could be achieved by correlating the received signal with itself over different delays and finding the strongest delay component. This does not involve maximal ratio combining. It could be combined with MDS codes for wireless communications where given any d bits corrupted by channel noise or multi path effects, the signal could still be recovered uniquely. Reference: Lectures of Prof. Cutter on iTunesU under the course on Digital Communications 2 taught at MIT. Reference: W-CDMA Rake Receiver implementation in DSP: EE Times: Link: http://www.eetimes.com/electronics-news/4139933/W-CDMA-RAKE-Receiver-Comes-to-Life-inDSP
Reference: A Rake Receiver for Maximal Ratio Combining without Channel Estimation for UWB Communications: http://digitalcommons.unf.edu/cgi/viewcontent.cgi?article=1044&context=ojii_volumes
81
Class S RF Power Amplifiers on GaN HEMTs

Class S RF Power Amplifiers with fully differential H-Bridge topology could give a theoretical 100% efficiency.
GaN HEMTs give the best high frequency switching characteristics.

The 2 features could be combined to give a high efficiency RF power amplifier topology. Reference: Ph.D. Dissertation of Stephan Maroldt, University of Freiburg
82
Microprocessor Design
The attached slides describe the design of a 16 bit Brent Kung Adder and 1024x16 asynchronous SRAM in 45 nM CMOS along with the design of a branch predictor and cache prefetch unit for a MIPS microprocessor. These design ideas could be combined with other ideas for pipeline design, ALU design and interconnect circuit design to give a full physical layer design of a MIPS microprocessor in 45nM CMOS. Various power reduction and clock gating techniques could be applied at a higher level of the hierarchy.
83
mm-wave MIMO OFDM

mm-wave MIMO OFDM could be used for wireless backhaul networks due to its high capacity mm-wave MIMO systems could be extended to 2x2, 4x4, 8x8, etc topologies to exploit spatial diversity and get higher data rate. Reference: 4 channel spatial multiplexing over a mm-wave line of sight link Microwave Symposium Digest, 2009. MTT '09. IEEE MTT-S International Date of Conference: 7-12 June 2009 Author(s): Sheldon, C. Dept. of Electr. & Comput. Eng., Univ. of California, Santa Barbara, CA, USA Munkyo Seo ; Torkildson, E. ; Rodwell, M. ; Madhow, U. Page(s): 389 - 392
84
Routing algorithm to reduce congestion

The routing algorithm to reduce congestion could be based on the idea of sparsity. High congestion nodes could be dropped from the network map till congestion on the node drops. The underlying packet streams would be using a flow control based routing protocol. Each node would store a map of the network which would be updated periodically using ping back messages. Could be applied to packet switched networks, traffic control and wireless sensor networks.
85
Photonic Computers
These could use multiplexer based logic gates. Photonic multiplexers have been widely researched and developed for optical communications. Phase detectors could be used to identify the phase and thus the value of the stored signal. These would use electronic charge storage and high speed electro-optic conversion. Reference: Prior research on this has been carried out in UCSB.
86

Research Presentation

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Research Presentation

Caricato da

Copyright:

Formati disponibili

Nirav A. Desai desai.nirav.12.09@gmail.

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

I assisted in these mm-wave MIMO experiments at UCSB

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Input Adder Chain 1

Input Adder Chain 2

Input Adder Chain 3

1.553X 3.043X 1.108X

Output Buffers to drive Capacitive Loads

Input Adder Chain 4

Output Buffers to drive Capacitive Loads

Brent Kung Adder Gate Level Diagram

2. Intermediate Dot Product Blocks

Intermediate Adder Chain 1

Intermediate Adder Chain 2

Output Buffers to drive Capacitive Loads

Brent Kung Adder Gate Level Diagram

Output Buffers to drive Capacitive Loads

Nirav A. Desai desai.nirav.12.09@gmail.com

Brent Kung Adder Transistor Level Design

Nirav A. Desai desai.nirav.12.09@gmail.com

Brent Kung Adder Transistor Level Design Inverter Design Optimization

Nirav A. Desai desai.nirav.12.09@gmail.com

Brent Kung Adder Transistor Level Design

Logical Effort Design for Signal Chains labeled in previous slide #2

Input Adder Block Chain 2

Input Adder Block Chain 3

Input Adder Block Chain 4

Nirav A. Desai desai.nirav.12.09@gmail.com

Brent Kung Adder Transistor Level Design

Logical Effort Design for Signal Chains labeled in previous slide #3

Intermediate Adder Block Chain 2

Nirav A. Desai desai.nirav.12.09@gmail.com

1.1 0.9 0.7

Simulations with minimally sized 1 stage buffers

1.1 0.9 0.7

0.403 0.569 1.069

5.186 2.277 0.692

2.089 1.295 0.739

Nirav A. Desai desai.nirav.12.09@gmail.com

Output Buffers PEX waveforms show larger size may be needed

Nirav A. Desai desai.nirav.12.09@gmail.com