Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
com
MM-Wave Active Sensor: BPSK Spectrum can be seen in the Spectrum Analyzer
3 Nirav Desai
10
11
12
13
14
15
16
17
18
19
20
21
22
23
EE 5323: VLSI DESIGN 1 PROJECT Course Instructor: Prof. Chris Kim 16-bit BRENT KUNG ADDER DESIGN in 45nM CMOS Nirav Desai ID: 4280229 Department of Electrical and Computer Engineering University of Minnesota
24
25
Brent Kung Adder Gate Level Diagram 1. Input Block with Pre Computation
1.097X 1X
3.883X
1.23X
1X 1.274X 1.034X
2.943X
10.8506X
40X
Gi + Pi*Gi-1
1X
1X
Pi*Pi-1
Pi
Si
1.182X
1.117X
Ci-1
28
29
110
100
90
80 TD*Iavg
70
NMOS Width = 90nm PMOS / NMOS Length = 50nM Vdd = 1.1V Current Averaged Over One Period of 2 ns Optimal PMOS Width = 165nM inverter = 165/90 = 1.834 Sizing for NAND, NOR and XOR Changed appropriately
60
50
40
120 140 160 180 200 220 240 260 280 300 PMOS Width (nM)
30
3.941
31
32
Brent Kung Adder Simulated Performance Simulations with maximally sized 1 stage buffers as determined by Logical Effort Design of individual chains
Voltage (V) Delay Max-C14 (nS) 0.359 0.503 0.937 Power Max (mW) 6.73 2.95 0.924 Power-Delay Product (xE-12) 2.41 1.483 0.865
Without Parasitic Extraction and Interconnect Parasitics buffering doesnt improve performance significantly. Nirav A. Desai desai.nirav.12.09@gmail.com 33
Brent Kung Adder Worst Case Delay Input Pattern: A: FFFF B: 0000 -> 0001 Dotted Lines show Carry Bits 15 and 14
Carry Bit 15
Carry Bit 14
34
Brent Kung Adder Layout Input Block with Pre Computation Input Inverters for Bit 0 and Bit 1 XOR NAND 10X
35
36
Brent Kung Adder Layout NAND 10.57X Layout with inter digitated fingers to reduce parasitics
37
Output Buffers PEX Waveforms show larger Size may be necessary here
38
39
40
41
References:
Course Slides from Prof. Kia Bazargans Course on VLSI
website
Digital Integrated Circuits by Jan Rabaey
42
University of Minnesota
43
43
University of Minnesota
44
University of Minnesota
45
Simulation Setup
V(ic) V(word)
V(write)
V(bit)
V(qbar) V(q)
V(bitbar)
M0,M1,M3,M4 form the cross coupled inverter pair M5,M6 are access transistors C1, C2 is the bitline capacitance M7 is the precharge switch for bitline ( bit ) - V3 precharges the bitline to 0.8V V6 precharges bitbar and writes a 0 to the cell
Nirav A. Desai desai.nirav.12.09@gmail.com 46
University of Minnesota
V(bitbar) V(bit)
University of Minnesota
47
49
CK IN CK
Identical Sizing for NMOS and PMOS to minimize charge injection effects
OUT
Delay drops by ~40ps/2 for every Doubling of transistor widths Delay drop saturates around 1000nM to 89ps Used W/L of 880/50 for final tree
University of Minnesota
50
The following waveforms were applied to the row and column selection inputs of the tree decoder
University of Minnesota
51
It takes one cycle for initializing the tree decoder after which we get clean pulses for each row output LSB pulse is wider than MSB pulse in bottom figure to allow the tree to clear present state before next
University of Minnesota
52
The top waveforms shows the matrix point output where the row and column select inputs are high The output node discharges when the input goes low
University of Minnesota
53
54
Sense Amplifier
Write Driver
Precharge Circuit
University of Minnesota
55
NMOS Switches to allow read without disabling write circuit Bitline Capacitance estimate from ASU PTM Website University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com 56
Precharge Pulse ( Active Low ) Data Meant to be written to cell Write Enable Pulse Read Enable Pulse Output of Write Buffer Disable output buffer ( tristate logic
Bitline
Bitline Bar Data Output Data Out Bar
Nirav A. Desai desai.nirav.12.09@gmail.com 57
University of Minnesota
University of Minnesota
58
GND WORD 1
VDD
References
Digital Integrated Circuits Jan Rabaey, Anantha Chandrakasan, Borivoje Nikolic
System Diagram for developing LMS Algorithm for Channel Estimation ( H(z) ) Errors e1 and e2 ( e2 being the Quantized Error ) could have the same convergence If the channel model H(z) is adapted using a LMS Model Next few slides show regular LMS and modified LMS Error Convergence
61
Error Convergence for regular LMS takes more time than the modified LMS
62
Modified LMS Adapts all tap weights using different errors computed using as many filter output estimates as the filter order. The assumption being that the optimum gradient direction for each tap weight is different and is given by the corresponding error Lattice Predictors would be a more efficient way to do this as compared to LMS since each stage of a predictor is optimum for that order unlike modified LMS where you adapt each tap weight in a sub optimal manner.
63
EEG Spectral Estimates for Pre-Ictal, Ictal and Post-Ictal Signal Sequences
64
Spectral Estimation for a low pass filtered impulse sequence using different techniques
65
Correlograms provide best Spectral Estimates for Low Pass Filtered Impulse Trains
66
EE 5364 / CS 5204: Advanced Computer Architecture Final Course Project on Design of a Branch Predictor Prepared by: Nirav Desai 4280229 Amanda Skinner 3749048 Course Instructor: Prof. Pen-Chung Yew Department of ECE University of Minnesota, Twin Cities
Nirav A. Desai desai.nirav.12.09@gmail.com 67
68
145000000
1.05 140000000 1
0.95
135000000
0.8 IPC
125000000
CPI
Address Hits
69
Why Prefetching ?
As branch predictor accuracy increases, cache misses go down
Prefetching and increasing cache size decreases cache misses
Miss Rate for Mesa benchmark. Both the L1-Data and L2 cache associativities were changed
[4]
70
Reference Prediction
[1] Table
LA-PC runs ahead of PC and keeps track of load and store instructions RPT keeps track of previous reference addresses and strides for load and store instructions
L2 Cache prefetching can be done by storing spill over data and instructions from L1 Cache blocks.
INTEL CORE 2 Duo uses RPT for L1 Cache Prefetching and Loop Counter Local Branch Predictor
71
Tag registers giving repeated conditional branch errors on the Branch Decision Table
Use the O-GEHL predictor for all tagged branches
Using the loop counter and duplicate ALU will allow indexing long histories with limited geometric length
72
Entered by LA-PC
if prediction != actual decision Prediction computed by Loop Counter ? Yes - Incorrect Duplicate Register Values Re-Initialize Duplicate Register Stack Set LA-PC to PC After 2 successive errors make an entry in O-GEHL Also tag the branch address in Branch Decision Table to be used with O-GEHL
Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS
Prediction computed by O-GEHL ? Yes Run the update equation on counters listed in table Set LA-PC to PC
73
The loop counter looks at only the conditional branches Can be extended to bgtz, blez
No Duplicate Register Stack Initialized Set Register Flag for rs and rt = 1 These registers will be tracked by the Duplicate ALU Proceed to Branch Prediction Computation rs == rt ? rs != rt ? Op code == 5 ? Execute yes no Inc LA-PC By 4
Op code == 4 ?
no yes
74
Sum = C(i)(j)+C(i+1)(k)+C(i+9)(l)
75
Decode Unit
15-11 20-16
LA-PC
Address -Instruction
25-21 31-26
Set LA-PC Busy bit on instruction read When LA-PC updated by branch predictors, busy bit reset For arithmetic, reset busy bit after 2 cycles Instruction read when busy bit reset LA-PC different from that used in RPT
Branch Target for Jump: 32bits: bits 31:28: 4 MSB bits of current PC+4 bits 27:2: Jump Target from instruction bits 1:0 : 00 ( Word Addresses ) Branch Target for Branch: 32 bits: Current PC + 4 + bits 15:0 left shifted by 2 to give word addresses
77
References
1. An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty Jean-Loup Baer, Tien-Fu Chen Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195 Supercomputing '91 Proceedings of the 1991 ACM/IEEE Conference on Supercomputing
2. The O-GEHL Branch Predictor Andre Seznec The 1st JILP Championship Branch Prediction Competition CBP1 (2004) Available from www.jilp.org
3. Computer Organisation and Design The Hardware-Software Interface David Patterson and John Hennessy
4. http://en.wikipedia.org/wiki/CPU_cache 5. Analysis of the Optimized GEHL Predictor Andre Seznec Available from: http://www.irisa.fr/caps/people/seznec/ISCA05.pdf Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS
Nirav A. Desai desai.nirav.12.09@gmail.com 78
79
Long term viability for power generation feasible due to low operating costs and low distribution costs in a distributed model.
Reference: Si/multicrystalline-SiGe heterostructure as a candidate for solar cells with high conversion efficiency: Photovoltaic Specialists Conference, 2002. Conference Record of the Twenty-Ninth
IEEE Date of Conference: 19-24 May 2002 Author(s): Usami, N. Inst. for Mater. Res., Tohoku Univ., Sendai, Japan Takahashi, T. ; Fujiwara, K. ; Ujihara, T. ; Sazaki, G. ; Murakami, Y. ; Nakajima, K. Page(s): 247 - 249
80
Reference: A Rake Receiver for Maximal Ratio Combining without Channel Estimation for UWB Communications: http://digitalcommons.unf.edu/cgi/viewcontent.cgi?article=1044&context=ojii_volumes
81
82
Microprocessor Design
The attached slides describe the design of a 16 bit Brent Kung Adder and 1024x16 asynchronous SRAM in 45 nM CMOS along with the design of a branch predictor and cache prefetch unit for a MIPS microprocessor. These design ideas could be combined with other ideas for pipeline design, ALU design and interconnect circuit design to give a full physical layer design of a MIPS microprocessor in 45nM CMOS. Various power reduction and clock gating techniques could be applied at a higher level of the hierarchy.
83
84
85
Photonic Computers
These could use multiplexer based logic gates. Photonic multiplexers have been widely researched and developed for optical communications. Phase detectors could be used to identify the phase and thus the value of the stored signal. These would use electronic charge storage and high speed electro-optic conversion. Reference: Prior research on this has been carried out in UCSB.
86