Sei sulla pagina 1di 86

Nirav A. Desai desai.nirav.12.09@gmail.

com

Nirav A. Desai desai.nirav.12.09@gmail.com

MM-Wave Active Sensor: BPSK Spectrum can be seen in the Spectrum Analyzer

Nirav A. Desai desai.nirav.12.09@gmail.com

3 Nirav Desai

I assisted in these mm-wave MIMO experiments at UCSB

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

Nirav A. Desai desai.nirav.12.09@gmail.com

10

Nirav A. Desai desai.nirav.12.09@gmail.com

11

Nirav A. Desai desai.nirav.12.09@gmail.com

12

Nirav A. Desai desai.nirav.12.09@gmail.com

13

Nirav A. Desai desai.nirav.12.09@gmail.com

14

Nirav A. Desai desai.nirav.12.09@gmail.com

15

Nirav A. Desai desai.nirav.12.09@gmail.com

16

Nirav A. Desai desai.nirav.12.09@gmail.com

17

Nirav A. Desai desai.nirav.12.09@gmail.com

18

Nirav A. Desai desai.nirav.12.09@gmail.com

19

Nirav A. Desai desai.nirav.12.09@gmail.com

20

Nirav A. Desai desai.nirav.12.09@gmail.com

21

Nirav A. Desai desai.nirav.12.09@gmail.com

22

Nirav A. Desai desai.nirav.12.09@gmail.com

23

EE 5323: VLSI DESIGN 1 PROJECT Course Instructor: Prof. Chris Kim 16-bit BRENT KUNG ADDER DESIGN in 45nM CMOS Nirav Desai ID: 4280229 Department of Electrical and Computer Engineering University of Minnesota

Nirav A. Desai desai.nirav.12.09@gmail.com

24

Nirav A. Desai desai.nirav.12.09@gmail.com

25

Brent Kung Adder Gate Level Diagram 1. Input Block with Pre Computation

1.097X 1X

3.883X

Input Adder Chain 1


Gi + Pi*Gi-1 1.224X

Input Adder Chain 2


1.562X 1X 10.1683X 36X

Input Adder Chain 3


1X

1.553X 3.043X 1.108X

Output Buffers to drive Capacitive Loads


Pi*Pi-1

Input Adder Chain 4

1.23X

1X 1.274X 1.034X

2.943X

10.8506X

40X

Output Buffers to drive Capacitive Loads


Nirav A. Desai desai.nirav.12.09@gmail.com 26

Brent Kung Adder Gate Level Diagram

2. Intermediate Dot Product Blocks

Intermediate Adder Chain 1

Gi + Pi*Gi-1

1X

Intermediate Adder Chain 2


6X 1.72X 1X 1X 4X 16X 16X

1X

Pi*Pi-1

Output Buffers to drive Capacitive Loads


Nirav A. Desai desai.nirav.12.09@gmail.com 27

Brent Kung Adder Gate Level Diagram


3. Output Block for Post Computation

Pi

Si

1.182X

1.117X

Ci-1

Output Buffers to drive Capacitive Loads

Nirav A. Desai desai.nirav.12.09@gmail.com

28

Brent Kung Adder Transistor Level Design


XOR GATE

Nirav A. Desai desai.nirav.12.09@gmail.com

29

Brent Kung Adder Transistor Level Design Inverter Design Optimization

110

100

90

80 TD*Iavg

70

NMOS Width = 90nm PMOS / NMOS Length = 50nM Vdd = 1.1V Current Averaged Over One Period of 2 ns Optimal PMOS Width = 165nM inverter = 165/90 = 1.834 Sizing for NAND, NOR and XOR Changed appropriately

60

50

40
120 140 160 180 200 220 240 260 280 300 PMOS Width (nM)

Nirav A. Desai desai.nirav.12.09@gmail.com

30

Brent Kung Adder Transistor Level Design


1. Input Block with Pre Computation

Logical Effort Design for Signal Chains labeled in previous slide #2


Input Adder Block Chain 1
Gate Number Gate Name g value f value b value S Value Stage 1.000 2.000 3.000 4.000 5.000 Stage F Stage B Stage H Gate H G BUFFER INVERTER NOR INVERTER NAND LOAD h 1.000 1.000 1.646 1.000 1.352 36.000 2.225 36.000 6.943 556.248 3.540 3.540 3.540 2.151 3.540 2.618648 2.893 2.400 1.000 1.000 1.000 1.000 1.000 1.224 1.097 3.883 10.16831 36.000

Input Adder Block Chain 2


Gate Number Gate Name g value f value b value S Value 1.000 2.000 3.000 BUFFER INVERTER XOR NAND 1.000 1.000 1.893 4.518 4.518 2.386 2.893 2.400 1.780 1.000 1.562 1.553 4.000 LOAD 1.295 3.488 1.000 3.043 13.748 1.000 13.748 2.451 13.748 12.359 416.510 Stage Stage F G Stage B Stage H Gate H h 4.518

Input Adder Block Chain 3


Gate Number Gate Name g value f value b value S Value 1.000 2.000 3.000 BUFFER INVERTER NOR 1.000 1.000 1.646 3.558 3.558 2.162 2.893 2.400 1.000 1.000 1.230 1.108 Stage Stage F G LOAD 3.941 1.646 3.941 6.943 45.038 Stage B Stage H Gate H h 3.558

3.941

Input Adder Block Chain 4


Gate Number Gate Name g value f value b value S Value Stage 1.000 2.000 3.000 4.000 5.000 Stage F Stage B Stage H Gate H G BUFFER INVERTER XOR NAND INVERTER LOAD h 1.000 1.000 1.893 1.295 1.000 40.000 2.451 40.000 6.943 680.832 3.686 3.686 3.686 1.947 2.847 3.686447 2.893 2.400 1.000 1.000 1.000 1.000 1.000 1.274 1.034 2.943 10.85056 40.000 3.94084

Nirav A. Desai desai.nirav.12.09@gmail.com

31

Brent Kung Adder Transistor Level Design


2. Intermediate Dot Product Blocks

Logical Effort Design for Signal Chains labeled in previous slide #3


Intermediate Adder Block Chain 1
Gate Number Gate Name g value f value b value S Value 1.000 2.000 INVERTER NAND LOAD 1.000 1.352 1.000 2.848 2.107 2.848 1.000 1.000 1.000 1.000 2.107 6.000 Stage G 1.352 Stage F 6.000 Stage B 1.000 Stage H Gate H h 8.112 2.848

Intermediate Adder Block Chain 2


Gate Number Gate Name g value f value b value S Value 1.000 2.000 BUFFER NAND 1.000 1.352 2.775 2.053 2.000 1.000 1.000 1.026 Stage G LOAD 2.848 1.352 Stage F 2.848 Stage B 2.000 Stage H Gate H h 7.701 2.775

Nirav A. Desai desai.nirav.12.09@gmail.com

32

Brent Kung Adder Simulated Performance Simulations with maximally sized 1 stage buffers as determined by Logical Effort Design of individual chains
Voltage (V) Delay Max-C14 (nS) 0.359 0.503 0.937 Power Max (mW) 6.73 2.95 0.924 Power-Delay Product (xE-12) 2.41 1.483 0.865

1.1 0.9 0.7

Simulations with minimally sized 1 stage buffers


Voltage (V) Delay Max-C14 (nS) Power Max (mW) Power-Delay Product (xE-12)

1.1 0.9 0.7

0.403 0.569 1.069

5.186 2.277 0.692

2.089 1.295 0.739

Without Parasitic Extraction and Interconnect Parasitics buffering doesnt improve performance significantly. Nirav A. Desai desai.nirav.12.09@gmail.com 33

Brent Kung Adder Worst Case Delay Input Pattern: A: FFFF B: 0000 -> 0001 Dotted Lines show Carry Bits 15 and 14

Carry Bit 15

Nirav A. Desai desai.nirav.12.09@gmail.com

Carry Bit 14

34

Brent Kung Adder Layout Input Block with Pre Computation Input Inverters for Bit 0 and Bit 1 XOR NAND 10X

Output Buffers PEX waveforms show larger size may be needed

Nirav A. Desai desai.nirav.12.09@gmail.com

35

Brent Kung Adder Layout XOR 1.553X

Nirav A. Desai desai.nirav.12.09@gmail.com

36

Brent Kung Adder Layout NAND 10.57X Layout with inter digitated fingers to reduce parasitics

Nirav A. Desai desai.nirav.12.09@gmail.com

37

Brent Kung Adder Layout Intermediate Dot Product Generator

Output Buffers PEX Waveforms show larger Size may be necessary here

Nirav A. Desai desai.nirav.12.09@gmail.com

38

Brent Kung Adder Layout Output Stage with Buffers

Nirav A. Desai desai.nirav.12.09@gmail.com

39

Brent Kung Adder Layout Full Layout: 49.5um X 48.6um

Nirav A. Desai desai.nirav.12.09@gmail.com

40

Future Design Modifications


The design uses large buffers at the output of every stage to drive large capacitances The buffers are not needed at nodes with low fanouts and can be eliminated. The buffers at input nodes right now cause more power consumption and add to the delay . Thus the overall performance can be improved with fewer buffers.

Nirav A. Desai desai.nirav.12.09@gmail.com

41

References:
Course Slides from Prof. Kia Bazargans Course on VLSI

A Taxonomy of Parallel Prefix Networks


(David Harris ) Reference paper on course

website
Digital Integrated Circuits by Jan Rabaey

Nirav A. Desai desai.nirav.12.09@gmail.com

42

SRAM DESIGN PROJECT PHASE 2


Nirav Desai 4280229 VLSI DESIGN 2: Prof. Kia Bazargan Dept. of ECE College of Science and Engineering University of Minnesota, Twin Cities

University of Minnesota

Nirav A. Desai desai.nirav.12.09@gmail.com

43
43

SRAM CELL READ AND WRITE MARGIN FROM BUTTERFLY CURVE


NMOS inverter = 110nM PMOS inverter = 220nM NMOS Access = 90nM NMOSinv/NMOSaccess = 1.2 PMOSinv/NMOSaccess=2.4 Cbitline = 0.747fF for 512 cell array ( Interconnect Parasitics from ASU PTM Website )

University of Minnesota

Nirav A. Desai desai.nirav.12.09@gmail.com

44

SRAM CELL READ AND WRITE MARGIN FROM BUTTERFLY CURVE


NMOS inverter = 150nM PMOS inverter = 555nM NMOS Access = 180nM NMOSinv/NMOSaccess = 1.2 PMOSinv/NMOSaccess = 3 Cbitline = 0.747fF Curve shows SRAM cell is close to write failure. Bitline Precharge to less than 1.1V could be explored to increase SNM.

University of Minnesota

Nirav A. Desai desai.nirav.12.09@gmail.com

45

Simulation Setup
V(ic) V(word)

V(write)

V(bit)

V(qbar) V(q)

V(bitbar)

M0,M1,M3,M4 form the cross coupled inverter pair M5,M6 are access transistors C1, C2 is the bitline capacitance M7 is the precharge switch for bitline ( bit ) - V3 precharges the bitline to 0.8V V6 precharges bitbar and writes a 0 to the cell
Nirav A. Desai desai.nirav.12.09@gmail.com 46

University of Minnesota

Timing Waveforms for Characterization


V(write) precharges Cbit to 0.8V via M7 V(word) disables access transistors M5 and M6 during precharge . V(qbar) and V(q) are used to generate the butterfly curves. V(ic) enables M7 during precharge It could be implemented as NOT(V(word)). V(bitbar) precharges to 0.8V, shows charge pumping when M7 turns off and follows V(qbar) when wordline is enabled. V(bit) follows V(q) after word line is enabled. V(bit) precharged to Vdd by V6
V(write) Applied to source of M7 (precharge switch)

V(word) Wordline Voltage V(qbar) V(q)

V(ic) Enables the precharge switch M7

V(bitbar) V(bit)

University of Minnesota

Nirav A. Desai desai.nirav.12.09@gmail.com

47

PASS TRANSISTOR BASED TREE DESIGN 1:8 Row Decoder Tree

Similar Tree Decoder for 16 LSB Bits University of Minnesota


Nirav A. Desai desai.nirav.12.09@gmail.com 48

TREE DECODER DESIGN

Nirav A. Desai desai.nirav.12.09@gmail.com

49

PASS TRANSISTOR BASED TREE DESIGN


W 880 L 50

CK IN CK
Identical Sizing for NMOS and PMOS to minimize charge injection effects

OUT

Delay drops by ~40ps/2 for every Doubling of transistor widths Delay drop saturates around 1000nM to 89ps Used W/L of 880/50 for final tree

University of Minnesota

Nirav A. Desai desai.nirav.12.09@gmail.com

50

TREE DECODER TIMING DIAGRAMS

The following waveforms were applied to the row and column selection inputs of the tree decoder

University of Minnesota

Nirav A. Desai desai.nirav.12.09@gmail.com

51

TREE DECODER TIMING DIAGRAMS

It takes one cycle for initializing the tree decoder after which we get clean pulses for each row output LSB pulse is wider than MSB pulse in bottom figure to allow the tree to clear present state before next

University of Minnesota

Nirav A. Desai desai.nirav.12.09@gmail.com

52

TREE DECODER TIMING DIAGRAMS

The top waveforms shows the matrix point output where the row and column select inputs are high The output node discharges when the input goes low

University of Minnesota

Nirav A. Desai desai.nirav.12.09@gmail.com

53

Nirav A. Desai desai.nirav.12.09@gmail.com

54

READ WRITE CIRCUIT ( Design by Bong Jin )

Sense Amplifier

Write Driver

Precharge Circuit

University of Minnesota

Nirav A. Desai desai.nirav.12.09@gmail.com

55

READ WRITE CIRCUIT TEST SETUP

Single SRAM Cell for simulations

Cbit estimate for 512 rows

NMOS Switches to allow read without disabling write circuit Bitline Capacitance estimate from ASU PTM Website University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com 56

READ / WRITE TIMING WAVEFORMS

Precharge Pulse ( Active Low ) Data Meant to be written to cell Write Enable Pulse Read Enable Pulse Output of Write Buffer Disable output buffer ( tristate logic

Bitline
Bitline Bar Data Output Data Out Bar
Nirav A. Desai desai.nirav.12.09@gmail.com 57

University of Minnesota

SRAM Cell Layout

University of Minnesota

Nirav A. Desai desai.nirav.12.09@gmail.com

58

2X2 SRAM Array Layout


This unit can be replicated in all directions without any changes. LVS check remaining Array Size = 3.7975umX2.4725um B0 B0BAR B1 B1BAR

GND WORD 1

VDD

WORD 0 GND University of Minnesota


Nirav A. Desai desai.nirav.12.09@gmail.com 59

References
Digital Integrated Circuits Jan Rabaey, Anantha Chandrakasan, Borivoje Nikolic

( SRAM Cell Design, Decoders, Read Write Circuits )


CMOS VLSI Design by Weste and Harris ( Butterfly Curves ) CMOS Circuit Design, Layout and Simulation Baker, Li, Boyce (Decoder Design)

Course slides of Prof. Kia Bazargan


( Precharge Techniques, Decoders, SRAM Cell Design )
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com 60

System Diagram for developing LMS Algorithm for Channel Estimation ( H(z) ) Errors e1 and e2 ( e2 being the Quantized Error ) could have the same convergence If the channel model H(z) is adapted using a LMS Model Next few slides show regular LMS and modified LMS Error Convergence

Nirav A. Desai desai.nirav.12.09@gmail.com Adaptive

61

DSP Course by Prof. Keshab Parhi

Error Convergence for regular LMS takes more time than the modified LMS

Nirav A. Desai desai.nirav.12.09@gmail.com Adaptive

62

DSP Course by Prof. Keshab Parhi

Modified LMS Adapts all tap weights using different errors computed using as many filter output estimates as the filter order. The assumption being that the optimum gradient direction for each tap weight is different and is given by the corresponding error Lattice Predictors would be a more efficient way to do this as compared to LMS since each stage of a predictor is optimum for that order unlike modified LMS where you adapt each tap weight in a sub optimal manner.

Nirav A. Desai desai.nirav.12.09@gmail.com Adaptive

63

DSP Course by Prof. Keshab Parhi

EEG Spectral Estimates for Pre-Ictal, Ictal and Post-Ictal Signal Sequences

Nirav A. Desai desai.nirav.12.09@gmail.com Adaptive

64

DSP Course by Prof. Keshab Parhi

Spectral Estimation for a low pass filtered impulse sequence using different techniques

Nirav A. Desai desai.nirav.12.09@gmail.com Adaptive

65

DSP Course by Prof. Keshab Parhi

Correlograms provide best Spectral Estimates for Low Pass Filtered Impulse Trains

Nirav A. Desai desai.nirav.12.09@gmail.com Adaptive

66

DSP Course by Prof. Keshab Parhi

EE 5364 / CS 5204: Advanced Computer Architecture Final Course Project on Design of a Branch Predictor Prepared by: Nirav Desai 4280229 Amanda Skinner 3749048 Course Instructor: Prof. Pen-Chung Yew Department of ECE University of Minnesota, Twin Cities
Nirav A. Desai desai.nirav.12.09@gmail.com 67

Why Branch Predictor?


Branch Predictors improve the flow of the instruction pipeline

As Branch predictor accuracy increases,

cache misses decrease, or improve, for


both data and instruction caches

Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS

Nirav A. Desai desai.nirav.12.09@gmail.com

68

Why Branch Predictor?


GCC Benchmark on Simple SCALAR sim-outorder Pipeline Efficiency with increasing Branch Predictor Accuracy
150000000 1.15

1.1 IPC and CPI with increasing BTB Size

145000000

1.05 140000000 1

0.95

135000000

0.9 130000000 0.85

0.8 IPC

16k 0.8669 1.1536 133748386

32k 0.8797 1.1367 136391615

64k 0.8952 1.117 139391610

128k 0.9132 1.095 143310093

256k 0.9291 1.0763 146842938

125000000

CPI
Address Hits

Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS

Nirav A. Desai desai.nirav.12.09@gmail.com

69

Why Prefetching ?
As branch predictor accuracy increases, cache misses go down
Prefetching and increasing cache size decreases cache misses
Miss Rate for Mesa benchmark. Both the L1-Data and L2 cache associativities were changed

[4]

Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS

Nirav A. Desai desai.nirav.12.09@gmail.com

70

Reference Prediction

[1] Table

LA-PC runs ahead of PC and keeps track of load and store instructions RPT keeps track of previous reference addresses and strides for load and store instructions

L2 Cache prefetching can be done by storing spill over data and instructions from L1 Cache blocks.
INTEL CORE 2 Duo uses RPT for L1 Cache Prefetching and Loop Counter Local Branch Predictor

Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS

Nirav A. Desai desai.nirav.12.09@gmail.com

71

Design of Branch Predictor


Loop Counter would give high accuracy on matrix multiplication Track all registers for loop counter as possibility of different interleaved threads using different registers Loop Counter error would imply dynamic update of registers based on non-local values

Tag registers giving repeated conditional branch errors on the Branch Decision Table
Use the O-GEHL predictor for all tagged branches

Using the loop counter and duplicate ALU will allow indexing long histories with limited geometric length

Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS

Nirav A. Desai desai.nirav.12.09@gmail.com

72

Branch Decision Table


Branch Address Predicted Direction Predicted Branch Target Entered by Duplicate ALU Actual Direction Entered by PC Actual Branch Target Entered by PC Counter s Used C(i)(j) Entered by OGEHL T Counters a Used g C(i)(j) Entered by OGEHL

Entered by LA-PC

Entered by Loop Counter or O-GEHL

if prediction != actual decision Prediction computed by Loop Counter ? Yes - Incorrect Duplicate Register Values Re-Initialize Duplicate Register Stack Set LA-PC to PC After 2 successive errors make an entry in O-GEHL Also tag the branch address in Branch Decision Table to be used with O-GEHL
Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS

Prediction computed by O-GEHL ? Yes Run the update equation on counters listed in table Set LA-PC to PC

Nirav A. Desai desai.nirav.12.09@gmail.com

73

Loop Counter Branch Predictor


Op-Code: Op-Code = 4 (beq) OR Op-Code = 5 (bne) Bits 31:26 Duplicate Register Flag == 0 ? Yes First Conditional Branch Copy Register Stack to Duplicate Register Stack ( Equivalent to initializing the duplicate register stack) Inc LA-PC By 4 Do addition and subtraction for all instructions having rs and rt with register flags set to 1 rs Bits 25:21 rt Bits: 20:16
Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS

The loop counter looks at only the conditional branches Can be extended to bgtz, blez

No Duplicate Register Stack Initialized Set Register Flag for rs and rt = 1 These registers will be tracked by the Duplicate ALU Proceed to Branch Prediction Computation rs == rt ? rs != rt ? Op code == 5 ? Execute yes no Inc LA-PC By 4

Op code == 4 ?
no yes

Copy Off-Set from bits 15 to bit 0


Sign Extend Off Set to bit 31 ( Total 32 bits ) Left Shift by 2 ( to get Word Address ) Add to PC+4 to get Branch Target Address
Nirav A. Desai desai.nirav.12.09@gmail.com

74

O-GEHL Branch Predictor[2]


History Lengths go in Geometric Progression given by L(i) = i-1 L(1) + constant Best Series found from experiments: 2, 4, 9, 12, 18, 31, 54, 114, 145, 266 Dynamic History length fitting with variable also possible. j,k,l .. Are incremented on every unconditional branch. j increments are modulo 2, k increments are modulo 4, l increments are modulo 266. Each C(i)(j) is a 4 bit saturating counter that counts -8 to 7. Counter Update given by: if(p!=out) if(branch==taken) c(i)(j)++ if(branch!=taken) c(i)(j)- Dynamic Threshold () Fitting possible Threshold() by default is 0. Sum > then p = taken Sum < then p = not taken
Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS C12() C11() C24() C23() C22() C21() C39() C38() C37() C36() C35() C34() C33() C32() C31() C101() C10266() C10265()

Sum = C(i)(j)+C(i+1)(k)+C(i+9)(l)
75

Nirav A. Desai desai.nirav.12.09@gmail.com

Duplicate ALU ( for MIPS )[3]


Duplicate Instruction Queue

Decode Unit
15-11 20-16

Compare Register Flags for reg1, reg2, reg3


If register flags set, do the computation for Op-Code: 0 bits(5:0) 32: add r1, r2, r3 Op-Code: 0 bits(5:0) 34: sub r1, r2, r3 Op-Code: 0 bits(5:0) 33: addu r1, r2, r3 Op-Code: 0 bits(5:0) 35: subu r1, r2, r3 Op-Code: 8: addi r1, constant Op-Code: 9: addiu r1, constant
Op-Code == 4 OR 5: (beq, bne) Use Loop Counter Op-Code == 2 OR 3: (jump, jal) Always take Op-Code == 0 & FUNCT==8 OR 9: (jr, jalr) Always take

Reg 3 Reg 2 Reg 1 Op Code Compare Op-Code

LA-PC

Address -Instruction

25-21 31-26

Set LA-PC Busy bit on instruction read When LA-PC updated by branch predictors, busy bit reset For arithmetic, reset busy bit after 2 cycles Instruction read when busy bit reset LA-PC different from that used in RPT

Branch Target for Jump: 32bits: bits 31:28: 4 MSB bits of current PC+4 bits 27:2: Jump Target from instruction bits 1:0 : 00 ( Word Addresses ) Branch Target for Branch: 32 bits: Current PC + 4 + bits 15:0 left shifted by 2 to give word addresses

This branch predictor can be used on Multi Threaded CPUs


Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS
Nirav A. Desai desai.nirav.12.09@gmail.com 76

Test results on O-GEHL Branch Predictor[5]

Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS

Nirav A. Desai desai.nirav.12.09@gmail.com

77

References
1. An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty Jean-Loup Baer, Tien-Fu Chen Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195 Supercomputing '91 Proceedings of the 1991 ACM/IEEE Conference on Supercomputing
2. The O-GEHL Branch Predictor Andre Seznec The 1st JILP Championship Branch Prediction Competition CBP1 (2004) Available from www.jilp.org

3. Computer Organisation and Design The Hardware-Software Interface David Patterson and John Hennessy
4. http://en.wikipedia.org/wiki/CPU_cache 5. Analysis of the Optimized GEHL Predictor Andre Seznec Available from: http://www.irisa.fr/caps/people/seznec/ISCA05.pdf Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS
Nirav A. Desai desai.nirav.12.09@gmail.com 78

Research Ideas I am working on right now

Nirav A. Desai desai.nirav.12.09@gmail.com

79

Strained Silicon on SiGe Solar Cell


Requires Chemical Vapor Deposition or MBE techniques for fabrication Tandem Solar Cell design gives a wide band of absorbable frequencies with different band gaps. Optimal thickness at quarter wavelength will give maximum absorption at designed frequency Back plate metal contacts and top plate fingered contacts Economically viable for charging battery packs in electric vehicles and for replacing LPG cooking gas cylinders.

Long term viability for power generation feasible due to low operating costs and low distribution costs in a distributed model.
Reference: Si/multicrystalline-SiGe heterostructure as a candidate for solar cells with high conversion efficiency: Photovoltaic Specialists Conference, 2002. Conference Record of the Twenty-Ninth
IEEE Date of Conference: 19-24 May 2002 Author(s): Usami, N. Inst. for Mater. Res., Tohoku Univ., Sendai, Japan Takahashi, T. ; Fujiwara, K. ; Ujihara, T. ; Sazaki, G. ; Murakami, Y. ; Nakajima, K. Page(s): 247 - 249

Nirav A. Desai desai.nirav.12.09@gmail.com

80

Rake Receiver with MDS Codes


Rake receivers could be used to identify strongest multi path component from a received signal. This could be achieved by correlating the received signal with itself over different delays and finding the strongest delay component. This does not involve maximal ratio combining. It could be combined with MDS codes for wireless communications where given any d bits corrupted by channel noise or multi path effects, the signal could still be recovered uniquely. Reference: Lectures of Prof. Cutter on iTunesU under the course on Digital Communications 2 taught at MIT. Reference: W-CDMA Rake Receiver implementation in DSP: EE Times: Link: http://www.eetimes.com/electronics-news/4139933/W-CDMA-RAKE-Receiver-Comes-to-Life-inDSP

Reference: A Rake Receiver for Maximal Ratio Combining without Channel Estimation for UWB Communications: http://digitalcommons.unf.edu/cgi/viewcontent.cgi?article=1044&context=ojii_volumes

Nirav A. Desai desai.nirav.12.09@gmail.com

81

Class S RF Power Amplifiers on GaN HEMTs


Class S RF Power Amplifiers with fully differential H-Bridge topology could give a theoretical 100% efficiency.

GaN HEMTs give the best high frequency switching characteristics.


The 2 features could be combined to give a high efficiency RF power amplifier topology. Reference: Ph.D. Dissertation of Stephan Maroldt, University of Freiburg

Nirav A. Desai desai.nirav.12.09@gmail.com

82

Microprocessor Design
The attached slides describe the design of a 16 bit Brent Kung Adder and 1024x16 asynchronous SRAM in 45 nM CMOS along with the design of a branch predictor and cache prefetch unit for a MIPS microprocessor. These design ideas could be combined with other ideas for pipeline design, ALU design and interconnect circuit design to give a full physical layer design of a MIPS microprocessor in 45nM CMOS. Various power reduction and clock gating techniques could be applied at a higher level of the hierarchy.

Nirav A. Desai desai.nirav.12.09@gmail.com

83

mm-wave MIMO OFDM


mm-wave MIMO OFDM could be used for wireless backhaul networks due to its high capacity mm-wave MIMO systems could be extended to 2x2, 4x4, 8x8, etc topologies to exploit spatial diversity and get higher data rate. Reference: 4 channel spatial multiplexing over a mm-wave line of sight link Microwave Symposium Digest, 2009. MTT '09. IEEE MTT-S International Date of Conference: 7-12 June 2009 Author(s): Sheldon, C. Dept. of Electr. & Comput. Eng., Univ. of California, Santa Barbara, CA, USA Munkyo Seo ; Torkildson, E. ; Rodwell, M. ; Madhow, U. Page(s): 389 - 392

Nirav A. Desai desai.nirav.12.09@gmail.com

84

Routing algorithm to reduce congestion


The routing algorithm to reduce congestion could be based on the idea of sparsity. High congestion nodes could be dropped from the network map till congestion on the node drops. The underlying packet streams would be using a flow control based routing protocol. Each node would store a map of the network which would be updated periodically using ping back messages. Could be applied to packet switched networks, traffic control and wireless sensor networks.

Nirav A. Desai desai.nirav.12.09@gmail.com

85

Photonic Computers
These could use multiplexer based logic gates. Photonic multiplexers have been widely researched and developed for optical communications. Phase detectors could be used to identify the phase and thus the value of the stored signal. These would use electronic charge storage and high speed electro-optic conversion. Reference: Prior research on this has been carried out in UCSB.

Nirav A. Desai desai.nirav.12.09@gmail.com

86

Potrebbero piacerti anche