Sei sulla pagina 1di 508

This page intentionally left blank

Coordinated Multi-Point in Mobile Communications


From Theory to Practice
A self-contained guide to coordinated multi-point (CoMP), this comprehensive book
covers everything from theoretical basics to practical implementation. Addressing a
wide range of topics, it highlights the potential gains of CoMP, the fundamental degrees
of freedom involved, and the key challenges of using CoMP in practice. The editors and
contributors bring unique real-world experience from running the worlds first and largest
test beds for LTE-Advanced, and recent field trial results are presented. With detailed
insight into the realistic potential of CoMP as a key technology for LTE-Advanced and
beyond, this is a must-read resource for professionals and students who want the big
picture on CoMP or require in-depth knowledge of how to build cellular communication
systems for the future.
Patrick Marsch was the technical project coordinator of the research
project EASY-C, where the worlds largest research test beds for
LTE-Advanced were established and the first live demonstrations of
CoMP were performed. He received his Dr.-Ing. degree from Technische
Universitt Dresden, where he later headed the system level group at the
Vodafone Chair, focusing on optimizing spectral efficiency and energy
efficiency in heterogeneous cellular deployments. He currently heads a radio research
team within Nokia Siemens Networks in Wrocaw, Poland.
Gerhard P. Fettweis is the Vodafone Chair Professor at Technische
Universitt Dresden, with 20 companies from around the world currently sponsoring his research on wireless transmission and chip design.
An IEEE Fellow, he runs the worlds largest cellular research test beds,
coordinated the EASY-C project, and has received numerous awards. He
began his career at IBM Research and has since developed nine start-up
companies (so far).

Interference is the limiting factor in cellular communications and smart coordination


of transmission can lead to significant improvements in quality of service. This book
provides a strong outline and lays out some of the fundamental assumptions and theoretical models to treat the subject and supports the theory with results from system-level
test benches and field measurements. I recommend this book to everyone interested in
the topic.
Siavash M. Alamouti, Vodafone Group R&D Director

Coordinated Multi-Point in
Mobile Communications
From Theory to Practice
Edited by

PAT RICK MARSCH


Nokia Siemens Networks, Wrocaw, Poland

GE RHARD P. FE T T WE IS
Technische Universitt Dresden, Germany

cambridge university pre s s


Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore,
So Paulo, Delhi, Tokyo, Mexico City
Cambridge University Press
The Edinburgh Building, Cambridge CB2 8RU, UK
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9781107004115
Cambridge University Press 2011
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2011
Printed in the United Kingdom at the University Press, Cambridge
A catalogue record for this publication is available from the British Library
ISBN 978-1-107-00411-5 Hardback
Additional resources for this publication at www.cambridge.org/9781107004115
Cambridge University Press has no responsibility for the persistence or
accuracy of URLs for external or third-party internet websites referred to in
this publication, and does not guarantee that any content on such websites is,
or will remain, accurate or appropriate.

Contents

List of Contributors
Acknowledgements
List of Abbreviations
Nomenclature and Notation

page xiii
xvii
xviii
xxiv

Part I Motivation and Basics

Introduction
1.1 Motivation
1.2 Aim of this Book
1.3 Classes of CoMP Considered
1.4 Outline of this Book

3
3
5
5
6

An
2.1
2.2
2.3
2.4

Information-Theoretic Basics
3.1 Observed Cellular Scenarios
3.2 Usage of OFDMA for Broadband Wireless Communications
3.3 Multi-Point Frequency-Flat Baseband Model Considered
3.4 Uplink Transmission
3.4.1 Basic Uplink Capacity Bounds
3.4.2 Full Cooperation in the Uplink
3.4.3 No Cooperation in the Uplink
3.4.4 Numerical Example
3.5 Downlink Transmission
3.5.1 Basic Downlink Capacity Bounds
3.5.2 Full Cooperation in the Downlink
3.5.3 No Cooperation in the Downlink
3.5.4 Numerical Example

Operators Point of View


The Mobile Internet - A Success Story so far
Requirements on Future Networks and Upcoming Challenges
The Role of CoMP
The Role of Field Trials

7
7
8
9
10
11
11
11
13
14
15
17
17
19
19
20
22
22
23

vi

3.6 Summary

24

Gains and Trade-Os of Multi-Cell Joint Signal Processing


4.1 Modeling Imperfect Channel State Information (CSI)
4.1.1 Imperfect CSI in the Uplink
4.1.2 Imperfect CSI in the Downlink
4.2 Gain of Joint Signal Processing under Imperfect CSI
4.3 Trade-Os in Uplink Multi-Cell Joint Signal Processing
4.3.1 Dierent Information Exchange and Cooperation Schemes
4.3.2 Numerical Results
4.3.3 Parallels between Theory and Practical Cooperation Schemes
4.4 Degrees of Freedom in Downlink Joint Signal Processing
4.5 Summary

25
25
25
27
29
32
32
35
37
37
38

Part II Practical CoMP Schemes

39

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.


5.1 DL Multi-User Beamforming with IRC
5.1.1 Introduction
5.1.2 Downlink System Model
5.1.3 Linear Receivers
5.1.4 Imperfect Channel Estimation
5.1.5 Resource Allocation and Fair User Selection
5.1.6 Single-Cell Performance
5.1.7 Multi-Cell Performance under Perfect CSI
5.1.8 Multi-Cell Performance under Imperfect CSI
5.1.9 Summary
5.2 Uplink Joint Scheduling and Cooperative Interference Prediction
5.2.1 Interference-Aware Joint Scheduling
5.2.2 Cooperative Interference Prediction
5.2.3 Practical Considerations
5.2.4 Applicability of Both Schemes to the Downlink
5.2.5 Summary
5.3 Downlink Coordinated Beamforming
5.3.1 Introduction
5.3.2 Single Receive Antenna at the Terminal
5.3.3 Multiple Receive Antennas at the Terminal
5.3.4 Summary

41
41
41
43
44
45
46
48
50
52
54
54
56
61
64
66
67
68
68
70
74
80

CoMP Schemes Based on Multi-Cell Joint Signal Processing


6.1 Uplink Centralized Joint Detection
6.1.1 Introduction
6.1.2 Joint Detection Algorithms

81
81
81
82

vii

6.1.3 Local BS Processing with Limited Backhaul Constraint


6.1.4 Local or Partial Decoding with Limited Backhaul Constraint
6.1.5 Provisions for Uplink Joint Processing in WiMax and LTE
6.1.6 Summary
6.2 Uplink Decentralized Joint Detection
6.2.1 Practical Decentralized Interference Cancelation Scheme
6.2.2 Performance Assessment
6.2.3 Summary
6.3 DL Distributed CoMP Approaching Centralized Joint Transmission
6.3.1 System Model
6.3.2 Theoretical Limits for Static Clustering and DPC
6.3.3 Practical (Linear) Precoding
6.3.4 Scheme for Distributed, Centralized Joint Transmission
6.3.5 Summary
6.4 Downlink Decentralized Multi-User Transmission
6.4.1 Decentralized Beamforming with Limited CSIT
6.4.2 Multi-cell Beamforming with Limited Data Sharing
6.4.3 Summary

87
90
92
93
94
95
104
108
108
110
111
113
115
121
121
122
130
136

Part III Challenges Connected to CoMP

137

Clustering
7.1 Static Clustering Concepts
7.1.1 Non-Overlapping Clusters
7.1.2 Overlapping Clusters
7.1.3 Resulting Geometries
7.2 Self-Organizing Clustering Concepts
7.2.1 Self-Organizing Network Concepts in 3GPP LTE
7.2.2 Adaptive Clustering Algorithms
7.2.3 Simulation Results
7.2.4 Signaling and Control Procedures
7.3 Summary

139
141
142
145
146
148
148
149
152
157
159

Synchronization
8.1 Synchronization Concepts
8.1.1 Synchronization Terminology
8.1.2 Network Synchronization
8.1.3 Satellite-Based Synchronization
8.1.4 Endogenous Distributed Wireless Carrier Synchronization
8.1.5 Summary
8.2 Imperfect Sync in Time: Perf. Degradation and Compensation
8.2.1 MIMO OFDM Transmission with Asynchronous Interference
8.2.2 Interf.-Aware Multi-User Joint Detection and Transmission

161
161
161
163
165
166
170
170
173
176

viii

8.2.3 System Level SINR Analysis


8.2.4 Summary
8.3 Imperfect Sync in Frequency: Perf. Degradation and Compensation
8.3.1 Downlink Analysis
8.3.2 Uplink Analysis
8.3.3 Summary

178
181
181
182
189
192

Channel Knowledge
9.1 Channel Estimation for CoMP
9.1.1 Channel Estimation - Single Link
9.1.2 Channel Estimation for CoMP
9.1.3 Multi-Cell Channel Estimation
9.1.4 Uplink Channel Estimation
9.1.5 Summary
9.2 Channel State Information Feedback to the Transmitter
9.2.1 Transmission Model
9.2.2 Sum-Rate Performance Measure
9.2.3 Channel Vector Quantization (CVQ)
9.2.4 Minimum Euclidean Distance Based CVQ
9.2.5 Maximum SINR Based CVQ
9.2.6 Pseudo-Maximum SINR based CVQ
9.2.7 Application to Zero-Forcing (ZF) Precoding
9.2.8 Resource Allocation
9.2.9 Simulation Results
9.2.10 Summary

193
193
194
202
204
206
208
208
210
211
211
213
214
215
216
216
216
218

10

Ecient and Robust Algorithm Implementation


10.1 Robust and Flexible Base Station Precoding Implementation
10.1.1 System Model
10.1.2 Transmit Filter Eigendecomposition
10.1.3 Transmit Filter Computations
10.1.4 The Order-Recursive Filter in Details
10.1.5 Example: SINR as Function of the Condition Number
10.1.6 Summary
10.2 Low-Complexity Terminal-Side Receiver Implementation
10.2.1 Introduction to Interference Rejection Combining (IRC)
10.2.2 IRC with Known Channel and Interference Covariance
10.2.3 Implementation Losses from Imperfect Channel Estimation
10.2.4 Losses from Spatial Interf.-and-Noise Covariance Estimation
10.2.5 Losses from Channel and Interference Estimation Errors
10.2.6 Summary

219
219
220
221
222
224
226
227
227
228
231
233
237
241
241

ix

11

Scheduling, Signaling and Adaptive Usage of CoMP


11.1 Centralized Scheduling for CoMP
11.1.1 Introduction
11.1.2 System Model
11.1.3 Centralized Scheduling Problems
11.1.4 Analyses and Results
11.1.5 Summary
11.2 Decentralized Radio Link Control and Inter-BS Signaling
11.2.1 Resource Allocation
11.2.2 Link Adaptation
11.2.3 Radio Link Measurements
11.2.4 Uplink Power Control
11.2.5 Uplink Timing Advance
11.2.6 HARQ-related Timing Constraints for UL CoMP
11.2.7 Handover
11.2.8 Inter-BS Signaling
11.2.9 Summary
11.3 Ad-hoc CoMP
11.3.1 Introduction
11.3.2 Ad-Hoc CoMP With More Accurate CSI
11.3.3 Ad-Hoc CoMP with CSI Impairments
11.3.4 Ad-Hoc CoMP and HARQ
11.3.5 Summary

243
243
243
244
246
251
254
254
255
256
257
258
259
259
262
262
266
266
267
269
273
275
276

12

Backhaul
12.1 Fund. Limits of Interf. Mitigation with Limited Backhaul Coop.
12.1.1 Introduction
12.1.2 Uplink Scenario: Receiver Cooperation
12.1.3 Downlink Scenario: Transmitter Cooperation
12.1.4 UL-DL Reciprocity and Generalized Degrees of Freedom
12.1.5 Summary
12.2 Backhaul Requirements of Practical CoMP Schemes
12.2.1 Types of Backhaul Data and Scaling Laws
12.2.2 Specic Backhaul Requirements of Exemplary CoMP Schemes
12.2.3 Backhaul Latency Requirements
12.2.4 Backhaul Topology Considerations
12.2.5 Summary
12.3 CoMP Backhaul Infrastructure Concepts
12.3.1 Ethernet
12.3.2 Passive Optical Network
12.3.3 Digital Subscriber Line
12.3.4 Microwave
12.3.5 The X2 Interface
12.3.6 Backhaul Topology Concepts

277
277
278
281
286
287
291
291
291
294
299
300
300
301
301
303
305
306
307
307

12.3.7 Summary

310

Part IV Performance Assessment

311

13

Field Trial Results


13.1 Real-time Impl. and Trials of Adv. Receivers and UL CoMP
13.1.1 Real-time Implementation and Lab Tests
13.1.2 Uplink Successive Interference Cancelation (SIC) Receiver
13.1.3 Uplink Macro Diversity Trials with Distributed RRHs
13.1.4 Summary
13.2 Assessing the Gain of Uplink CoMP in a Large-Scale Field Trial
13.2.1 Measurement Setup
13.2.2 Signal Processing Architecture and Evaluation Concept
13.2.3 Noise Estimation
13.2.4 Channel Equalization
13.2.5 Field Trial Results
13.2.6 Summary
13.3 Real-time Implementation and Field Trials for Downlink CoMP
13.3.1 Introduction
13.3.2 Enabling Features
13.3.3 Real-time Implementation
13.3.4 Field Trials
13.3.5 Summary
13.4 Predicting Pract. Achievable DL CoMP Gains over Larger Areas
13.4.1 Setup and Closed-Loop System Design
13.4.2 Measurement and Evaluation Methodology
13.4.3 Measurement Campaign
13.4.4 Summary
13.5 Lessons Learnt Through Field Trials

313
313
314
314
317
319
319
320
321
322
322
325
330
331
332
334
346
347
352
353
353
356
358
363
364

14

Performance Prediction of CoMP in Large Cellular Systems


14.1 Simulation and Link-2-System Mapping Methodology
14.1.1 General Simulation Assumptions and Modeling
14.1.2 Channel Models and Antenna Models
14.1.3 Transceiver Techniques
14.1.4 Link-to-System Interface
14.1.5 Key Performance Indicators
14.1.6 Summary
14.2 Obtaining Chn. Model Params. via Chn. Sounding or Ray-Tracing
14.2.1 Large-Scale-Parameters
14.2.2 Measurement-based Parameter Estimation
14.2.3 Ray-Tracing based Parameter Simulation
14.2.4 Comparison between Measurements and Ray-Tracing

367
367
368
370
373
373
375
376
376
377
380
380
382

xi

14.2.5 Summary
14.3 Uplink Simulation Results
14.3.1 Compared Schemes
14.3.2 Simulation Assumptions and Parameters
14.3.3 Backhaul Trac
14.3.4 Simulation Results
14.3.5 Summary
14.4 Downlink Simulation Results
14.4.1 Compared Schemes
14.4.2 Simulation Assumptions and Parameters
14.4.3 Detailed Analysis of Coordinated Scheduling/Beamforming
14.4.4 Backhaul Trac
14.4.5 Simulation Results
14.4.6 Summary

387
387
387
389
391
392
395
396
396
397
398
406
406
408

Part V Outlook and Conclusions

409

15

411
411
412
416
418
422
423
423
424
425
427
428
428
429
430
432
432
432
433
435
443
444
445
447
448
449

Outlook
15.1 Using CoMP for Terminal Localization
15.1.1 Localization based on the Signal Propagation Delay
15.1.2 Further Localization Methods
15.1.3 Localization in B3G Standards
15.1.4 Summary
15.2 Relay-Assisted Mobile Communication using CoMP
15.2.1 Introduction
15.2.2 Reference Scenario
15.2.3 System and Protocol Description
15.2.4 Trade-Os in Relay Networks
15.2.5 Numerical Evaluation of CoMP and Relaying
15.2.6 Cost/Benet Trade-O
15.2.7 Energy/Benet Trade-O
15.2.8 Computation/Transmission Power Trade-O
15.2.9 Summary
15.3 Next Generation Cellular Network Planning and Optimization
15.3.1 Introduction
15.3.2 Classical Cellular Network Planning and Optimization
15.3.3 Physical Characterization of Capacity Gains through CoMP
15.3.4 Summary
15.4 Energy-Eciency Aspects of CoMP
15.4.1 System Model
15.4.2 Eective Transmission Rates
15.4.3 Backhauling
15.4.4 Energy Consumption of Cellular Base Stations

xii

15.4.5 System Evaluation


15.4.6 Summary
16

451
453

Summary and Conclusions


16.1 Summary of this Book
16.1.1 Most Promising CoMP Schemes and Potential Gains
16.1.2 Key Challenges Identied
16.2 Conclusions
16.2.1 About this Book
16.2.2 CoMPs Place in the LTE-Advanced Roadmap and Beyond

455
455
455
457
458
459
460

References
Index

461
479

List of Contributors

Amin, M. Awais
Bachl, Rainer
Bhagavatula, Ramya
Boccardi, Federico
Brown III, D. Richard
Br
uck, Stefan
Calin, Doru
Chae, Chan-Byoung
Dammann, Armin
Dekorsy, Armin
Dietl, Guido
Doll, Mark
dos Santos, Ricardo B.
Dotsch, Uwe
Droste, Heinz
Fahldieck, Torsten
Falconetti, Laetitia
Fehske, Albrecht
Fettweis, Gerhard
Fischer, Erik
Forck, Andreas

Qualcomm CDMA Technologies GmbH,


Nuremberg, Germany
ST-Ericsson AT GmbH, Nuremberg, Germany
University of Texas at Austin, TX, USA
Alcatel-Lucent Bell Labs, Stuttgart, Germany
Worcester Polytechnic Institute, MA, USA
Qualcomm CDMA Technologies GmbH,
Nuremberg, Germany
Alcatel-Lucent Bell Labs, Murray Hill, NJ, USA
Yonsei University, Korea
Institute of Communications and Navigation,
German Aerospace Center (DLR), Germany
Institute for Telecomms. and High-Frequency
Techniques, University of Bremen, Germany
DOCOMO Euro-Labs, Munich, Germany
Alcatel-Lucent Bell Labs, Stuttgart, Germany
Federal University of Cear
a, Brazil
Alcatel-Lucent Bell Labs, Stuttgart, Germany
Deutsche Telekom Laboratories, Darmstadt,
Germany
Alcatel-Lucent Bell Labs, Stuttgart, Germany
Ericsson Research, Aachen, Germany
Vodafone Chair, Technische Universitat Dresden,
Germany
Vodafone Chair, Technische Universitat Dresden,
Germany
Vodafone Chair, Technische Universit
at Dresden,
Germany
Fraunhofer Institute for Telecommunications,
Heinrich Hertz Institute, Berlin, Germany

xiv

List of Contributors

Frank, Philipp
Fritzsche, Richard
Garavaglia, Andrea
Gesbert, David
Giese, Jochen
Grieger, Michael
Haustein, Thomas
Heath Jr., Robert W.
Holfeld, Jorg
Hoymann, Christian
Irmer, Ralf
Jackel, Stephan
Jandura, Carsten
Jungnickel, Volker
Kadel, Gerhard
Klein, Andrew G.
Klein, Anja
Koppenborg, Johannes
Kotzsch, Vincent
Maciel, Tarcisio F.
Marsch, Patrick
Mayer, Hans-Peter
Mensing, Christian
Molisch, Andreas F.
M
uller-Weinfurtner,
Stefan

Deutsche Telekom Laboratories, Berlin, Germany


Vodafone Chair, Technische Universitat Dresden,
Germany
Qualcomm CDMA Technologies GmbH,
Nuremberg, Germany
EURECOM - Mobile Communications Department,
Sophia-Antipolis, France
Qualcomm CDMA Technologies GmbH,
Nuremberg, Germany
Vodafone Chair, Technische Universitat Dresden,
Germany
Fraunhofer Institute for Telecommunications,
Heinrich Hertz Institute, Berlin, Germany
University of Texas at Austin, TX, USA
Vodafone Chair, Technische Universitat Dresden,
Germany
Ericsson Research, Aachen, Germany
Vodafone Group R&D, Newbury, UK
Fraunhofer Institute for Telecommunications,
Heinrich Hertz Institute, Berlin, Germany
Actix GmbH, Dresden, Germany
Fraunhofer Institute for Telecommunications,
Heinrich Hertz Institute, Berlin, Germany
Deutsche Telekom Laboratories, Darmstadt,
Germany
Worcester Polytechnic Institute, MA, USA
Technische Universitat Darmstadt, Germany
Alcatel-Lucent Bell Labs, Stuttgart, Germany
Vodafone Chair, Technische Universitat Dresden,
Germany
Federal University of Cear
a, Brazil
Nokia Siemens Networks, Wroc:law, Poland
Alcatel-Lucent Bell Labs, Stuttgart, Germany
Institute of Communications and Navigation,
German Aerospace Center (DLR), Germany
Department of Electrical Engineering, University of
Southern California, Los Angeles, CA, USA
ST-Ericsson AT GmbH, Nuremberg, Germany

List of Contributors

M
uller, Andreas
Olbrich, Michael
Palleit, Nico
Rost, Peter
Sand, Stephan
Schellmann, Malte
Schneider, Christian
Schulist, Matthias
Thiele, Lars
Tian, Yafei
Tse, David
Utschick, Wolfgang
Voigt, Jens
Wachsmann, Udo
Wahls, Sander
Wang, I-Hsiang
Weber, Andreas
Weber, Ralf
Weber, Tobias
Wei, Xinning
Wild, Thorsten
Wirth, Thomas
Yang, Chenyang

xv

Institute of Telecommunications, University of


Stuttgart, Germany
Fraunhofer Institute for Telecommunications,
Heinrich Hertz Institute, Berlin, Germany
Institute of Communications Engineering,
University of Rostock, Germany
NEC Laboratories Europe, Heidelberg, Germany
Institute of Communications and Navigation,
German Aerospace Center (DLR), Germany
Huawei Technologies D
usseldorf GmbH, European
Research Center, Munich, Germany
Ilmenau University of Technology, Germany
Qualcomm CDMA Technologies GmbH,
Nuremberg, Germany
Fraunhofer Institute for Telecommunications,
Heinrich Hertz Institute, Berlin, Germany
School of Electronics and Information Engineering,
Beihang University, China
Wireless Foundations, University of California at
Berkeley, CA, USA
Associate Institute for Signal Processing,
Technische Universit
at M
unchen, Germany
Actix GmbH, Dresden, Germany
ST-Ericsson AT GmbH, Nuremberg, Germany
Fraunhofer Institute for Telecommunications,
Heinrich Hertz Institute, Berlin, Germany
Wireless Foundations, University of California at
Berkeley, CA, USA
Alcatel-Lucent Bell Labs, Stuttgart, Germany
Qualcomm CDMA Technologies GmbH,
Nuremberg, Germany
Institute of Communications Engineering,
University of Rostock, Germany
Institute of Communications Engineering,
University of Rostock, Germany
Alcatel-Lucent Bell Labs, Stuttgart, Germany
Fraunhofer Institute for Telecommunications,
Heinrich Hertz Institute, Berlin, Germany
School of Electronics and Information Engineering,
Beihang University, China

xvi

List of Contributors

Zakhour, Randa
Zirwas, Wolfgang

Electrical and Electronic Engineering Department,


University of Melbourne, Australia
Nokia Siemens Networks GmbH & Co. KG,
Munich, Germany

Acknowledgements

This book is based on the knowledge and eort of a large number of authors,
some of whom have been working in the eld of CoMP for over a decade. The
editors would like to thank all contributors for their great cooperation in the last
months, their constructive discussions on contents, notation and nomenclature,
and their patience in ne-tuning contents up to the last minute of editing.
The request for searching for new limits of cellular beyond 3G came from
Vodafone Group R&D, initiating our research in the area of CoMP. As the
sponsor of the Vodafone Chair at Technische Universit
at Dresden, Vodafone
Group R&D has been instrumental in sharpening our view for CoMP schemes
with practical impact. In particular Mike Walker, Trevor Gill and Luke Ibbetson
among many others have been of great help in serving as a sounding board for
our ideas. As a result, we have focused our research on theoretical limits as
well as practical implementation challenges. The result of this view on CoMP
technology has provided the basis for what has nally led to this book.
Nothing would be possible without interaction with friends, colleagues, fellow
researchers and cooperation partners. The mindset and openness of our scientic
community is a platform for inspiration and motor for sharpening our minds. In
particular, the team at the Vodafone Chair has been of invaluable help in creating
scientic results and providing the framework for inspirations, discussions, and
many new insights. Thanks to the whole team for this major help!
While most parts of this book were mutually reviewed by the authors themselves, the editors would like to thank the following external reviewers for their
valuable feedback: Fabian Diehm, Alexandre Gouraud, Ines Kluge, Marco Krondorf, Eckhard Ohlmer, Simone Redana, Fred Richter, Hendrik Schoneich, Mikael
Sternad, Vinay Suryaprakash, Tommy Svensson, Stefan Valentin, Raphael Visoz,
Guillaume Vivier and Steen Watzek. Also, the appearance of the book would
not be as it is without the signicant work of Katharina Philipp, who adapted
the majority of gures in this book to the same look and feel.
Last but surely not least, the editors would like to thank Phil Meyler and
Sarah Finlay from Cambridge University Press for making this book possible,
and for the great and patient support during its creation.
Patrick Marsch and Gerhard Fettweis (Editors), January 2011

List of Abbreviations

ACK
ADC
AGC
aGW
ANR
AoA
AWGN
bpcu
BC
BER
BF
BLER
BPSK
BS
CAZAC
CB
CCU
CD
CDF
CDI
CDM
CDMA
CFO
CGI
CIF
CIR
CoMP
CP
CPRI
CQI
CRC
CRLB
CRS
CS

acknowledgement
analog to digital conversion
automatic gain control
advanced gateway
automatic neighbor relation
angle of arrival
additive white Gaussian noise
bits per channel use
broadcast channel
bit error rate
beamforming
block error rate
binary phase shift keying
base station
constant amplitude zero autocorrelation codes
coordinated beamforming
CoMP central unit
Cholesky decomposition
cumulative distribution function
channel direction indicator
code division multiplex
code division multiple access
carrier frequency oset
cell global identier
compressed interference forwarding
channel impulse response
coordinated multi-point
cyclic prex
common public radio interface
channel quality indicator
cyclic redundancy check
Cramer-Rao lower bound
common reference signal
coordinated scheduling

List of Abbreviations

CS/CB
CSG
CSI
CSIR
CSI RS
CSIT
CSU
CT
CTF
CU
CVQ
DAS
DBA
DF
DFT
DIS
DL
DM
DPC
DRS
DSL
DSLAM
DSP
EASY-C
eNB
EOC
ERC
EPON
E-UTRAN
EVD
EvDO
FDD
FDM
FEC
FFT
FIR
FPGA
FTP
g.d.o.f.
GF
GSM
GPON
GPRS

coordinated scheduling / coordinated beamforming


closed subscriber group
channel state information
channel state information at the receiver
CSI reference signal
channel state information at the transmitter
central scheduling unit
conventional transmission
channel transfer function
central unit
channel vector quantization
distributed antenna system
dynamic bandwidth assignment
decode-and-forward
discrete Fourier transform
distributed interference subtraction
downlink
device manager
dirty paper coding
demodulation reference signal
digital subscriber line
DSL access multiplexer
digital signal processor
Enables of Ambient Services and Systems - Part C
enhanced Node B
eigenmode-aware optimum combiner
eigenmode-aware receive combining
Ethernet PON
enhanced UMTS terrestrial radio access network
eigenvalue decomposition
evolution data optimized
frequency division duplex
frequency division multiplex
forward error correction
fast Fourier transform
nite impulse response
eld programmable gate array
le transfer protocol
generalized degrees of freedom
geometry factor
global system for mobile communications
Gigabit-capable PON
general packet radio service

xix

xx

List of Abbreviations

GTC
GPS
GTP-U
HARQ
H-BLAST
HK
HPBW
HSPA
IAP
IC
ICI
ICIN
IDFT
IF
i.i.d.
IEEE
IFFT
INR
IP
IRC
ISD
ISI
JD
JT
LAN
LDC
LLR
LMMSE
LO
LOS
LSP
LSU
LTE
LTE-A
MAC
MAN
MCS
MET
MIESM
MIMO
MISO
MF
ML
MLE

GPON transmission convergence


global positioning system
GTP user plane
hybrid automatic repeat request
Horizontal Bell Laboratories Layered Space-Time Architecture
Han-Kobayashi
half-power beamwidth
high-speed packet access
interference-aware precoding
interference channel
inter-carrier interference
inter-cell interference nulling
inverse discrete Fourier transform
intermediate frequency
independently and identically distributed
Institute of Electrical and Electronics Engineers
inverse fast Fourier transform
interference-to-noise ratio
Internet protocol
interference rejection combining
inter-site distance
inter-symbol interference
joint detection
joint transmission
local area network
linear deterministic channel
log-likelihood ratio
linear minimum mean square error
local oscillator
line-of-sight
large-scale parameters
LTE signal processing unit
Long Term Evolution
Long Term Evolution Advanced
multiple access channel
metropolitan area network
modulation and coding scheme
maximum Eigenvalue transmission
mutual information equivalent SINR mapping
multiple-input multiple-output
multiple-input single-output
matched lter
maximum likelihood
maximum likelihood estimator

List of Abbreviations

MME
MMSE
MPC
MRC
MRM
MRT
MS
MSE
MUI
MU-MIMO
NGMN
NLOS
NMEA
NR
NRT
NTP
OAM
OC
OCXO
ODN
OFDM
OFDMA
OLT
ONU
PA
PAPR
PCI
PDF
PDH
PDCCH
PDSCH
PDP
PIC
PLL
PMI
ppb
ppm
PPS
PON
POTS
PRB
PRS
PTP
PUCCH

mobility management entity


minimum mean square error
multi-path component
maximum ratio combining
measurement report message
maximum ratio transmission
multiple stream
mean square error
multi-user interference
multi-user MIMO
next generation mobile networks
non line-of-sight
National Marine Electronics Association
neighbor relation
neighbor relation table
network time protocol
operation and maintenance
optimum combining
oven-controlled crystal oscillator
optical distribution network
orthogonal frequency division multiplex
orthogonal frequency division multiple access
optical line termination
optical network unit
power amplier
peak-to-average power ratio
physical cell identier
probability distribution function
plesiochronous digital hierarchy
physical downlink control channel
physical downlink shared channel
power delay prole
parallel interference cancelation
phase-locked loop
precoding matrix indicator
parts per billion
parts per million
pulses per second
passive optical network
plain old telephone service
physical resource block
positioning reference signal
precision time protocol
physical uplink control channel

xxi

xxii

List of Abbreviations

PUSCH
QAM
QoE
QoS
QPSK
RAN
RAP
RB
RE
RF
RI
RHS
RMS
RN
RNTI
RoF
RRH
RRM
RS
RSS
RSRP
RTOA
RTT
SC
SC-FDMA
SCM
SCME
SCTP
SDH
SDMA
SDIV
S-GW
SIC
SINR
SIR
SISO
SMUX
SON
SONET
SS
SSB
SSP
SNR
SU-MIMO

physical uplink shared channel


quadrature amplitude modulation
quality of experience
quality of service
quadrature phase shift keying
radio access network
radio access point
resource block
resource element
radio frequency
rank indicator
right-hand side
root mean square
relay node
radio network temporary identier
radio over bre
remote radio head
radio resource management
reference signal
received signal strength
reference signal received power
round-trip time of arrival
round-trip time
sub-carrier
single carrier frequency domain multiple access
spatial channel model
spatial channel model extended
stream control transmission protocol
synchronous digital hierarchy
spatial division multiple access
spatial diversity
serving gateway
successive interference cancelation
signal-to-interference-and-noise ratio
signal-to-interference ratio
single-input single-output
spatial multiplexing
self-organizing network
synchronous optical network
single stream
single side band
smale-scale parameters
signal-to-noise ratio
single-user MIMO

List of Abbreviations

SVD
SynchE
TB
TCI
TDD
TDM
TDMA
TDOA
THP
TOA
TTI
UDP
UCA
UE
UL
ULA
UMTS
UTRAN
VDSL
VID
VLAN
VoIP
WAN
WCDMA
WCI
WiMAX
WF
WINNER
WSSUS
XGPON
ZF

singular value decomposition


synchronous Ethernet
transport block
target cell identier
time division duplex
time division multiplex
time division multiple access
time delay of arrival
Tomlinson-Harashima precoding
time of arrival
transmit time interval
user datagram protocol
uniform circular array
user equipment
uplink
uniform linear array
universal mobile telecommunications standard
universal terrestrial RAN
very-high-speed digital subscriber line
VLAN identier
virtual local area network
voice over IP
wide area network
wideband code division multiple access
worst companion indicator
Worldwide Interoperability for Microwave Access
Wiener lter
Wireless World Initiative New Radio
wide sense stationary uncorrelated scattering
10-Gigabit-capable PON
zero-forcing

xxiii

Nomenclature and Notation

Nomenclature
In this book, we generally consider the setup and involved nomenclature depicted
in Fig. 3.1 on page 13. Please note that we assume a site to consist of three sectors,
which are equivalent to cells. Each sector or cell is assumed to be served by one
dedicated base station (BS), even though in practice multiple such BSs may be
integrated into one physical device.

CoMP Scheme Classication


Throughout the book, CoMP schemes are classied on one hand according to
the extent of cooperation between BSs. We here distinguish between
interference-aware transmission and detection (possibly with estimation of
interference, but without explicit BS cooperation)
interference coordination (e.g. joint multi-cell scheduling, coordinated beamforming etc.)
multi-cell joint signal processing (e.g. joint detection or joint transmission)
We further distinguish between decentralized and centralized CoMP schemes,
depending on where the subject of cooperation takes place. This classication is
applied to various schemes observed in this book in Table 1.

Notation
Unless stated otherwise, the following holds throughout most parts of the book:

Calligraphic letters (e.g., M) represent sets


Capital, italic letters, (e.g. Pmax ) denote constants
Bold-face, capital letters (e.g. H) represent matrices
Bold-face, lowercase letters (e.g. h) represent vectors
Lowercase, italic letters represent scalars.
denote estimates
Variables with a hat on top (e.g. H)
denote an eective expression
Variables with a bar on top (e.g. H)
denote expressions in time domain,
Variables with a tilde on top (e.g. H
whereas other expressions are usually in frequency domain, see Section 8.2).

Nomenclature and Notation

xxv

Table 1. Classication of CoMP schemes.

Interferenceaware
transmission/
detection
Interference
coordination

Multi-cell
joint signal
processing

Decentralized
DL multi-user
beamforming with IRC
(Sections 5.1, 13.3)
IRC (Section 10.2)
UL cooperative interf.
prediction (Sections 5.2.2, 14.3)
DL coordinated
sched. / beamforming (CS/CB)
(Sections 5.3, 14.4.3)
UL decentralized
joint detection
(Sections 6.2, 13.1, 14.3)
UL distr. interference
subtraction (Sections 4.3.1, 13.2)
DL distributed
joint transmission
(Sections 6.3, 13.3, 13.4)

Centralized

UL joint scheduling
(Sections 5.2.1, 14.3)
DL centralized
joint scheduling
(Section 11.1)
UL centralized
joint detection
(Sections 6.1, 13.2)
DL centralized
joint transmission
(Sections 6.3,13.3)

The following variables are frequently used throughout the book:

H, h, or h denote (matrices or vectors of) channel coecients


x or x are signals to be transmitted, before precoding
s or s are signals to be transmitted, after precoding
y or y are received signals
W or w are transmit/receive lters used at the BS side
G or g are transmit/receive lters used at the UE side
c typically denotes a cluster index
k and j typically denote user indices
m typically denotes a base station index
t, and i denote time indices, where t and are time-continuous, and i is a
discrete sample index
f and q denote frequency indices, where f is frequency-continuous, and q is a
discrete sub-carrier index
o denotes an OFDM symbol index
As in most publications, ()H denotes Hermitian matrix transpose, tr{}
denotes the trace of a matrix, | | denotes set size when applied to a set, or
determinant when applied to a matrix. E{} denotes expectation value. I is an
identity matrix.

Part I
Motivation and Basics

Introduction
Patrick Marsch and Gerhard Fettweis

1.1

Motivation
Mobile communication has gained signicant importance in todays society. As
of 2010, the number of mobile phone subscribers has surpassed 5 billion [ABI10],
and the global annual mobile revenue is soon expected to top $1 trillion [Inf10].
While these numbers appear promising for mobile operators at rst sight, the
major game-changer that has come up recently is the fact that the market is
more and more driven by the demand for mobile data trac [Cis10]. This is simply because Moores law in semiconductors leads to continuously more powerful
mobile devices with larger storage capacity, which in the era of Web 2.0 require
regular synchronization with the Internet. Consequently, Moores law can also
be found in the increase of data rates in wireless communications, as illustrated
in Fig. 1.1. The main challenge, however, is that mobile users tend to expect
the fast and cheap Internet access that they are used from their xed lines (e.g.
ADSL), but anytime and anywhere while being on the move. This puts mobile
operators under the pressure to respond to the increasing trac demand and
provide a more homogeneous quality of experience (QoE) over the area (often
referred to as improved fairness), while continuously decreasing cost per bit - and
addressing the more and more crucial issue of energy eciency [FMBF10].
But how can mobile data rates and fairness be increased in general? We have
to be aware that current cellular systems are mainly limited by inter-cell interference [GK00] - especially in urban areas where the rate demand is largest and
hence base station deployment is dense. Here, each point-to-point communication link is characterized by a certain ratio of desired receive signal power over
interference and noise power, where Shannon [Sha48] states a clear upper bound
on the capacity of the link. This then translates to a maximum spectral eciency, i.e. the maximum data rate achievable for a given bandwidth. In fact,
the standard Long Term Evolution (LTE) Release 8 [McC07] uses modulation
and coding schemes and link adaptation in conjunction with hybrid automatic
repeat request (HARQ) that allow to approach Shannon capacity to within less
than a dB at reasonable complexity [LS06]. Hence, the increasing rate demand
can surely not be met by improving point-to-point links, but requires other innovations. But which further options do we have?

Introduction

Figure 1.1 Exponential growth of data rates in mobile communications.

Use more spectrum. An option which is currently already being pursued, as


visible through recent auctions on spectrum becoming available via the digital
dividend. Especially spectrum aggregation is of interest, i.e. the capability of
radio access networks to use non-continuous blocks of spectrum. While capacity grows linearly with bandwidth, eciently usable spectrum is generally a
limited resource, and hence cannot be the sole source of rate growth.
Use more antennas - an option that is already used since high-speed packet
access (HSPA) and is a main feature of LTE. So-called multiple-input multipleoutput (MIMO) techniques allow to obtain additional degrees of freedom,
which can be used to spatially separate desired from interfering signals and/or
for spatial multiplexing, besides yielding an additional source of diversity. In
theory, capacity grows linearly in the minimum number of transmit and receive
antennas [Tel99]. However, the number of antennas at base station side is
usually limited due to regulatory or site rental issues, and that at the terminal
side due to form factor and cost reasons. Further, practical multiplexing gains
saturate at some point due to unavoidable antenna correlation.
Increase the degree of sectorization. An alternative is to use directed
base station antennas in order to obtain a larger quantity of smaller cells with
less mutual interference. This is already done since global system for mobile
communications (GSM), where 3-fold sectorization is typically applied, but
one could principally imagine increasing sectorization [RRMF10], up to the
case where each user is served with a dedicated beam.
Using more base stations or introducing relays and micro/femto
cells is clearly the strongest driver towards increased data rates, which also
allows to improve energy and cost eciency [MFF10]. Both relays and femto
cells further allow to strongly improve indoor coverage, which is a major downside of conventional cellular systems.
Introduce coordination or cooperation between cells. While most previously stated options require the deployment of new equipment, it is known
from theory that interference can be overcome and even exploited if coordination or cooperation between cells is introduced. Such schemes are particularly
interesting, as they require a fairly small change of infrastructure, and may

1.2 Aim of this Book

lead to a more homogeneous quality of service (QoS) distribution over the


area [MKF06]. For this reason, multi-cell coordination or cooperation has
been identied as a key technology of LTE-Advanced [PDF+ 08].
This book focuses on the latter aspect, using the term coordinated multipoint (CoMP). First CoMP approaches were proposed in [BMWT00, SZ01],
where the idea was to let multiple base stations jointly transmit to multiple terminals, eectively exploiting interference to obtain large gains in spectral eciency
and fairness [And05, KFVY06]. In the uplink, multiple base stations can cooperatively detect multiple terminals [WBOW00], promising similar gains [MKF06].
While the previous examples are cases of multi-cell joint signal processing, CoMP
may also refer to schemes with a lesser extent of cooperation between base stations, for example joint scheduling or interference aware transmission and detection. In principle, it is also benecial to let terminals cooperate [SSS+ 07b], but it
(so far) appears dicult to explain to a mobile user why his or her handset battery is being depleted in order to enhance other users data rates. Cooperation
between terminals is hence not covered by this book.

1.2

Aim of this Book


This book provides a comprehensive overview on various CoMP techniques. It
introduces information-theoretic concepts needed to understand and assess the
principle degrees of freedom and gains expectable from CoMP, but also covers
practical CoMP algorithms and addresses a multitude of challenges connected
to their usage. A strong emphasis on implementation aspects and eld trial
results from the worlds largest cellular research test beds gives the reader a
detailed insight into the realistic potential of CoMP within the roadmap of LTEAdvanced and beyond, and the associated price and eort that have to be taken
into consideration. The book provides the thorough detail required by scholars
and professionals from industry or academia who aim at implementing or using
CoMP themselves, but also serves as a reference book for the occasional reader.

1.3

Classes of CoMP Considered


As stated before, the term CoMP may refer to a multitude of schemes. All have
in common that intra- or inter-cell interference is somehow taken into account
or even exploited to enhance data rates and/or fairness. In this work, we classify
CoMP schemes on one hand according to the extent of cooperation (or information exchange) taking place between cells:
Non-cooperative, but interference aware transceiver schemes, where
base stations or terminals adjust their transmit or receive strategy according
to some knowledge on interference. This does not require explicit information

Introduction

exchange between cells, but the estimation of interference must be enabled


through appropriate reference signal design. This class of schemes includes
single-cell multi-user signal processing, as used in LTE Release 8 [McC07].
Interference coordination schemes, where limited data is exchanged
between cells for the purpose of multi-cell cooperative scheduling, multi-cell
interference-aware link adaptation, or multi-cell interference-aware precoding.
Joint signal processing schemes, where user data or (partially) processed
transmit or receive signals are exchanged among base stations. One here considers non-coherent and coherent schemes, where the latter aim at aligning
the phases of signals transmitted from or received at dierent antennas. As
we will see, this requires precise synchronization between all involved entities.
We will later also distinguish between decentralized and centralized CoMP
schemes, referring to where the subject of cooperation takes place.

1.4

Outline of this Book


The book is structured into the following 5 parts:
Part I - Motivation and Basics motivates the topic from a technical
and economical point of view in Chapters 1 and 2, respectively, and provides information-theoretic basics and a rst insight into potential gains and
trade-os of multi-cell joint signal processing in Chapters 3 and 4.
Part II - Practical CoMP Schemes introduces various specic CoMP algorithms, where Chapter 5 focusses on interference-aware transceiver schemes
and interference coordination, and Chapter 6 on multi-cell joint signal processing schemes, as classied before in Section 1.3.
Part III - Challenges Connected to CoMP addresses various issues
regarding the usage of CoMP in practice. Chapter 7 deals with nding clusters
of cells in which CoMP is performed, whereas Chapters 8 and 9 cover the crucial aspects of synchronization and channel estimation and feedback, respectively. Chapter 10 highlights practical implementation aspects such as numerical stability and scalability. Chapter 11 investigates an adaptive, situationdependent usage of CoMP, while Chapter 12 discusses the additional backhaul
infrastructure required for CoMP itself and any required signaling.
Part IV - Performance Assessment discusses CoMP eld trial results in
Chapter 13. As eld trials are usually limited to the observation of exemplary
multi-point links under exemplary interference conditions, the prediction of
CoMP performance in large-scale systems requires system-level simulations,
for which both methodology and results are covered in Chapter 14.
Part V - Outlook and Conclusions nally discusses the usage of CoMP for
other purposes than rate and fairness improvements, and elaborates the usage
of CoMP in conjunction with relaying or heterogeneous cellular deployment
in Chapter 15. The book is then concluded in Chapter 16.

An Operators Point of View


Ralf Irmer

2.1

The Mobile Internet - A Success Story so far


When 3G was launched initially with WCDMA technology (Release 99), it was
rather a disappointment with not many services being successful. Some years
later, the mobile Internet took o when a number of factors came together:
HSPA as a technological evolution of 3G with low latency and higher data
rate
Attractive at-rate price plans by mobile operators
Availability of mobile broadband hardware in terms of dongles and built-in
3G modules in notebooks
Smart phones with attractive user interfaces, e.g., iPhone, Android
Complete country-coverage with HSPA and HSPA+ by mobile operators.
This take-up of the mobile Internet generated substantial additional revenues
for mobile operators, at a time when voice and text message revenues started to
decline in saturated markets such as Europe. For example, Vodafone had a data
revenue growth of 19% in nancial year 2009/2010, with more than A
C4 Billion
generated by non-SMS data. Today, only 11% of phones are smartphones, but
by 2013 it is expected that more than a third of all active phones within the
Vodafone network will be smartphones.
This data revenue growth comes along with a cost for mobile operators namely data trac growth. Fig. 2.1 shows the actual and projected trac growth
for Vodafones European networks in Petabytes/year [Vod10]. It can be seen that
data trac has substantially surpassed voice trac.
Mobile operators have some levers to cope with the growth in trac in the
short term, including:
Technology upgrade, i.e. more ecient versions of HSPA or launch of LTE
Cost reduction, i.e. network sharing, more ecient network operation, and
exploitation of economy of scale
Spectrum re-farming and acquisition of new spectrum
Trac management, i.e. enforcement of fair usage policies and launch of differentiated data bundles

An Operators Point of View

Figure 2.1 Data trac in Vodafones European networks in petabytes.

Network management, i.e. building of new sites, provisioning of additional


carriers or ooad of trac to femto cells or WiFi.
In the long term, however, the research community and the industry is
required to come up with more fundamental approaches on how to serve mobile
data at the right location in the most cost and energy ecient way.

2.2

Requirements on Future Networks and Upcoming Challenges


In 2006, a group of operators published the white paper on Next Generation
Networks beyond HSPA & EvDO [ABG+ 06], which lists the high-level requirements on future networks, and an accompanying document listing the detailed
technical assessment criteria [IAL+ 07]. Some of the important requirements are:

Improved average and cell-edge spectral eciency


Low latency
Simplicity, reliability and total cost of ownership
Flat architecture

Most of the requirements are already addressed with LTE, which is being
commercialized in 2010 in its rst release. However, there is a need to develop
LTE beyond the rst release, in order to address customer and operator requirements. The challenges faced by mobile communications in the second decade of
the 21st century are the following:
Exploding data volume - This is driven by attractive services, at-rate pricing and user-friendly devices. The most prominent example is the iPhone - which
resulted in a 10x trac increase. IPTV, 3D Internet, real-time web, and cloud
services will result in step changes in data consumption. IBM is predicting the
generation of 16 TB/person/year by 2020. The challenge is that networks need
to be structured to cope with data volume explosion without a cost or energy
explosion or constant need for equipment upgrades, as illustrated in Fig. 2.2.

2.3 The Role of CoMP

Cost

Traffic

Data

Revenue

Voice
Time

Figure 2.2 The growing gap between trac and revenue.

Increased data rates - Driven by new services and the evolution from DSL
( 2 Mbps) to variants of bre technology (100 Mbps to 1 Gbps), the user expectation of acceptable Internet speed will rise substantially in line with the expectation set by bre networks and thus posing a challenge to wireless technologies.
Ubiquitous indoor coverage - Many data services are important for indoor
users and people are usually within buildings. Indoor coverage is therefore important and can be either provided by copper/bre with local radio distributions
(femto cell or WiFi) or from cellular networks.
Ubiquitous outdoor coverage - For voice calls, the user expectation has
moved from making calls along major roadways in the 1990s to being reachable
all the time in any building. Mobile Internet based on 3G or WiFi today can
only be characterized as best-eort, without continual connectivity whilst onthe-move and with patchy coverage in many places. In 2020, business customers
and consumers will rely more on data connectivity - they will need connectivity
anywhere, anytime. Coverage with a minimum guaranteed data rate and hence
reliability will be a key dierentiator between operators as the world moves from
a nice-to-be-connected model to one that is essential-to-be-connected.
There are technical innovations on the horizon to address these challenges:
Gradual improvements of existing technologies, e.g. better MIMO modes etc.
Active antennas, which may enable multi-element antennas
New deployment concepts like femto cells or MetroZone networks. They
require innovative backhaul solutions such as in-band and out-band backhaul
or mm-wave microwave, and self-organizing principles in order to be manageable
Miniaturized, exible, energy-ecient base stations
Base station cooperation concepts.

2.3

The Role of CoMP


Base station cooperation concepts (CoMP) are especially attractive since they
improve the cell-edge data rate and average data rate, and are suitable to increase
spectral eciency (and hence capacity) for much more dense network deploy-

10

An Operators Point of View

ments in urban areas and capacity hotspots. As we will see later, this increase
in access capacity with CoMP concepts comes at the cost of more backhaul
capacity, i.e. more communication bandwidth between base stations. However,
for HSPA+ and LTE, base station sites need high-capacity backhaul (bre or
microwave) anyway, and as the cost of backhaul increases less than linearly with
the backhaul capacity, this issue might not be as severe as often stated.
What are the alternatives to CoMP? Dierent frequency reuse, more spectrum, more sites, more antennas all are very expensive options for an operator.
Thus investing into more intelligent baseband (i.e. CoMP algorithms) and backhaul with higher data rate and lower latency requirements seems to be more and
more attractive when compared to the other options.
The complicated issue about CoMP concepts is that they are only partially
understood from the academic perspective today, and that implementation in a
standard at reasonable complexity is dicult. However, lets draw an analogy to
MIMO technologies. They are commercially used today in WiFi and cellular communications, but ten years ago there was only limited understanding of MIMO,
and the technology was seen by many as too complex to be commercialized.

2.4

The Role of Field Trials


Traditionally, academic innovation is evaluated using analytical models or doing
statistical simulations. However, concepts such as CoMP are so complex that it is
impossible to come up with models which capture all eects realistically, and to
have well-calibrated simulation scenarios. Therefore, it is essential to have eld
trials of new technologies in an early development stage, in order to
identify technical challenges early on
rene simulations and analytical models
be forced to have and end-to-end view and not pick interesting but nonrelevant topics
provide a proof of concept.
For CoMP technology development and evaluation, various authors of this
book have set up a cluster of research test beds in Dresden and Berlin, within the
EASY-C project led by Vodafone, Deutsche Telekom, Heinrich Hertz-Institute
Berlin and Technische Universitat Dresden. The signicance of these is that
enough sites are used to represent typical interference scenarios. More information on these test beds is stated in [I+ 09], and eld trial results will be presented
in detail in Chapter 13.

Information-Theoretic Basics
Patrick Marsch and David Tse

In this chapter, the reader is made familiar with a set of theoretical concepts to
analytically capture the variety of CoMP schemes considered in this book. The
reader will obtain a rst understanding of the general capacity gains expectable
from multi-cell joint signal processing, and the many degrees of freedom involved.
The chapter introduces notation that will be reused in most parts of the book.

3.1

Observed Cellular Scenarios


Throughout the book, we generally consider (subsets of) a large cellular system as depicted in Fig. 3.1. Here, a large number of mobile terminals, or user
equipments (UEs), is distributed over a set of cells, where we assume that each
cell is served by exactly one BS. As this is the case for most currently deployed
cellular systems, we further assume that multiple BSs are grouped into so-called
sites. Note that, diering from some other publications, we consider a sector
to be equivalent to a cell. The term cluster is used to indicate a set of cells
between which some form of CoMP may take place. Note that we assume that
each UE in the system aims at transmitting or receiving dedicated information,
i.e. multi-cast concepts are not covered in the book. As the number of UEs is
typically signicantly larger than the number of cells, UEs have to be scheduled to resources, i.e. to certain transmission windows. In this book, we assume
that orthogonal frequency division multiple access (OFDMA) is employed as a
media access technique, which allows each UE to be assigned to resources that
are (under certain ideal assumptions to be discussed later) orthogonal in time
and frequency. As this orthogonality allows us to simplify most of the analytical
models and derivations used throughout the book, the basics of OFDMA are
stated in the sequel.

3.2

Usage of OFDMA for Broadband Wireless Communications


Three fundamental challenges in mobile communications are the fact that transmission takes place over a) a shared medium, which is often subject to b) rich
scattering, and to which we desire c) simple and exible access of many commu-

12

Information-Theoretic Basics

cells = sectors

sites with
3 base stations
each

CoMP cluster

Figure 3.1 Cellular system and key terminology considered.

nicating entities. The rst aspect implies that any transmission must be bandlimited in order not to disturb other transmissions on adjacent bands, which
requires the design of particular transmit and receive lters. The second aspect
implies that any receiver may observe a superposition of multiple dierently
delayed and attenuated copies of originally transmitted signals, which in the
context of broadband transmission may lead to inter-symbol interference (ISI)
that has to be dealt with. The third aspect means that we need a low-cost and
ecient signal processing solution that can divide a mobile communications system into a large number of exible bit pipes according to many users or the
applications needs.
The mobile communications standard LTE Release 8 [McC07] from 3GPP uses
an OFDMA approach to address all aspects stated above, where the baseband
signal processing chain for a downlink example is depicted in Fig. 3.2. Here, the
key concept is that the symbols to be transmitted from one BS towards multiple
UEs 1..U are modulated in frequency domain, mapped to dierent sub-carriers,
and then an inverse discrete Fourier transform (IDFT) is used to generate a
time domain signal. A cyclic prex is inserted before each orthogonal frequency
division multiplex (OFDM) symbol (i.e. before each block of samples processed
in one IDFT), in order to assure that even a channel with a large delay spread
does not cause ISI, and that the transmission leads to a circularly symmetric
convolution of the transmitted samples with the channel. Each receiving UE can
then discard the cyclic prex, perform a discrete Fourier transform (DFT), and
obtain (scaled and noisy) transmitted symbols in frequency domain again. While

13

3.3 Multi-Point Frequency-Flat Baseband Model Considered

d1

Mod./Cod.

..
.

..
.

dK

Mod./Cod.

P/S
IFFT

+CP

D/A

A/D

Sync

-CP

FFT

Det./Dec.

d1

..
.

..
.

..
.

..
.

..
.

..
.

..
.

A/D

Sync

-CP

FFT

Det./Dec.

dK

Figure 3.2 OFDMA signal processing chain (downlink example).

being computationally inexpensive and sacricing only a reasonable extent of


capacity for the cyclic prex, OFDMA has the advantage that it a) can easily be
designed to fulll dierent spectral masks through an appropriate choice of guard
bands (i.e. leaving peripheral sub-carriers empty), b) avoids ISI and inter-carrier
interference (ICI) and hence enables simple OFDM symbol-wise and sub-carrierwise equalization at the receiver side, and c) is highly suitable for multiple access.
A detailed explanation of OFDMA can be found in [LP01].
A major disadvantage of OFDM is that performing modulation in frequency
domain and applying an IDFT leads to signals with a peak-to-average power
ratio (PAPR) increasing linearly in the DFT size [WG99]. Especially in the
uplink, this aspect is critical, as it implies that a larger power amplier (PA)
back-o is needed, leading to a faster depletion of handset battery. For this
reason, 3GPP has decided to employ single carrier frequency domain multiple
access (SC-FDMA) in the uplink, where modulation is performed in time domain,
after which a small DFT is applied (according to the number of sub-carriers to
be occupied by the UE) and the signals are mapped to the sub-carriers to be
used by the terminal before the actual large IDFT. The price for a reduced
PAPR is the need for ISI cancelation, which may lead to more complex signal
processing if used in conjunction with (possibly multi-cell) MIMO equalization.
In the remainder of this book, we will assume for simplicity that OFDM is
used in both uplink and downlink, knowing that the performance of OFDM and
SC-FDMA (with ISI cancelation) is fairly comparable.

3.3

Multi-Point Frequency-Flat Baseband Model Considered


As mentioned before, OFDMA also has the advantage that it enables a simple
mathematical notation and analysis, as it is often sucient to observe the baseband transmission on a single frequency-at sub-carrier, which can be seen as
a transmission over an additive white Gaussian noise (AWGN) channel. Most
chapters in this book will make use of this simplication. Only in cases where
the correlation of channel realizations in time and frequency is of importance, for
example for channel estimation and feedback schemes in Chapter 9, a wideband
model will be used. The assumption of an AWGN channel also implicitly requires

14

Information-Theoretic Basics

the OFDM systems of all communicating entities to be perfectly synchronized in


time and frequency - an assumption that will later be challenged in Chapter 8.
When observing the transmission on a single frequency-at OFDMA subcarrier, we typically consider a subset of a cellular system consisting of M BSs
and K UEs, assuming that a scheduling entity has assigned the UEs to the same
observed resource in time and frequency. We introduce sets M = {1..M } and
K = {1..K}. Throughout this book, we consider various antenna setups, where
Nbs and Nue denote the number of antennas per BS and per UE, respectively, and
where NBS = M Nbs and NUE = KNue denote the overall number of antennas at
BS and UE side, respectively. In the sequel, we will now go into details of the
uplink and downlink transmission models used throughout the book.

3.4

Uplink Transmission
In the uplink, the precoding, transmission and equalization of each symbol on
an OFDMA sub-carrier is illustrated in Fig. 3.3 and stated as

G
0

..
,
= WH y = WH (Hs + n) = WH
x
+
n
H
x

0
GK

(3.1)

G
[NUE 1]

where x =
C
are the symbols to be transmitted by the UEs,
which we generally assume uncorrelated with E{xxH } = I. These may then be
subject to linear UE-side precoding via matrices k : Gk C[Nue Nue ] , yielding
the nally transmitted signals s = [sT1 ..sTK ]T C[NUE1] . As we do not consider
UE cooperation, the overall transmit covariance E{ssH } = ss has a blockdiagonal structure, i.e. the signals originating from dierent UEs are uncorrelated. ss is usually subject to a per-antenna or per-UE power constraint to be
stated later. H C[NBS NUE ] is the instantaneous fast fading realization of the
channel on this sub-carrier. We also denote as Hm , Hk , Hm
k the parts of the
channel matrix H connected to BS m, UE k, or the link from BS m to UE k,
respectively, where we use a lower-case h if the expression becomes a vector
T T
] C[NBS 1] are the signals received by the BSs,
(e.g. for Nue = 1). y = [y1T ..yM
containing zero-mean Gaussian noise n C[NBS 1] with E{nnH } = 2 I, where
then equalization via a matrix W C[NBS NUE ] is performed to yield estimates
C[NUE 1] on the originally generated symbols x. The structure of W depends
x
on the particular CoMP strategy employed, as we will see later. We also write
Wm , Wk or Wkm for the part of W connected to a particular BS m, UE k, or
a specic link, respectively, and use a lower-case w if this yields a vector.
[xT1 ..xTK ]T

3.4 Uplink Transmission

15

Network
Backhaul infrastructure
Base station 1

Base station 2

y1

y2

n1

yM = [yM,1 ..yM,Nbs ]T

n2

Base station M

nM = [nM,1 ..nM,Nbs ]

Channel H = [h1 ..hK ]


s1
G1

s2
G2

sK = [sK,1 ..sK,Nue ]T

s3
G3

x1

x2

x3

UE 1

UE 2

UE 3

GK

xK = [xK,1 ..xK,Nue ]T

UE K

Figure 3.3 Uplink transmission setup.

3.4.1

Basic Uplink Capacity Bounds


For the derivation of information-theoretic bounds in Chapters 3 and 4, we typically use the assumption that the signals x are zero-mean Gaussian and belong
to long codewords where each symbol sees the same channel realization H. Note
that the Gaussianity of x is not necessarily optimal in terms of capacity [Med00],
but strongly simplies achievable rate derivation. If we now consider as the most
simple setup the case where only one UE with Nue = 1 antenna transmits at
unit power and is decoded by one BS, then the probability of decoding error
decreases exponentially in the codeword length if and only if the transmission
rate R in bits per channel use (bpcu) fullls [CT06]




1
H

(3.2)
R I (X; Y ) = log2 I + 2 hh  ,

where the notation I(X; Y ) denotes the mutual information between transmitter
and receiver side. The rate bound in (3.2) can be proven from both sides. On one
hand, one can construct an example coding technique (often based on the idea of
typical sequences) with a rate equivalent to the right-hand side (RHS) of (3.2),
such that any arbitrarily low probability of error can be achieved by simply
choosing a suciently long codeword. On the other hand, one can prove that
regardless of codeword length a non-zero probability of error remains if R exceeds
the RHS of (3.2) [CT06], typically making use of Fanos inequality. Hence, in
this point-to-point case with Gaussian x, the capacity of the transmission has
been precisely established. In the case of Nue > 1, i.e. multiple antennas per UE,

16

Information-Theoretic Basics

(3.2) changes to





1
H

R I (X; Y ) = max log2 I + 2 Hss H  ,
ss

(3.3)

where the max operation over the transmit covariance ss implies that the UE
choses the optimal precoding matrix G for the current channel realization H,
hence requiring transmitter-side channel knowledge. Given perfect such knowledge, and assuming all transmit antennas of the UE to be subject to a sum power
constraint Psum , a capacity-achieving UE strategy is to perform a singular value
decomposition (SVD) of the channel, yielding H = UVH , and choose as precoding matrix G the RHS eigenvectors V, where the columns are scaled in power
such that tr{VVH } = tr{ss } Psum . Assuming that W = U is used as BS side
receive lter, the transmission from (3.1) can be re-stated as a transmission over
min(Nbs , Nue ) independent single-input single-output (SISO) links, often referred
to as the eigenmodes of the channel. The capacity on each eigenmode can then
be proved as in the case of Nue = 1 before. Finding the power scaling for V that
maximizes the sum capacity over all eigenmodes is a convex optimization problem [BV04] that can be solved easily via a water-lling algorithm [CT06], but not
in closed form. Note that the gap between capacity and rates achievable without
channel knowledge at the UE-side (e.g. without precoding and power control,
for example V = I) may be marginal, but then signicantly more complex signal
processing is required at the receiver side [HTB03].
If now multiple UEs 1..K are decoded by the same single BS, the setup resembles a multiple access channel (MAC) [Ahl71], where it becomes interesting to
observe the capacity region, hence all tuples of rates R1 ..RK at which the UEs can
transmit, such that all can be decoded at a probability of error decreasing exponentially in the codeword length. The capacity region of the MAC [Ahl71, Lia72]
is simply based on the fact that the sum-rate of any subset of UEs is bounded
by the joint mutual information between these UEs and the BS, given that all
other UEs are turned o. Formally, we can state this as





1
H

(3.4)
S {1..K} :
Rk max log2 I + 2 Hss (S) H  ,
ss

kS

where ss (S) is the transmit covariance connected to the subset of UEs in set
S. Note that ss is now block-diagonal, as it is connected to multiple UEs. For
K = 2 UEs, this leads to the well-known pentagon-shaped capacity region illustrated in Fig. 3.4 [TV05]. Interpreting (3.4) from a more practical perspective,
it becomes clear that any point on the capacity region can be achieved by applying successive interference cancelation (SIC), hence by successively decoding the
transmissions of certain UEs, subtracting the corresponding receive signals, and
then decoding other UEs. Each of those cornerpoints of the capacity region
where all UEs rates are non-zero corresponds to one particular SIC order. The
optimal ss can be determined via a UE-wise successive SVD and water-lling
algorithm [YRBC01].

3.4 Uplink Transmission

17

In the remainder of this chapter, we shift the focus to scenarios with multiple
BSs serving adjacent cells, and observe potential capacity gains through multicell joint signal processing.

3.4.2

Full Cooperation in the Uplink


Full cooperation, i.e. joint processing of all received signals by all BSs, can under
certain idealistic assumptions be modeled as one virtual BS with NBS antennas.
The capacity region is also given by (3.4), but now based on a channel H of
higher dimensionality. As opposed to the single-BS case, we have now obtained
array gain of at most 10log10 (M ) dB, as all BS antennas receive correlated signals
which can be coherently overlapped, while noise terms are uncorrelated, and spatial multiplexing gain [ZDZ04], as the larger channel dimensionality improves the
eigenvalue distribution and hence orthogonality between UEs. Considering multiple fading realizations, cooperation may also yield spatial diversity gain [TZ03].

3.4.3

No Cooperation in the Uplink


If no cooperation is possible between BSs, the scenario is similar to
an interference channel (IC) [Ahl74], which is dened as two or more
non-cooperative transmitter-receiver pairs that communicate on the same
time/frequency resource, subject to mutual interference. Unfortunately, the
capacity region of the IC is only known in certain cases, for example for Gaussian signaling, single-antenna transceivers and very strong interference, where
the interference links are signicantly stronger than the transmitter-receiver
links, and the capacity region is the same as if there were no interference at
all [Car75]. Capacity has also been established for very weak interference, where
it is optimal if all receivers treat interference as noise [SKC09, MK09]. For all
regimes in between, the tightest known inner capacity bound is based on the
Han-Kobayashi (HK) transmission scheme, where the transmitters use superimposed transmissions, and the receivers decode both a subset of the interfering
transmissions and their desired transmission [HK81], achieving an optimal level
of interference cancelation. In all interference cases, capacity is known within
one bit [ETW08]. Recent work has been on the IC with multiple antennas at
the receiver or transmitter side, introducing the concept of interference alignment [WT08, MAMK08, CJ08]. Here, desired signal and interference as seen by
each receiver fall into orthogonal signal dimensions, such that each transmitterreceiver pair can use half the system resources for interference-free communication. For the observation of asymptotically large signal-to-noise ratios (SNRs),
one often uses the concept of generalized degrees of freedom [ETW08], which will
later be discussed in Section 12.1.
Note that the communication scenarios taking place on certain resources
within a cellular system deviate from the classical IC in the way that it does not

Information-Theoretic Basics

H = [1, 0.25; 0.5i, 1]


unit pwr. per UE, 2 = 0.1

4
bc

rs

bc

bc

bc

bc

bc
bc

bc

rs

ut

ut

rs

bc

ut
rs

ut
rs

bc

ut
rs

ut
rs

bc

no coop.

bc

rs

bc

rs

ut

no coop. (HK)
bc
full coop.

ut

rs

ut

R2 [bit/channel use]

18

bc

ut

rs

ut

FDM

rs

ut

0
0

2
3
R1 [bit/channel use]

bc

Figure 3.4 (Inner bounds on) capacity regions for no or full cooperation in the uplink.

matter where a UE is decoded, hence transmitter-receiver pairs may be chosen


such that interference is minimized. Simply assigning each UE to its best serving
BS clearly renders scenarios of strong or very strong interference [ETW08], or
those where HK schemes would be highly benecial, rather unlikely. Further,
the usage of such schemes becomes questionable in the context of practical coding schemes with a gap to capacity [FU98] and under imperfect channel state
information (CSI), as we will see in Chapter 4. Instead, the following options
appear much more powerful to increase capacity in the non-cooperative case, in
particular under imperfect CSI [Mar10]:
a exible assignment of BSs to UEs, possibly on a short-term basis, or
the option of non-cooperative multi-UE detection by one BS.
If we neglect HK schemes (i.e. superimposed transmission and partial decoding of interference), an inner bound on the capacity region for a given BS-UE
assignment can be calculated for each BS and set of assigned UEs separately,
seeing the remaining UE transmissions not decoded by the BS as spatially colored noise. More precisely, if we denote as S[m] the set of UEs decoded by BS
m, the capacity region of these UEs is inner-bounded as S S[m] :






1 m


Rk max log2 I + Hm ss K \ S[m] HH
H ss (S) (Hm )H  . (3.5)
m
kS

ss

Note that this inner bound implies that each BS m also has perfect channel
knowledge towards interfering UEs, and takes this into account when calculating
receive lter Wm , referred to as interference rejection combining (IRC).

3.5 Downlink Transmission

3.4.4

19

Numerical Example
Fig. 3.4 shows inner bounds on capacity regions for no BS cooperation, and the
capacity region forfull BS
cooperation, for an example with M = K = 2, Nbs =
Nue = 1, H = [1, 0.25, 0.5i, 1], 2 = 0.1 and unit transmit power limit per
UE. In the non-cooperative case, the bound is based on all possible assignments
of UEs to BSs, including the option of one BS decoding both UEs. This bound is
only marginally extended through HK schemes. For this channel, non-cooperative
performance can be improved through frequency division multiplex (FDM), as
also shown in the gure, where both UEs are placed on orthogonal resources and
hence mutual interference is avoided. Each UE then invests its transmit power
into a smaller portion of bandwidth, yielding an improved SNR. As FDM is
of little value in connection with BS cooperation [Mar10], however, we will not
further observe it in this work.

3.5

Downlink Transmission
In the downlink, the precoding, transmission and equalization of each OFDM
symbol on a single sub-carrier can be stated as
H

G1
0


 H
..
= GH y =
(3.6)
x
H Wd (x) + n
.
0

GH
K

where x C[NUE 1] are the symbols to be transmitted to the UEs, and d() can
be any arbitrary manipulation of these symbols performed by the BSs. We will
see later that a non-linear operation d() is in fact required to achieve capacity in
the case of multiple UEs. W C[NBS NUE ] is a precoding matrix applied at the
BS side. The transmit covariance is now given as ss = E{Wd(x)(d(x))H WH },
which is typically subject to either a sum, per-BS or per-antenna power constraint. The latter is often motivated through the fact that each BS transmit
antenna has a separate power amplier with a limited linear range. In an OFDM
context, however, applying a per-antenna power constraint individually on each
sub-carrier is rather questionable, as the time-domain signal and its PAPR
appear more important. H C[NBSNUE ] is the channel matrix, as dened for the
uplink. G C[NUE NUE ] is a matrix containing the UE-side receive lters, which
is block-diagonal, as we again assume that no cooperation takes place between
UEs. n C[NUE1] is the thermal noise and background interference present at
the receive antennas of the UEs, which we assume zero-mean Gaussian with
C[NUE 1] of
covariance E{nnH } = 2 I. Each UE nally obtains estimates x
the originally transmitted symbols x. The same variable names have been used
in both (3.6) and for the uplink in (3.1) to emphasize duality: The receive lter
W in the uplink plays a dual role to the transmit lter in the downlink, and the
uplink transmit lters G a dual role to the downlink receive lters.

20

Information-Theoretic Basics

3.5.1

Basic Downlink Capacity Bounds


A point-to-point link in the downlink is analog to the uplink, hence (3.3) can be
used to state capacity (if the uplink channel H in (3.3) is replaced by the downlink
channel HH ). As stated before, point-to-point capacity under perfect channel
knowledge at transmitter and receiver can be achieved with linear precoding
and equalization, hence the non-linear operation d() is not relevant in this case.
In the case of one BS and multiple UEs, the downlink setup resembles a
broadcast channel (BC). Note that one must here distinguish between common
information (i.e. broadcast information to be decoded by all receivers) and individual information to be decoded only by single UEs. In this book, we only
consider the latter type of information, as it plays the dominant role in cellular
systems. As opposed to that of the MAC, the capacity region of the BC is only
known for special cases. One of these is the so-called degraded BC, where all UEs
receive signals can be stated as another UEs signals, but subject to additional
noise, and where a superposition coding strategy is capacity achieving [Ber74].
For the general, non-degraded BC, which is interesting for us, capacity is only
known for the case of Gaussian noise [WSS06]. A major transmission concept in
this context is dirty paper coding (DPC) [Cos83], which allows a UE to receive
signals free of interference, if this interference is known non-causally to the transmitter. In an example with K = 2 UEs, this means that the BSs can transmit
to one UE, while simultaneously transmitting to the other UE, but encoded in
such a way that the latter transmission is not interfered by the former. This can
be applied to an arbitrary number of transmitted streams, and one can see a
duality to SIC in the uplink: If the SIC decoding order is 1..K in the uplink,
UE k only sees interference from UEs k + 1..K, as the others have already been
decoded and their signals removed. Equivalently, if the downlink DPC encoding
order is 1..K, UE k only sees interference from UEs k + 1..K, as the previously
encoded streams can be considered as known interference and be pre-cancelled
at the transmitter side. Dierent from the uplink, however, where SIC can be
practically implemented at reasonable eort and requires only receiver-side channel knowledge, DPC is mainly a theoretical construct. Sub-optimal, practical
schemes of limited complexity for the downlink are Tomlinson-Harashima precoding (THP) [Tom71, HM72] and sphere-precoding [HRF06], but these generally require highly precise channel knowledge at the transmitter-side. In Chapters 3 and 4, we consider both BC capacity (i.e. DPC performance) and the rates
achievable with linear precoding, where residual interference between streams is
simply accepted as noise, knowing that a practical transmission scheme may then
perform somewhere in between.
The capacity region of the BC can principally be stated as

RBC = conv
R1 (W, ) , .., RK (W, ) ,

W,

(3.7)

3.5 Downlink Transmission

21


where
states the union of multiple rate regions, and conv() is a convex hull
operation [BV04], in this case over all choices of precoding matrix W and encoding order , where for each xed parameter choice the UE rates are bounded as



1





2
H
H
H
H


Hk W j W j Hk
Hk Wk Wk Hk  . (3.8)
Rk (W, ) log2 I + I +


(j)>(k)
Unfortunately, nding the optimal W for a certain point on the capacity
region (or, equivalently, the precoder maximizing a particularly weighted sum of
UE rates), is not trivial, as any sum of UE rates as given in (3.8) is typically nonconvex in W [BS02]. It has been observed in [JVG04], however, that an interesting duality can be exploited between uplink and downlink that we want to briey

H
illustrate in the sequel. Let us state Ak = 2 I + (j)>(k) HH
k Wj Wj Hk and

2
H H
Bk = I + (j)<(k) Hj Gj Gj Hj as the interference terms in downlink and
uplink, respectively. We then re-state the rate bound in the BC from (3.8) as


H
H

(3.9)
Rk (W, ) log2 I + A1
k Hk W k W k Hk


1
1
1
1
1
1

2
2 

H 2 2
2
= log2 I + Ak 2 HH
(3.10)
k Bk Bk Wk Wk Bk Bk Hk Ak 






1
1
1
1
12
12



H
H
= log2 I + Bk Hk Ak Bk2 Wk Wk Bk2 Ak 2 Hk Bk 2  (3.11)







=Gk


1
H
H
= log2 I + Bk Hk Gk Gk Hk  ,
(3.12)
which is equivalent to the uplink rate bound for a MAC, given xed transmit l , as this can be derived from (3.4).
ters Gk and an opposite decoding order
The equality in (3.10) is based on the fact that |I + AB| = |I + BA|, and
that in (3.11) based on the idea of channel ipping [JVG04]. The authors
in [JVG04] have furthermore shown that the above equalities hold for all UEs

if and only if the sum power is the same in both cases, i.e. if tr{ k Gk GH
k } =

H
tr{ k Wk Wk }. Hence, we can conclude that the capacity region of the MIMO
BC under a sum power constraint is equivalent to that of the MIMO MAC
(obtained through the reciprocal channel H) under the same sum power constraint. As the standard uplink is typically subject to a per-UE power constraint,
we can obtain the BC capacity region by taking the convex hull around many
MAC regions with dierent per-UE powers summing up to the same overall
power. This is illustrated in Fig. 3.5 for the same example channel as before. It
was shown in [WSS06] that the obtained BC rate region corresponds to the Sato
upper bound [Sat78], proving that there can indeed be no scheme that performs
better. Hence, capacity has been established for the BC case of Gaussian noise.
Equations (3.9)-(3.12) also suggest that we can calculate the optimal precoding matrix W if the dual uplink transmit lters G are known. This is possible by calculating k Bk directly from G1 ..GK , and then determining Ak and

22

Information-Theoretic Basics

5
uplink capacity regions
downlink capacity region
R2 [bit/channel use]

H = [1, 0.25; 0.5i, 1]


2
tr{ss } 2, = 0.1

0
0

2
3
R1 [bit/channel use]

Figure 3.5 Illustration of uplink/downlink duality.


1/2

1/2

Wk = Bk Ak Gk iteratively, starting with UE K [JVG04]. As the dual uplink


is subject to a sum power constraint, however, the calculation of Gk requires not
only the optimization of each UEs individual transmit covariance as in Section 3.4.1, but also the power distribution among UEs. Under the assumption of
non-linear precoding (DPC), this is a convex optimization problem and can be
solved via a gradient-based algorithm as stated in, e.g., [VVH03].
Uplink/downlink duality can also be used to calculate capacity regions and
the precoding matrix W for a BC under per-antenna-(group) power constraints [YL07]. Under these constraints, the dual uplink is subject to a least
favorable noise covariance constrained within a polyhedron. Capacity region calculation then becomes more complex, as the update of dual uplink transmit
covariances Gk and uplink noise covariance have to be performed iteratively,
and convergence may become an issue.

3.5.2

Full Cooperation in the Downlink


As in the uplink, full cooperation in the downlink means that the rate expression
from (3.8) can be used, but is now applied to a larger compound channel H. As
in the uplink, this yields spatial multiplexing, array and diversity gain.

3.5.3

No Cooperation in the Downlink


Without BS cooperation in the downlink, we are again facing an IC, where the
best known transmission scheme is based on superposition coding and partial
decoding of interference (HK). Analog to the uplink, however, the usage of HK

3.5 Downlink Transmission

H = [1, 0.25; 0.5i, 1]


tr{ss } 2, 2 = 0.1

5
b

bc
bc b

R2 [bit/channel use]

bc

rs

bc b

b
b

b
b

bc bc

rs

rs

bb
bc bc

rs

bc bc

rs

rs

no coop.

bc bc

rs

rs

bc

b
bc

rs

no coop. (HK) rs
bc
full coop. (MMSE)
b
full coop. (DPC)
sum pwr.
p.a. pwr.

bb

bc bc

rs

b b

bc bc

rs

bc

rs

rs

rs

bb
bc
bc

rs
rs

rs
bc

bb
bc

rs

bc

rs

2
3
R1 [bit/channel use]

rs

bc

bb
bc bc b b

rs

0
1

bb
bc

rs

rs

23

bc bc

Figure 3.6 (Inner bounds on) capacity regions for no or full cooperation in the

downlink.

techniques is of little relevance if we allow for a fast-fading dependent assignment


of UEs to BSs, and consider the option of local, non-cooperative transmission
from one BS to multiple UEs. In general, the non-cooperative case implies that
each UE k receives desired signals from only one BS, which means that its associated precoding vector wk may have non-zero entries only on the elements
connected to the antenna of this BS. It has been shown in [MF09a, Mar10]
that duality can now still be applied, and inner bounds on capacity regions can
be determined by observing a dual uplink where for each UE only the receive
antennas connected to one BS may be used for detection.

3.5.4

Numerical Example
Fig. 3.6 shows inner bounds on capacity regions for no BS cooperation and
capacity regions for full BS cooperation, for
channel as before,
the same
example
2
i.e. M = K = 2, Nbs = Nue = 1, H = [1, 0.25, 0.5i, 1], = 0.1 and either a
unit per-antenna power constraint, or a sum power contraint of 2 (i.e. in both
cases tr{ss } 2). In the non-cooperative case, an inner bound on the capacity
region is in principal based on all possible assignments of UEs to BSs, including
the option of one BS transmitting to both UEs, though this is not benecial
for this particular channel. We again also observe HK schemes, but can see
that these are only interesting under a per-antenna power constraint. In general,
the dierence between sum and per-antenna power constraint is only visible at
the sides of the capacity regions, while the sum-rate remains largely unaected,
especially under non-linear precoding (DPC).

24

Information-Theoretic Basics

3.6

Summary
In this chapter, we have formalized the uplink and downlink transmissions
considered throughout the remainder of the book, and introduced the basic
information-theoretic concepts inherent in the many degrees of freedom of CoMP.
We have seen how capacity regions can be computed for uplink and downlink
under full base station cooperation, and inner bounds on these (or achievable rate
regions) can be computed for cases of no BS cooperation, as capacity remains
unknown here. While all computations are rather straight-forward for the uplink,
we have seen that uplink/downlink duality can be used to also make the downlink more mathematically amenable. The results in this chapter already suggest
substantial rate gains through multi-cell joint signal processing, but this will be
analyzed in more detail in the next chapter.

Gains and Trade-Os of Multi-Cell


Joint Signal Processing
Patrick Marsch

In this chapter, we focus on CoMP schemes based on multi-cell joint signal


processing. We extend the transmission models from Chapter 3 to incorporate
imperfect channel knowledge at the transmitter and receiver side, which strongly
limits the interference regimes in which joint signal processing is reasonably
benecial. Regarding the uplink, we then explore dierent types of information
exchange between cooperating base stations, revealing general trade-os that
have to be considered here if backhaul is a limited resource. For the downlink,
dierent principle joint transmission concepts are introduced.

4.1

Modeling Imperfect Channel State Information (CSI)


In this chapter, we observe an exemplary frequency-at OFDM sub-carrier,
assuming that scheduling of users to resources has already taken place.

4.1.1

Imperfect CSI in the Uplink


Let us rst observe the uplink, where the main issue is usually the channel
state information at the receiver (CSIR), in particular if only Nue = 1 transmit
antenna per user equipment (UE) is assumed, as in LTE Release 8 [McC07]. Let
us assume that the channel introduced in (3.1) is re-written as
+ E,
H=H

(4.1)

is an un-biased minimum mean square error (MMSE) estimate of the


where H
channel, and E is the channel estimation error. We can then also re-write the
complete transmission from (3.1) (without the receive lter W) to


+ E s + n = Hs
+
y= H
Es
+n,
(4.2)


channel estimation related noise term

where we can see that the channel estimation error leads to an additional noise
term Es. Equation (4.1) implies that if the channel and its estimate are assumed
block-static (as dened for the former in Chapter 3), then the estimation error
is also block-static. If we now observe the average capacity of the transmission
over many transmission blocks, the impact of channel estimation noise can be

26

Gains and Trade-Os of Multi-Cell Joint Signal Processing

overestimated due to Jensens inequality, if the term Es is treated as a random


variable with a dierent realization in each channel access. We further overestimate its impact by modeling it as a spatially white Gaussian random variable,
as this has the largest entropy for a given variance [Med00], i.e. by observing
+ v + n with v NC (0, hh )
y = Hs

(4.3)

with hh = E{Ess EH }, which is diagonal under the realistic assumption that


the estimation errors of each channel coecient are uncorrelated. It is shown
in [YG06a] that the assumption of Gaussianity has little impact on the mutual
information unless NBS >> NUE . In other words, in the CoMP scenarios considered in this book, there is not much rate improvement possible if receiver-side
algorithms treat term Es as anything else than just spatially white Gaussian
noise. With (4.3), we can now formulate a fairly tight inner bound on the multiple
access channel (MAC) capacity region by changing (3.4) to



1


ss (S) H
H  . (4.4)
H
Rk max log2 I + 2 I + hh
S {1..K} :
kS

ss

Note that, dierent from the MAC capacity region under imperfect channel
state information (CSI), it is now not optimal anymore to let all UEs transmit at maximum power. This is because hh itself is a function of the transmit
covariance ss , hence increasing one UEs power will lead to the fact that the
residual channel estimation related noise impairing successive interference cancelation (SIC) performance is also increased. This is the reason why the MAC
capacity region under imperfect CSI is not a pentagon anymore, see [MF09b].
can be modeled for a channel realization
The question is now how hh and H
H and a particular channel estimation scheme. One option is to observe the
variance of the absolute (i.e. link-independent) channel estimation error variance,
which can be obtained via the Cramer-Rao lower bound [Kay93] as
2
=
E

p2
.
Np ppilots

(4.5)

Here, p2 denotes the variance of the noise the channel estimation is subject to,
Np is the number of pilots used to obtain CSI, and ppilots is the pilot power. Note
that p2 may deviate from 2 if, e.g., pilot sequences of multiple cells are designed
to be orthogonal, while data transmission in these cells is subject to mutual
interference not addressed by CoMP (hence leading to an increased background
2
noise 2 > pilots
). With the denition of E in (4.1), we can now state
"
# 2
!
E |hi,j |2 E
H
(4.6)
=
E ei,j (ei,j )
2,
E {|hi,j |2 } + E
from which the calculation of hh is straightforward. Note that some authors
derive the impact of channel estimation noise using a dierent model where
= H + E [PSS04, MF09b], i.e. where the estimated channel and estimation
H
error are assumed correlated, but obtain the same nal result as in (4.4). Now we

4.1 Modeling Imperfect Channel State Information (CSI)

27

still have the problem that (4.4) states an inner bound on the capacity region for
assuming that the actual channel H is uctuating
a given channel estimate H,
around this. In most cases, however, we want to observe the opposite case, i.e.
the capacity of the transmission over an actual channel H under imperfect CSI.
in (4.4)
It is discussed in [Mar10] that this can be approximated by replacing H
by the expectation value of the channel estimate, which under the assumption of
of the
an unbiased MMSE detector is simply an element-wise scaled version H
actual channel H with [MF09b]
hi,j
i,j = $
.
i, j : h
2
1 + E
/E {|hi,j |2 }

(4.7)

The approximation is very accurate as long as the average channel gain


E{|hi,j |2 } is at least a few dB larger than the absolute channel estimation error
2
, below which channel estimation is questionable, anyway.
variance E
While the Cramer-Rao lower bound from (4.5) refers to channel estimation
of a frequency-at block-static channel, the transmission of each symbol in an
orthogonal frequency division multiple access (OFDMA) system is of course subject to a slighty dierent channel, as this varies over time and frequency. Here,
pilots are typically scattered across these two dimensions, and channel estimation
(including inter- and extrapolation of the channel for all data symbols) is ideally
performed such that it takes the correlation of the channel realization in time
and frequency into account, e.g. via 2D MMSE channel estimation [HKR97a].
Hence, each detected symbol is subject to a dierent accuracy of CSI, depending
on its location w.r.t. the pilot positions and the concrete interpolation scheme
applied. One way to keep an analytical study simple is to calculate capacity
bounds based on a frequency-at block-static channel as before, but use a value
of Np representative for the average performance for a concrete pilot and channel
estimation scheme, coherence bandwidth and time (i.e. a particular delay spread
and UE speed) [MRF10, Mar10].

4.1.2

Imperfect CSI in the Downlink


In the downlink, especially under base station (BS) cooperation, also channel
state information at the transmitter (CSIT) becomes a crucial issue [CS03]. As
in frequency division duplex (FDD) systems CSIT is typically realized via CSI
feedback, which requires quantization and is subject to feedback delay, CSIT is
usually (signicantly) less accurate than CSIR. It is here possible to re-write (4.1)
as
BS + EBS +E,
H =
H

(4.8)

=H

at the receiver side (in the downlink hence the


where the channel knowledge H
BS at the transmitter side plus
UE side) is expressed as the channel knowledge H
an uncorrelated noise component EBS . The latter hence denotes the additional

28

Gains and Trade-Os of Multi-Cell Joint Signal Processing

uncertainty the transmitter side has on the channel knowledge in comparison to


the receiver side. A simple way to model this is to assume that each channel
coecient is quantized with a certain number of quantization bits Nb and then
fed back to the transmitter side, so that rate-distortion theory yields [CT06]
%
 &
$
 BS 2 !
 2
BS
Nb
N



b
hi,j and E ei,j
E hi,j  . (4.9)
i, j : hi,j = 1 2
=2
The downlink transmission equation from (3.6) can then be modied to
BS s + v + u + n,
y=H

(4.10)

where v NC (0, hh ) is a noise term connected to imperfect CSI at both UE and


BS
BS
BS H
BS side, while u NC (0, BS
hh ) with hh = E{E ss (E ) } is a noise term
connected to the additional CSI uncertainty at the transmitter side. Assuming
both terms to be spatially white Gaussian again yields an overestimation of their
impact. While the impact of v on capacity is straightforward (as explained for
the uplink), this is not the case for u. Intuitively, one would expect that for a
given CSIT, there should be a benet of an improved CSIR. And indeed, the case
of linear precoding can be modeled such that each UE k is not negatively aected
by the noise term E{EBS ss ({k})(EBS )H } connected to inaccurate CSIT and
the desired signal covariance ss ({k}) of this UE [Mar10]. This is dierent if
dirty paper coding (DPC) is employed, which requires the transmitter side to
know the exact overlap of desired signal and interference to be canceled at the
BS ,
UE side. Under imperfect CSIT, DPC can hence only be performed w.r.t. H
and there is no benet at all if the receiver side has better CSI. This leads to the
fact that beyond some point of decreasing CSIT, DPC is not superior to linear
precoding any more, as we will see later.
Note that (4.10) becomes highly inaccurate under strongly degraded or no
CSIT at all. While capacity tends to zero in our model, it is known that even
without CSIT non-zero rates can be achieved in a broadcast channel (BC) if the
BSs transmit sequentially to single UEs from one antenna (under a sum power
constraint) or using unit precoders (for per-antenna power constraints). This,
however, is a regime of operation which is surely not relevant for CoMP.
With (4.10) and the discussion before, an inner bound for the BC capacity
region under imperfect CSIT and CSIR can now be computed by considering the
convex hull over all precoding matrices W and encoding orders , where [MF09a]


1 
H



BS,k
2
k
k
BS
H BS 


Hk
W k Wk Hk 
Rk (W, ) log2 I + I + ii + hh + hh
with kii =

BS
H
k

H

BS .
Wj WjH H
k

(4.11)

(j)>(k)

Here, kii is the residual inter-user interference, khh is the part of matrix hh
that is connected to UE k, denoting noise due to imperfect CSI at transmitter and receiver side, and BS,k
is noise due to additional CSI imperfectness
hh

29

4.2 Gain of Joint Signal Processing under Imperfect CSI

cell edge

BS1

BS2
d1

0.6

d2
UE1

d1

UE2

km

0.2

0.5

0.5
SD

0.2
d2

dI

0.6

dis
t.

UE3

UE2

BS2
BS3

int
ers

UE1

ite

d3
BS1

inter-site distance dISD = 0.5 km


(a) Setup for M = K = 2 BSs and UEs.

(b) Setup for M = K = 3 BSs and UEs.

Figure 4.1 Interference scenarios considered in this chapter.

at the transmitter side. As in the uplink, the capacity region can again be
BS in (4.11) by the average BS-side channel estiapproximated by replacing H
$

BS
i,j = hi,j 1 2Nb / 1 + 2 /E{|hi,j |2 }. It has
= E{H
} with h
mate H
E
been shown in [MF09a, Mar10] that uplink/downlink duality is still applicable
to the capacity region in (4.11), where the dual uplink is then subject to a particular extent of CSIR. Unfortunately, calculation of dual uplink precoders Gk
and the power distribution among UEs is now not a convex problem any more,
but still numerically more tractable (e.g. through a brute-force search) than trying to solve (4.11) directly. Duality can also be used to observe non-cooperative
performance under imperfect CSI and various power constraints [Mar10].
In the rest of this chapter, we use values of Np = 2 and Nb = 6, which
have shown in [MRF10, Mar10] to be representative for the performance in an
OFDMA system with the pilot structure of LTE Release 8, a UE speed of 3 km/h,
a maximum delay spread of 1 s, and a CSI feedback delay of 3 ms.

4.2

Gain of Joint Signal Processing under Imperfect CSI


Based on the modied transmission equations from the last section, we now
observe the gains to be expected from joint signal processing in uplink and
downlink in dierent interference scenarios, and under perfect and imperfect
CSI. We consider setups with M = K = 3, Nue = 1 antenna per UE, and Nbs = 2
antennas per BS. All UEs are moved simultaneously from their cell-center to the
common cell-edge, as depicted in Fig. 4.1(b), with an inter-site distance (ISD) of
500 m. Here, the values di state the normalized distance of the terminals to their
assigned BS, where a value of 0.5 reects the cell-edge case. Exemplary channel
matrices H are generated according to a log-linear pathloss model with pathloss
coecient of 3.5, and are constructed to be of average orthogonality.

Gains and Trade-Os of Multi-Cell Joint Signal Processing

16

Multi-cell pwr. ctrl., SISO SNR 10 dB


bc

bc

14

bc
bc

bc
bc

bc

bc

bc
bc

bc

bc

bc

utrs

rs

rs

+90% rs
rs

utrs

rs

ut
utrs

utrsbc

ut

utrs
ut

ut
utrs
rs
ut

+20%
+28% ut
ut

ut

ut

ut

ut

ut

ut

utrs
full coop.
no coop. (IRC+SIC)
no coop. (IRC)
no coop. (MRC)
perf. CSI (Np = )
imp. CSI (Np = 2)
ut

rs

ut

ut

ut

bc

utrs

ut

ut

bc
utrs

10

ut

utrs

ut

bc

ut

ut

12

bcutrs

ut

sum-rate [bit/channel use]

bc

utrs
ut

ut

30

0.2

0.3

0.4
d1 = d2 = d3

0.5

0.6

Figure 4.2 Uplink joint signal processing gains for scenarios with M = K = 3.

In the uplink, shown in Fig. 4.2, we assume multi-cell power control, where
the transmit power of each UE is adjusted w.r.t. a certain target average receive
power at all BSs. Target receive power and noise variance are chosen such that a
single-input single-output (SISO) signal-to-noise ratio (SNR) of 10 dB is obtained
at the cell-edge. We compare the following schemes:
non-cooperative detection, based on maximum ratio combining (MRC) (i.e.
considering interference as spatially white noise),
non-cooperative detection, based on interference rejection combining (IRC)
(i.e. taking the spatial properties of interference into account),
non-cooperative detection, allowing a exible assignment of UEs to BSs and
the joint detection of multiple UEs at the same BS, and
fully-cooperative joint detection of all UEs by all BSs.
The strongest rate gains can be obtained at the cell-edge [KRF07]. Here, using
IRC to exploit the spatial color of interference (but without BS cooperation)
already yields 28% rate increase. Further 20% are possible if local SIC is used,
i.e. interference subtraction also not requiring BS cooperation. The strongest
gains, however, with an additional 90%, are visible if the BSs jointly process
all UEs, proting from array and spatial multiplexing gain and yielding MAC
performance. The gain of local, non-cooperative interference subtraction disappears quickly as we move away from the cell-edge, as enabling the decoding and
subtraction of interference poses constraints on the rates of interferers [HK81].
Han-Kobayashi (HK) techniques (superposition coding and partial interference
decoding) would yield marginal rate improvements here, and are strongly sensitive to imperfect CSI. Towards the cell-centers, all stated gains strongly diminish,
especially under imperfect CSI, as then the interference links cannot be estimated
well enough to be exploited. At the cell-edge, however, the relative rate improve-

4.2 Gain of Joint Signal Processing under Imperfect CSI

20
utrs

bc
ut
r

p.a. pwr.
bc
b
bc

utrs
utrs

16

bc
bc

bc

bbc
utrs

bbc

ut
utrs

b
b

b
bc

14

utrs

r
ut

12

bc
b

ut

bbc

b
bc

bc

bc

utrs

+11%
ut
+55%
ut
ut

ut

utrs

ut

ut

ut

rsut
rs

ut

rs
ut

utrrs

ut

utrrs

ut

ut

full coop. (lin. or DPC)


no coop. (IAP+DPC)
no coop. (IAP)
no coop. (MRT)
perf. CSI (Np=Nb=)
imp. CSI (Np=2, Nb=6)

+86% rrs
r
utrs

ut

utrs

ut

ut

bc

ut

utrsr

10

ut

sum-rate [bit/channel use]

b
ut

18

31

0
0.2

0.3

0.4
d1 = d2 = d3

0.5

0.6

Figure 4.3 Downlink joint signal processing gains for scenarios with M = K = 3.

ments due to full cooperation increase for decreasing CSI, as additional diversity
alleviates the impact of channel estimation errors [Mar10].
In the downlink, shown in Fig. 4.3, we assume that the transmit power is
xed, such that we obtain a SISO SNR at the cell-edge of 10 dB. The compared
schemes are analog to those in the uplink, namely
non-cooperative transmission, based on maximum ratio transmission (MRT),
non-cooperative transmission, based on interference-aware precoding (IAP),
non-cooperative transmission, allowing a exible assignment of UEs to BSs
and joint transmission from one BS to multiple UEs, possibly using local
DPC,
fully-cooperative joint transm. from all BSs to all UEs, possibly with DPC.
The dierence between linear precoding and DPC is shown through empty
and lled markers, respectively. We can see similar eects as in the uplink. The
cooperation gain is again largest at the cell-edge, with 55% improvement due
to interference-aware precoding, 11% additionally due to the option of local,
non-cooperative multi-UE transmission with DPC, and another 86% if full joint
transmission is employed. Under imperfect CSI, DPC is only (marginally) superior to linear precoding at the cell-edge, as interference links can otherwise not
be estimated accurately enough. This is also the reason why local multi-UE
transmission with DPC is less benecial than its counterpart (local SIC) in the
uplink. The small gap between linear and non-linear techniques under full cooperation, also under perfect CSI, is due to the fact that the compound channel
already enables a fairly good spatial separation of the UEs without DPC. In
general, the relative gain of cooperation remains more or less the same, regardless of the extent of CSIT [Mar10]. The reason is that both cooperative and also
non-cooperative transmission degrades equally for diminishing CSIT.

32

Gains and Trade-Os of Multi-Cell Joint Signal Processing

4.3

Trade-Os in Uplink Multi-Cell Joint Signal Processing

4.3.1

Dierent Information Exchange and Cooperation Schemes


We now explore the gray area in Fig. 4.2, i.e. the regime between no and full
BS cooperation in the uplink, by introducing dierent kinds of BS information exchange and cooperation for a small setup with M = K = 2. While these
are information-theoretic concepts, we will later see many parallels to practical
CoMP algorithms. We initially focus on schemes with one phase of information
exchange between BSs, and briey address iterative cooperation in Section 4.3.3.
Distributed Interference Subtraction (DIS)
Here, one BS decodes one UEs transmission and forwards the decoded data to
the other BS for interference cancelation [MF08d], as shown in Fig. 4.4(a). The
best-known scheme from information theory is to apply source coding, i.e. to compress the information handed over the backhaul, so that the beneting BS can
reconstruct it only by using its own received signals as side-information [SW73a].
This BS then re-encodes the interfering UE transmission, and subtracts the corresponding interference from its received signals before decoding an assigned UE.
If a backhaul capacity of is available and information exchange takes place from
BS 1 to BS 2, then the UE rates can be lower-bounded as




 1 H 1 1  1 H 

p1 h
1 p2 h

h
(4.12)
R1 log2 I + 2 I + 1hh + h
2
2
1
1





 2 H 1 2  2 H 
2 p2 h

p1 h

(4.13)
R1 + log2 I + 2 I + 2hh + h
h
2
2
1
1




zero if source coding is not considered



1 2  2 H 

p2 h

R2 log2 I + 2 I + 2hh
h
.
2
2

(4.14)

The inequality in (4.12) is based on the fact that UE 1 is decoded by BS 1


under the full interference from UE 2, and that in (4.13) due to the backhaul
constraint. The underbraced term corresponds to the rate at which BS 2 could
decode UE 1 without cooperation. Eq. (4.14) is nally based on the fact that
BS 2 can decode UE 2 free of interference from UE 1, as this has been subtracted.
In regimes of weak interference and low backhaul, the rate/backhaul trade-o
of DIS can be improved if only a portion of the decoded data is forwarded to the
other BS [MF08a]. In information theory, this can be modeled via superposition
coding. Instead of decoded data, one might also consider forwarding quantized
interference directly, having the advantage that BSs beneting from cooperation
need not re-encode interference [SSPS09c, GMFC09]. This so-called compressed
interference forwarding (CIF) has practical advantages, but is always inferior to
DIS with superposition coding [Mar10]. We will later also provide simulation
results for CIF, but refer the interested reader to [GMFC09] for details.

33

4.3 Trade-Os in Uplink Multi-Cell Joint Signal Processing

Network

Network

dec. data
of UE 1

dec. data
of UE 2

dec. data
of UE 1

dec. data /
quant. interf. of UE 1

Network
dec. data
of both UEs

dec. data
of UE 2
quant. rx signals
or soft bits

B BS1 B
y1

B BS2 B
y2

B BS1 B
y1

s1

s2

UE1

UE2

(a) DIS / CIF.

quant. rx signals
or soft bits

B BS2 B
y2

B BS1 B
y1

B BS2 B
y2

s1

s2

s1

s2

UE1

UE2

UE1

UE2

quant. rx signals
or soft bits

(b) Decentralized JD.

(c) Centralized JD.

Figure 4.4 Dierent uplink multi-cell joint signal processing concepts.

Decentralized Multi-Cell Joint Detection


In multi-cell joint detection (JD), cooperation is based on an exchange of quantized receive signals between BSs. Let us rst consider a decentralized approach,
where both BSs quantize their received signals and forward these simultaneously
to the partnering BS, as shown in Fig. 4.4(b). Both BSs can then independently
decode their assigned UE, making use of the receive signals from all NBS antennas in the cluster. The UE rates can be bounded as


'
(
)*1






0
0
H
H

2 p2 h
2
1  (4.15)
1 p1 h
R1 log2 I + 2 I + hh + h
h


0 2qq


'
(
)*

 H 1qq 0 1
 H 

2
1 p1 h
1
2  .(4.16)
2 p2 h
R2 log2 I + I + hh + h
h


0 0
The terms 1qq and 2qq in (4.15) denote covariances of the quantization noise
introduced by the BSs before signal exchange. The best backhaul eciency would
again be obtained via source coding. As quantization and source coding can be
modeled separately under Gaussian signaling [WZ76], we state that each BS m
m of its receive signals, and then passes a source
creates a discrete representation y
encoded version over the backhaul, so that the other BS l = m can reconstruct
m by exploiting its own received signals yl . Assuming a total backhaul capacity
y
of , the quantization noise terms then need to fulll [dS08]
2




1 m|l=m 

log2 I + m
yy
 ,
qq

(4.17)

m=1
m|l

where qq is the receive signal covariance at BS m, conditioned on the signals


received by BS l. Optimal quantization noise covariances may be calculated via a
Karhunen-Lo`eve transform [GDV02] succeeded by water-lling [dS08], where the

34

Gains and Trade-Os of Multi-Cell Joint Signal Processing

backhaul is optimally invested into the spatial dimensions of the received signals.
For decentralized JD, this is equivalent to letting each BS locally equalize the
interfering UE to obtain a scalar value which is then quantized. As source coding
might be regarded infeasible in practice, we also consider the case where this is
omitted, or a practical quantizer, where one quantization bit is lost per real
signal dimension [LBG80]. In these cases, (4.17) changes to [Mar10]
'
*1


2


m

 m 1 m 

max N
2,0

m
bs
log2 I + qq
yy  or m : qq = 2
1 m
yy
m=1

(4.18)
with 1 + 2 . The rate/backhaul trade-o of decentralized JD can be
improved if the backhaul is used successively, and not simultaneously. One BS
could forward quantized receive signals, after which the other BS would decode
its assigned UE and subtract the corresponding signals from its receive signals
before quantizing and forwarding the remaining signals to the former BS. However, this only yields (marginal) gains in interference regimes where the following
scheme is superior, anyway, while increasing latency [Mar10].
Centralized Multi-Cell Joint Detection
Let us nally consider the case where one BS quantizes its received signals,
and forwards these to the other BS, where both UEs are then jointly decoded.
Assuming that received signals are forwarded from BS 1 to BS 2, the UE rates
can be stated for a given quantization noise covariance as


'
( 1
)*1 






0
H

qq
k 
k pk h
(4.19)
Rk log2 I + 2 I + hh +
h


0 0
kS

kS

One benet of centralized JD, becoming evident from (4.19), is the option of
SIC at the decoding BS. The quantization noise covariance 1qq has to fulll

1 1|2 


log2 I + 1qq
yy  ,
'
*1





1 1 

max N 2,0

bs
yy  or 1qq = 2
1 1yy , (4.20)
log2 I + 1qq
with or without source coding, or based on practical quantization, respectively.
Note that even under perfect CSIR, the rate region is not a polygon anymore,
as we have the degree of freedom of assigning dierent portions of backhaul to
the two UEs. This is treated analytically and illustrated in [dS08].
In [SSS07a], centralized JD has been investigated in conjunction with partial local decoding. For above cooperation direction, this would mean that BS 1
decodes part of its assigned UEs transmission itself, and forwards the remaining
received signals to the other BS for joint decoding of the remaining signals from
both UEs. While a benet regarding the rate/backhaul trade-o was reported
in [SSS07a], this is only marginally superior to a simple time-share between a
decentralized and centralized cooperation strategy [MF08c].

35

4.3 Trade-Os in Uplink Multi-Cell Joint Signal Processing

d1 = 0.5, d2 = 0.5

7.6
bc

bc
bc
bc

qp
qp

bc

rs

qp

ld
rs

bc
rs
qp

ld
ld

qpldrs

4
0

qp
qp

bc

Np = 2

rsld

ldrs qp
bc

DIS
CIF
ldrs
Dec. JD
Cen. JD
cut-set b.

2
4
6
8
10
sum backhaul [bit/channel use]
(a) Symmetrical, strong interference.

ldrs

sum-rate [bit/channel use]

sum-rate [bit/channel use]

d1 = 0.4, d2 = 0.2
rs

rs

7.4

rsld

rsld

ldrs

qp

qp

qp

ld
ld
bc
qp
qp

7.2

rs

ldrs
ld
qp
qp

7.0
bc
qpldrs

6.8
12

bc
bc

bc

Np = 2

DIS
CIF
Dec. JD
Cen. JD
cut-set b.

bc

2
4
6
8
10
sum backhaul [bit/channel use]

12

(b) Asymmetrical, weaker interference.

Figure 4.5 Sum-rate vs. backhaul for dierent uplink cooperation strategies.

4.3.2

Numerical Results
Let us now compare the rate/backhaul trade-os achievable with the cooperation
concepts stated before, again focussing on M = K = 2. In Fig. 4.5, the achievable sum-rate of both UEs is plotted as a function of backhaul, under imperfect
CSI with Np = 2. The left case shows a symmetrical interference scenario, where
both UEs are at the cell-edge (i.e. d1 = d2 = 0.5), while the right case resembles
an asymmetrical scenario of weaker interference (d1 = 0.4, d2 = 0.2). For each
scheme, multiple lines show the range between the best rate/backhaul trade-o
achievable in theory (upper left) and under practical considerations (lower right).
The dashed line indicates the cutset-bound [CT06], i.e. the sum-rate achieved if
every bit of backhaul leads to an equivalent sum-rate increase until MAC performance is reached. Only centralized JD asymptotically achieves MAC performance for a large backhaul, due to the full extent of spatial multiplexing, array
and interference cancelation gain. At the cell-edge (Fig. 4.5(a)), the scheme outperforms all others, and source coding is highly benecial due to strong signal
correlation. Decentralized JD also shows good asymptotical performance, but
lacks the option of SIC. One could argue that each BS could also decode the
interference as well, but then it would suce to perform cooperation only in one
direction, i.e. do centralized JD requiring less backhaul. However, such a strategy
may still be interesting from a signaling perspective, see Section 11.2. For the
cell-edge case, there is no benet of using DIS or CIF, as both BSs can independently decode the interference and subtract this before decoding their UEs,
without requiring backhaul at all. In the asymmetrical case of weaker interference
(Fig. 4.5(b)), the story changes. Beside lacking array and spatial multiplexing
gain, decentralized schemes can now oer an improved rate/backhaul trade-o
in regimes of low backhaul. Especially DIS here appears attractive, as BS 1 can
decode its assigned UE at moderate interference, while the extent of interference
cancelation enabled by the exchange of decoded bits over the backhaul is large.

Gains and Trade-Os of Multi-Cell Joint Signal Processing

0.6
rs
rs
ld

DIS
CIF
Dec. JD
Cen. JD

0.5
qp
bc

d2

36

rs
bc

> 10%

rs

0.4

bc

> 10%
ld

0.3

qp
rs

SISO SNR 10 dB, Np = 2


0.2
0.2

0.3

0.4
d1

0.5

0.6

Figure 4.6 Best uplink cooperation concept for a backhaul capacity of 4 bpcu.

This is conrmed in Fig. 4.6, where the best cooperation concept is shown
as a function of UE location, for an exemplary backhaul of 4 bpcu. While centralized JD is best for strong, possibly asymmetric interference, DIS is superior
for weaker interference and a constrained backhaul. CIF and decentralized JD
are only interesting in regimes of very weak interference, where we know from
Fig. 4.2 that expected CoMP gains are small, anyway. The results suggest that
a practical system should switch between centralized JD and DIS depending on
the interference situation. This is emphasized by the darker areas in Fig. 4.6,
where one strategy yields more than 10% larger rates than the other.
One may wonder why all schemes presented before actually yield a
rate/backhaul trade-o far away from the cut-set bound. This question enables
an interesting insight into fundamental properties of the compared schemes:
While centralized JD asymptotically achieves MAC performance, it fails to
meet the slope of the cut-set bound, as a certain extent of backhaul is wasted
into the quantization of noise [dS08]. In fact, the cut-set bound is approached
if source coding is applied and the SNR approaches innity [dS08, Mar10].
DIS, however, usually does not meet the at part of the cut-set bound, as
it lacks spatial multiplexing and array gain, and the rst UE decoded does
not prot from cooperation at all. It also usually fails to meet the slope of
the cut-set bound, as the entropy of the data handed over the backhaul is
mostly larger than the rate gain due to interference cancelation. An exception
is the Z-interference channel [ZY08] with d1 = 0.5, d2 = 0, where (with source
coding) every backhaul bit indeed yields exactly one bit of sum-rate increase,
until asymptotic DIS performance is reached [GMF09].

4.4 Degrees of Freedom in Downlink Joint Signal Processing

37

Any uplink joint signal processing scheme is hence subject to a trade-o


between using backhaul eciently due to local preprocessing (with limited gain),
or wasting backhaul into the quantization of noise (with maximum CoMP gain).

4.3.3

Parallels between Theory and Practical Cooperation Schemes


Practical algorithms applied to non-Gaussian signaling typically combine decentralized and centralized strategies. In [KF08, KRF08], for example, each BS
(partially) decodes both the strongest interferer and its own UE, and forwards
soft-bits to the other BS. This principally corresponds to centralized JD, but
code-awareness and local preprocessing is used to exploit the structure in signals
and interference for ecient backhaul usage (but reduced gain). The fact that
terminal rates are constrained by the rst (partial) decoding process can be alleviated by using iterative BS cooperation [BC07, AEH08, WT09a], i.e. starting
with coarse decoding and rening this in each iteration. For the case of iterative DIS, however, even under very theoretical considerations, the rate/backhaul
trade-o is only marginally improved over one-shot cooperation (though the
asymptotic sum-rate is improved) [GMF09, Mar10]. In practice, every backhaul
usage will always inherit additional redundancy (and increase latency), hence
rendering iterative schemes even more questionable [MJH06].

4.4

Degrees of Freedom in Downlink Joint Signal Processing


As suggested by the title of this section, there are no such fundamental tradeos to be made in the downlink as in the uplink. This is because transmitterside signal processing is generally performed on noiseless, deterministic signals
(as opposed to noisy observations in the uplink), and the main design choice
is whether this processing is to be performed in a centralized, distributed or
decentralized way. Fig. 4.7 illustrates these three principle kinds of multi-cell
joint transmission (JT), which are briey explained in the sequel.
In a centralized case, shown in Fig. 4.7(a), a central unit (CU) (possibly a
BS itself) performs all preprocessing for a cluster of cooperating cells. It then
quantizes and forwards the transmit signals to the other. This can be done in
either time or frequency domain (where the latter is of course more backhaul
ecient), and can be based on a common public radio interface (CPRI). It is also
possible to forward analog radio frequency (RF) signals over bre-optic cables,
known as radio over bre (RoF) [DMF10]. In all cases, the transmitting BSs are
basically operated as remote radio heads (RRHs). Clearly, source coding is not
applicable in the downlink, as there is no side-information to be exploited.
Backhaul requirements can be strongly reduced (while possibly maintaining
the same performance) if precoding is performed in a distributed way, as shown
in Fig. 4.7(b). In this case, the network provides all involved BSs with the same

38

Gains and Trade-Os of Multi-Cell Joint Signal Processing

Network
data bits
of both UEs

Network

global CSI
global data

CU
analog or
digital tx signals

(possibly)
data of
both UEs

Network
(possibly)
data of
both UEs

(possibly) CSI /
data bit exch.

global CSI /
data

data of
some UEs

data of
some UEs
(possibly) CSI /
data bit exch.

global CSI /
data

partial CSI /
data

BBS 1B

BBS 2B

BBS 1B

BBS 2B

BBS 1B

B BS 2B

s1

s2

s1

s2

s1

s2

y1

y2

y1

y2

y1

y2

UE 1

UE 2

UE 1

UE 2

UE 1

UE 2

(a) Centralized.

(b) Distributed.

(c) Decentralized.

Figure 4.7 General kinds of downlink multi-cell joint signal processing.

data to be transmitted to all jointly served UEs (or the BSs distribute this
data among each other), such that all BSs calculate their part of the precoding
matrix W independently. A crucial aspect, however, is the fact that all BSs now
require global CSI. This can be assured by either exchanging channel information
between BSs, or by designing the CSI feedback from the terminal side such that
all involved BSs can individually decode this. A distributed downlink JT scheme
is observed in Sections 6.3, 13.3 and 13.4.
Before mentioned CSI requirement can be alleviated if decentralized downlink
JT is performed, as shown in Fig. 4.7(c). In this case, the involved BSs may
have strongly dierent extents and accuracies of CSI, and dierent extents of
knowledge on UE data bits, but still contribute to the transmission through
local precoding. Such a scheme is considered in Section 6.4.

4.5

Summary
In this chapter, we have extended the models from Chapter 3 to observe multicell joint signal processing under imperfect channel knowledge. In both uplink
and downlink, we have seen that in representative scenarios of up to 3 cooperating base stations, spectral eciency gains of more than 100% are thinkable at
the cell-edge, while these gains strongly decrease towards the cell-center. Further considering a limited backhaul capacity between base stations has revealed
a major trade-o in the uplink: Either backhaul is used eciently, but only a
limited extent of capacity gain is achieved, or backhaul is wasted into the quantization of noise, but yielding maximum gain. In the downlink, we have discussed
three dierent joint transmission concepts that dier in the way how user data,
channel knowledge and precoding are distributed among the base stations.

Part II
Practical CoMP Schemes

CoMP Schemes Based on


Interference-Aware Transceivers or
Interference Coordination
In this chapter, we introduce CoMP schemes where no or little information is
exchanged between cooperating base stations. In Section 5.1, we observe an
interference-aware downlink transmission scheme where each base station performs individual intra-cell beamforming, while the terminals are able to mitigate
inter-cell interference to a certain extent through a particular interference estimation and rejection concept. The level of base station cooperation is then increased
in Sections 5.2 and 5.3, where joint multi-cell scheduling and link adaptation,
and multi-cell coordinated beamforming are investigated, respectively.

5.1

Downlink Multi-User Beamforming with Interference Rejection


Combining
Lars Thiele, Thomas Wirth, Malte Schellmann, Thomas Haustein
and Volker Jungnickel
In this section, we evaluate a non-cooperative downlink transmission scheme,
i.e. where no explicit cooperation takes place between base stations (BSs),
but where interference-aware transmission and reception is performed within
cells. The BSs perform intra-cell precoding based on limited feedback from
the user equipments (UEs), in conjunction with interference-aware scheduling
and interference rejection combining (IRC) at the terminal side. This section
is based on Interference-aware scheduling in the synchronous cellular multiantenna downlink, by L. Thiele, M. Schellmann, T. Wirth and V. Jungnickel,
c 2009 IEEE.
which appeared in [TSWJ09]. 

5.1.1

Introduction
Transmission with multiple antennas both at the transmitting and receiving ends
of a wireless link has become increasingly mature in recent years. From theory,
the fundamental capacity gain of the multiple-input multiple-output (MIMO)
radio link, being proportional to the minimum of the number of transmit and
receive antennas, is well understood for an isolated point-to-point link. Under
perfect channel knowledge at transmitter and receiver, a capacity-achieving

42

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

interference
from other cells

xed beams

Figure 5.1 Concept of intra-cell beamforming considered in this section.

strategy is to exibly invest transmit power into the eigenmodes of the channel via water-lling (see Section 3.4.1). In practical systems, however, under
imperfect channel knowledge and a limited granularity of power allocations and
modulation and coding schemes (MCSs), one typically switches between the two
fundamental transmission modes spatial multiplexing (SMUX) and spatial diversity (SDIV) [ZT03] depending on the current channel state, in order to improve
the error rate performance for xed data rate transmission [HP05b] or to increase
the spectral eciency [SAH+ 04].
To enable ubiquitous broadband wireless access, MIMO transmission must be
made robust against multi-cell interference. However, it is not fully evident yet
how the potential capacity gains of MIMO can be realized under these conditions.
In fact, early results obtained for a small set of linear transceiver settings, i.e.
number of antennas, equalization and precoding strategy, indicate only small
gains for SMUX over SDIV systems [CDG00]. The achievable spectral eciency
may be enhanced by incorporating multi-user MIMO (MU-MIMO) into system
design and thus turning the focus to multi-user links [GKH+ 07]. However, BSs
would require coherent channel state information (CSI) to optimally serve their
users in MU-MIMO, which is dicult to obtain in frequency division duplex
(FDD) systems, as a high rate feedback link would be required from the terminals
to the base stations.
Further, fair resource assignment is mandatory in cellular networks in order
to guarantee radio access for all users. The multi-path structure of signal and
interference channels may be used benecially in this interference-aware scheduling process. Supplemental to the time-domain scheduling already used in todays
radio systems, groups of frequency resources may be assigned to the users according to their frequency-selective signal-to-interference-and-noise ratio (SINR) conditions. In this case, users may benecially be assigned to their best resources.
This section targets a practical solution for decentralized interference management. The key to success is a predictable interference scenario at the receiver
side, which also helps to improve the link adaptation process. Thus, we consider
using xed beams (i.e., xed sets of possible precoding vectors) for transmission

5.1 DL Multi-User Beamforming with IRC

43

Network

data bits
of UE1

PMI / CQI
feedback

data bits
of UE2

BBS 1B

BBS 2B

s1

s2

precoding

precoding

IRC

IRC

y1

y2

UE 1

UE 2

PMI / CQI
feedback

Figure 5.2 Non-cooperative transmission and PMI/CQI feedback concept considered.

as depicted in Fig. 5.1. In particular, terminals are assumed to report their preferred precoding matrix indicators (PMIs) in combination with corresponding
post-equalization SINRs via a low-rate feedback channel. For the equalization
at the UE, comprehensive channel knowledge on the radio system is required,
which may be obtained by multi-cell channel estimation based on pilot symbols,
as discussed in Section 9.1. Therefore, downlink transmission has to be synchronized [JWS+ 08]. With this approach, we demonstrate substantial throughput
gains for MIMO systems in multi-cell environments, similar to those known for
point-to-point links. We further indicate potential performance gains under the
inuence of imperfect channel estimation in systems with non-synchronized and
synchronized BSs.

5.1.2

Downlink System Model


We extend the multi-antenna downlink model from Section 3.5, observing an
orthogonal frequency division multiple access (OFDMA) transmission on a single
sub-carrier from M BSs to K UEs that are scheduled to the same resource in
time and frequency. The BSs and UEs are equipped with Nbs and Nue antennas,
respectively, leading to an overall number of NBS = M Nbs transmit antennas
and NUE = KNue receive antennas. This implies that each BS may transmit up
to Nbs streams simultaneously on the same resource, while each UE may receive
up to Nue such streams simultaneously. Clearly, there is the degree of freedom
that a BS may serve many UEs with fewer streams each, or fewer UEs with more
streams each, which we will explore later. As we are observing non-cooperative
downlink transmission, this means that each stream may only be transmitted
from one BS, as illustrated for a setup with M = K = 2 in Fig. 5.2. Consequently,
the overall precoding matrix W CNBS NUE as introduced in Section 3.5 is

44

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

sparse, as each column connected to one UE and one stream may only have
non-zero entries connected to the antennas of one BS.
In the sequel, let us observe one UE k which is served by BS m = k. While
set K captures all K UEs, we denote as Km the set of all UEs served by BS m
simultaneously on the same resource, which is obviously limited to the number
of BS transmit antennas, e.g. |Km | Nbs . All received signals of our observed
UE k can be expressed as


H
H
m
m
yk = (Hm
(Hm
(Hk )H Wj xj + nk ,
k ) Wk xk +
k ) Wj xj +



j{Km \k}
j{K\Km }
Hk



Intra-cell intfr. k

Inter-cell intfr. and noise zk

(5.1)
where Hk is the channel between UE k and all BSs, Wk is the compound prem
coding vector used to serve UE k, and Hm
k and wk are the sub-portions of
these matrices or vectors connected to BS m, as introduced in Chapter 3. We
write as Hk the eective channel between UE k and its serving BS after precoding, which consists of one column for each of the Nue streams the UE may
potentially receive, i.e. Hk = [Hk,1 . . . Hk,Nbs ]. The corresponding potential data
streams stacked in xk with x NC (0, I) are distorted by the intra-cell and intercell interference and noise aggregated in k and zk , respectively. Each BS m
may select a limited number Qm Nbs of active beams to serve one user with
multiple beams or multiple users simultaneously. This is done by choosing the
corresponding columns of BS m-related precoding matrix Wm from the columns
of a pre-dened beam set m
i . In the case of Nbs = 2, beam set size = 2 and
discrete Fourier transform (DFT)-based precoding, this can be either
(
)
(
)
1 1 1
1 1 1
m

or

.
(5.2)
=
=
m
1
2
2 i i
2 1 1
Columns in Wm representing streams that are not used are simply lled with
zeros. Note that Wm has to be scaled depending on the choice of Qm in order
to fulll a per base station power constraint, i.e. tr{Wm (Wm )H } Pm . If only
one beam is active, i.e. Qm = 1, we name it single stream (SS) mode, while for
Qm > 1, we refer to it as multiple stream (MS) mode.

5.1.3

Linear Receivers
Assuming that a linear equalizer gk,u is employed to extract the useful signal
from yk connected to stream u, this yields a post-equalization SINR given by
H

SINRk,u =

H
gk,u
hk,u hk,u gk,u
H Z
gk,u
k,u gk,u

(5.3)

where Zk,u is the covariance matrix of the streams received by UE k (except


stream u) and the interfering signals and noise aggregated in k and zk ,

i.e. Zk,u = v=u hk,v (hk,v )H + E{(k + zk ) (k + zk )H }. For IRC [Win84], the

45

5.1 DL Multi-User Beamforming with IRC

interference-aware minimum mean square error (MMSE) receiver is used, i.e.


MMSE
= R1
gk,u
yy,k hk,u ,

(5.4)

where Ryy,k denotes the covariance matrix of the received signal yk , i.e.
!
!
H
Ryy,k = E yk (yk )H = Hk Hk + E (k + zk ) (k + zk )H .

(5.5)

The derivation of the MMSE receiver is discussed in detail in Section 10.2.


The MMSE receiver yields a post-equalization SINR
H

SINRMMSE
= hk,u Z1
k,u
k,u hk,u .

(5.6)

Based on this SINR, the achievable spectral eciency is evaluated in a downlink OFDMA multi-cellular simulation environment. For reference purposes, we
compare these results with the performance achievable by using a maximum
ratio combining (MRC) receiver
MRC
= hk,u
gk,u

yielding a post-equalization SINR


SINRMRC
k,u =

5.1.4

+
+2
+ H
+
+hk,u hk,u +
H

(5.7)

(5.8)

hk,u Zk,u hk,u

Imperfect Channel Estimation


For theoretical investigations, full channel state information at the receiver
(CSIR) may be assumed. In order to obtain the achievable data rate in a practical system, we introduce dierent channel estimation models. In [TSWJ08], IRC
was shown to be highly sensitive to estimation errors, since the spatial structure
of the interference covariance matrix is utilized for equalization. In the following, we assume quasi-static channel conditions over the observation interval. For
evaluation, we assume perfect synchronization between UEs and their serving
BSs [MKP07] and a suciently large cyclic prex, which alleviates the eect of
inter-symbol interference (ISI). We distinguish between the following cases:
Non-synchronized BSs, i.e. BSs are not synchronized to each other with
respect to carrier frequencies and frame start. Therefore, we introduce chan,
,
nel estimation errors according to h
k,u = hk,u + k,u . Term hk,u denotes the
biased estimate of variable hk,u , and k,u denotes the zero-mean Gaussian distributed error with variance . For SINR estimation, we consider knowledge on
frequency-at and frequency-selective independently and identically distributed
2
(i.i.d.) interference powers IF
according to (5.9) and (5.10), respectively. Further, we consider the case of full frequency-selective covariance knowledge based
on received data signals yk . In the following, q and n denote the discrete subcarrier and transmit time interval (TTI) index, respectively, and the estimated
, k,u is given as
interference covariance Z

46

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

2
Frequency-at i.i.d. interference power IF

,
, k,u = Eq
Z
|hj,v (q)|2 |hk,u (q)|2 + 2 I

(5.9)

j,v

2
Frequency-selective i.i.d. interference power IF


, (q)|2 + 2 I
, k,u (q) =
|hj,v (q)|2 |h
Z
k,u

(5.10)

j,v

Frequency-selective covariance Zk,u


"
# ,
,
H
, k,u (q) = En yk (q, n)yk (q, n)H h
Z
k,u (q)hk,u (q)

(5.11)

Synchronized BSs, using multi-cell channel estimation based on virtual


pilot sequences ck,u (n) [TSSJ08, HKF10]. These sequences are block-orthogonal
and dened over time-domain. For channel estimation, the receiver uses a simple correlator. For simplicity, we drop the sub-carrier index q in the sequel.
According to [TSSJ08], we use pilot sequences ck,u (n) which are derived from
Hadamard matrices. Hence, the multi-cell channel knowledge degrades with
increasing mobility of the UE, and we state
N 1

1 
,
h
c (n)yk (n)
k,u =
N n=0 k,u
 , ,H
,H
, h
, k,u =
Z
hj,v hj,v h
k,u k,u .

(5.12)
(5.13)

j,v

5.1.5

Resource Allocation and Fair User Selection


Resource allocation and selection of the proper spatial transmission mode (i.e.
SS or MS, see Section 5.1.2) is carried out by a score-based scheduling process
developed in [STJH07], which is briey described as follows: In a rst step, the
UEs evaluate the current channel conditions per physical resource block (PRB)
in terms of their achievable SINR conditions. By using (5.6) and (5.8) and a
suitable SINR-to-rate mapping function, they can determine for each transmission mode the expected achievable rate per supported beam. This information
is conveyed to the BS, where a score-based resource scheduling algorithm is performed: To enable direct comparison of the single per-beam rates from dierent
spatial modes, the stream rates are weighted by a so-called penalty factor, which
accounts for the higher power allocated to the SS beam compared to MS mode.
In particular, if Q is the number of simultaneously active streams in MS mode,
the penalty factor is chosen as Q1 . For each user, the (weighted) per-beam
rates from all modes over all PRBs are ranked by their quality, and corresponding scores are assigned. Mode selection and resource assignment is then done

5.1 DL Multi-User Beamforming with IRC

47

(a) U = 2 users

(b) U = 20 users
Figure 5.3 Rate allocation across two data streams, if the scheduler may choose from
c 2009 IEEE.
U = 2 or U = 20 users [TSWJ09]. 

48

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

for each PRB individually: First, each beam available per transmission mode is
assigned to the user providing the minimum, i.e. best, score for that beam. Then,
the mode is selected which corresponds to the minimum overall user score.
The objective of this score-based resource allocation process is to assign each
user to his best resources, and the decision on the spatial mode is taken under
the premise of achieving a high throughput for each user. Clearly, the process is
of heuristic nature, and hence the global scheduling target of assigning each user
an equal amount of resources is achieved only on average, or if the number of
available resources tends to innity. However, its convenient property for practical applications is its exible utilization, as the set of resources can be dened
over arbitrary dimensions (time/frequency/space). Thus, fairness w.r.t. an equal
amount of resources for all active users can be established on a small time-scale,
e.g. even for the scheduling of resources contained within a single orthogonal
frequency division multiplex (OFDM) symbol.
An illustration of the performance achievable by the score-based scheduler is
given in Fig. 5.3. It depicts the histogram of normalized achievable user rates
in the rate region plane for two UEs which may be scheduled in each PRB. In
particular, we assume two spatial layers to be available in each PRB (i.e. Nbs =
2), allowing two users to be served simultaneously in MU-MIMO mode. The rate
allocated to each of these two users is normalized to the rate it would achieve
if the PRB was assigned exclusively to it. Fig. 5.3(a) shows the distribution of
normalized rates if the total number of users to select from is limited to U = 2,
while Fig. 5.3(b) refers to the case U = 20. From both gures, it is clearly seen
that the achievable rates lie beyond the time division multiple access (TDMA)
rate region (dashed line in the rate region plane). For an increasing number of
UEs, the histogram is more and more concentrated in the upper right corner of
the rate region. This shows that the heuristic score-based scheduling approach
signicantly outperforms TDMA scheduling and conveniently achieves high user
rates by properly utilizing MU-MIMO.

5.1.6

Single-Cell Performance
Initial performance evaluation is carried out for a xed system setting in
an isolated cell (i.e., zk = nk in (5.1)), where K UEs, each equipped with
Nue = 2 receive antennas, communicate with a dual-antenna BS (Nbs = 2).
The evaluation environment is based on the spatial channel model extended
(SCME) [3GP07a], and full CSIR is assumed. We investigate the probabilities of
mode selection depending on the mean signal-to-noise ratio (SNR) conditions,
which are depicted in Fig. 5.4 for 2 or 10 users, respectively. Note that resources
where a rate cannot be supported by any user are not assigned by the scheduler.
For that reason, the selection probability of SS mode drops down to 75% at
Ps /N0 = 5 dB in the rst case. Three dierent congurations of the adaptive
mode switching system are considered here:

49

1.0

probability of MS mode selection

probability of MS mode selection

5.1 DL Multi-User Beamforming with IRC

SU-MIMO
MU-MIMO
MU-MIMO, 2 beam sets

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
5

5
pi / 2 [dB]

(a) U = 2 users.

10

15

1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
5

5
pi / 2 [dB]

10

15

(b) U = 10 users.

Figure 5.4 Probability of the selection of multiple stream (MS) mode vs.
c 2009 IEEE.
SNR [TSWJ09]. 

1. SU-MIMO: MU-MIMO option is switched o, i.e. MS mode reduces to


single-user MIMO (SU-MIMO). Now only one user is served per PRB either
in diversity or SU-MIMO mode.
2. MU-MIMO system as described in Section 5.1.5 with the rst beam set
m
1 from (5.2) being available. Simultaneously active beams can be assigned
independently to dierent users. The mode per user is selected per PRB, i.e.
a user may be served in dierent modes simultaneously.
3. MU-MIMO, 2 beam sets: Adaptive MU-MIMO system with both beam
sets from (5.2) being available.
The points where the curves in Fig. 5.4 cross the median highlights the SNR
regions where the MS mode becomes the dominantly selected one. From both
gures, we observe that going from SU-MIMO to MU-MIMO promotes selection
of the MS mode substantially, as the crossing point is shifted by 5 dB in case of
2 users and by more than 10 dB in case of 10 users down towards the low SNR
regime. For 10 users, the crossing point falls below an SNR of 0 dB. The support
for MU-MIMO mode also results in signicant gains in the spectral eciencies
(refer to Fig. 5.7 later). These results strongly emphasize that MU-MIMO is the
key for the ecient use of spatial multiplexing transmission even at low SNR, as
also discussed in Section 11.1.
Providing an additional beam set shifts the crossing point even further down,
which can be attributed to the ner granularity in the quantization of the transmit vector space. For 10 users, the crossing point in Fig. 5.4(b) can be shifted
down to about 1.5 dB now. Further, it can be observed that the shape of the
probability curves approach that of a step function, highlighting that the system
behavior tends towards a hard mode switching at a xed SNR value.

50

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

Table 5.1. Simulation assumptions.

Parameter
channel model
scenario
trac model
carrier frequency fc
system bandwidth
inter-site distance (ISD)
number of sites
Nbs ; antenna spacing
transmit power
sectorization
BS height
Nbs ; antenna spacing
UE height
CQI granularity
feedback delay
channel estimation

Value
3GPP SCME
urban-macro with scenario-mixa
full buer
2 GHz, frequency reuse 1
18 MHz, 100 PRBs
500 m
19 having 3 cells each
1,2,4 ; 4
46 dBm
triple, with FWHM of 68
32 m
1,2,4 ; /2
2m
1 PRB
0 ms
as specied in text

a
Note, a mobile terminal might experience dierent propagation scenarios, i.e. line-ofsight (LOS) and non line-of-sight (NLOS), to distinct BSs.

5.1.7

Multi-Cell Performance under Perfect CSI


Turning the focus to a multi-cell system, the performance is investigated in a
triple-sectorized hexagonal cellular network with M = 57 BSs in total, i.e. a center site with three sectors or cells surrounded by two tiers of interfering sites.
Simulation parameters are given in Table 5.1. Initial results are based on the
assumption of full and perfect CSIR. The SCME with urban macro scenario
parameters is used [3GP07a], yielding a user geometry as in [HVKS03]. The
UEs are always served by the BS whose signal is received with highest average power over the entire frequency band. For capacity evaluation, only UEs
being placed inside the three central cells are evaluated. In this way, BS signals transmitted from the 1st and 2nd tier of sites model the inter-cell interference [TSZJ07]. Performance is evaluated for both the sum-throughput in a
sector and the throughput for individual users. Both values are normalized to
the signal bandwidth, yielding a sectors overall spectral eciency and normalized user throughput, respectively. The achievable rates are determined from
the SINRs calculated according to expression (5.6) by using a quantized rate
mapping function [IST07b], representing achievable rates in a practical system.
From these results, cumulative distribution function (CDF) plots are obtained.
Case 1: All BSs provide one xed unitary beam set. With respect to the
single-input single-output (SISO) reference case, Fig. 5.5 (solid lines) indicates a
capacity increase of the median sectors spectral eciency by a factor of = 1.95,

51

5.1 DL Multi-User Beamforming with IRC

ut

bcu

0.9
q

ut

0.7
0.6

bc u

rs

=2
=4

ut

= 1.95 = 2.88
= 3.62
= 2.11
= 3.43

0.5
0.4

bc u

rs

ut

0.3

bc

bc u

rs

0.6

ut

2
4
6
8
10
spectral eciency [bit/s/Hz/cell]
(a) Spectral eciency.

u
bc

q
bc
u
q
qbc

bc

bc

0.2
0

q
u

0.4

bc

bc

0.5

0.1

0.7

0.1

bc

bc

0.3

0.2

0.8

CDF

rs

ut

=2

ut

bc u

ut

0.8

CDF

ut

1.0

0.9

ut

1.0

rs

SISO
adapt.,
adapt.,
adapt.,
adapt.,

2x2
4x2
2x4
4x4

0.1
0.2
0.3
0.4
0.5
throughput [Mbit/s/user]

0.6

(b) User throughput.

Figure 5.5 Idealistic system performance for the SISO, MIMO 2 2 (Nbs Nue ),

4 2, 2 4 and 4 4 system for 20 users per cell or sector. Dashed lines indicate the
c 2009 IEEE.
performance achievable with = {2, 4} beam sets m
i [TSWJ09]. 

= 2.88 and = 3.43 for the MIMO 2 2 (Nbs Nue ), 2 4 and 4 4 system.
We can observe only small additional capacity gains for systems with Nbs > Nue
compared to a system with Nbs = Nue . This is mainly caused by the constraint
of DFT-based precoding, where the total transmit power is distributed evenly
over all antennas. In contrast, the system with Nbs < Nue benets from advanced
capabilities for interference suppression and higher receive diversity. This enables
the system to achieve larger scaling factors, e.g. = 2.88 for MIMO 2 4. The
5th percentile of normalized user throughput, which may serve as a measure to
represent the throughput of cell-edge users, shows similar scaling.
Case 2: All BSs provide multiple xed unitary beam sets. Fig. 5.5 (dashed
lines) further indicates the potential capacity gains for allowing the users to
choose from multiple beam sets. Here, the system may prot from an improved
channel quantization, yielding a capacity increase of = 2.11 for MIMO 2 2
with two beam sets. However, it has to be considered that then also the PMI
feedback overhead doubles from 1 bit to 2 bit.
Interference prediction: Note that considering independent adaptation of
beam sets for all BSs does not inuence the received interference covariance
matrix Zk,u , since the Wishart product Wm (Wm )H equals the scaled identity
matrix if we assume Wm to be unitary. However, changing the power allocation
for dierent MIMO transmission modes results in a multi-cell system where Zk,u
cannot be predicted at the receiver side. In order to support cell-edge terminals,
we suggest to arrange e.g. SS with full base station power in an agreed access
scheme known to the users.

52

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

1.0
MRC, fs. i.i.d., =0.1
MRC, . i.i.d., =0.1
MRC, fs. cov., =0.1
MMSE, fs. cov., =0.1
MMSE, fs. corr. N=3
MMSE, fs. corr. N=12

0.9
0.8
0.7
CDF

0.6
0.5
0.4
0.3
0.2
0.1
0
15

10

5
0
5
SINRest / SINRavail [dB]

10

15

c 2009 IEEE.
Figure 5.6 SINR estimation errors [TSWJ09]. 

5.1.8

Multi-Cell Performance under Imperfect CSI


In the following, we take channel estimation errors into account, i.e. using (5.9)(5.13). Fig. 5.6 indicates the estimation error of the SS SINR at the terminal. We
compare the ratio of the estimated SINRest to the achievable SINRavail under
perfect CSIR and estimated equalization weights. Employing either MRC in
an asynchronous network or IRC in a synchronized one leads to signicantly
dierent estimation errors. For MRC based on (5.10), the estimation suers in
two ways: There is a median shift of 1.9 dB, i.e. SINRest is systematically
too low. In addition, the estimation error has a considerable variance. With
overestimated SINR conditions, the channel may be overloaded, i.e. the reported
channel quality indicator (CQI) and the supported MCS do not match, which
results in substantial performance degradation and increased block error rate
(BLER). Assuming that strong channel codes as well as hybrid automatic repeat
request (HARQ) mechanisms are able to correct errors if 10% of the resources
are overloaded, we have to ensure that the 90th percentile of SINRest /SINRavail
is below 0 dB. This can be achieved by introducing a safety factor S < 1, shifting
all SINRest correspondingly.
For MRC based on (5.10), we can estimate S to be 2.3 dB from Fig. 5.6. Focusing on the median value, there is an overall penalty (oset) of approx. SINRpen =
4.2 dB at the multiple access channel (MAC) compared to SINRavail . Averaging
2
the interference power IF
over the entire frequency band, i.e. using (5.9), reduces
the penalty to SINRpen = 3.7 dB. Covariance estimation, i.e. (5.11), leads to
unbiased SINRest , but the S-factor is higher due to the larger variance, resulting
in SINRpen = 6.3 dB. Concentrating on asynchronous downlink transmission, we
conclude that an interference estimation scheme assuming a frequency-at i.i.d.
2
results in the highest performance.
IF

5.1 DL Multi-User Beamforming with IRC

53

1.0
0.9
0.8
0.7
CDF

0.6
1.76

0.2

SISO, =0.1
SU, =0.1, i.i.d. IF
SS, =0.1, i.i.d. IF
SU, MMSE corr. N=12
SS, MMSE corr. N=12

0.1

MMSE corr. N=12


MMSE perfect CSIR

0.5
0.4
0.3

0
1

3
4
5
6
spectral eciency [bit/s/Hz/cell]

Figure 5.7 MIMO 2 2 system performance under channel estimation

c 2009 IEEE.
errors [TSWJ09]. 

The penalties can be reduced further if the interference is estimated more


precisely, e.g. in a synchronous system using an MMSE receiver and the correlation approach as given in (5.12) and (5.13). For a correlation window spanning
N = 3 pilot symbols, we assume to be able to distinguish between the channels belonging to 3 out of 57 sectors or cells. Hence, interference cannot be
separated suciently, and thus SINR is systematically overestimated. However,
already with a correlation window spanning N = 12 pilot symbols, 12 sectors
and thus more interferers can be identied, and the SINR is determined more
precisely [TSWJ08]. The safety factor is then S = 0.9 dB, and the median shift
becomes negligible.
Fig. 5.7 shows the achievable sum-rates in the multi-cell system including
SINRpen . As a lower bound, we use the performance in the SISO case including
, . The upper bound
the eects of estimation errors for the desired channel h
k,u

is given by the adaptive transmission system assuming perfect CSIR. Assuming


the UE is able to estimate its dedicated channel with = 0.1 and Zu according
to (5.10) and the system is forced to SU-MIMO mode only, results in an inferior
performance compared to the SS transmission using MRC. The reason is that
the estimation error leads to inter-stream interference in the SU-MIMO case,
which is not present with SS transmission.
The next three CDF curves are all based on the estimates (5.12) and (5.13).
Although the MMSE receiver can exploit the knowledge of interference, the SS
mode using the MMSE receiver outperforms SU-MIMO transmission. Fully adaptive transmission yields a signicant system throughput gain, which is mostly
related to MU-MIMO scheduling. Note that the gap to the adaptive system with
perfect CSIR amounts to 8% only, indicating the robustness of the proposed

54

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

scheme. Finally, we come to the following conclusion: Synchronized downlink


transmission from all BSs in combination with MMSE receivers based on estimates (5.12) and (5.13) outperforms the asynchronous case. However, if the
system design would be constrained to non-synchronized BSs, SS transmission
in combination with the MRC receiver would be a suitable choice. The dierence
in the average throughput between both cases is signicant and amounts to 76%
in our results. Thus, the overall throughput gain achievable with synchronized
BSs is still signicant even under practical considerations.

5.1.9

Summary
In this section, we have evaluated the gains from using interference-aware,
frequency-selective MU-MIMO scheduling in a cellular network with synchronized base stations. Terminals were assumed to be able to estimate their dedicated and a certain number of interfering channel coecients. Two important
observations were made: Ecient MU-MIMO transmission can be achieved by
using xed unitary precoding, i.e. without the requirement of full channel knowledge. Further, proper application of the MU-MIMO mode enables to conveniently
serve even users with multiple streams who experience relatively poor SNR conditions. Thus, the MU-MIMO mode establishes a win/win situation for both, low
and high rate users. In addition, it was shown that knowledge on the interference
channels yields a more precise estimation of the achievable SINR compared to the
traditional approach, where interference is assumed white. Thus, CQI feedback
and supported modulation and coding scheme can be matched more accurately.

Acknowledgements
The authors are grateful for nancial support from the German Ministry of
Education and Research (BMBF) in the national collaborative project EASY-C
under contract No. 01BU0631.

5.2

Uplink Joint Scheduling and Cooperative Interference


Prediction
Philipp Frank, Andreas M
uller and Heinz Droste
While the last section introduced non-cooperative, but interference-aware transmission and detection schemes, we now look into CoMP schemes where adjacent
base stations (BSs) actually exchange information in order to coordinate resource
usage and applied transmission strategies. More precisely, a joint scheduling
approach is proposed, where the resource allocation in dierent cells is performed jointly, as well as a novel approach for cooperation-based interference
prediction through which the link adaptation can be signicantly improved. Both

5.2 Uplink Joint Scheduling and Cooperative Interference Prediction

55

schemes are described for the uplink in the following, but in principle they may
be employed in the downlink as well.
Joint scheduling generally belongs to the class of so-called interference coordination techniques, which recently have attracted a lot of research attention due to their potential to eciently mitigate inter-cell interference and
hence to realize signicant performance gains compared to non-cooperative systems [BPG+ 09, ADF+ 09]. The basic idea of interference coordination in general is to let dierent BSs cooperate with each other in order to control and
account for the inter-cell interference originating from the corresponding cooperating cells. This may be done in either a static or dynamic manner. With
a static approach, there are usually some pre-congured restrictions regarding
the resource allocation, for example, that on some frequency resources no celledge users may be scheduled, as it is the case for static fractional frequency
reuse [XSX07]. With a dynamic scheme, in contrast, such restrictions are determined on a much shorter time scale and usually by taking the instantaneous
channel conditions into account. In case of dynamic fractional frequency reuse,
for instance, there would be only restrictions on certain frequency resources
when high interference is expected, see for example [FKR+ 09, MMT08]. Clearly,
dynamic interference coordination generally should lead to a better performance
than static approaches, but this comes at the cost of a higher complexity and
possibly a higher backhaul load [BPG+ 09].
A general problem of interference coordination with independent, i.e., cellspecic scheduling is that the scheduling of one user in a certain cell may directly
impose certain restrictions on other cooperating cells and vice versa. Thus, nding the globally optimal solution becomes hardly feasible in practice. This is
also because the imposed restrictions cannot be changed arbitrarily fast due to
the inherent BS-BS signaling delay over the backhaul (see Section 12.2). However, this drawback can be overcome with a global scheduling algorithm that is
applied across all cooperating BSs, taking into account the channel state information (CSI) of all associated user equipments (UEs) in order to nd the optimal or at least close-to-optimal allocation of radio resources. Below, we propose
such a centralized cooperative scheduling scheme in Section 5.2.1, with which the
resource allocation as well as the link adaptation is performed jointly by a central
scheduling unit (CSU) for a set of cooperating BSs [FMDS10]. However, since
this requires the signaling of multi-cell CSI from all cooperating BSs to the CSU,
it results in a possibly massive backhaul load. In order to reduce this backhaul
load as well as the signal processing complexity, we therefore propose in a second
step in Section 5.2.2 a novel multi-cell interference prediction scheme. In contrast
to the joint scheduling approach, the scheduling process of the interference prediction scheme is still done independently by each BS as in conventional systems,
and only the inter-cell interference that is expected to occur during a future data
transmission is predicted and then used for improving the link adaptation process [MF10]. This is accomplished by exchanging scheduling information between

56

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

cells = sectors

multi-cell
CSI
scheduling
decisions

backhaul
link

central
scheduling unit
site with
3 base stations

Figure 5.8 Illustration of the joint scheduling concept considered.

a set of cooperating BSs combined with multi-cell channel estimation. As will


be seen in Section 14.3, this way still considerable performance gains can be
achieved, but the generated backhaul load is much smaller than that for joint
scheduling.

5.2.1

Interference-Aware Joint Scheduling


The uplink of a cellular network as shown in Fig. 5.8 is considered, where different BS sites are interconnected with a CSU via high-capacity backhaul links,
assumed to facilitate a fast information exchange. It should be noted that the
depicted CSU in Fig. 5.8 is not necessarily a separate device, but it may also be
incorporated in one of the involved BSs. As illustrated in Fig. 5.8, all cooperating
BSs periodically send multi-cell CSI of the associated UEs to the corresponding
CSU, which thus becomes aware of the interference a certain UE scheduled in one
cell would cause to another cell within the same cooperation cluster. This way,
strong interference situationswhich may occur, for example, if cell-edge UEs
of neighboring cells are allocated to the same radio resourcescan be avoided
by taking the predicted inter-cell interference caused by the various UEs located
within the cooperation cluster into account. The avoidance of high interference
levels may not only signicantly increase the overall system performance in terms
of the average cell throughput, but it also contributes to a better fairness since
UEs located close to the cell-edge generally benet most from it.
A ow chart of the considered joint scheduling algorithm is depicted in Fig. 5.9.
In a rst step, each CSU reserves certain radio resources for the requested retrans-

5.2 Uplink Joint Scheduling and Cooperative Interference Prediction

57

HARQ management
Ordering of the
cooperating BSs

yes

Resource
allocation performed for
all cooperating BSs?

Link adaptation based


on exchanged multi-cell
CSI

no
Determine joint
scheduling priorities for
current BS

Interference-aware joint
scheduling completed

Perform resource
allocation

Figure 5.9 Flow chart of the interference-aware joint scheduling algorithm [FMDS10].
c 2010 IEEE.


missions of all associated BSs, and then the actual joint scheduling process is carried out. Since the simultaneous allocation of radio resources to all UEs located
within the respective cooperation cluster would cause a tremendous increase in
computational complexity, we assume in the following that the joint scheduling
procedure is carried out stepwise for each set of UEs assigned to one of the cooperating BSs. This way, the computational eort can be signicantly reduced.
However, this entails also that the BSs associated to a certain CSU have to be
ordered by means of a certain fairness criterion in order to sustain fairness among
the various UEs. For that purpose, the long-term cell throughput averaged over
the number of assigned UEs is considered as fairness criterion, which can be
expressed for the m-th BS by
Tavg,m (t + 1) = Tavg,m (t) + (1 )

Tinst,m (t)
,
|Km |

(5.14)

where Tavg,m (t) denotes the long-term throughput for the m-th BS at the time
interval t, Tinst,m (t) the instantaneous throughput, the forgetting factor and
Km the set of UEs assigned to BS m. The actual BS ordering is then done
in such a way that the corresponding average long-term throughputs according
to (5.14) are non-decreasing, i.e., the resource allocation always starts with the
BS associated with the lowest long-term throughput, then it is done for the one
with the second smallest one, etc.
Having determined the ordering of the cooperating BSs, the radio resources
are allocated to the various UEs based on the exchanged multi-cell CSI. To this
end, not only the current channel conditions between the UEs and their serving
BSs are taken into account, but also the expected inter-cell interference caused

58

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

by assigning these UEs to certain radio resources. Thus, the joint scheduling
priority for the b-th radio resource and k-th UE associated to its serving BS m
can be expressed by

Gj,b,Kb (t) , k Km ,
(5.15)
Sk,b (t) = Gk,b,Kb (t) +
j Kb

where Gk,b,Kb (t) denotes the scheduling priority for the k-th UE allocated to
the b-th radio resource on which the UEs in set Kb are already scheduled. Furthermore, Gj,b,Kb (t) indicates the updated scheduling priority for the already
scheduled UE j, taking into account that the k-th UE will be allocated to the
b-th radio resource. In this regard, the updated set of interfering UEs allocated
to the b-th radio resource for the j-th UE is given by
b = (Kb \ j) k.
K

(5.16)

In the following, only the calculation of the scheduling priority Gj,b,Kb (t) is
explicitly outlined, but the scheduling priority Gk,b,Kb (t) can be determined in
a similar way and therefore is not further considered in more detail here. It
is assumed that the radio resources are shared between the various UEs by
means of the well-known proportional fair approach, but it should be noted
that any other scheduling metric may be used in conjunction with our joint
scheduling scheme as well. The basic idea of proportional fair scheduling is to
realize a reasonable trade-o between the maximal total throughput and cell-edge
throughput. Clearly, on the one hand, fair resource allocation among the UEs
will lower the overall throughput compared to the maximum possible one, but
in return it provides a higher throughput for UEs with relatively poor channel
conditions, thus improving the system fairness. In general, the proportional fair
metric is given by the ratio between instantaneously supportable and long-term
throughput of a certain UE [VTL02], i.e., Gj,b,Kb (t) can be determined by
Gj,b,Kb (t) =

Rj,b,Kb (t)
Tj (t)

(5.17)

with Rj,b,Kb (t) as the instantaneous supportable throughput and as the fairness factor, which determines the trade-o between eciency in terms of total
throughput and fairness. Furthermore, Tj (t) denotes the long-term average
throughput given by
j
/ Ktotal (t)
Tj (t)
Tj (t + 1) =
,
(5.18)

Tj (t) + (1 ) Rj (t) j Ktotal (t)


j (t) denote the
where denotes the forgetting factor and Ktotal (t) as well as R
set of all scheduled UEs at time interval t and the aggregated throughput of
the scheduled UE j, respectively. The instantaneous supportable throughput
Rj,b,Kb (t) may be estimated by means of the Shannon capacity formula


Rj,b,Kb (t) = log2 1 + j,b,Kb ,
(5.19)

5.2 Uplink Joint Scheduling and Cooperative Interference Prediction

59

with j,b,Kb as the uplink signal-to-interference-and-noise ratio (SINR) of the


UE j on the b-th radio resource. Let us assume in the sequel that all BSs are
equipped with Nbs antenna elements whereas all UEs have only a single antenna
element (i.e., Nue = 1). Then, the uplink SINR j,b,Kb can be expressed by
j,b,Kb =

wH
j,b

H
Pj,b wH
j,b hj,b hj,b wj,b


!
,
2I w
E ij,b,Kb iH
+

j,b
b
j,b,K

(5.20)

with Pj,b as the transmit power of UE j for the b-th radio resource, hj,b C[Nbs 1]
as the channel vector from the j-th UE to its serving BS, wj,b C[Nbs 1] as the
corresponding weight vector for coherent detection, ij,b,Kb C[Nbs 1] as inter b and 2 as thermal noise variance.
cell interference caused by the set of UEs K
Based on the exchanged multi-cell CSI, the CSU is able to predict the interference
[Nbs Nbs ]
in (5.20), which is given by
covariance matrix ii = E{ij,b,Kb iH
} C
j,b,K
b

ii =

Pq,b hq,j,b hH
q,j,b ,

(5.21)

b
q K

where Pq,b and hq,j,b C[Nbs 1] denote the transmit power of UE q for the b-th
radio resource and the channel vector from the q-th UE to the serving BS of
UE j on the considered radio resource b, respectively. Clearly, ii in (5.21) contains both the inter-cell interference level caused by the already scheduled UEs
associated to the cooperating BSs as well as the one that will be generated by
assigning the k-th UE to the considered radio resource. As a result, the joint
scheduling priorities in (5.15) reect the weighted sum-throughput taking the
current inter-cell interference situation into account. This consequently leads to
an interference-aware joint scheduling, aiming at reducing the inter-cell interference within the given cooperation cluster while still taking channel-dependent
scheduling as well as user fairness into account.
Having determined the joint scheduling priorities in (5.15) for all UEs associated to a certain BS, the central scheduler generally aims at maximizing the
priority for each radio resource. The complexity of the resource allocation process depends on the used access scheme. In case of single carrier frequency
domain multiple access (SC-FDMA), for example, which is used in the 3GPP
LTE uplink, the allocated radio resources of each UE have to be either adjacent
or evenly spaced in frequency in order to achieve a low peak-to-average power
ratio (PAPR) [MLG06]. However, this leads to a signicantly reduced allocation
exibility and a higher complexity. To overcome this problem, a resource allocation algorithm presented in [CRA+ 08] may be applied after determining the joint
scheduling priorities in (5.15). The basic idea of this algorithm is that adjacent
radio resources are assigned to a certain UE until either a dierent UE has a
higher scheduling priority or the maximum transmit power is reached. This way,
the allocation constraints due to SC-FDMA can be met, while still exploiting
the multi-user diversity and the frequency selectivity of the uplink channel.

120
110
100
90
80
70
60
50
40
30
20
10
0

+102%

interference coord.
joint scheduling

+71%

+37%
+25%
+14%
+3%
30
60
100
bandwidth occupancy [%]

(a) Gain in average spectral eciency.

cell-edge throughput gain [%]

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

spectral eciency gain [%]

60

120
110
100
90
80
70
60
50
40
30
20
10
0

+105%

+67%
+58%
+32%

+26%

+21%

30
60
100
bandwidth occupancy [%]
(b) Gain in cell-edge throughput.

Figure 5.10 Relative uplink performance gains of the presented joint scheduling
scheme as well as of a dynamic interference coordination scheme compared to a 3GPP
LTE Release 8 system with 500 m inter-site-distance and six cooperating cells per BS.

Finally, after completing the resource allocation of all cooperating BSs, the
link adaptation selects for each UE the spectrally most ecient modulation and
coding scheme (MCS) that can be supported by its current uplink channel without exceeding a given target block error rate (BLER). To this end, the corresponding SINR is estimated by evaluating the available multi-cell CSI, resulting
in a more accurate link adaptation. This is because the knowledge of which UEs
are scheduled in the cooperating cells together with the available multi-cell CSI
facilitate an accurate prediction of the interference situation that will occur during the actual (future) data transmission. Especially in the uplink, this may lead
to signicant additional performance gains since the interference situation there
is usually rather volatile. This is because from one transmit time interval (TTI)
to the other completely dierent sets of UEs may be scheduled in nearby cells.
An example for the achievable uplink performance of the presented joint
scheduling scheme for dierent bandwidth utilizations is depicted in Fig. 5.10,
where the relative gains compared to an LTE Release 8 system in terms of average spectral eciency as well as cell-edge throughput are shown. The detailed
simulation assumptions, parameter settings as well as further results will be introduced later in Section 14.3. In order to achieve a certain bandwidth occupancy,
the scheduling is performed until the intended degree of bandwidth utilization is
reached. In addition to the joint scheduling results, Fig. 5.10 shows for comparison also the performance of a state-of-the-art dynamic interference coordination
scheme based on high interference indicator signaling [3GP07c, FMDS10]. First
of all, it can be seen that the achievable performance is heavily dependent on the
bandwidth occupancy. The gains increase with decreasing bandwidth occupancy,
which indicates that the exibility in assigning radio resources to the various UEs
is considerably increased at a low bandwidth occupancy. As a result, severe inter-

5.2 Uplink Joint Scheduling and Cooperative Interference Prediction

61

cell interference situations can be avoided by exploiting the whole bandwidth, i.e.
preventing the UEs associated to the cooperating BSs from being allocated to the
same radio resources. Furthermore, it is shown in Fig. 5.10 that the joint scheduling scheme outperforms the dynamic interference coordination scheme due to the
higher exibility in jointly allocating radio resources to the various UEs, which
consequently leads to an improved avoidance of severe inter-cell interference. The
better system performance, however, comes at the cost of an increased backhaul
load due to the required exchange of multi-cell CSI [FMDS10].

5.2.2

Cooperative Interference Prediction


Interference-aware joint scheduling as considered in Section 5.2.1 generally causes
a relatively high backhaul load since multi-cell CSI of all cooperating BSs has to
be signaled to the CSU. Especially in case of lack of fast optical ber links, this
may represent a major barrier for implementing such an approach in practical
systems in the short-term. In the following, we therefore present a somewhat
more lightweight yet ecient scheme, where the scheduling is performed independently by all BSs as in conventional networks, but then the interference that
will occur during the associated data transmission is predicted in a cooperative
manner for improving the link adaptation process. To this end, rst of all the
basic principle of conventional link adaptation in general and a related problem
are reviewed, after which the actual interference prediction scheme is explained
in detail.
Review and Problem of Conventional Link Adaptation
An indispensable prerequisite for the ecient application of fast link adaptation
schemes in cellular networks is the availability of accurate estimates of the current
channel quality of the link between the UE for which the adaptation should be
done and its associated serving BS. In the uplink, such estimates generally can
be readily determined by a BS by evaluating the reference signals that have to be
transmitted by the corresponding UEs anyway for facilitating coherent detection
and channel-aware scheduling.
Having estimates of the current channel quality, a BS can determine for each
scheduled UE the spectrally most ecient MCS for which a given target BLER is
not exceeded. Afterwards, the selected MCS is signaled as part of the scheduling
grant to the corresponding UE and the actual data transmission then typically
starts slightly later so that the UEs have enough time to prepare the respective
transport blocks for transmission. Due to the inherent delay between the time
when link adaptation is performed and the actual data transmission, link adaptation often becomes inaccurate. This is because the channel conditions during the
actual data transmission might dier substantially from the channel conditions
during the time when the link adaptation was performed. On one hand, this is
because the involved channels naturally change during that time, but at least for

62

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

cells = sectors

backhaul
link
scheduling
decisions

site with
3 base stations

Figure 5.11 Illustration of cooperation between dierent BSs in case of cooperative


interference prediction.

low to moderate user speeds, the impact of this eect should be only marginal.
On the other hand and more importantly, the interference situation may have
completely changed, since from one TTI to the other completely dierent sets
of users may be scheduled in nearby cells. As a consequence, the selected MCSs
are often over- or underestimated, thus leading to a very high BLER or a rather
low spectral eciency, respectively. Without any appropriate countermeasures
as proposed in the following, the performance therefore would be often degraded
compared to the idealized case with perfect link adaptation based on the channel
conditions during the actual data transmission.
Proposed Interference Prediction Scheme
The fundamental idea of the considered interference prediction scheme is to perform the link adaptation not based upon the currently estimated SINR values,
but rather based upon predicted SINR values likely to occur during the associated
future data transmissions. For that purpose, it is necessary that a BS can accurately predict the interference level that it will experience during such future data
transmissions already a couple of TTIs in advance. This may be accomplished by
means of cooperation between dierent BSs as illustrated in Fig. 5.11. First of all,
every BS performs conventional scheduling and power control, i.e. it determines
which UEs should transmit on which radio resources and at which power levels.
If the employed scheduling algorithm is channel-awarewhich is the case for a
proportional-fair scheduler, for examplethe corresponding scheduling metrics
are calculated as in conventional systems, taking into account only the currently
observed channel and interference conditions, respectively.

5.2 Uplink Joint Scheduling and Cooperative Interference Prediction

63

Afterwards, every BS exchanges the resource allocation tables that have been
xed during the scheduling process with a certain set of cooperating BSs via a
fast backhaul network. For the case of a 3GPP LTE system, for example, this
could be realized via the X2-interface [HT09]. Note that low-latency backhaul
links are a crucial prerequisite for the proposed approach since an additional
delay is introduced by exchanging and processing the scheduling information
as well as by performing the actual prediction of the interference. Without a
fast data exchange the overall latency may increase, resulting in a performance
degradation compared to the idealized case without any additional delay [MF10].
Provided that the various BSs have reasonably accurate CSI not only of the
channels from the UEs located in their own cell, but also from those associated
with any of their cooperating BSs, they can eventually accurately predict the
interference level that will be generated by these UEs when the actual data
transmission takes place. If, for example, the channel from the k-th interfering UE
to the various antenna elements of a particular BS sector m is denoted by hk
C[Nbs 1] , the expected contribution of this interferer to the overall interference
covariance matrix simply would be given by
ii,k = Pk hk hH
k ,

(5.22)

where Pk is the transmit power associated with UE k. The predicted interference is then used as an input to the link adaptation stage, and afterwards the
corresponding scheduling grants (including the assigned MCSs) are signaled to
all scheduled UEs, which nally transmit their data a couple of TTIs after the
reception of these grants. However, note that the scheduling decisions themselves
are not updated based on the predicted interference levels, since otherwise the
actual future interference situation would change again. Hence, in that case some
iterative procedure would be necessary, thus leading to an increased complexity
and backhaul load as well as a higher latency. An example for how the accuracy
of the link adaptation can be improved with the proposed approach is depicted
in Fig. 5.12, where the simulated distribution for certain deviations between the
ideal and the used MCSs are shown for the cases with and without interference
prediction. In this regard, the BSs may choose between several dierent MCSs
according to [3GP09f]. Furthermore, it is assumed that in case of interference prediction each BS always receives scheduling information with a delay of two TTIs
from its six cooperating sectors. It can be seen that with interference prediction
the probability that the ideal MCS is selected is almost twice as high as for the
case without interference prediction and also the variance of the deviations from
the ideal MCS can be considerably reduced. Note that further simulation results
can be either found in Section 14.3 or in [MF10]. Furthermore, the simulation
assumptions and parameter settings used for generating the results in Fig. 5.12
are the same as the ones that will be used later in Section 14.3.
Clearly, the performance of the approach strongly depends on the number of
cooperating BSs. While a BS generally should be able to predict the interfer-

64

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

ence rather accurately with a large number of cooperation partners, it would


frequently underestimate the actual interference level if it cooperates with very
few other BSs only. This is because with the basic scheme as described above no
interference from non-cooperating cells is taken into account. In real-world scenarios, however, the set of cooperating BSs is in most cases very likely restricted
to nearby neighbors onlyon the one hand in order to keep the backhaul load
limited and on the other hand because it is unrealistic that a BS may accurately estimate the channels from all UEs within a large number of cooperating
cells, as pointed out in Section 9.1. Therefore, it is essential that the impact of
the interference caused by UEs in non-cooperating cells is somehow taken into
account as well. An ecient way to do that is to employ an additional outer
loop link adaptation scheme, similar to the one presented in [NAV02]. With this
scheme, always a UE-specic oset oset is added to the predicted SINR values in dB before performing the actual link adaptation, which is permanently
adjusted based on the outcome of previous transmission attempts. In particular, if an (initial) transmission attempt is successful, oset is increased by up
whereas otherwise it is decreased by down . If these two step sizes up and down
are chosen as in [NAV02] such that
'
*
1
down |dB =
1 up |dB ,
(5.23)
BLERtarget
it is eventually possible to adjust the link adaptation in such a way that the
obtained average BLER always corresponds to the congured target BLERtarget .
In the stationary case, i.e., after a suciently long warm-up phase, the average
interference originating from non-cooperating cells is thus implicitly included in
the outer-loop link adaptation oset oset .

5.2.3

Practical Considerations
A crucial prerequisite for both BS cooperation schemes presented in this section
is that accurate multi-cell CSI is obtained. For that reason, it is necessary that
the reference signals transmitted by dierent UEs within a certain cooperation
cluster can be separated again at the BS side, for example through orthogonal
reference signals as in Section 9.1. In any case, all BSs have to be aware of the
reference signals assigned to the various UEs. This consequently requires further signaling between cooperating BSs in addition to the necessary information
exchange via the backhaul network already outlined before. However, note that
this usually does not have to be done during every TTI since the utilized reference signals and hopping patterns are normally assigned in a semi-persistent
manner. Therefore, this additional backhaul load is expected to be comparatively
small.
In case of cooperative interference prediction as introduced in Section 5.2.2, it
is quite obvious that the requirements on the accuracy of the multi-cell channel
estimation between a BS and UEs located in other cells are generally much lower

65

5.2 Uplink Joint Scheduling and Cooperative Interference Prediction

Probability [%]

25
Selected MCS deviates from ideal MCS
Selected MCS corresponds to ideal MCS

20
15

without interference
prediction

10
5
0
10

6
4
2
0
2
4
6
Deviation between selected and ideal MCS [MCS indices]

10

Probability [%]

25
20
15
with interference
prediction

10
5
0
10

6
4
2
0
2
4
6
Deviation between selected and ideal MCS [MCS indices]

10

Figure 5.12 Exemplary illustration of the improved link adaptation accuracy with

interference prediction for the uplink of a 3GPP LTE Release 8 system with 500 m
c 2010 IEEE.
inter-site-distance and six cooperating cells per BS [MF10]. 

than those for the estimation of the desired link between a certain UE and its
serving BS. On one hand, this is because estimation errors made for dierent
interfering channels may compensate each otherparticularly if the number of
cooperating BSs is relatively highand on the other hand because it may be
already sucient for achieving a good performance to know whether on a certain
radio resource very high or very low interference has to be expected, whereas the
exact gures are only of secondary importance. In addition, if the channel from a
certain UE in one of the cooperating cells cannot be estimated reliably since it is
in a deep fade, this should also not represent a major problem since in such a case
this UE would cause only low interference anyway. By contrast, the requirements
on the accuracy of the multi-cell channel estimation are more stringent in the
case of interference-aware scheduling. This is due to the fact that the resource
allocation decisions heavily depend on the predicted inter-cell interference level
caused by single UEs, for which reason a high deviation between the predicted
and the actual interference levels during a data transmission would lead to rather
inaccurate resource allocation decisions.
Another prerequisite for the proposed schemes is that cooperating BSs can
quickly exchange the required information, such as the multi-cell CSI or scheduling tables, via a fast backhaul network. However, it is quite clear that even if
cooperating BSs are interconnected by means of direct optical ber links, in
general an additional delay is introduced because some time is always required
for the processing of the exchanged information. As a consequence, the overall
latency increases and the performance may degrade to some extent compared
to the idealized case without any additional delay. This is due to an increased
mismatch between the channels used as the basis for the scheduling and link

66

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

adaptation stages and those during the actual data transmission. Besides, the
increased delay between scheduling and actual data transmission clearly also
aects potential hybrid automatic repeat request (HARQ) retransmissions. In
the 3GPP LTE uplink, for example, a synchronous HARQ protocol is used. If
one of the previously discussed schemes is to be introduced here, it might therefore be necessary to switch to an asynchronous HARQ or adjust HARQ timing.
Finally, it goes without saying that in case of joint scheduling the generated
backhaul load generally should be much higher than for cooperative interference
prediction, particularly due to the required exchange of multi-cell CSI in that
case. As a concrete example, it will be shown later in Section 14.3 for a particular scenario that the average backhaul load per site can be reduced from
251 Mbit/s in the case of joint scheduling to 26 Mbit/s for cooperative interference prediction. However, it should also be noted that even for joint scheduling,
the backhaul capacity requirements are still considerably smaller than those for
joint signal processing CoMP schemes, which will be introduced in Chapter 6.

5.2.4

Applicability of Both Schemes to the Downlink


Although the basic principles of the two dierent BS cooperation schemes presented in the previous sections have been explained focusing on the cellular
uplink, they may be readily applied to the downlink as well. In the downlink,
one generally has to distinguish between cases where channel reciprocity may
be exploited to obtain accurate CSI at the BS sideas it might be the case
for time division duplex (TDD) systems, for exampleand cases where channel
reciprocity cannot be exploited, so that appropriate CSI has to be fed back to
the various BSs from the corresponding UEs, as addressed in Section 9.2.
In the rst case, the presented joint scheduling scheme basically might be
applied in exactly the same manner to the downlink as described before for
the uplink. By contrast, the downlink realization of the interference prediction
scheme in this case would imply that the cooperating BSs would not only have
to exchange their resource allocation tables (including information about the
applied precoders, if applicable), but also the multi-cell CSI of all UEs located
within the cooperation cluster. This might be realized in a two-step approach in
order to reduce the backhaul load, so that a cooperating BS rst of all obtains
the resource allocation table of the considered BS and then signals the multi-cell
CSI, the transmit power levels and the used precoders back, but only for the
UEs which are actually scheduled by the considered BS. With this information,
each BS is then able to accurately predict the interference and hence improve
the link adaptation process.
In case that no explicit CSI is available at the BS side due to lack of channel
reciprocity, the UEs would have to periodically send feedback information back
to their serving BS. Depending on the applied feedback concept, this feedback
information consists of either a recommendation of the transmission parameters

5.2 Uplink Joint Scheduling and Cooperative Interference Prediction

67

to be used or information about the estimated downlink channel. The latter


feedback alternative is a crucial prerequisite for performing joint scheduling,
since the CSU has to be aware of the downlink channels of the UEs in order
to determine the joint scheduling priorities. Clearly, the CSU not only requires
the downlink channels between the UEs and their serving BSs, but also those
between any of their cooperating BSs, thus the UEs have to acquire those additional interfering channels by performing a multi-cell channel estimation. This
information may then be reported either to the serving BSs or directly to the
CSU in order to avoid an additional delay and further backhaul signaling. By
contrast, the interference prediction scheme may be used in conjunction with
both feedback alternatives. In case that CSI feedback is sent to the BSs, the
above described two-step approach may be employed for the downlink realization of this scheme. However, if the UEs instead report a recommendation of the
transmission parameters only, then the BSs would not only notify the scheduled
UEs on which resources data will be transmitted to them, but also signal relevant
information about what the main interfering BS will do on these resources. This
could include but is not limited to information about the precoders that will
be used in the cooperating (=interfering) sectors and the assigned power levels.
Having CSI of the interfering BS by performing a multi-cell channel estimation,
the UE can then take this information into account and perform a more reliable
prediction of the interference situation during the actual data transmission, as
done in Section 5.1. Based on the predicted interference, they can determine
appropriate transmission parameters again, such as the MCS, for example, and
signal these parameters to their serving BS, which then uses these parameters
for the actual data transmission.

5.2.5

Summary
Two novel uplink CoMP schemes have been presented where dierent BSs cooperate with each other via a backhaul network in order to mitigate the eects
of inter-cell interference. While the interference-aware joint scheduling scheme
coordinates the allocation of radio resources to the various UEs by means of
periodically exchanged multi-cell CSI between the cooperating BSs and a central scheduling unit, the cooperative interference prediction scheme is a more
lightweight but yet ecient approach with reduced backhaul load requirements.
The latter scheme only requires the exchange of scheduling information between
a set of cooperating BSs to predict the inter-cell interference level that will
occur during a future data transmission for improving the link adaptation process. Since both proposed schemes are transparent to the UEs and cause only
a minor to moderate backhaul load, they represent very attractive options for
future LTE-A systems.

68

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

5.3

Downlink Coordinated Beamforming


Chan-Byoung Chae, Ramya Bhagavatula, Doru Calin
and Robert W. Heath Jr
In this section, we consider downlink interference coordination schemes where
base stations (BSs) exchange channel state information (CSI) in order to adjust
their transmission strategies, so that the generated extent of inter-cell interference is reduced. Such schemes, known as coordinated beamforming in the 3GPP
LTE-A literature, oer a fair balance between ensuring a reasonable load on
the backhaul links and attaining the performance gains using cooperation. The
shared CSI is used by BSs to design individual precoding matrices (or beamforming vectors for single-stream transmission) to transmit exclusively to users
within their own cell [EC05, ZHKA09, LJSL09]. Consequently, this is also known
as distributed beamforming. CSI exchange over the backhaul has been shown to
use a small backhaul bandwidth as compared to data exchange, for moderate
Doppler spreads [SH09], which is also shown in Subsection 12.2.2.

5.3.1

Introduction
There are several distributed approaches for coordinated beamforming in the
literature. For example, [EC05] proposes an iterative algorithm to minimize
transmit power, which does not necessarily maximize sum-rates. The authors
of [LJSL09] propose a non-iterative distributed solution to design precoding
matrices for multi-cell systems, which will maximize the sum-rates for only a
two-cell system at high signal-to-noise ratio (SNR), using a per base station
power constraint. Another important partial cooperation-based transmit strategy is inter-cell interference nulling (ICIN) [ZA10, JLD08, LKL09, BH10], in
which each BS transmits in the null-space of the interference it is causing to
neighboring cells.
The performance of a cooperative transmission strategy is highly dependent on the quality of the CSI fed back by the users. Most of the literature
on multi-cell cooperation assumes that full CSI is available at the transmitters [SSPS09c, SZ01, JTS+ 08a, EC05, LJSL09]. The impact of imperfect CSI
was considered in [MF09a, GMF10a, PTW10]. In [MF09a], the authors consider
imperfect CSI at the BS due to limited feedback or estimation errors and show
that the performance gains from BS cooperation can be obtained even when
CSI is imperfect. Noisy CSI estimates were considered in [PTW10], where the
objective was to maximize the performance gains that can be obtained using the
worst-case CSI perturbations.
Since quantization and feedback is a major source of imperfect CSI, it is
important to consider CSI quantization in multi-cell cooperative systems. Limited feedback for multi-cell systems is a topic of ongoing research [BH10]. Unfortunately, results from the well-investigated single-cell limited feedback are not

5.3 Downlink Coordinated Beamforming

69

hlk

Backhaul
with delay

hnk

l
h
k
hkk

n
h
k
CSI feedback
with delay


k, h
n, h
l
h
k
k
k

Figure 5.13 CSI feedback and backhauling concept considered for inter-cell

interference nulling (ICIN).

directly applicable to the multi-cell scenario. While the CSI of only one channel
is fed back in the single-cell case, cooperative strategies require feedback of CSI
from multiple BSs using the same feedback link. Further, in single-cell transmission, quantized CSI reaches the BS after experiencing a delay in the feedback
channel [ZHKA09]. In the multi-cell cooperative framework, however, quantized
CSI is subject to an additional source of delay in the backhaul link. The impact
of delayed CSI on the performance of non-cooperative systems [ZHKA09] has
been investigated extensively. The eect of delayed limited feedback on the performance of cooperative systems has received comparatively less attention.
In this section, two dierent cases are considered: i) a single receive antenna
and ii) multiple receive antennas at the user equipment (UE). For the single receive antenna case, the BSs need to optimize their precoding matrices/beamforming vectors to maximize the sum-rate under given constraints but
for a multiple receive antenna case, the precoding/postcoding matrices/vectors
should be jointly optimized. In Subsection 5.3.2, we describe ICIN, a lowcomplexity and non-iterative partial cooperative strategy that uses explicit perbase power constraints and yields reasonable gains in the sum-rate, while resulting in a small burden over the backhaul link. Note that ICIN requires that the
total number of antennas per BS be larger than the number of single-antenna terminals considered in one transmission, an aspect we will discuss in detail later.
We also describe some limited CSI feedback algorithms for ICIN. In Subsection 5.3.3, we further extend the cooperative strategies to the multiple antenna
cases and show performance results.

70

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

5.3.2

Single Receive Antenna at the Terminal


We now briey describe ICIN for the setup in Fig. 5.13 for an M cell system.
We assume that on a particular observed resource, the BS in each cell serves a
single active user. The received signal power of the desired and interfering signals
is a function of the users location in the cell. A similar approach was adopted
in [JTS+ 08a, SSBN+ 06]. Each user is assumed to face interference from M 1
neighboring BSs. We index the users in each cell by the BS they obtain their
desired signal from, i.e. the k-th BS services the k-th user, for k = 1, . . . , K. Note
that by assuming that there is a single user in each cell on each resource, we x
K = M . We assume that all BSs are equipped with Nbs antennas, while each
user supports a single receive antenna.
As dened in Chapter 3, the channel between BS m and UE k is denoted
Nbs 1
by hm
. The symbol to be transmitted to the k-th user is denoted by
k C
xk , where the transmit power is E{xk xH
k } = Es , implying a per-base station
power constraint. The channels are subject to large-scale fading, which includes
distance-dependent path-loss and shadowing eects, and small-scale fading. After
averaging over the small-scale fading eects, the received SNR of the desired
signal is denoted by k . The interfering signal SNR from a BS m to UE k
m
is given by m
k k , where k [0, 1] (i.e. the interfering signal strength can
at most be equal to that of the desired signal). A similar parameter is used
in [JTS+ 08a, SSBN+ 06] to model the SNR of the interfering signal with respect
to the received signal. Note that k and m
k are independent of the beamforming
vectors. Observing the transmission of a single frequency-at sub-carrier of an
orthogonal frequency division multiple access (OFDMA) system, and assuming
that the channels remain constant over the codeword transmission, the signalto-interference-and-noise ratio (SINR) of the k-th user is given by
 
2
H


k  hkk wk 
SINRk =
,
(5.24)

2

 j  j H
k k  hk
wj  + 1
j=k

where wk denotes the beamforming vector employed at BS k to transmit to


UE k, which is normalized to have unit-norm. We assume that the BSs have
perfect knowledge of the involved SNR terms k and m
k .
When full and instantaneous CSI is available at all BSs, the k-th BS has
instantaneous knowledge of not only its own desired channel to UE k, hkk , but also
of the interference caused to neighboring cells, i.e. hkj , j = 1, . . . , K, j = k. The
k-th BS then computes the beamforming vector, wk , as [ZA10, JLD08, LKL09]
.
/
k, . . . , h
k ,
(5.25)
wk = ak , where Ak = [a1 . . . aK ] = h
1
K
m = hm /"hm " denotes the normalwhere (.) refers to the pseudo-inverse and h
k
k
k
ized channel between UE k and BS m. Since A is full-rank with high probability if

71

5.3 Downlink Coordinated Beamforming

Nbs K, (5.25) ensures perfect interference nulling in most cases, i.e. hjk wj = 0,
for k, j = 1, . . . , K, j = k. Note that while ICIN is a simple, non-iterative and
distributed coordinated beamforming strategy, it suers from the dimensionality
constraints imposed from computing the pseudo-inverse, i.e. Nbs K. For the
more practical case where the number of users in the system is greater than Nbs ,
scheduling can be employed to enforce Nbs K. This implies that there exists a
trade-o between increasing the number of users for simultaneous transmission
in the cells and perfect interference cancelation. Clustering can also be employed
to group cells into clusters of size Nbs each and use intra-cell time division
multiple access (TDMA) or a comparable orthogonal transmission strategy, to
make sure that Nbs K.
Limited Feedback
Practical feedback channels are bandwidth-limited and have delays associated
with them. Hence, it is important to investigate the performance of ICIN with
m [t] are quantized to
delayed limited feedback [BH10]. The channel directions h
k
m

the unit-norm vectors given by hk [t] at the k-th user, where we now introduce
variable t to capture the time instant, for example a transmit time interval
(TTI). We assume that each user can utilize Btot bits for feedback, and that
k [n] and h
j [t], j = k respectively, where
Bk and Bkj bits are used to quantize h
k
k

j
k [t] to h
k [t] and
Bk + j=k Bk = Btot . The delay associated with quantizing h
k
k
feeding back the latter to the k-th BS is denoted by Dk . The k-th user also
j [t], j = k to h
j [t], j = k and feeds back
quantizes the interfering channels, h
k
k
the latter to the k-th BS, which then forwards this information to the j-th BS
over the backhaul link, incurring an overall delay of Dkj . The limited feedback
model is also shown in Fig. 5.13.
k [t Dk ],
k [t Dk ] and h
At the time instant t, the k-th BS has knowledge of h
k

for all j = k. The beamforming vector at the t-th time instant, wk [t], is designed
using the delayed and quantized CSI of the desired channels and the interference
caused to other cells [BH10]
wk [t] =ak where
.
k [tD k ]..h
k
A= h
1

(5.26)
/
k
k
k
k
k
k
k1 [tDk1 ], hk [tDk ], hk+1 [tDk+1 ]..hK [tDK ] .

When Nbs K, the beamforming vector lies in the Nbs (K 1) dimensional


null-space of the K 1 interfering channels. Hence, when Nbs = K, wk [t] will
k . This implies that if we
lie in a one-dimensional sub-space, independent of h
k
have Nbs = K, it is not necessary to feedback the quantized desired channel
k is
back to the BS, i.e. Bk = 0. In contrast, when Nbs > K, knowledge on h
k
desirable to determine the best wk [t] in the Nbs (K 1) dimensional subspace. By assuming that hkk [t] and hkj [t], j = k are constant throughout the
codeword transmission, the current and delayed CSI are related by the Gauss-

72

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

Markov block fading autoregressive model [TJMW01]


0
hkk [t] = k hkk [t Dk ] + 1 k2 ehkk [t], and
0
hkj [t] = jk hkj [t Djk ] + 1 (jk )2 ehkj [t],

(5.27)
(5.28)

where ehkk [t] and ehkj [t] denote the channel knowledge uncertainties, which are

uncorrelated with hkk [t Dk ] and hkj [t Djk ], respectively. The entries of ehkk [t]
and ehkj [t] are distributed by NC (0, 1). The correlation coecients for the desired

and interfering channels are denoted by k and jk , respectively. Clarkes autocorrelation model is used to determine k and jk as [ZHKA09]
k = J0 (2Dk fd Ts ), and jk = J0 (2Djk fd Ts ),

(5.29)

where J0 is the zeroth order Bessel function of the rst kind, fd is the Doppler
spread and Ts is the symbol duration. The Doppler spread is given as fd = fc /c,
where is the relative velocity of the transmitter-receiver pair, fc the carrier
frequency, and c the speed of light. The mean loss in sum-rate due to delayed
limited feedback is bounded in [BH10], as a function of delays, signal strengths,
and can be minimized choosing Bk and Bkj as per Theorems 5.1-5.3.
Theorem 5.1. Given the total number of bits allocated to quantize all the channels seen by one UE, Btot , the optimum number of bits assigned to the desired
channel at the k-th user, Bk , at low SNR is given by

1
1
Nbs
(Nbs 1)|K|
Btot

log2 k
Bk =
(jk (kj )2 ) |K| ,
|K| + 1
|K| + 1
Nbs 1
jK

for Nbs > K, and Bk = 0 for Nbs = K. The optimum number of bits assigned
to all the interfering channels at the k-th user is computed as Bk,int = Btot
Bk [BH10].
Theorem 5.2. Given the total number of bits allocated to quantize all the channels seen by one UE, Btot , the optimum number of bits assigned to the desired
channel at the k-th user, Bk , at high SNR is given by
**
'
'
Nbs
.
Bk = (Nbs 1) log2 (|K| 1)
Nbs 1
The optimum number of bits assigned to all the interfering channels at the k-th
user, Bk,int = Btot Bk [BH10].
Theorem 5.3. The optimum number of bits invested by UE k into the quantization of the channel to the j-th interfering BS, Bkj is given by
2
4
jk (kj )2
Bk,int
j
+ (Nbs 1) log2 3
Bk =
,
1
|K|
(j ( j )2 ) |K|
jK

5.3 Downlink Coordinated Beamforming

1.4
rs

mean data rate [bit/s/Hz/cell]

1.3
bc

equal bit allocation


proposed adaptive partitioning

73

bc

7,0,14,14,0,0,0
bc

6,0,11,11,0,0,0

1.2
1.1
bc

7,0,7,7,0,0,0
1.0
rs
bc

0.9

rs

0.8
0.7

rs

6,0,4,4,0,0,0
bc

7,0,0,0,0,0,0
rs

rs

0.6
7

14
21
28
number of total feedback bits per UE Btot

35

Figure 5.14 Comparison of the mean data-rate at the cell-edge for dierent values of

Btot . Bit assignments are shown corresponding to each Btot .


j
j
for
/ K, where K is the largest set of
jK, j=k Bk = Bk,int and Bk = 0 for j
interferers that satises [BH10]
23
1 4
j
j 2 |K|
Btot
jK (k (k ) )
<
.
log2
j
j 2
|K|(N
(k (k ) )
bs 1)
For numerical evaluation, we consider a seven-cell system, i.e. M = K = 7.
Each BS has eight antennas (Nbs = 8) and each user has a single antenna. The
system setup is based on the urban micro-cell propagation scenario in the 3GPP
spatial channel model (SCM). The inter-site distance (ISD) is assumed to be
800 m. The pathloss between the BSs and the UE is modeled using the COST
231 Walsh-Ikegami non line-of-sight (NLOS) model, adopted for urban microcells. Using a carrier frequency of 1.9 GHz, BS and UE heights of 12.5 m and
1.5 m, respectively, a building height of 12 m, building to building distance 50 m
and street width 25 m, the path-loss in dB from BS m to a UE k at a height of
1.5 m is given as
m
P Lm
k [dB] = 34.53 + 38 log10 (dk ),

(5.30)

where dm
k denotes the distance from UE k to BS m. The transmit power is Es =
33 dBm for all BSs, and the noise power is given by 114 dBm. We also model
the delay associated with the feedback and backhaul links to be one and two
frames, respectively. Note that the m
k parameters are obtained as a dierence
of the path-losses from the desired and interfering BSs. For example, m
k [dB] =
[dB].
P Lkk [dB] P Lm
k
It is seen from Fig. 5.14 that while the limited feedback technique in this
section outperforms equal-bit allocation for all Btot , the improvement in data

74

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

rate is about 40 % at Btot = 35. At Btot = 7, the desired channel is given all the
7 bits, while at Btot = 35, the two strong interfering channels are assigned 14
bits. Equal-bit allocation, in contrast, sees an increase in the feedback bits for the
strong interfering channels from 1 to 5 bits per channel. Quantizing the strong
interferers more nely at the cost of allocating zero bits to the weak channels
leads to the signicant improvement in data rates using the proposed algorithm.

5.3.3

Multiple Receive Antennas at the Terminal


So far, we have assumed a single receive antenna at each UE, thus only transmit
beamforming vectors need to be designed. In this section, we consider multiple receive antennas at the UE, which gives more degrees of freedom to jointly
design beamforming vectors. Before explaining multi-cell coordinated beamforming with multiple receive antennas, let us rst study a jointly optimized beamforming algorithm for the multiple-input multiple-output (MIMO) broadcast
channel (BC) [GKH+ 07]. It will help us understand multi-cell coordinated beamforming with multiple antennas at the UE side, which will be introduced later.
Consider a multi-user MIMO (MU-MIMO) system with one BS with Nbs
antennas, and Nue receive antennas for each of the K users. The channel between
the BS and UE k is represented by an Nbs Nue matrix Hk . Let xk be the symbol
to be transmitted to the k-th UE, assuming only one stream per UE, and nk be
the additive white Gaussian noise vector of size Nue 1 seen by UE k. Let wk
and gk be the transmit beamforming vector and the receive combining vector
for the k-th UE, respectively. Then, the signal at the k-th UE after equalization
is given by

H H
P g k Hk
wj xj + gkH nk
(5.31)
x
k = P gkH HH
k wk xk +
j=k

where, P is the transmit power. The transmit beamforming vector is chosen in


the null-space of gjH Hj (j = k), that is gjH HH
j wk = 0 (the zero inter-user interference constraint). Assume that the receive beamforming vectors are maximum
ratio combining (MRC) lters, given by gk = HH
k wk . This is a reasonable design
(though not necessarily the only one and not the optimal one under residual, spatially colored interference) since it achieves the sum-rate very close to capacity
under the zero inter-user interference constraint [CMJH08]. In fact, the achievable sum-rate can be slightly enhanced especially in the low SNR regime by using
minimum mean square error (MMSE)-type beamforming/combining, as investigated in Section 10.2. This method requires, however, knowing all users noise
variances at the BS, requiring additional feedback. For simplicity, we focus, in
this section, on the zero inter-user interference.
To nd the transmit beamforming and the receive combining vectors, a simple
iterative algorithm was proposed [CMIH08] based on the assumption of MRC
at the UE, i.e. where transmit and receive lters are tied to each other. For an
exemplary case of 2 users (K = 2), the algorithm may be summarized as such:

5.3 Downlink Coordinated Beamforming

75

the two transmit beamforming vectors (wk , where k = 1 or 2) are initialized to


some random vectors or the principal singular vectors of the channel matrices H1
and H2 . Then, the following operations are repeated with increasing i (iteration
index) until a stopping criterion is satised:
*T '
*T /
H
H
. '
T
(i)
(i)
H
(i) =
w
H
w1
H1 HH
H
H
,
2 2
1
2
(5.32)

1
(i+1)
(i)

= H
,
W
(i)

(i)

(i)

where W(i) = [w1 w2 ], and wk is the transmit beamforming column-vector


for the k-th UE at the i-th iteration, without normalization. The BS repeats this
(i)
(i)
(i1)
procedure until the change in wk is suciently small i.e., "wk wk " < :
where : is an arbitrary small number.
The convergence of the iterative update algorithm is not guaranteed. Nevertheless, it typically converges with a small : in almost all trial cases with
more than 20 iterations for two transmit antenna systems [CMIH08]. This algorithm, however, may also aect the systems stability because, at times, the iterative algorithm converges very slowly. To resolve this issue, non-iterative coordinated beamforming algorithms have been proposed [CMJH08]. The authors
in [CMJH08] used two methods (power iteration and generalized eigen analysis) to nd the optimal transmit/receive beamforming vectors for two transmit
antenna systems. Since we now consider a two-user system, we need to solve the
optimization problem as follows:
!
max
w1,opt ,w2,opt = arg
w1 :"w1 "=1,w2 :|w2 "=1
5




log2 1 + |w1H R1 w1 | + log2 1 + |w2H R2 w2 |
s.t. |w1H R1 w2 | = |w2H R2 w1 | = 0.

(5.33)

where R1 and R2 are the Nbs Nbs normalized matched channel matrices
2
dened by Hk HH
k /"Hk "F and w1 , w2 are the transmit beamforming vectors
of size Nbs 1.
Theorem 5.4. If Nbs = 2, Nue 2 and R1 and R2 are both invertible, then the
following claim holds. If (non-zero) transmit beamforming vectors w1 and w2
satisfy the zero inter-user interference conditions, i.e.,
g1H R1 w2 = 0
g2H R2 w1 = 0
then w1 , w2 are the generalized eigenvectors of (R1 , R2 ), which means:
R1 w1 = 1 R2 w1
R2 w2 = 2 R1 w2

76

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

for some scalars 1 and 2 [CMJH08].


Theorem 5.4 means that for Nbs = 2, any zero inter-user interference solution
is a generalized eigenvector of R1 and R2 .
Theorem 5.5. If tm , tn are generalized eigenvectors of (R1 , R2 ) and they correspond to distinct eigenvalues, then any tm , tn satisfy the zero inter-user interference constraint, where m, n = 1, 2, , the number of generalized eigenvectors,
m = n. In other words, for a two-user system, any set of generalized eigenvectors
of (R1 , R2 ) satisfy the zero inter-user interference condition [CMJH08].
From Theorem 5.5, it is clear that the generalized eigenvectors of R1 and R2
satisfy the zero inter-user interference constraint (5.33). Note that this solution
is not sum-capacity optimal for arbitrary antenna congurations. The idea here
is to use the transmit beamforming vectors shown in Theorems 5.4 and 5.5 to
obtain zero inter-user interference even for more than two antenna systems.
This algorithm can be generalized for a case of three BSs with one transmit
antenna each and multiple UEs with multiple antennas [CKH10]. The condition that there is no inter-user interference between UE 1 and UEs 2 and 3 is
equivalent to
w1H R1 w2 =0 = w2H R2 w1

(5.34)

w1H R1 w3

(5.35)

=0 =

w3H R3 w1 .

In the case when R1 w1 R2 w1 = 0, where denotes the cross-product dened


in [CKH10], Lemma 2, 3) in [CKH10] asserts that (5.34) is equivalent to
w2 = (R1 w1 R2 w1 ),

C,

similarly as in the proof of Theorem 5 in [CKH10]. By the same reason, if R1 w1


R3 w1 = 0, (5.35) is equivalent to
w3 = (R1 w1 R3 w1 ),

C.

Now the BSs need to nullify inter-user interference between UE 2 and UE 3.


That is,
w2H R2 w3 = 0 = w3H R3 w2
R 2 w 2 w3 = 0 = w 2 R 3 w 3

(5.36)

w2 R 2 w 3 = 0 = w 2 R 3 w 3 .
Again by Lemma 2 in [CKH10], (5.36) is equivalent to
w2 = (R2 w3 R3 w3 ),
as long as R2 w3 R3 w3 = 0.

C,

(5.37)

5.3 Downlink Coordinated Beamforming

77

By expressing w2 and w3 in terms of w1 , (5.37) is equivalent to the following:


R1 w1 R2 w1 " R2 (R1 w1 R3 w1 ) R3 (R1 w1 R3 w1 ).

(5.38)

Here, x " y denotes that the complex vectors x and y are parallel.
Dene two functions , : C3 C3 as
(w) = R1 w R2 w
(w) = R2 (R1 w R3 w) R3 (R1 w R3 w).
where the components and are polynomials of degrees 2 and 4, respectively,
in the components of w. By applying Lemma 2, 3 in [CKH10], we see that
solving (5.38), with the restriction that (w1 ) = 0 and (w1 ) = 0, is equivalent
to nding a solution w1 for the following equation:
(w1 ) (w1 ) = 0.
Note that (w1 ) = 0 implies R1 w1 R3 w1 = 0. Therefore, we can nd all possible (w1 , w2 , w3 ) satisfying the no inter-user interference condition, in generic
(non-singluar) cases as described below.
Theorem 5.6. Under the non-singular hypothesis (w1 ) = 0 and (w1 ) = 0,
no inter-user interference is achieved by (w1 , w2 , w3 ), if and only if
(w1 ) (w1 ) = 0
w2 = (w1 )
w3 = (R1 w1 R3 w1 )
for some , C.
Extension to the Two-Cell Case
While considering multiple antennas per UE, we so far constrained ourselves to
the case of one BS (hence observing a BC). We now extend this to the two-cell
case1 . Note that the essential dierence is that only the antennas connected to
one BS may be used for the transmission towards one particular UE. Otherwise,
the BSs would also have to exchange the data to be transmitted to the terminals,
resembling a joint signal processing CoMP scheme which will be investigated in
Sections 6.3 and 6.4. Let us initially focus on a two-cell MIMO system as shown
in Fig. 5.15, where two BSs serve two UEs equipped with more than one receive
antenna. As usual, the channel between BS m and UE k is denoted as Hm
k . The
1

Optimal M -cell coordinated beamforming algorithm with a zero inter-cell interference constraint is still unknown, thus in the section we mostly focus on a two-cell system.

78

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

Figure 5.15 Two-user MIMO interference channel.

received signal after equalization at each UE k can be written as


6
6
P H  k H
P H  j H
gk Hk wk xk +
g
(5.39)
Hk wj xj + gkH nk
x
k =
2
2 k
 H
6
6
 k H
H k
H k
H
w
Hjk wj
wjH Hkk nk
j
k
H
w
w
H
P k k
P
k
k
+
+
=
x
x
 
 
 H
k
j
2 || Hk Hwk ||
2 || Hk Hwk ||
|| Hkk wk ||
k
k



Desired signal

Other-cell interference

where j = k, P is the total transmit power and MRC is also assumed at the
(Hk )H w
UE, i.e. gk = "(Hkk )H wk " . Then the design goal is to maximize the desired signal
k
k
term and to remove the other-cell interference term found in (5.39). Thus we
introduce an interference-aware coordinated beamforming with MRC algorithm
that satises the following condition:
 H
 H
g1H H21 w2 =0 = g2H H12 w1
(5.40)
 H
 H
w1H H11 H21 w2 =0 = w2H H22 H12 w1 ,
which implies that the other-cell interference term in (5.39) is perfectly removed;
at the same time, the proposed system maximizes the desired eective channel
gain |gkH (Hkk )H wk |2 by using MRC. Note that it can be guaranteed that there is
no inter-user interference thanks to the transmit beamforming vectors. Since we
are considering a two-cell environment, this can be interpreted as the two-user
MIMO interference channel (IC) illustrated in Fig. 5.15 [CHHT10].
Theorem 5.7. Under a zero other-cell interference constraint in (5.40), the
sucient and necessary beamforming vectors with MRC for UE = (where k, j are
 H
 H j
1 or 2, j = k) are generalized eigenvectors of Hkk Hjk and Hkj
Hj .
Theorem 5.8. Given wj , where each BS has two transmit antennas, the sucient and necessary beamforming vector (unique up to complex multiplications)
for UE k, wk can be expressed as
' *
' *
z2
z2
wk =
or
w
,
(5.41)
=

k
z1
z1

5.3 Downlink Coordinated Beamforming

79

15
ut
bc
rs

10

non-cooperative eigen-beamforming
ICIN, eigen-beamforming, perfect CSI
CBF, perfect CSI
CBF wth LFB, Q=6 bits per user
bc

utrs
bc

rsut
bc
ut

ut

rs
ut
ut

sum rate [bit/channel use]

ut

bc
rs
ut
ut

0
5

10

15

20

SINR [dB]
Figure 5.16 Sum-rate comparisons as a function of SINR, where Nbs = Nue = 2. Each

BS has the same transmit power P/2, where P is the total transmit power.

where
z=

H
Hkk

'
Hjk wj

z1
z2

*
,

and z2 is the complex conjugate of z2 .


To enable practical implementation, we also introduce simple two channel
quantization methods. The normalized matched channel matrices can be written
as, without the user index,
 H
 k H k (
)
)
Hjk
Hjk ( R
H k Hk
RH,11 RH,12
G,11 RG,12
=
=
RH =
and RG =
.
RH,21 RH,22
RG,21 RG,22
||Hkk ||2F
||Hjk ||2F
Since RH and RG are Hermitian matrices, we use the following properties to
design scalar quantization.
RH,11 + RH,22 = 1 and RG,11 + RG,22 = 1,
H
H
= RH,21 and RG,12
= RG,21
RH,12

Therefore, only six scalar parts are needed to be quantized for computing the
transmit beamforming and the receive combining vectors, i.e., RH,11 , RG,11 ,
Re{RH,12 }, Re{RG,12 }, Im{RH,12 }, and Im{RH,12 }, where Re{} and Im{}
denote real and imaginary part, respectively. On the other hand, RH and RG
can be jointly quantized using vector quantization as follows:

RH,11
RG,11
vH = Re{RH,12 } and vG = Re{RG,12 } .
Im{RH,12 }
Im{RG,12 }

80

CoMP Schemes Based on Interf.-Aware Transceivers or Interf. Coord.

Upon receiving the quantized values from the UEs over a control channel, the
BSs can estimate RH and RG and compute the transmit beamforming vectors
before transmitting the data.
Fig. 5.16 shows the achievable sum-rate results for i) coordinated beamforming, ii) non-cooperative eigen-beamforming, and iii) interference nulling algorithms introduced in [CHHT10]. For this gure, we model the elements of each
UEs channel matrix as independent complex Gaussian random variables with
zero mean and unit variance NC (0, 1). Note that the algorithm introduced is not
directly related to the channel model. Once the BSs know all channel matrices, the transmit beamforming and receive combining vectors can be computed
through Theorems 5.7 and 5.8. As can be seen from Fig. 5.16, the coordinated
beamforming algorithm shows reasonably good sum-rate performance compared
with other solutions regardless of SNR values. Note that the coordinated beamforming algorithm in the gure uses 6 bits limited feedback per user, i.e., 3 bits
each for RH and RG , respectively.

5.3.4

Summary
In this section, we presented some latest results in downlink cooperative beamforming, which is important for interference management in upcoming cellular standards like 3GPP LTE-A. We described the details of several strategies,
distinguishing whether each terminal is equipped with one or multiple receive
antennas. We also presented simulation results from these strategies to illustrate
the potential gains that can be obtained from such kind of CoMP. From an
LTE-Advanced point of view, the most likely scenario is to have BSs with 2 or
4 transmit antennas, UEs with 2 receive antennas, thus the solutions introduced
in the chapter would be good candidates for the LTE-A systems. A particular
implementation of coordinated beamforming will be described and simulated in
Section 14.4.3.

CoMP Schemes Based on


Multi-Cell Joint Signal Processing

In this chapter, we focus on CoMP schemes where user data or received signals connected to multiple users are exchanged between base stations for joint
signal processing. Such schemes promise larger spectral eciency gains than
pure interference coordination techniques, but typically come at the price of
larger backhaul requirements and (particularly in the downlink) more severe
synchronization requirements. After Sections 6.1 and 6.2 introduce centralized
and decentralized uplink CoMP schemes, respectively, 6.3 and 6.4 focus on the
downlink.

6.1

Uplink Centralized Joint Detection


Chenyang Yang, Yafei Tian and Andreas F. Molisch
In this section, uplink centralized joint detection with innite or limited backhaul capacity is studied, as already introduced in Section 4.3.1. A cluster of
base stations (BSs) sends either raw or preprocessed receive signals from the
user equipments (UEs) to a CoMP central unit (CCU), where joint processing
is performed to deal with inter-cell interference. The CCU can be a separate
entity, or any of the BSs involved in the cooperation. This is practical in systems
such as LTE-A employing a at network architecture. The detection algorithms
are applicable to various static or dynamic clustering schemes as described in
Chapter 7.

6.1.1

Introduction
When multiple BSs are connected via perfect backhaul links with innite capacity, uplink centralized joint detection resembles a multiple access channel (MAC)
problem, where the CCU is a super-receiver, and the BSs form a distributed
antenna system (DAS) [Mol01]. Consequently, various optimal or suboptimal
multi-user detectors such as the maximum likelihood (ML), linear minimum
mean square error (LMMSE) detector, MMSE detector with successive interference cancelation (SIC) or parallel interference cancelation (PIC), and iterative
detectors based on the Turbo principle, can be used for joint detection. By remov-

82

CoMP Schemes Based on Multi-Cell Joint Signal Processing

ing the inter-cell interference to a certain extent and obtaining signicant array
gain and diversity gain as shown in Chapter 4, the system spectral eciency can
be signicantly increased [Mol01, DP03].
When joint detection is implemented in practice, the received signals at the
cooperative BSs need to be quantized and then forwarded via the backhaul links
to the CCU for centralized processing. This entails requirements for large backhaul capacity, typically on the order of Mbps or even Gbps [MF07b, HFG09].
In existing or upcoming systems such as LTE, the backhaul capacity is typically
limited. Considering realistic backhaul constraints, we can transfer the locally
demapped signals or soft-decoded information to the CCU [FHG09], or forward
the locally compressed receive signals exploiting the correlation inherent in the
message of the UEs [MF09b].
One aspect of uplink CoMP lies in the fact that the signals received by multiple
BSs are correlated, which leads to redundancy. This implies that each BS can
exploit this signal correlation and compress its received signals, then transmit
the compressed signals to the CCU, which reduces the information needed to be
exchanged via the backhaul links. The CCU then decompresses the signals with
its own received signal as the side information (if it is a BS itself), and nally
estimates the messages of the UEs [dCS09].
Another inherent feature of CoMP systems is their asymmetric channels, i.e.,
the average channel power from one UE to its local BS and other cooperative BSs are dierent, and the power from multiple UEs in dierent cells to
one BS also dier. The channel asymmetry cannot be compensated by power
control [HYK+ 10], which is quite dierent from the near-far eect in single cell
multi-user systems. This does not change the structure of the joint detector at the
CCU when innite backhaul capacity is assumed, but it has large impact on the
system performance. Moreover, when the backhaul link constraint is taken into
account, this feature introduces new degrees of freedom to design uplink CoMP
schemes. Depending on the relative locations of multiple UEs (i.e., various user
pairings), the CCU may jointly detect only some UEs, and the supporting BSs
may quantize, compress, decode or even partially decode their received signals
to reduce the information to be transmitted to the CCU [MF08a, SSPS09a].

6.1.2

Joint Detection Algorithms


In the following, we introduce and discuss several joint signal processing algorithms at the CCU, including ML detection, LMMSE, MMSE detection with SIC
and PIC, and Turbo detection. The computational complexity of the algorithms
will be analyzed. Simulation results are provided to compare the detection performance of these algorithms. To show the benet of centralized joint detection
versus non-cooperative detection in various typical settings, we will also provide
simulation results on the system sum-rate. Throughout the remainder of this
section, we will focus on a frequency-at channel, which could be a single sub-

6.1 Uplink Centralized Joint Detection

83

carrier of an orthogonal frequency division multiple access (OFDMA) system, as


introduced in Chapter 3.
We consider a cluster of M cells, where each cell involves one BS with Nbs
receive antennas and one UE with Nue transmit antennas. Multiple UEs from
dierent cells operate on the same time/frequency resource, and the received
signals at the BSs from K = M UEs in dierent cells are superimposed. As
in Chapter 3, we assume that all entities are perfectly synchronized in time
and frequency (the impact of imperfect synchronization is discussed in detail in
Sections 8.2 and 8.3), and the received signals are transferred to the CCU for
joint detection after BS local processing. The signals from M BSs gathered at
the CCU can be expressed as

0
G1

..
y = Hs + n = H
(6.1)
x + no + np ,
.

GK

G
T T
where y = [y1T , , yM
] , and ym are the forwarded symbols from BS m with
local processing errors, s and x are the transmitted symbols after and before precoding, H is the composite channel matrix, G is the (block-diagonal) precoding
matrix, no denotes the noise vector at all M BSs including the thermal noise at
the receiver front-end and the inter-cluster interference from the cells outside of
the cooperative cluster, and np denotes the local processing errors at the BSs
due to quantization or compression.

Assuming that the CCU knows the channel matrix H and precoding matrix
G perfectly, the system capacity region with unlimited backhaul is described
in (3.4). We can use (3.4) to calculate the block error rate in fading channels,
which can serve as an achievable lower bound for practical systems. Given modulation and coding schemes, the transmission data rate is known. If for one channel
realization the rate is larger than the sum capacity computed from (3.4), then
the data block can not be decoded correctly, or we can say an outage happens.
The ML detection algorithm nds the most probably transmitted symbols,
which, under the assumption of Gaussian noise, minimizes the Euclidean distance
between the received signals and all possible received symbols,
= min "y HGx"2 .
x
x

(6.2)

ML detection is optimal if the transmitted symbols are equally probable, but


is not feasible for most applications due to its prohibitive complexity. Suppose
that each UE transmits Nue streams and Ms -QAM is used, then the detection
complexity at the CCU is on the order of O(MsNue K ). For example, when Ms = 4,
Nue = 2, and K = 5, the complexity would be on the order of 220 computational
steps per channel access, which is beyond the capability of current systems.

84

CoMP Schemes Based on Multi-Cell Joint Signal Processing

To reduce the complexity of ML detection, sphere detection can be applied. It


only searches the candidates when the distance "y HGx"2 is less than a given
radius, leading to a complexity roughly cubic in Nue K [HTB03]. The LMMSE
detection algorithm is a widely used alternative. It minimizes the mean square
estimation error between x and its estimate, which is
= WH y
x
1

where W = HGGH HH + nn
HG,

(6.3)
(6.4)

and nn is the covariance matrix of no plus np . The LMMSE algorithm has a


complexity on the order of O((Nue K)2 ). For the example system setting listed
earlier, the computational cost is on the order of 102 operations. Considering
that the interference of a UE to other UEs can be eliminated after its signal
has been detected, the MMSE-SIC algorithm is expected to perform better. The
MMSE-SIC algorithm is a serially concatenated MMSE detection scheme, which
detects and decodes the signals (streams) of the UEs one by one and subtracts
their interference from the remaining signals, which is
y1 = y,

1 = W1H y1 ,
x

y2 = y1 H1 G1 e (d (
x1 )) ,
..
.

2 = W2H y2 ,
x

xK1 )) ,
yK = yK1 HK1 GK1 e (d (

H
K = WK
x
yK ,

(6.5)

where Hk is the channel matrix from UE k to all M BSs, Wk is the receive


lter designed for UE k, d() denotes decoding and e() denotes (re-)encoding.
The decoding order determines which vertex of the pentagon-shaped capacity
region as illustrated in Fig. 3.4 can be achieved. For example, we can decode
and cancel the signals in descending order of signal strength, which will improve
fairness among UEs. This algorithm can be seen as an adaptation of Horizontal
Bell Laboratories Layered Space-Time Architecture (H-BLAST) [WFGV98].
The MMSE-SIC algorithm has a complexity on the same order of the LMMSE
detection algorithm. PIC is another kind of interference cancelation scheme with
less detection latency, which cancels the interference of other UEs in parallel.
Before xk is detected, the estimated signals from all other UEs should be subtracted from y. However, if the initial estimations of x1 , , xK are not correct,
the error propagation will be more severe.
PIC will have better performance if we have a priori information of coded bits
forwarded from the decoder. With this probability information, soft estimation of
the transmitted symbols rather than hard decisions can be formed and canceled.
Through iteration between the detector and decoder, detection performance can
be improved signicantly. This scheme belongs to Turbo detection [WP99], in
which various kinds of detectors and decoders can be applied. We will describe
an ML detector and MMSE-PIC detector appropriate for Turbo detection in the
following [DMP04], while the decoding process itself is described in [WH02].

6.1 Uplink Centralized Joint Detection

85

Extrinsic information exchange is required between the detector and the


decoder. For the ML detector, the obtained extrinsic information is

+ p(y|x)p(x)
xXk,i
(6.6)
Le (ck,i ) = log 
La (ck,i ),
xX p(y|x)p(x)
k,i

+
where ck,i is the ith coded bit of UE k, Xk,i
= {[xT1 , , xTK ]T : ck,i = 1} and

T
T T
Xi = {[x1 , , xK ] : ck,i = 1}, p(y|x) is a multivariate Gaussian distribu3
3Nk
tion with the signal model of (6.1), p(x) = K
k=1
i=1 p(ck,i ), Nk is the number of bits modulated on Nue sub-streams of UE k, and La (ck,i ) = log(P (ck,i =
1)/P (ck,i = 1)) denotes a priori information from the decoding stage.

For the MMSE-PIC detector, we rst need to construct the soft estimation of
the transmitted symbols using the a priori information transferred from the
decoding stage. Since xk is a mapping result of ck,i , i = 1, , Nk , the soft
estimation of xk is the weighted summation of all possible mapping results
xk (ck,1 , , ck,Nk ) with their corresponding probabilities,
k =
x

xk (ck,1 , , ck,Nk )

ck,i (1,0),i=1, ,Nk

Nk
1

p(ck,i ).

(6.7)

i=1

Then, for each UE, we can subtract the estimated interference from other
UEs,
k = y HG
y
xk ,

(6.8)

k1 , 0, x
k+1 , , x
K )T is the estimated interference sym k = (
x1 , , x
where x
k to further suppress the residual
bol vector. An MMSE lter is applied to y
interference plus noise, which is given by
kH }1 E{
yk y
yk xH
Wk = E{
k }


H H
kG
HH
k QG
H + nn 1 Hk Gk ,
= Hk Gk Gk Hk + H
k
k

(6.9)
(6.10)

k involve all the vectors of H and G except those of Hk and


k and G
where H
Gk , Q is the covariance matrix of the residual interference, i.e.,
Q = diag ([q1 , , qk1 , qk+1 , , qK ]) ,
8
7
k . This
where qk = 1 |
k,n is the nth element of x
xk,1 |2 , , 1 |
xk,Nue |2 , x
covariance matrix will approach zero when the estimates from the decoding stage
are accurate enough. As shown in [WP99], the output of the MMSE lter, i.e.,
the estimation of the j-th stream of UE k, can be modeled as
H
k = k,j xk,j + k,j ,
y
zk,j = wk,j

E{zk,j xH
k,j }

(6.11)
H
wk,j
Hk gk,j ,

=
where wk,j is the j-th column of matrix Wk , k,j =
and k,j is well-approximated by a Gaussian variable with zero mean and variance
#
"
2
k,j
= E |zk,j k,j xk,j |2 = k,j |k,j |2 .

(6.12)

The extrinsic information is given in the same form as in (6.6), except that y
is replaced by zk,j , x is replaced by xk,j , and (6.1) is replaced by (6.11). Since

86

CoMP Schemes Based on Multi-Cell Joint Signal Processing

the vector operation is replaced by multiple scalar operations for calculating the
extrinsic information, the complexity is signicantly reduced.
Simulation Settings and Results
In this subsection, we rst show the block error rate (BLER) of these detectors,
and then show the impact of user positions and pairing on the system sumrate. We will compare the sum-rate achieved by the centralized joint detection
with that by the non-cooperative MMSE-SIC, with which each BS only locally
decodes the message of its own user and treats the inter-cell interference as noise.
We consider two cells, each has a 4-antenna BS and a single antenna user.
The distance between the two BSs is 500 m, the pathloss in dB follows 35.3 +
37.6log10 d, and shadowing is not considered. The cell-center user is located on
the line connecting the two BSs and is 50 m from its serving BS, and the cell-edge
user is 245 m from its serving BS. The small-scale fading is assumed to be i.i.d.
Rayleigh fading. The uplink transmit power is 30 dBm to both users, and the
noise power at each BS is 99 dBm. We choose BS 1 as the CCU. BS 2 forwards
its received signal via an innite-capacity backhaul link to BS 1, i.e., np = 0.
The sum-rates achieved by the joint LMMSE and MMSE-SIC are computed
using Shannons capacity formula with their respective signal-to-interferenceand-noise ratio (SINR) and are averaged over realizations of small-scale fading.
Note that the sum-rate of the MMSE-SIC is the same as the sum capacity shown
in (3.4) [SAH+ 04].
The BLER versus signal-to-noise ratio (SNR) of various joint detection algorithms is shown in Fig. 6.1, where two users are assumed to have identical SNR.
Each user employs binary phase shift keying (BPSK) modulation and a rate-1/3
convolutional code with generators (155, 117, 123)8. The data packet length is
256 bits. For ML, LMMSE, and MMSE-SIC detection, the soft decision Viterbi
decoder is used. For Turbo detection with MMSE-PIC, the Max-Log-MAP algorithm is used in the soft-input-soft-output decoder [WH02] with 6 iterations. To
compare with the BLER lower bound, which is calculated from the achievable
sum-rate formula, the data block involves the data packets of two users and
therefore the BLER is accounted if any of these two packets are wrong.
From this gure, we can see that the LMMSE detector has a similar performance as the MMSE-SIC detector when the convolutional coding and softdecision Viterbi decoder are employed. At a BLER of 102 , the ML detector
performs 1 dB better. Even without iteration, (6.6) is used to send the soft decision values to the Viterbi decoder, otherwise, the ML detector with hard decisions
cannot outperform the LMMSE detector with soft decisions. The Turbo detection algorithm with MMSE-PIC detector and soft-input-soft-output decoder has
2 dB SNR gain over the non-iterative MMSE-SIC detection algorithm, and the
gap to the theoretical lower bound is about 1 dB.
To show the gain of joint detection and observe the impact of user pairing
on the performance, Fig. 6.2 gives the sum-rate cumulative distribution func-

6.1 Uplink Centralized Joint Detection

100

87

bcrsutqp
bcrsutqp
bcrs
ut
qp

10

1
bcrs

BLER

ut
qp

102

bc

ut

rs

qp

103
12

6
SNR [dB]

rs
ut

MMSE det. + Viterbi dec.


MMSE-SIC det. + Viterbi dec.
ML det. + Viterbi dec.
MMSE-PIC det. + Turbo dec.
ideal
10

bc

qp

Figure 6.1 Block error rate of various joint detection algorithms.

tion (CDF) of joint LMMSE, joint MMSE-SIC and non-cooperative detection


in dierent settings. The legend center-edge represents the results where one
user is located at the cell-center but another user is at the cell-edge. The legend edge-edge and center-center respectively represent the cases where both
users are at cell-edge or at cell-center. Although in our simulation setting the
non-cooperative system has enough degrees of freedom to deal with the interference, it shows that even when the two users are located at the cell-center where
each user has high SINR, the joint detection still outperforms non-cooperative
detection (though this gain can be expected to become marginal under imperfect channel knowledge, as shown in Section 4.2). The gain of performing joint
detection increases when at least one user is at the cell-edge. In these cases,
the cell-edge user will benet from the joint detection signicantly thereby joint
detection outperforms non-cooperative detection more evidently. On the other
hand, since the performance of the cell-center user dominates the sum-rate, the
sum-rate will reduce when more users are located at the cell-edge.

6.1.3

Local BS Processing with Limited Backhaul Constraint


In practical systems, the cooperative BSs need some kind of local processing
before they forward the received information to the CCU. The BSs can directly
quantize the IQ samples of their received signals, but transmitting this information to the CCU requires high capacity backhaul links. To reduce the requirement
for the backhaul capacity, the BSs can compress the received signals. Compression can be performed through source coding schemes with compression distortion. As mentioned in Section 4.3.1, we can further exploit the correlation among
the received signals from dierent BSs, where the amount of information transferred is the conditional mutual information between the signals received by one

CoMP Schemes Based on Multi-Cell Joint Signal Processing

1.0
0.9
0.8

center-center

0.7
0.6
CDF

88

center-edge
edge-edge

0.5
0.4
0.3

non-cooperative detection
cooperative LMMSE
cooperative MMSE-SIC

0.2
0.1
0
0

10
15
sum-rate [bit/channel use]

20

Figure 6.2 Sum-rate CDF under innite backhaul capacity, which shows the
performance gap between joint detection and non-cooperative detection. 9 dB
inter-cluster interference is considered.

BS and those already known by the CCU. Both the quantization noise and the
compression distortion will deteriorate the joint detection performance.
Local BS Processing Methods
Consider that one of the M BSs serves as the CCU. Without loss of generality, we
again consider BS 1 as the CCU. The backhaul link constraint from BS m to the
CCU is denoted as Cm . If a per-link capacity constraint is applied, all the links
have same capacity, i.e., Cm = C. If the sum-capacity constraint is applied, i.e.,

m Cm = C, the capacity of each link would be allocated by some optimization
criteria such as the sum-rate maximization in the cluster.
We rst consider the direct quantization scheme. The data symbols received
by BS m are quantized with Bm bits each for real and imaginary dimensions separately, where Bm = Cm /(2Nbs ). Assume that the signal level is within the range
of [A, +A], then with uniform quantization the quantization noise variance in
each dimension is
A2 22Bm
.
(6.13)
3
The covariance matrix of the quantization noise vector is therefore given as
2
2
, , 2q,m
]), representing the processing errors of BS m. We
m = diag([2q,m
next address the source coding scheme without exploiting the signal correlation
between BS m and BS 1. The covariance of the received signal of BS m is
2
=
q,m

2
yy,m = Hm ss (Hm )H + o,m
I,

(6.14)

where Hm denotes the channel matrix from all K users to BS m, ss = GGH


is a block diagonal matrix where each block is the covariance matrix of the

6.1 Uplink Centralized Joint Detection

89

2
transmitted signals of one user, and o,m
is the variance of the observation noise
including the receiver thermal noise and the inter-cluster interference at BS m.
According to the backhaul constraint, the distortion matrix m should satisfy



2
log2 det I + (m )1 Hm ss (Hm )H + o,m
I Cm .
(6.15)

We nally consider the multi-source compression coding exploiting the signal


correlation between BS m and BS 1. We only consider the correlation between
the signals received in BS m and BS 1, even though BS 1 may have retrieved more
information from other BSs. The amount of mutual information that should be
transferred depends on the conditional covariance matrix yy,m|1 between the
signals received in BS m and in BS 1, which is expressed as [dCS09]
'
*
ss  1 H 1 1
m
2
H
ss (Hm )H + o,m
I.
(6.16)
I+ 2
yy,m|1 = H
H
o,m
The distortion matrix in this case should satisfy [dCS09]


log2 det I + (m )1 yy,m|1 Cm .

(6.17)

This is a lossy decentralized multi-source compression with receiver-side information. In practice, we can rst use the conditional Karhunen-Lo`eve transform
to decompose the vector signal into independent streams, and then compress the
resulting scalar streams separately. Given the quantization noise or compression
distortion matrix m in BS m, the sum-rate achievable with joint detection is


(6.18)
Rsum = log2 det I + (nn + )1 Hss HH ,
2
2
INbs , , oM
INbs }, and INbs is
where = diag(1 , , M ), nn = diag{o1
the identity matrix of size Nbs .

Simulation Settings and Results


In Figs. 6.3(a) and 6.3(b), we show the sum-rate of two cell-edge users when
MMSE-SIC is used at the CCU (i.e., BS 1), and dierent local processers are
used at the supporting BS (i.e., BS 2), where the backhaul is constrained to 3
and 10 bit/channel use, respectively. Inter-cluster interference is not considered.
All simulation settings are the same as in Fig. 6.1, unless otherwise stated.
Comparing with the results shown in Fig. 6.2, we can see that the performance
of the joint detection degrades severely due to the processing errors of the supporting BS but still outperforms non-cooperative detection. When increasing
the backhaul capacity constraint, the two schemes with compression improve
signicantly, but the scheme with direct quantization improves much slower.
When C = 10 bit/channel use, the performance of the two compression schemes
is almost the same, which means that it is unnecessary to use multi-source compression coding which is more complex and needs more accurate channel state
information (CSI). With such a moderate backhaul capacity, using single source
compression rather than using the direct quantization is a good choice.

90

CoMP Schemes Based on Multi-Cell Joint Signal Processing

1.0

1.0
0.9

0.7

quantization
source coding
compr. with corr.

0.6

innite backhaul

0.6

0.9

0.8
0.7
CDF

CDF

0.8

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0
3

5
6
7
8
9 10
sum-rate [bit/channel use]

(a) Backhaul capacity C = 3 bpcu.

11

5
6
7
8
9 10
sum-rate [bit/channel use]

11

(b) Backhaul capacity C = 10 bpcu.

Figure 6.3 The impact of quantization noise and compression distortion on the

performance of joint MMSE-SIC.

To observe the impact of the user pairing on the performance, we simulate


the sum-rate of the two users in two cases in Fig. 6.4. The legend center-edge
represents the results where one user in the cell of BS 1, i.e., the CCU, is located
at the cell-center but another user is at the cell-edge. The legend edge-center
represents the opposite case. Again, we do not consider inter-cluster interference.
It is shown that it is critical for the performance which user is located at
the cell-center; this is in contrast to the scenario of perfect backhaul considered
in Section 6.1.2, Fig. 6.2. Remember that BS 1 is the CCU. Therefore, signals
received at this BS do not suer from compression errors. When the user of
BS 1 is located at the cell-center (center-edge in the legend), its performance
will dominate the sum-rate. Thereby, the impact of the local processing error is
negligible. On the other hand, when the user of the supporting BS 2 is located
at the cell-center (edge-center in the legend), the local processing errors at BS 2
will degrade the performance of the joint detection although the received signal
of this user at BS 2 has high SINR.

6.1.4

Local or Partial Decoding with Limited Backhaul Constraint


Alternative to the direct quantization, single-source and multi-source coding,
the BSs can also demodulate or decode the signal from every user, and then
transfer the original bits or (soft) coded bits of each user to the CCU for joint
detection. The bits from dierent BSs are combined at the CCU to yield nal
decisions [FHG09].
Local Decoding
For the cell-center users, we can decode their messages locally at their serving
BSs. These messages are able to be decoded correctly with high probability.

6.1 Uplink Centralized Joint Detection

91

1.0
0.9
0.8
0.7

center-edge

CDF

0.6
edge-center

0.5
0.4
0.3

quantization
source coding
compression with correlation

0.2
0.1
0
2

8
10
12
14
sum-rate [bit/channel use]

16

18

Figure 6.4 The impact of user pairing on the performance of joint MMSE-SIC,

backhaul capacity constraint C = 3 bit/channel use.

As shown in Fig. 6.2 for the case where both users are at the cell-center, local
decoding performs closely to or even the same as joint decoding. These decoded
bits can be forwarded to the CCU to facilitate interference cancelation for other
cell-edge users. Otherwise, it is shown in Fig. 6.4 that under limited backhaul
capacity the cell-center user of the supporting BS will degrade the sum-rate no
matter whether we quantize or compress its received signals.
Due to the mutual interference, it is not wise to decode the information of all
users at one BS only. On the other hand, if each BS demodulates and transfers
all the soft information of multiple users, the transfer rate may even exceed that
of the direct quantization scheme. For example, assume that 3 BSs cooperate to
serve 3 users, where each user transmits a data stream with 16-QAM modulation.
This amounts to 12 information bits for each BS. If 4 bits are used to represent
the log-likelihood ratio (LLR) of each information bit, then totally 48 bits need
to be transferred to the CCU. However, if we use 8 bits to quantize each received
sample, only 16 bits need to be transferred for both the real and imaginary parts
of the superposed signals.
Therefore, it is better that the BSs locally decode the information of some UEs
and forward to the CCU, but locally process and then transfer the received signals
from other UEs to the CCU for joint detection, as also stated in Section 4.3.2.
This is in fact a receiver mode switching, which is analogous to the downlink
CoMP mode switching, where the BSs serve some UEs without cooperation and
jointly transmit to other UEs (see Section 6.4).
Partial Decoding and Rate Splitting
For those users who are between cell-center and cell-edge, we can divide the data
streams of each user into two parts. One is decoded by the local BS, and the other

92

CoMP Schemes Based on Multi-Cell Joint Signal Processing

is at the CCU. In particular, for each UE that is in the cells of supporting BSs, we
use rate splitting to divide its data streams, which is implemented by superposition coding [MF08a, SSPS09a]. At each supporting BS, the received messages
are partially decoded and the residual received signals are compressed and then
forwarded to the CCU. At the CCU, MMSE-SIC is used for joint detection. By
optimizing the power allocated to the two parts of the data streams, maximum
sum-rate can be achieved under a certain constraint of backhaul capacity. Partial decoding at the local BS can reduce the information to be transferred to
the CCU, but may also reduce the users rate since decoding is performed under
interference. It is expected that such a multi-level decoding strategy may provide
a smooth transition from the two extreme schemes: (i) each BS only compresses
its received raw signals, or (ii) each BS only locally decodes the messages of its
own users [dCS09], as also pointed out in Section 4.3.2.
However, numerical results show that the performance gain of rate splitting
is marginal (e.g. compared to a time-share between dierent cooperation strategies). The same conclusion is drawn in [MF08a, SSPS09a] with slightly dierent
forms of superposition coding.
This is because CoMP channels are asymmetric, which cannot be compensated
by power allocation. Using rate splitting for a user, say UE k, is equivalent
to dividing the user into two users with dierent powers under a sum power
constraint, UE k1 and UE k2. In CoMP systems, from the viewpoint of its local
BS, say BS 1, UE k1 allocated with more power looks like a cell-center user.
Other BSs, however, also receive higher power from UE k1. Consider that a true
cell-center user in BS 1 will lead to high receive power at its local BS but low
receive power at other BSs, which makes local decoding more desirable. Now
it is clear that rate splitting is not equivalent to dividing a user into two users
with dierent positions, and hence does not help to improve the performance as
expected.

6.1.5

Provisions for Uplink Joint Processing in WiMax and LTE


Advanced WiMax, or IEEE 802.16m, enables CoMP as an option. As dened in
Section 16.5.2 of the standard [IEE10a], two types of uplink multi-BS multipleinput multiple-output (MIMO) are dened: (i) single-BS decoding with multiBS coordination, and (ii) uplink multi-BS joint processing. The former type is
similar to the coordinated beamforming scheme introduced in Section 5.3. The
latter corresponds to joint detection CoMP as observed in this section, and oers
two variants of processing: macro diversity (where multiple BSs receive the signal
only from a single UE), or joint detection for multiple UEs.
In order to enable simultaneous reception at multiple BSs, a number of provisions are taken: rst of all, 802.16m denes a zone (i.e., a time period within
each basic frame duration), during which multi-BS reception may take place.
During this zone, transmission from one or more UEs to the dierent cooperating

6.1 Uplink Centralized Joint Detection

93

BSs occurs on the same time-frequency resources (i.e., the same OFDM symbols
and the same sub-carriers). This is a notable contrast to standard transmission
in WiMax, where UEs in adjacent cells are transmitting on (mostly) dierent
time-frequency resources, in order to reduce interference. Uplink sounding, i.e.,
determination of the channel information for the dierent UEs, is done in such
a way that the sounding signals allocated to the dierent UEs are orthogonal to
each other, as in Section 9.1. When macro-diversity is enabled (i.e., only one signal exists on a time-frequency resource, the BSs exchange log-likelihood ratios of
the bit decisions over the backhaul network. For cooperative reception (multiple
UEs on a time-frequency resource), quantized versions of the received signals are
exchanged over the backhaul network.
LTE-A also considers CoMP concepts, where similar provisions are made. For
centralized uplink CoMP, there should be backhaul links to connect the eNodeBs
to the CCU. Consider the per-link capacity, which is simply C BW . In an LTE-A
system, each eNodeB includes three BSs, the number of the transmit antennas
at each BS may be 2, 4 or 8, and the bandwidth BW is 20 MHz. Here we simply
ignore the various overhead, and simply multiply the backhaul constraints in
bit/channel use with the system bandwidth to obtain Mbit/s. Therefore, when
C = 3 bit/channel use or C = 10 bit/channel use, the required per-link backhaul
is 180 Mbps or 600 Mbps, respectively. When considering direct quantization,
this implies Bk = 0.375 bits and Bk = 1.25 bits per antenna when Nbs = 4.

6.1.6

Summary
In this section, uplink centralized joint detection was studied, where multiple
BSs forward partially processed receive signals to a CCU or another BS for a
joint and centralized decoding of multiple terminals.
Dierent joint detection concepts were rst introduced under the assumption
of innite backhaul capacity between cooperative base stations, and compared
in terms of performance and complexity. Then, constrained backhaul links were
considered and schemes introduced where local preprocessing and compression
are performed before signals are forwarded over the backhaul. It was shown that
the impact of compression distortion on the performance depends strongly on
user locations and pairing. Furthermore, results conrmed that it is benecial to
let some terminals be decoded locally (and the data bits potentially forwarded
to the CCU for interference cancelation), while others are decoded by the CCU
itself, which can be seen as a hybrid of a decentralized and centralized cooperation, as discussed in Section 4.3.1.
Finally, the provisions for uplink centralized joint processing in WiMax and
LTE-A standardization were discussed.

94

CoMP Schemes Based on Multi-Cell Joint Signal Processing

6.2

Uplink Decentralized Joint Detection


Xinning Wei and Tobias Weber
This section provides a practical decentralized uplink CoMP scheme for inter-cell
interference cancelation. As opposed to the schemes introduced in Section 6.1,
this scheme does not require a CoMP central unit (CCU), but performs multi-cell
cooperative signal processing simultaneously and iteratively at the coordinated
base stations (BSs) which exchange information via the backhaul.
The proposed decentralized CoMP scheme is expected to be a practical candidate in the uplink physical layer design for future LTE-A systems applying
orthogonal frequency division multiplex (OFDM) and multiple-input multipleoutput (MIMO) techniques. Concerning the signal processing implementation
architecture, recently more and more attention has been paid to decentralized
CoMP schemes [MF07b, KRF08, PHG09, MF09b, WWWS09, dCS09] as opposed
to centralized schemes, due to the following advantages:
From the network operators point of view, the change to the current architecture of cellular networks is expected to be as little as possible. Based on the
existing networks, the decentralized scheme requires only some extra backhaul
links between the neighboring BSs.
The joint signal processing for all the user equipments (UEs) can be distributed into parallel cooperative signal processing among individual UEs at
the coordinated BSs. The computational load is shared by the coordinated
BSs to avoid a central unit of possibly large computational complexity.
The decentralized CoMP scheme in this section can make full use of the iterative zero-forcing (ZF) joint detection (JD) algorithm. With a reduced computational cost, the system performance of the proposed decentralized iterative
ZF scheme can asymptotically approach that of its centralized counterpart.
In contrast to the centralized scheme with a xed central unit responsible
for a static cooperative cluster, the decentralized CoMP scheme in this section is more exible to eciently make full use of the dynamically selected
UE-oriented signicant channel state information (CSI). Although the significant channels for a UE are not necessarily limited in a structure-oriented
geographical area, in most realistic scenarios they exist in this UEs own cell
and adjacent cells. Since only local signicant CSI is considered at each BS
in the decentralized scheme, only backhaul links connecting adjacent BSs and
only the synchronization in the local area are required.
The proposed decentralized CoMP scheme is investigated in a 3-cell reference
scenario. This small scenario can be considered as one cooperative cluster in
the whole system, for example obtained through static or dynamic clustering
techniques explained in Chapter 7. Perfect channel knowledge is assumed in the
present section, while channel estimation for obtaining CSI in practical systems
will be discussed in Chapter 9.

6.2 Uplink Decentralized Joint Detection

6.2.1

95

Practical Decentralized Interference Cancelation Scheme


The decentralized CoMP scheme should make a good compromise between system performance and implementation complexity. On one side, in order to reduce
the computational load and backhaul requirements, we consider only part of the
full CSI in the cooperative signal processing in each cooperative cluster. On the
other side, in order to maintain a good system performance, the considered CSI
is that of the UE-oriented signicant channels which play a signicant role in
the system performance of each UE. Based on the UE-oriented signicant CSI, a
practical JD algorithm suitable for interference cancelation is implemented in a
decentralized way. In the following, the decentralized CoMP scheme is discussed
from the aspects of signicant channel selection, the signal processing algorithm
and the implementation architecture.
Signicant Channel Selection
Without loss of generality, the signicant channel selection is performed in a
cooperative cluster of M cells. For example, M = 3 is considered in this section.
Assuming OFDM is used as motivated in Chapter 3, we investigate a single subcarrier with one active UE in each cell at each time slot. We consider M = K BSs
with Nbs antennas each and K single-antenna UEs. Altogether, NBS = M Nbs
antennas with indices a = 1, . . . , NBS at the BS side and K antennas with indices
k = 1, . . . , K at the UE side are considered. The uplink channel matrix H in a
cooperative cluster is of dimensions NBS K.
The signal processing algorithm which has an inuence on the system performance shall be taken into account in the signicant channel selection. In the
iterative ZF JD algorithm which will be described in this section in detail, rstly
the matched ltering estimate for each UE is computed considering the signicant
useful channels, and then the interfering signals corresponding to the signicant
interfering channels are iteratively removed from this estimate. According to the
role of the physical channels in the data transmission, two types of signicant
channels for every UE are distinguished from each other as follows:
Signicant useful channels for a certain UE in the uplink are the channels
over which we get signicant useful contributions when we estimate the data
symbols transmitted from this UE.
Signicant interfering channels for a certain UE in the uplink are the
channels over which we get signicant interfering signals from other UEs when
we receive the data symbols from this UE.
9U
Correspondingly, we will use a signicant useful channel indicator matrix H
(k)
9
and individual signicant interfering channel indicator matrices HI to indicate
signicant useful channels and signicant interfering channels, respectively. In
the above indicator matrices, 1s are assigned into the positions corresponding
to the selected signicant channels, while 0s are assigned into the positions
corresponding to the insignicant channels.

96

CoMP Schemes Based on Multi-Cell Joint Signal Processing

The useful channels for each UE k are the channels between this UE and all
the BSs. The selection of signicant useful channels for each UE k is performed
based on the k-th column vector of the channel matrix H. Obviously, a single
9 U of dimensions NBS K is sucient to represent all the signicant
matrix H
useful channels for all the UEs.
The interfering channels for each UE k are the channels between other UEs
and all the BSs. A signicant interfering channel for one UE could be considered
as an insignicant interfering channel for another UE. Therefore, it is reasonable
to separately represent the signicant interfering channels for individual UEs k
9 (k) .
by individual UE-specic signicant interfering channel indicator matrices H
I
Furthermore, for each UE k there are two kinds of channels irrelevant to the
interference considered in the proposed decentralized CoMP scheme, and they
9 (k) . Firstly, corresponding to the
are indicated as dont care elements in each H
I
9 (k) are certainly
useful channels for UE k, the elements in the k-th column of H
I
9 (k) corresponding
dont care elements. Secondly, the elements in the rows of H
I
to the insignicant useful channels for this UE k are also dont care elements.
The reason is that in the proposed JD algorithm the received signals at the BS
antennas corresponding to the insignicant useful channels for each UE k will
not be used in the data estimation of this UE.
Taking the channel group including all channels between all antennas of one
BS and one UE as a selection unit, a practical signicant channel selection scheme
according to the following mathematical criteria is proposed. For each UE k,
rstly we select its signicant useful channels. Let Am denote the set of indices a
of the antennas belonging to BS m, m = 1, . . . , M . The channel from UE k to BS
antenna a, characterized by the coecient hak , is selected as a signicant useful

channel if the channel group gain aAm |hak |2 covers a signicant portion of the
 
sum of all useful channel gains for this UE m aAm |hak |2 . Then, we select
the signicant interfering channels for each UE k based on the channel coecients excluding the dont care elements. Let Bk denote the set of indices of
the BSs corresponding to the selected signicant useful channel groups for UE k.
For each UE k, if a channel with the channel coecient hak' , k ' = k, is selected as
a signicant interfering channel, it has to fulll the following
condition.
Namely,



a H a

the channel group weighting factor magnitude aAm (hk ) hk' corresponding to the scaling of the interference in the matched ltering estimate covers a
signicant portion of the
of the channel group weighting factor magnitudes
 sum



(ha )H ha'  for all the interferences to UE k. In fact, we
'
k
k
k =k
mBk
aAm
implicitly select the relevant signicant interfering channels based on the selected
signicant useful channels. As can be seen from the above mathematical criteria,
the selection of hak' depends on the considered useful channel coecient hak .
Two UEs have compatible signicant interfering channels if all the signicant
interfering channels selected for one UE are never considered as insignicant
interfering channels for the other UE. If all the UEs have compatible signicant
interfering channels, all the individual UE-specic signicant interfering channel

6.2 Uplink Decentralized Joint Detection

UE

significant useful channels for UE 1

BS

significant interfering channels for UE 1


H =

3
1 0 *
* * *
(3)
H
=
I

0 1 *

* *
(2)
H
= 1 *
I

0 *

(1)
H
I = *

useful channels
for UE 1

97

1 0 1

HU = 1 1 0

0 1 1

Rules to determine the elements in HI


1 0 1
9 (k) and H
9 (k)
as the combination of H
*
I
I
I = 1 1 0
H

0
0 1 1

1
0 1
.  /a
(k )
9
hI
0
0
1

1
1 0
k

.  /a

* *
(k )
9
hI
1
1
0
1, 0,
0
k

corresponding to
insignificant useful
channels for UE 1

compatible

results

: channel coefficients 1 : significant channels

. /a .  /a
(k )
9
hI
hI = 9
k

incompatible

0 : insignificant channels * : dont care elements

Figure 6.5 Example for signicant channel selection and indicator matrix formalism.

9 (k) can be represented by one combined signicant interferindicator matrices H


I
9 I . Details of the rules for this combination can be
ing channel indicator matrix H
found in Fig. 6.5, in which the proposed signicant channel selection scheme and
the corresponding signicant channel indicator matrix formalism are visualized.
For the sake of simplicity, a 3-cell system with one antenna at each BS is used
as an exemplary scenario here. Assuming that the relation between the channel
magnitudes strongly depends on the corresponding distances between the UEs
9 (k) , k = 1, 2, 3, can be easily obtained. Two kinds of
9 U and H
and the BSs, H
I
9 (1) are pointed out. Furthermore, the
dont care elements denoted by in H
I
9 (k) to obtain H
9 I is performed.
element-wise combination of individual matrices H
I
(1)
(2)
(3)
9 ,H
9
9
For example, the (1, 1)-th entries of H
and H
are , , and 1,
I
I
I
respectively. Following the combination rules in Fig. 6.5 considering the dont
9 I is 1.
care elements, the (1, 1)-th entry of H
In practice, signicant channel selection has to be possible at moderate complexity. Therefore, the roughly estimated channel magnitudes instead of the accurate channel coecients could be used in practice. Further, it is not necessary to
perform the signicant channel selection at every signal processing time slot. In a
certain time interval depending on the characteristics of the time-variant mobile
radio channels, the cooperative signal processing could be performed based on
the same signicant channel selection results.

98

CoMP Schemes Based on Multi-Cell Joint Signal Processing

Iterative ZF JD Algorithm with Signicant CSI


We now discuss the signal processing algorithm in the uplink decentralized CoMP
scheme. In cellular systems with strong noise, the minimum mean square error
(MMSE) algorithm or the matched lter (MF) algorithm could be applied. In
the interference-limited cellular systems as considered in this section, the linear
multiuser ZF algorithm which can eliminate all the inter-cell interference is a
good choice to achieve a good system performance. However, a pseudo-inversion
of the channel matrix typically required in the linear ZF algorithm can cause
great computation complexity especially in cellular systems with a large number
of cells. Meanwhile, the linear ZF algorithm is not suitable for a decentralized
implementation. Following the Jacobi method in linear algebra [GVL96] which
solves the linear matrix equation requiring matrix inversion in an iterative way,
the parallel interference cancelation (PIC) algorithm has been proposed [DSR98,
Ver98]. Generally, in a realistic large cellular system, the PIC algorithm has a
lower complexity but a similar performance as compared to the conventional
linear ZF algorithm. Following the idea of the PIC algorithm, an iterative ZF
JD algorithm focusing on interference cancelation with signicant CSI instead
of full CSI is proposed in this section. The reasons for choosing this algorithm
in the practical uplink CoMP scheme are briey summarized:
As one practical multiuser detection strategy, the parallel iterative ZF algorithm has a moderate computational complexity.
This algorithm follows the principle of interference cancelation, and therefore
has a fairly good performance in realistic interference-limited cellular systems.
The parallel signal processing with UE-oriented signicant CSI is suitable for
a decentralized implementation based on the coordinated BSs.
The signicant CSI considered in the signal processing algorithm is taken from
the channel coecients included in the full channel matrix H according to the
signicant channel selection results. Corresponding to the signicant channel
9 (k) and H
9 I , the signicant CSI is described by the
9 U, H
indicator matrices H
I
signicant useful channel matrix
9 U,
HU = H ( H

(6.19)

the UE-specic signicant interfering channel matrices


(k)

HI

9 ,
=H(H
I
(k)

(6.20)

and the combined signicant interfering channel matrix


9 I,
HI = H ( H

(6.21)

where the operator ( denotes the element-wise multiplication of two matrices.


The above signicant channel matrices contain channel coecients of the significant channels, 0s corresponding to the insignicant channels, and the dont
care elements.

6.2 Uplink Decentralized Joint Detection

99

For single-antenna transmitters (Nue = 1), the transmitted vector s as dened


in Section 3.4 is equivalent to the data vector x, and hence we obtain from (3.1)
y = H s + n = H x + n.

(6.22)

Applying the iterative ZF JD algorithm with full CSI, i.e., the PIC algorithm,
(i) in the i-th iteration can be described by [Ver98]
the estimated data vector x


 H 1  H


(i 1) ,
(i) = diag H H
H y diag HH H x
x
(6.23)
where diag () sets all the elements on the diagonal of its argument to zero. In the
(i) could be forwarded to a
end of every iteration i, the estimated data vector x
data estimate rener applying hard quantization or soft quantization techniques
(i). Now we will apply the signicant
to obtain the rened estimated data vector x
(k)
CSI described by HU given in (6.19) and by HI given in (6.20) instead of full
CSI into the PIC algorithm. According to the functionalities of the channels in
data transmission, rstly the signicant useful channel coecients in HU will be
considered in the matched ltering part. Then, these signicant useful channel
coecients and their corresponding signicant interfering channel coecients in
(k)
HI will be considered in the iterative interference cancelation part. In this way,
the proposed iterative ZF JD algorithm with signicant CSI can be derived as


(i 1) .
(i) = 1 HH
x
(6.24)
U y diag (H ) x
In the above equation, the channel gain scaling matrix is dened as


= diag HH
U HU ,
and the channel correlation matrix H is dened as

(1)
[HU ]H
1 HI

..
,
H =
.

[HU ]H
K

(6.25)

(6.26)

(K)
HI

where the matrix operator [ ]k returns the k-th column vector of its argument.
Knowing that a suitable data estimate renement can further improve the system
performance, in this section the iterative algorithm applying the transparent data
(i) = x
(i), is considered. Without loss of generality, it
estimate renement, i.e., x
can be treated as a benchmark for this kind of iterative algorithms with dierent
data estimate renement techniques.
In the case that the linear iterative JD algorithm with signicant
CSI described



(i) = x
(i 1) and the matrix + diag (H ) has full
by (6.24) converges with x
(i) can be easily calculated from (6.24) as
rank, the limiting value of x
1 H

() = + diag (H )
HU y.
(6.27)
x
Under special conditions, this equation can be simplied step by step in dierent
cases as shown in the following:

100

CoMP Schemes Based on Multi-Cell Joint Signal Processing

In the case that all the individual UE-specic signicant interfering channel
(k)
matrices HI can be combined to one matrix HI , one obtains
H = HH
U HI ,

(6.28)

and the limiting value of this iterative JD algorithm is described by




1 H
() = + diag HH
x
HU y.
U HI

(6.29)

This happens when UEs have compatible signicant interfering channels.


In the above case, if the signicant interfering channel matrix HI covers all
the non-zero elements of the signicant useful channel matrix HU , one obtains

 H 

= diag HH
U HU = diag HU HI ,

(6.30)

and the limiting value of this iterative JD algorithm can be represented by



1 H
() = HH
HU y.
(6.31)
x
U HI
This case happens when the signicant interfering channels for every UE are
the signicant useful channels of other UEs.
One special case is the iterative JD algorithm with full CSI, for which
HU = HI = H

(6.32)

holds, and the limiting value of this iterative algorithm is



1 H
() = HH H
x
H y.

(6.33)

This case simply corresponds to the full ZF solution.


Decentralized Signal Processing Scheme
Let us now focus on how the above stated computations can be performed in a
decentralized way. It is worth noting that for each UE k, the signicant useful
channel coecients [hU ]ak from HU are considered in the part of matched ltering.
In the part of iterative interference cancelation, at the BSs corresponding to the
(k)
signicant useful channels, the signicant interfering channel coecients [hI ]ak'
(k)
from HI are considered for every UE k.
Step 1: Initialization for the iterative signal processing of UEs k = 1, . . . , K, at their
corresponding BSs m = k in parallel:
(a) Assign the channel gain scaling factor



H
 [hU ]a 2
([hU ]ak ) [hU ]ak =
(6.34)
k =
k
a

for every UE k at BS m = k. k can be previously estimated in practice.


(b) Assign the initial value of the estimated data symbol as xk (0) = 0 for every
UE k at BS m = k.
Step 2: Matched ltering for every UE k, k = 1, . . . , K, in parallel:

6.2 Uplink Decentralized Joint Detection

101

(a) Compute the matched ltering estimate components


H

rka = ([hU ]ak )

ya

(6.35)

for every UE k at all the BSs corresponding to its signicant useful channels.
(b) Collect rka from the coordinated BSs through the backhaul links, and sum
them up at BS m = k to obtain the matched ltering estimate for each UE
k as


H
rk =
rka =
([hU ]ak ) ya .
(6.36)
a

Step 3: Iterative interference cancelation for every UE k, k = 1, . . . , K, in parallel:


The following steps are performed iteratively in iterations i, i = 1, . . . , L. The
required number L of iterations can be previously assigned.
(a) Compute reconstructed interfering signals
/a
.
/a
.
(k)
H
x
k' (i 1)
f (k) ' = ([hU ]ak ) hI
'
k

(6.37)

from dierent UEs k ' = k to UE k at the BSs with antenna indices a corresponding to the signicant interfering channels
UE k.
7
8for
a
(b) Collect the reconstructed interfering signals f (k) k' from coordinated BSs
over the backhaul, and subtract them from rk at BS m = k to obtain the
estimated data symbol for every UE k as
'
/a *
 .
1
(k)
rk
f
x
k (i) =
k
k'
k' =k a
'
*
.
/a
 
1
(k)
a H
rk
([hU ]k ) hI
xk' (i 1) .
(6.38)
=
k
k'
'
a
k =k

(c) Forward the estimated data symbols x


k (i) which are required at other BSs
in the next iteration to compute the reconstructed interfering signals in
Step 3(a).
For the sake of simplicity, the 3-cell cellular system with a single antenna at
each BS in Fig. 6.5 is taken as the exemplary scenario to visualize the implementation of the decentralized JD with signicant CSI in Fig. 6.6. Signicant
CSI according to the signicant channel selection results in Fig. 6.5 is applied in
the decentralized signal processing in Fig. 6.6. For example, two signicant useful
channels for UE 1 corresponding to its neighboring BS 1 and BS 2 are considered
to obtain the matched ltering data estimate r1 . Only one signicant interfering
channel at each involved BS is considered for interference reconstruction and
cancelation for each UE.
Additionally, the backhaul communication steps between coordinated BSs in
the above exemplary 3-cell scenario applying the proposed decentralized JD are
demonstrated in Fig. 6.7. For the computation of data estimates xk (i) of UEs k,

102

CoMP Schemes Based on Multi-Cell Joint Signal Processing

y1

BS 1 

[hU ]11

H

r1

H
[hU ]13 [hI]11


H
[hU ]13

y2

BS 2 

[hU ]22

H


H
[hU ]11 [hI ]13

r2

BS 3


H
[hU ]33

H
[hU ]32

matched filtering

x1 (i )
l1

x1 (i -1)

1

H
[hU ]21 [hI ]22


H
[hU ]21

y3


H
[hU ]22 [hI]21

r3

x2 (i -1)

1

H
[hU ]32 [hI ]33

H
[hU ]33 [hI ]32

x2 (i )
l2

x3 (i )
l3

x3 (i -1)

iterative interference reconstruction and cancellation

Figure 6.6 Decentralized signal processing with signicant CSI in a 3-cell system.

k = 1, . . . , K, at their corresponding BSs m = k, three kinds of information have


to be exchanged between the coordinated BSs as follows:
um,l denotes the matched ltering data estimate which has to be forwarded
from BS l to BS m.
zm,l denotes the preliminary estimated data symbol in the previous iteration
which has to be forwarded from BS l to BS m.
vm,l denotes the weighted interfering signal which has to be forwarded from
BS l to BS m.
In the CoMP scheme applying JD with full CSI, for each UE k all um,l , vm,l and
zm,l from BSs l = 1, . . . , k 1, k + 1, . . . , K have be to forwarded to BS m = k.
In the proposed CoMP scheme applying JD with signicant CSI, the backhaul
communication load can be signicantly reduced. This advantage is shown in
Fig. 6.7, where a few intermediate results are exchanged between BSs.
In this section, the backhaul load is evaluated in terms of the number NBL of
intermediate signal processing results which have to be exchanged between the
BSs to estimate one data symbol of each UE. It is found that even with the same
number of signicant channels, the backhaul load varies with dierent choices of
signicant channels. Applying the above proposed JD algorithm considering NU
signicant useful channel groups and NI signicant interfering channel groups
for each UE in a K-cell cellular system, the lower bound and the upper bound

103

6.2 Uplink Decentralized Joint Detection

BS 1

u1,2 , v1,2

BS 2

h11 , h13

h21 , h22

z2,1
u3,1
v3,1

z1,3 z3,2

u2,3
v2,3

BS 3

between
zm,l
um,l
two from BS l from BS l
to BS m to BS m
BSs
z2,1 =
u1,2 =
1 and 2  2 H
y2 x
[hU ]1
1 (i 1)
2 and 3

h32 , h33

3 and 1

u2,3 =

H
[hU ]32

z3,2 =

y3

u3,1 =

H
[hU ]13

y1

x2 (i 1)

vm,l
from BS l
to BS m
v1,2 =

H
[hU ]21 [hI ]22 x2 (i 1)

v2,3 =

H
[hU ]32 [hI ]33 x3 (i 1)

z1,3 =

v3,1 =

x3 (i 1)


H
[hU ]13 [hI ]11 x1 (i 1)

Figure 6.7 Backhaul communication steps for JD with partial CSI in a 3-cell system.

of NBL are derived as


lower
NBL
=

(NU 1)


matched ltering

upper
NBL

N L

and

(6.39)

interference cancellation

for 1 NI (K 1)
2 NI L
= (NU 1) + (NI + K 1) L for (K 1) < NI (K 1)2
.


2
(K

1)

L
for
(K

1)
<
N

(K

1)

K
I
matched ltering



interference cancellation

Additional backhaul communication between cooperative BSs is considered


as one major drawback of the proposed CoMP scheme. From a signal processing point of view, the backhaul trac could be greatly reduced by applying
signicant CSI instead of full CSI in the JD scheme shown above. Furthermore, various approaches could be applied to eciently exchange the information between BSs in practical systems. For example, the likelihood information on the transmitted bits could be calculated in the above JD scheme
and exchanged between BSs to reduce the backhaul trac [KF08, AEH08].
In [MF08a, MF09b, dCS09], the quantized received signals are rst compressed
via Wyner-Ziv source coding, exploiting the correlation between received signals [WZ76], and then exchanged between BSs. Similarly, the matched ltering
data estimate um,l and the weighted interfering signal vm,l , which have to be
exchanged between the BSs, could also be quantized or compressed. In [MF08d],
a CoMP scheme based on the exchange of distributively decoded messages is
proposed, and Slepian-Wolf source coding [SW73a] is applied for the compression of the decoded bits. In [SSPS09c, GMFC09], a CoMP scheme based on the
exchange of quantized transmit sequences derived from the decoded messages
is proposed, and Wyner-Ziv source coding is used for the compression of the
quantized sequences. Similarly, the preliminary estimated data symbol zm,l , the
decoded bits or the transmit sequences in above JD scheme could be appropriately compressed via source coding and then exchanged between BSs. Backhaul
issues will be discussed in detail in Chapter 12.

104

CoMP Schemes Based on Multi-Cell Joint Signal Processing

6.2.2

Performance Assessment
Analytical Calculations
The system performance of the uplink decentralized CoMP scheme can be investigated based on the limiting value of the iterative ZF JD algorithm with signicant CSI. Considering the transparent data estimate renement, the limiting
value of the estimated data vector described by (6.27) can be rewritten as


1 H 
1 H 
= diag + diag (H )
HU H x + diag + diag (H )
HU H x
x

+ +

useful contribution
1 H
diag (H )
HU n .

interference

(6.40)

noise

Based on the above limiting value, the signal-to-interference-and-noise ratio


(SINR) can be calculated as an important performance metric. Furthermore,
the system performance in every channel snapshot can be analytically assessed
in terms of the bit error rate (BER) and the capacity based on the SINR. It
is assumed that the applied modulation scheme can ensure the same average
transmitted power P per data "symbol,
# and the zero-mean Gaussian noise n
C[NBS 1] has the covariance E nnH = 2 I. The data vector x and the noise
vector n are statistically independent. For one channel snapshot, the SINR of
the data estimate x
k for UE k can be written as
(k) =

(k)

S (k)
, where
N (k) + I (k)

(6.41)

)
(



1 H 
 H 1 
H
= P diag + diag (H ) HU H diag H HU + diag H

,
k,k

(
)



1
 H 1 

H
H
diag
H
H
diag

I (k) = P diag + diag (H ) HH

+
U
U
H

k,k

and N (k) = 2

(
+ diag (H )

1

 H 1

HH
U HU + diag H

)
(6.42)
k,k

can be calculated from (6.40). Since only partial CSI instead of full CSI is applied
in the cooperative signal processing, the data estimates may contain slightly
rotated and scaled useful contributions. However, such a rotation or scaling can
be easily estimated and compensated at the receiver.
Numerical Simulation Results
In the following, the system performance of the proposed CoMP scheme is
assessed with respect to numerical simulation results in Figs. 6.8 and 6.9. A
small cellular system including 3 cells with a frequency reuse factor of 1 as
shown in Fig. 6.5 is taken as the reference scenario. Some key pre-assumptions
for the simulations are listed as follows:

6.2 Uplink Decentralized Joint Detection

1.0

1.0
MF-cell

0.8
0.6

JD-(2, 1)

0.4

L=1

0.8

CDF

CDF

105

0.6
L=5
0.4

JD-(3, 6)
0.2

L=2

L=

0.2
JD-(2, 3)

0
0

10 15 20 25 30 35 40 45
[dB]

(a) Limiting values of iterative JD.

10 15 20 25 30 35 40 45
[dB]

(b) JD-(2, 3) with L iterations.

Figure 6.8 CDF of the output SINR in the uplink with N = 20 dB.

Rayleigh fading and a pathloss with attenuation exponent = 3 with respect


to the distance are considered.
Applying OFDM, in every cell one BS with 3 antennas and one active UE
with a single antenna in every time slot and sub-carrier are considered. The
UE is randomly and uniformly distributed in its cell.
The data vector x C[K1] including independently and identically
#dis"
tributed (i.i.d.) zero-mean Gaussian elements has the covariance E xxH =
P I with P = 1. The noise vector n "C[NBS#1] including i.i.d. zero-mean
Gaussian elements has the covariance E nnH = 2 I.
The proposed JD scheme considering NU signicant useful and NI signicant
interfering channel groups for every UE is denoted by JD-(NU , NI ).
The parameter N is used to indicate the ratio of the average receive power
to the noise power in the reference scenario. For a UE at any xed position,
its average matched ltering receive power considering all BS antennas in
the 3-cell system can be calculated as a constant parameter. With the unit
transmit power
data symbol P = 1, the average receive power for this UE
"  per
a a H
is PB = E
a hk (hk ) }. The noise condition in the
 reference
:  scenario can be
characterized by the parameter N (dB) = 10 log10 PB 2 calculated for the
UE located at the intersection point of the 3 cells. In both Figs. 6.8 and 6.9,
N = 20 dB for the interference-limited scenario is considered.
In Fig. 6.8(a), the system performance of dierent cooperation and detection
schemes considering dierent signicant channels is compared in terms of the
cumulative distribution function (CDF) of the output SINRs. Here the system
performance of the proposed decentralized CoMP scheme is investigated based
on the limiting values of the iterative JD algorithm. It is shown that the conventional intra-cell matched ltering scheme denoted by MF-cell gives the worst
performance among all the investigated schemes. The reason is that this scheme

CoMP Schemes Based on Multi-Cell Joint Signal Processing

NI = 1, 2
6

NI = 1, 2, 3, 4, 5, 6

NI = 1, 2

NI = 1, 2, 3, 4, 5, 6

30

NI = 1, 2, 3, 4

NI = 1, 2, 3, 4
4

20

lower
NBL

Cout [bit/s/Hz]

106

10

0
1

2
NU
(a) Outage capacity.

2
NU

(b) Backhaul load.

Figure 6.9 Outage capacity and backhaul load vs. numbers of signicant channel
groups considered in JD with L = 4 iterations, pout = 0.1, N = 20 dB, M = K = 2.

considers only intra-cell useful channels but no inter-cell interfering channel in the
signal processing for each UE. The system performance is strongly limited by
the inter-cell interference. Applying the proposed decentralized CoMP scheme
considering signicant CSI, the CDF curve of the SINRs for JD-(2, 1) and
that for JD-(2, 3) are plotted. Obviously, the proposed decentralized CoMP
scheme considering a few appropriately selected UE-oriented signicant channels
can strongly improve the system performance as compared to the conventional
intra-cell matched ltering scheme. The communication scheme denoted by JD(3, 6) is nothing else but the decentralized CoMP scheme applying the iterative
ZF JD algorithm with full CSI considering all the useful and interfering channels for every UE. All the inter-cell interference is eliminated, and the system
performance is only limited by the Gaussian noise.
The SINR performance of the proposed decentralized CoMP scheme considering dierent numbers of iterations in the iterative JD algorithm is investigated
in Fig. 6.8(b). It is shown that after only a few iterations, the SINR performance
of the iterative JD converges to that of the corresponding JD with an unlimited number of iterations. The convergence behavior can be well retained even if
signicant CSI instead of full CSI is considered in JD.
In Fig. 6.9, the inuence of dierent amounts of considered signicant CSI
on the system performance and on the backhaul load of the proposed decentralized CoMP scheme is investigated. According to the proposed signicant
channel selection scheme, for each number of signicant useful channel groups
NU , the number of signicant interfering channel groups NI could range from
1 to (K 1) NU . In Fig. 6.9, (K 1) NU bars are plotted for each number NU
corresponding to all possible numbers of signicant useful and interfering channels considered in JD-(NU , NI ). In Fig. 6.9(a), the outage capacity of UEs in
the 3 cells is plotted. In Fig. 6.9(b), the lower bound of the backhaul load as

6.2 Uplink Decentralized Joint Detection

107

described by (6.39) is plotted. Since i.i.d. Gaussian data symbols and i.i.d. noise
signals are considered, it is reasonable to assume that the remaining interfering
signals and the noise signals after the linear JD are uncorrelated and Gaussian
distributed. With (k) indicating the SINR for UE k, the corresponding instantaneous capacity is calculated as


(k)
(6.43)
Cint = log2 1 + (k) .
The outage capacity Cout is dened w.r.t. its outage probability pout as
!
(k)
(6.44)
pout = Prob Cint < Cout , k = 1, 2, 3,
(k)

where pout describes the probability that the instantaneous capacity Cint of
one UE is smaller than the outage capacity Cout . For every pair of (NU , NI ),
an outage capacity Cout can be calculated based on a given pout . The most
important results derived from Fig. 6.9 are the following:
Generally, the more signicant channels are considered in the decentralized
CoMP scheme, the better system performance can be achieved. The more
signicant useful channels are considered, the more noise can be suppressed.
The more interfering channels are considered in JD, the more interference can
be eliminated, but the larger the noise enhancement will be.
Interestingly, it is shown that for a given number of signicant interfering
channel groups NI , a larger outage capacity can be achieved when considering
a smaller number of signicant useful channel groups NU . The reason is that
the more signicant useful channel groups are considered, the more BSs are
involved in JD, and the more interference is included in the matched ltering
data estimate for each UE. The system performance of the proposed scheme is
mainly limited by the remaining interfering channels which are not considered
as signicant interfering channels in JD. In fact, the outage capacity increases
with the number of signicant useful channel groups NU and the ratio
=

NI
,
(K 1) NU

(6.45)

rather than directly with the number of signicant interfering channel groups
NI . Variable indicates the ratio of the number of considered signicant
interfering channels to the total number of interfering channels for each UE.
Considering only a few appropriately selected signicant channels, a good
system performance with a moderate backhaul load can be achieved by the
proposed JD scheme. For example, JD with full CSI, i.e., JD-(3, 6), requiring
at least a backhaul load of NBL = 26 exchanged messages, and achieves an
outage capacity of Cout = 5.37 bit/s/Hz, while JD-(2, 3) which requires only
a backhaul load of NBL = 13 can already achieve an outage capacity of Cout =
4.96 bit/s/Hz. Hence, a good compromise between backhaul load and data rate
can be made by considering signicant CSI in JD.

108

CoMP Schemes Based on Multi-Cell Joint Signal Processing

6.2.3

Summary
A practical uplink decentralized CoMP scheme has been proposed in this section.
The decentralized CoMP scheme can be directly implemented at the coordinated
BSs in a exible way without requiring a central unit. Distinguishing the significant useful channels from the signicant interfering channels, only the channel
state information which plays a signicant role in the system performance of
each UE is required in joint detection. A good compromise between data rate
and backhaul load can be made in the proposed decentralized CoMP scheme
considering signicant CSI.

6.3

Downlink Distributed CoMP Approaching Centralized Joint


Transmission
Lars Thiele, Thomas Haustein, Volker Jungnickel, Wolfgang Zirwas
and Federico Boccardi
In this section, we focus on the downlink and consider centralized joint transmission. Here, multiple base stations (BSs) cooperate in jointly transmitting precoded data symbols to multiple user equipments (UEs) such that desired signals
overlap coherently and the interference is partially canceled out. First, CoMP
transmission requires knowledge on the compound channel matrix, i.e. multi-cell
channel state information at the transmitter (CSIT), between all UEs and BSs
involved in the coherent downlink transmission, in order to obtain the spatial
precoding weights. Second, we initially assume that both the multi-cell CSIT as
well as the scheduled data bits to be transmitted to the UEs are distributed to
all involved BSs, an assumption which will be alleviated later in Section 6.4.
Concerning coherent joint downlink transmission, information theoretical publications typically consider a centralized network architecture, where perfect
synchronization, unlimited backhaul and negligible delay are assumed. All BSs
in the network are grouped in a huge cluster. Their transmit antennas act
as inputs of a generalized multiple-input multiple-output (MIMO) broadcast
channel (BC), while antennas from multiple UEs are considered as the outputs. The BS antennas are connected via a fast backhaul link to a CoMP central unit (CCU) [BMWT00, SZ01, WMSL02, GJJV03]. Non-linear signal preprocessing, known as dirty paper coding (DPC) [Cos83], was shown to achieve the
BC capacity [CS03, VT03]. By allowing full coordination among the whole network, multi-cell interference can be removed to a certain degree, which depends
on the selected precoding strategy as well as on the channel knowledge. Due to
additional beamforming gain (referred to in Section 3.5 as array gain), the system
throughput can exceed the one known from an isolated cell [HV04]. Recently,
maximum Eigenvalue transmission (MET) was introduced as a linear transceiver
optimization method, which was shown to approach the BC capacity in an isolated cluster [BH07a]. The Eigenmode concept was shown to reduce the peak-

6.3 DL Distributed CoMP Approaching Centralized Joint Transmission

109

Cluster of
base stations

Figure 6.10 Illustration of a cluster of collaborative base stations.

to-average power ratio (PAPR) for linear precoding [JHJvH02] as well as for
non-linear Tomlinson-Harashima precoding (THP) [NMK+ 07]. In general, by
limiting each user to report its strongest eigenmodes only, feedback may be
reduced.
Recent results applying generalized MIMO techniques in wireless networks
show huge gains [FHK+ 05, KFV06]. In addition, [ZD04] proposes a common
framework to study multi-user CoMP downlink transmission, considers practical
signal processing issues and emphasizes the advantage of array gain, enhanced
channel rank and macro diversity. In [JJT+ 09], the authors conrm these ndings
based on channel measurements from a real cellular urban-macro deployment.
The work in [VHLV09] promises signicant gains obtained from a simulator
including realistic operational conditions valid for a WiMAX system operated in
an indoor scenario.
However, higher complexity, growing data rates on the backhaul and the
additional overhead remain serious challenges for the introduction of CoMP in
next generation mobile networks. Note that these costs can be scaled down
with the size of the cooperation cluster. In the downlink, backhaul requirements increase at least linearly with the number of BSs belonging to the cluster [HS09] (in a centralized approach). See a detailed discussion on backhaul
requirements in both centralized and decentralized downlink CoMP in Section 12.2. Hence, a distributed implementation of CoMP is realistic where the
serving BS cooperates with a small subset of BSs (see Fig. 6.10) in its direct
vicinity [ZSK+ 06, MF07a, JTS+ 08b, NEHA08, PHG09, ZMS+ 09a, TWH+ 09].

110

CoMP Schemes Based on Multi-Cell Joint Signal Processing

Sections 13.3 and 13.4 present details on a real-time implementation [JTW+ 09]
of downlink CoMP demonstrating its feasibility.
The subsequent section is organized as follows: In Subsection 6.3.1, we introduce an extended system model which covers the algorithms described in this
work. Then we continue with a general description of CoMP joint transmission (JT) obtained in a fully centralized setup and determine the system capacity by use of DPC under a sum power constraint in Subsection 6.3.2. In the
next steps, we introduce concepts to alleviate the major drawbacks related to
joint transmission, as e.g. its higher complexity, increased backhaul and signaling overhead. Those concepts cover linear precoding techniques, a greedy user
selection process and clustering solutions in Subsection 6.3.3. The clustering can
be carried out statically or dynamically and restrict joint processing techniques
to a limited number of base stations. Moreover, the cluster formation may be
performed and optimized by a central entity (network-centric), or in a per-user
way (user-centric). In Subsection 6.3.4, we introduce a concept for a unied
channel state information (CSI) feedback framework to cope with dierent vendor specic types of channel feedback provided by mobile devices. This concept
is well-aligned with the estimation of an eective channel described in Subsection 9.1. Finally, we summarize a system concept where each terminal provides
channel feedback to its serving base station only; the base stations in the same
cluster exchange the channel feedback and payload data in order to determine
the precoding weights and perform the spatial precoding, both in a distributed
manner.

6.3.1

System Model
We consider a cellular orthogonal frequency division multiplex (OFDM) downlink where a central site is surrounded by multiple tiers of sites. As in Chapter 3,
we assume each site to be partitioned into three 120 sectors or cells, i.e. yielding a set M consisting of M = |M| sectors in total. In our notation, each sector
constitutes a cell which is controlled by one BS, and frequency resources are fully
reused in all M cells. In joint transmission, the data to each user is simultaneously transmitted from multiple BSs. In order to mitigate the overhead related
to joint transmission techniques, BSs are grouped into C subsets or clusters, of
which one example is shown in Fig. 6.10. Mc represents the set of cells included
in a cluster c and Mc = |Mc | denotes its maximum dimension. Joint processing is only allowed between BSs belonging to the same cluster, whereas BSs
belonging to dierent clusters are not coordinated and thus produce residual
inter-cluster interference. As an extension, coordinated beamforming techniques
may be used to deal with the interference between clusters, i.e. to coordinate
the inter-cluster interference, as introduced in Section 5.3. Further, we assume
disjoint clusters, i.e. a given BS cannot belong to more than one cluster operated
at the same time/frequency resource, as for example created through clustering

6.3 DL Distributed CoMP Approaching Centralized Joint Transmission

111

techniques described in Chapter 7. Let us assume that each BS is equipped with


Nbs transmit antennas, and the scheduler of a cluster c has assigned a set of UEs
Kc to the same resource in time and frequency, while K denotes all UEs in all
clusters assigned to this resource. Assuming that UE k is served and scheduled
by cluster c, i.e. k Kc , the received downlink signal yk at UE k is given by


(Hck )H scj
+
(Hk )H sj + n (6.46)
yk = (Hck )H sck +


j{Kc \k}
j{K\Kc }
Desired signal

Intra-cluster interference k

Inter-cluster interference zk

assuming rank-1 transmission to the scheduled users. Analog to the notation


introduced in Section 3.5, sck C[Mc Nbs 1] denotes the precoded signals targeted
towards UE k and to be transmitted from all BS antennas connected to cluster c.
We emphasize that the rank-1 transmission assumption is only made to simplify
the notation: this assumption can be easily extended to the more general case of
sending multiple independent streams to a subset of scheduled users. The desired
data stream xk is distorted by intra-cluster and inter-cluster interference plus
noise aggregated in k and zk , respectively. Note that Hck spans the Mc Nbs Nue
channel matrix between all BSs in cluster c and UE k.

6.3.2

Theoretical Limits for Static Clustering and DPC


To study the limits of the system under the assumption of a static clustering,
we assume sum-rate maximized transmission within each cluster. The maximum
sum-rate in the c-th cluster can be obtained under the assumption of a distributed antenna system, where a centralized unit calculates the optimal transmission strategy for all BSs and users inside the cluster. The optimal transmission
strategy can be easily derived from works on the MIMO BC (see for example the
seminal works [CS03], [JVG04]), and is given by the so-called DPC scheme. We
dene as Rc the maximum throughput in the c-th cluster with a common power
constraint
2
4
K

c H
c
c
max
log2 det I +
(Hk ) ss,k Hk , (6.47)
Rc =
K
M


k=1
ss,k )0,
E {tr{m
}
M
P
}
c max
ss,k
m=1 k=1

m m H
where
= E{sck (sck )H } and m
ss,k = E{sk (sk ) }, and Pmax is the per-base
station power budget. We note that the original MIMO BC capacity [CS03],
[JVG04]) was calculated under a sum-power constraint. The problem of nding
the sum-capacity region of a downlink system with a per-antenna power constraint was considered in [YL07], where a generalized uplink-downlink duality
was established.

css,k

Simulation Results for DPC with Sum-Power Constraint


In the sequel, we consider the case where each BS is equipped with Nbs = 2
transmit antennas, and each UE with Nue = 2 receive antennas. All M = K = 21

112

CoMP Schemes Based on Multi-Cell Joint Signal Processing

BSs are considered as a huge distributed antenna system (DAS) (i.e. spanning
one huge cluster of cells), which jointly serve a set of users U, of which K U are
served on the same resource in time and frequency. Note, in contrast to the typical
Rayleigh fading assumption, the MIMO channels in this evaluation do not have
the same average signal-to-noise ratio (SNR). This is caused by the dierent
pathloss coecient to the dierent antenna arrays of the BSs in the cellular
deployment. The cellular channels are generated by use of the spatial channel
model extended (SCME) with a 3D antenna pattern. We are using an iterative
water-lling algorithm with a sum-power constraint [JRV+ 05] to determine the
maximum sum-rate of the system as a function of the size of the active set of
users U . While in practice we would rather consider a per-BS or per-antenna
power constraint than a sum-power constraint, these results should provide an
overview on a well-known water-lling algorithm and its achievable sum-rates
in a cellular deployment. Note that Section 6.3.4 evaluates a linear precoding
scheme with per-antenna power constraint.
For the results shown in Fig. 6.11, we assume that the transmit power per
physical resource block (PRB) emitted by each BS is set to Pi = 400 mW (equivalent to the full transmit power in LTE systems of 40 W for 20 MHz of bandwidth), Pi = 40 mW or Pi = 4 mW. As noise, we assume thermal noise given
at 20 C and an additional receiver noise gure of 9 dB. In particular, the high
transmission power of Pi = 400 mW yields a very high average SNR of 38 dB
for each user in U and its specic serving cell in M. We are aware that such high
SNRs cannot be achieved in practice due to various impairments in the system
hardware, such as resolution of analog to digital conversion (ADC), phase noise
etc.
Fig. 6.11 shows the achievable BC capacity by use of an iterative water-lling
algorithm with sum power constraint [JRV+ 05]. The capacity is given for 1 to 5
active users per cell and for the low to high SNR regime. The capacity increases
for an increasing set size U of available users, hence multi-user diversity helps
to improve the capacity of the BC. For the low SNR regime, i.e. Pmax = 4 mW
or Pmax = 40 mW, the capacity increases by 78% when assuming 3 active users
instead of 1 user per cell. In contrast, we observe a slightly reduced slope in
the high SNRs regime (Pmax = 400 mW), i.e. for an equivalent gain we need
to have 4 active users per cell. Peak capacities, i.e. 90%-ile per cell approach
6.7 bit/s/Hz, 14.6 bit/s/Hz and 23.5 bit/s/Hz for 3 users per cell for low to
high SNR regime, respectively. As a reference, we include the BC capacity for
Rayleigh fading channels and an average SNR of 38 dB per transmit antenna. It
turns out that the typical Rayleigh fading assumption with equivalent average
SNR overestimates the capacities by approx. 33%.

6.3 DL Distributed CoMP Approaching Centralized Joint Transmission

113

median spectral eciency [bit/s/Hz/cell]

35
30
25
20
15
4 mW per PRB
40 mW per PRB
400 mW per PRB
Rayleigh fading with mean
per-antenna SNR=38dB

10
5
0
0

2
3
4
number of active users per cell

Figure 6.11 Cellular deployment with 21 cells, fully coordinated by use of iterative

WF [JRV+ 05] and sum power constraints. Rayleigh fading CDF is given for an
average per-antenna SNR of 38 dB. Error bars indicate the standard deviation of the
MIMO BC distributions.

6.3.3

Practical (Linear) Precoding


The study presented in the last section for non-linear precoding can be easily
extended to the case of linear precoding, in which the transmitted signal is a
spatially multiplexed, linear combination of the users data signals. Thus, we
write sck = wkc xk , where wkc C[Mc Nbs 1] is the involved precoding vector. The
achievable signal-to-interference-and-noise ratio (SINR) is estimated at each UE,
according to

2
 H

H
gk (Hck ) wkc 
(6.48)
SINRk =  
2
7
8 ,
 H

H
g
gk (Hck ) wjc  + gkH zk zH
k
k
j{Kc \k}

with gk being the combining weights at the receiver.


One class of linear precoding techniques for the case of a single-antenna
receiver is based on zero-forcing [BTC06, YG06b, DS05, PHS05], where each
user receives only its desired signal with no interference. Because the number
of spatial channels formed using linear beamforming is limited by the number
of transmit antennas, the transmitter selects a set of active users for receiving
data. This user selection could be done optimally using a brute-force search over
all possible combinations of users, but due to the high complexity when the
number of users is large, suboptimum techniques based on a greedy algorithm
have shown to provide near-optimum performance [BTC06]. This topic is studied in detail in Section 11.1. Extensions of the zero-forcing technique to the case
of multiple receive antennas appear in [VVH03, CM04, SSH04], where multiple

114

CoMP Schemes Based on Multi-Cell Joint Signal Processing

spatial streams (or eigenmodes) are transmitted to each user with no inter-user
interference, resulting in a block diagonal (BD) covariance matrix.
An extension of the BD concept, called MET, was proposed in [BH07a] and
uses a linear transmission strategy based on zero-forcing beamforming for maximizing the weighted sum-rate. On a frame-by-frame basis, MET distributes up
to Mc Nbs spatially multiplexed streams for one or multiple users.
MET was initially proposed for multi-user MIMO (MU-MIMO) transmissions
and its extension to the CoMP case can be summarized as follows. Lets assume
that each user multiplies its channel matrix by the Hermitian of the left dominant
eigenvector. The eective channel after linear antenna combining is then
hck MET = (uck )H Hck = (uck )H Uck ck (Vkc )H = k,c (vkc )H .

(6.49)

As an extension to the MET and according to [TWH+ 09], we introduce the


following concept of an eigenmode-aware optimum combiner (EOC).

H
Hck .
(6.50)
hck EOC = (Zk )1 uck
Let us assume the inter-cluster interference aggregated in the covariance matrix
Zk is known to the receiver. Thus, we can determine an eective channel after linear combining, which considers the projection of residual interference according
to the optimum combining (OC) strategy from [Win84]. This lter determines
a specic channel direction, based on the dominant eigenmode uck , which provides highest post-equalization SINR for the desired signal and thus leads to an
improved system throughput with respect to (6.49).
Based on (6.49) and (6.50), the central unit can schedule a set |K| Mc Nbs
of users, either with a brute force algorithm or with a greedy selection algorithm (see [DS05] and its extension to the case of a per-antenna power constraint [BH06]). Therefore, the eective channels from multiple users are collected in a compound channel matrix in an iterative manner as
7
8
Hcvirtual = hcMT1 . . . hcMTK .
(6.51)
The linear precoder may be obtained by the Moore-Penrose pseudo-inverse of
the compound channel matrix (6.51):

1
.
(6.52)
Wi = Hcvirtual (Hcvirtual )H Hcvirtual
In practice, a per-BS or an even more restrictive per-antenna power constraint
is a mandatory constraint if such algorithms are used in a cellular deployment,
where multiple BSs belonging to the same cluster Mc have to meet their own
power constraints while jointly serving the users K. Thus, the maximum available
transmit power at each BS is restricted to a Pmax value and in case of a very
strict per-antenna power constraint, Pmax can be equally divided to all antenna
elements, i.e. Pmax /Nbs . In order to meet this constraint, we use the expression

6.3 DL Distributed CoMP Approaching Centralized Joint Transmission

for matrix

Pc as given in [ZD04]:
$
Pc =
min

m=1,...,M

;
Pmax
"Wm "2

115

5
I[KK] ,

(6.53)

where Wm are the rows of matrix W related to the antennas of BS m. Note that
this power allocation is suboptimal and typically results in only one BS antenna
transmitting with maximum power, and hence, the remaining Mc Nbs 1 antennas transmit with less than Pmax /Nbs .
Clustering for Reducing Feedback and Backhaul Data Trac
From a practical point of view, one of the major drawbacks related to joint
processing is its higher complexity, i.e. increasing backhaul and signaling overhead. To reduce these complexity requirements, clustering solutions that restrict
joint processing techniques to a limited number of BSs have been proposed. In
these approaches, the network is statically or dynamically divided into clusters
of cells [BH07b, PGH08, TWH+ 09]. Moreover, the cluster formation may be
performed and optimized by a central entity (network-centric), or in a per-user
way (user-centric). As a result of [TBB+ 10], dynamic BS clustering was found
to be key relation for spectrally ecient CoMP transmission while keeping the
backhaul trac at a moderate level.
The work described in [BHA08] considers that BS clusters are created in a
dynamic way (see Section 7.2), in other words at each time slot t the sets of coordinated BSs are generated in order to maximize a given objective function. This
work demonstrates a signicant reduction of signaling overhead in the backhaul
due to data sharing between cooperating base stations, while achieving a high
fraction of the full coordination performance.

6.3.4

Scheme for Distributed, Centralized Joint Transmission


First of all, let us clarify the usage of the three dierent terms: decentralized,
centralized and distributed. In case of a decentralized system concept, each BS
is assumed to have no or only a subset of feedback information (e.g. CSI) from
other BSs users, as considered in Section 6.4. In contrast, the centralized system
concept assumes full CSIT as well as scheduled user data shared among all BSs
in the cluster. And nally, the term distributed indicates that such a centralized
concept can also be implemented in a distributed manner, i.e. without a central
unit (CU).
In modern mobile networks, there is a general tendency of using distributed signal processing. The adaptation to the time variation of the wireless channel can be much faster if it is performed directly in the serving
BS. For downlink CoMP, we reduce the overall delay in the closed transmitter adaptation loop if the waveforms are generated at the serving BS. Concepts for a distributed implementation of CoMP, where the serving BS coop-

116

CoMP Schemes Based on Multi-Cell Joint Signal Processing

Network

data bits
of UE 1

data bits
of UE 2
CSI / data
bit exchange

BBS 1B

BBS 2B

s1

s2
joint precoding

CSI / CQI
feedback

IRC

IRC

y1

y2

UE 1

UE 2

CSI / CQI
feedback

Figure 6.12 Cooperative transmission and CSI/CQI feedback and exchange concept,
as illustrated for a toy scenario with M = K = 2.

erates with a small subset of BSs, Fig. 6.10, in its direct vicinity are reported
in [ZSK+ 06, JTS+ 08b, NEHA08, PHG09, ZMS+ 09a, TWH+ 09]1 . Chapter 13.3
reports on a rst real-time implementation of downlink CoMP demonstrating its feasibility and [JTW+ 09, JFJ+ 10] summarize this work. Terminals are
assumed to estimate the multi-cell CSI in the downlink using CSI reference signals (CSI RSs). Subsequently, UEs deliver CSI feedback in combination with
channel quality indicator (CQI) values to their serving BS, as illustrated in
Fig. 6.12. Next, BSs in the cluster exchange the CSI as well as scheduled user data
over a low-latency signaling network denoted as X2 interface [3GP10i]. Precoding
weights for the joint beamforming are determined at each BS. The relevant set of
weights is applied to the data signals and in this way, the transmitted waveforms
are obtained locally. Similar to the centralized approach, the desired signals sum
up constructively while the mutual interference inside the cluster is canceled. We
emphasize that under the assumption of low Doppler shift, i.e for low mobility
or even static users, the backhaul bandwidth required for sharing the user data
between cooperating BSs is much higher than the one required for updating the
channel estimates within the cluster, as discussed in Section 12.2. Let us assume
an average throughput per cell denoted as rate, hence, each BS has to receive
the scheduled user data for its own UEs according to that data rate. Further,
we consider that hybrid automatic repeat request (HARQ) processes for each
user in the active set of users Mk are running decentralized at each BS k. Thus,
1

Note, the length of the cyclic prex (CP) limits the tolerable backhaul latency in the centralized approach. For distributed downlink CoMP, latency is more related to the ongoing
aging process of the CSI while it is exchanged over the backhaul. A few ms may be tolerated for slowly moving UEs. Hence, capacity and latency requirements for the backhaul are
signicantly relaxed compared to the centralized approach.

6.3 DL Distributed CoMP Approaching Centralized Joint Transmission

117

each BS has to perform the channel coding with a given code rate, according
to the CQI feedback provided by the users in Mk . For simplicity, we x this
rate to 1/2 in the sequel. According to (6.54), all remaining K 1 BSs in the
cluster K convey their coded user data over the backhaul to the k-th BS. Thus,
the backhaul overhead scales linear with the number of BSs exchanging their
scheduled data in the cluster.
*
'
K 1
(6.54)
trac = rate 1 +
code rate
The process is split into three phases:
Phase I: Channel feedback.
Each user performs a cluster-wide channel estimation using reference signals
(see Section 9.1). Each UE generates multiple-input single-output (MISO)-CSI
according to [BH07a, TBH08, TWH+ 09]
hck = ( ck )H Hck ,

(6.55)

where the Euclidean norm equals " ck "2 = 1. Besides, ck is always used to
denote the linear combining scheme to generate CSI MISO feedback. In Section 2, we assume the combining metrics dened in (6.49) and (6.50) and
denote them as eigenmode-aware receive combining (ERC) and eigenmodeaware optimum combiner (EOC). This channel information is fed back in
conjunction with the expected post-equalization SINR
(I)

SINRk =

, kc |2
| (hck )H w
7
8 c,
c H
( k ) zk zH
k k

(6.56)

$
, kc = p,k hck /"hck "2 and
where each user assumes a precoder according to w
no intra-cluster interference, since this interference will be removed by the
joint precoder. In particular, the achievable SINR (6.56) together with the
CSI (6.55) is then conveyed to the serving BS.
Phase II: Distributed precoder calculation.
A scheduling instance in the cluster c combines a total number of
Mc Nbs = KNbs MISO channels to a compound MIMO channel matrix2 . In
the following, each BS is responsible for a specic sub-band of the overall
bandwidth where CoMP JT is employed. Therefore, BSs partially exchange
their collected CSI and combine the channel feedback hck to a compound
virtual MIMO channel matrix of size Mc Nbs K according to (6.51).
Subsequently, each BS determines the linear precoder for its specied
sub-bands but for all Mc Nbs antennas of the cluster according to (6.52).
2

With proper user selection, the full rank condition of the compound channel can be frequently
met in the multi-point-to-multi-point case with independent links [ZD04, JJT+ 09].

118

CoMP Schemes Based on Multi-Cell Joint Signal Processing

Afterwards, BSs exchange their precoding weights Wm and power allocation


Pm , both obtained per sub-band, as well as the complete user (payload)
data. Therefore, BSs use logical interconnections, e.g. the X2-interface in
LTE-A. Finally, all BSs in the n-th cluster perform the coherently precoded
downlink transmission, where each BS is using the weights corresponding to
its own transmit antennas.
Phase III: Intra-cluster-interference-free data reception at the
terminal side.
In this step, each UE performs its own preferred spatial equalization strategy
gk . Therefore, each user may select the same weights as used in Phase I or
may perform the equalization using the optimal linear receive combining and
introduced as OC [Win84] in (5.4) in Section 5.1. The post-equalization SINR
is determined by (6.48) and is used as inputs for the link adaptation.
Simulation Results
The combined transmitter-receiver concepts described in the previous chapters
are evaluated in a triple-sectorized hexagonal cellular network with M = 57 BSs
in total. All cells operate with full frequency reuse. We employ the wrap-around
technique, which ensures that performance evaluation can be based on all users
in all cells. The dierent channel matrices are generated by employing the widely
used SCME [BHD+ 05] with urban macro scenario parameters [3GP10d], 3D BS
antenna characteristics and an electrical downtilt angle set to achieve a main
lobe range of one third of the inter-site distance (ISD).
In particular, we determine the system performance by assuming a dynamic
and user-driven clustering method, as in Section 7.2. An active set of users is
selected according to the following metric: A set Uc of active multi-antenna terminals is uniformly distributed in the c-th cluster of the cellular environment.
This set contains multiple disjoint user sets Km Uc , where the users in Km
experience highest channel gain to the m-th BS. Thus, each UE is connected to
a master BS. Further, we emulate a cluster selection which is user-centric and
dynamic over frequency: the M/C strongest channel gains of the users in Km
are the ones of the Mc BSs within the cluster, i.e. each UE is placed in a perfect
cluster in the way that its strongest BS links are all covered by this cluster. A
round-robin scheduling policy ensures that only Nbs UEs per BS m M are
selected for CoMP JT. This enables us to obtain gains from CoMP transmission
which are separated from gains due to multi-user diversity. Results are provided
for dierent cluster sizes of M/C {1, 2, 3, 4, 5, 10}. All results in Fig. 6.13 are
based on an equal per-beam power constraint with a per-antenna power constraint [ZD04], aligned with LTE assumptions. To determine the data rates which
can be achieved in a practical system, we map post-equalization SINRs using
5 bit modulation and coding schemes (MCSs) known from LTE standardization,

119

6.3 DL Distributed CoMP Approaching Centralized Joint Transmission

bc
bc

bc
bc

10

100

bc
bc
bc

bc
bc
bc

80

8
bc
bc

bc

bc

qp

bc
rs

bc
bc

ld
qp

bc
ut
ld
qp

bc

bc

rs
qp
qp

bc
bc

bc

ut

median spectral eciency [bit/s/Hz/cell]

bc
bc
bc

bc

bc

bc

bc

60
LTE 1x1, round robin
LTE 2x2, round robin
LTE 2x2, score-based
CoMP MET, LTE map.
CoMP MET, Shan. map.
CoMP EOC, Shan. map.

40
20

backhaul trac (payload only) [bit/s/Hz]

120

12

0
0

1
2
3
4
5
6
7
8
9
cluster size, i.e. number of cells involved in JT CoMP

10

Figure 6.13 Performance results as a function of the cluster size Mc . Channel feedback

is assumed according to the ERC (6.49) and EOC (6.50).

and assume 630 bit codeword length and 1% target block error rate (BLER). In
addition, some results are provided based on Shannon information rates.
Performance of Reference Cases
For the scheduling in one cell, UEs provide feedback on their SINRs in the form
of so-called CQI values for subgroups of sub-carriers denoted as PRBs. These
CQIs correspond to a specic spatial transmission mode, which is indicated by
the precoding matrix indicator (PMI). As a rst extension towards multi-cell
processing, adjacent base stations are synchronized and multi-cell demodulation
reference signal (DRS) are introduced. They enable interference-aware equalization at the UE and improve the SINR estimation accuracy, leading to a more
precise link adaptation at the BS side [TSWJ09].
For reference purposes, we include the performance results for interferencelimited single-input single-output (SISO) as well as a MIMO 2 2 transmission
from Section 5.1. For Nbs = 2, two active xed discrete Fourier transform (DFT)based beams are sent to K = 2 dierent users in a round-robin manner or taking
CQI feedback into account. The CQI-aware score-based solution, described in
Section 5.1, outperforms both other reference cases with a relative throughput
gain of Mc =1 = 1.27 and Mc =1 = 2.2 compared to round-robin and SISO,
respectively. Note, with K = Nbs , the MIMO setup benets from an additional
user in conjunction with an increase of antennas Nbs = Nue = 2. All results in
Fig. 6.13 are based on an equal per-beam power constraint with a per-antenna
power constraint according to LTE assumptions.

120

CoMP Schemes Based on Multi-Cell Joint Signal Processing

Gain from CSI Feedback and CoMP Transmission


CSI-aware precoding within a given cluster reduces the interference experienced
at each UE. Therefore, we use Equations (6.49), i.e. MET, and (6.50), i.e.
EOC. The case of MET-based CSI feedback and zero-forcing (ZF) beamforming at a single BS, i.e. Mc = 1, with subsequent OC, provides a system gain of
Mc =1 = 2.2 and Mc =1 = 1.0 compared to SISO and score-based beam assignment, respectively. In case of EOC-based CSI feedback and ZF beamforming, the
data rate increases by approx. 10%. Note that there is no or only a small additional gain from CSI based ZF beamforming in a cluster of size Mc = 1. This is
mainly caused by the two facts: First, we assume a simplied per-antenna power
constraint, which leads to a suboptimal power allocation where only one antenna
transmits with full power and all others are scaled accordingly [ZD04]. In contrast, in the case of xed precoding from Section 5.1 all BS antennas transmit
with full power. Second, within the score-based spatial layer assignment (reference case), multi-user diversity is used to assign both users to these xed beams.
In the case of ZF beamforming, both users are directly served on orthogonal
beams. In both cases the inter-cell interference is not aected.
In the next step, we increase the cluster size for the CoMP system and
the gain from MET-based feedback attributes to Mc =2 = 2.7, Mc =3 = 3.2,
Mc =4 = 3.6, Mc =5 = 3.8 and Mc =10 = 4.6 w.r.t the SISO case. The EOCbased concept provides roughly 10% additional data rate compared to the MET
assumption. Note, the gap between Shannon information rates and a practical
LTE link adaptation attributes to 4.3 bit/s/Hz/cell for Mc = 10. Focusing on
the cooperation gain, we observe that the median cell spectral eciencies are
increased compared to the 2 2 round-robin system by 81%, 112% and 157%
for coordinating 3, 5 and 10 cells, respectively. However, backhaul requirements
per feeder link increase as well, i.e. by 5, 9 and 19 bits per bit-on-air-interface
assuming a xed code rate of 1/2, for the coordination of 3, 5 and 10 cells, respectively. Altogether, signicant gains from coordination have already been realized
by using small clusters, despite residual interference from non-coordinated cells.
For K = 10, the estimated median trac for user (payload) data exchange per
backhaul link will exceed a value of 120 bit/s/Hz. This is only a rough estimate
according to (6.54), where rate is the median cell data rate in bit/s/Hz/cell for
CoMP transmission using the MCSs as dened in LTE. The backhaul trac
consists of the transmitted data over the air interface and the required user data
exchange from K 1 other BSs. Since BSs have to coherently transmit the same
data, i.e. using the same MCS, we consider the BSs to exchange their coded user
payload data and independent mapping to identical QAM symbols. For sake
of simplicity, we assume here a code rate of 1/2. Further, we do not consider
additional overhead due to CSI exchange, as this would be only a small part of
the overall trac requirement [HS09], as emphasized in Section 12.2.

6.4 Downlink Decentralized Multi-User Transmission

6.3.5

121

Summary
In this section, we investigated centralized joint transmission in the context of
CoMP transmission in the downlink of next generation mobile networks. Starting
from a general system model, we rst determined the MIMO broadcast capacity in a cellular system for dierent SNR regimes. Second, we introduced the
concept of linear joint precoding for a subset of BSs, i.e. a cluster, in the system, whereas BSs belonging to dierent clusters are not coordinated. Hence,
each cluster is surrounded by multiple non-coordinated cells. For removing the
interference inside the cluster the common multi-user eigenmode transmission
has been further developed towards optimum combining. The gains from receive
antenna combining have been included in the overall optimization. The performance has been studied in detail in a triple-sectored multi-cell scenario covering
57 cells. At rst, we observed that median data rates per cell can be increased
by 81%, 112% and 157% assuming a cluster size of 3, 5 and 10 cells, respectively,
compared to a non-cooperative system with the same 2 2 antenna conguration. However, backhaul requirements per feeder link increase as well, i.e. by 5,
9 and 19 bits per bit-on-air-interface assuming a xed code rate of 1/2, for the
coordination of 3, 5 and 10 cells, respectively. Second, as a function of the cluster size ranging from 1 to 5 and up to 10, the linear eigenmode-aware optimum
combiner scheme achieves 28%, 34%, 41%, 46%, 49% up to 62% of the capacities
provided by system-wide dirty paper coding. Altogether, signicant gains from
coordination have already been realized by using small clusters.

Acknowledgements
The authors are grateful for nancial support from the German Ministry of
Education and Research (BMBF) in the national collaborative project EASY-C
under contract No. 01BU0631.

6.4

Downlink Decentralized Multi-User Transmission


David Gesbert and Randa Zakhour
In this section, we study decentralized downlink CoMP schemes that dier from
the schemes introduced in Section 6.3 in such a way that compound channel
state information (CSI) and data bits corresponding to the dierent user equipments (UEs) are not shared among all cooperating base stations (BSs). Instead,
dierent cooperating BSs may have dierent extents of knowledge on subsets
of the compound channel matrix, and on subsets of user data. This means that
globally optimal precoding vectors cannot be found, but decentralized algorithms
can aim at nding sub-optimal solutions.
We consider two forms of (partial) decentralized processing. In the rst, we
assume that user data is fully shared at the cooperating BSs, whereas the amount

122

CoMP Schemes Based on Multi-Cell Joint Signal Processing

Network

data bits of
all UEs

(1) . . . h
(1)
h
1
K

data bits of
all UEs
(possibly) partial
CSI exchange

(2) . . . h
(2)
h
1
K

BBS 1 B

BBS 2 B

s1

s2

h1

h2
y1

y2

UE 1

UE 2

Figure 6.14 Setup for decentralized beamforming with limited CSIT, for a toy example
with M = K = 2.

of shared CSI is limited [ZG10b]. In the second, the CSI is shared ideally across
the cooperating cells, while the user data is only partially shared to lift the burden
o the backhaul [ZG10a]. In the latter case, we use the notion of superposition
coding, as already introduced in an uplink context in Section 4.3, assuming now
that each terminal receives a superposition of conventionally and cooperatively
transmitted signals. We will see that the optimal ratio of these signals varies with
both the interference strength statistics and the backhaul capacity constraint.

6.4.1

Decentralized Beamforming with Limited CSIT


Let us rst consider a multi-cell scenario where the cooperating BSs share the
user data and aim at joint multiple-input multiple-output (MIMO) precoding
(beamforming (BF)) to serve the mobile users in the downlink. Thus, each user
receives a transmission from all cooperating BSs, as illustrated in Fig. 6.14.
For perfect downlink beamforming, channel state information at the transmitter (CSIT) must be acquired and shared by the transmitters through a combination of user feedback and backhaul signaling [KFVY06, SSBN+ 06]. In practice, it
is convenient to derive robust beamforming schemes able to cope with imperfect
and, importantly, dierent CSI available at each BS, as this will allow us to deal
with various scenarios for decentralized knowledge such as
Each BS obtaining an independent estimate of the same global CSI, or

6.4 Downlink Decentralized Multi-User Transmission

123

Each BS having precise knowledge about a dierent subset of the user channels
(e.g. a BS may not wish or be able to decode the CSI feedback from a very
distant user).
Note that this is the essential dierence to Section 6.3, where the same extent of
CSI is assumed to be fully distributed among all cooperating BSs. In this section,
we introduce a feedback model using the concept of hierarchical codebooks, which
allows us to incorporate additional structure into this problem and as a result
facilitate robust beamforming design.
Despite possible dierences in their acquired CSIT, the dierent transmitters wish to conciliate their views so as to design a consistent set of precoding
vectors that maximizes the user rate. This problem can be categorized as a
so-called team-decision problem or a decentralized statistical decision making
problem [Ho80, Rad62].
System Model
Consider a set of M BSs communicating with K UEs. Each BS has Nbs 1
Nbs 1
antennas, whereas each UE has a single antenna. hm
is the channel
k C
1 T
M T T
NBS 1
is UE ks whole
from BS m to UE k and hk = [(hk ) , . . . , (hk ) ] C
2
m
channel: hm
k NC (0, k,m I[Nbs ] ) and dierent hk are independent of each other.
The overall unquantized channel matrix H groups the channels to all users,
(m) denote UE ks quantized channel as
i.e. H = [h1 . . . hK ]. Similarly, we let h
k
perceived by BS m and group the whole of BS ms channel knowledge into
(m) . . . h
(m) ]. The signal received by UE k is given by:
(m) = [h
H
1
K
yk = hH
k s + nk ,

(6.57)

where s CNBS 1 is the concatenated transmit signal sent by all transmitters


and nk NC (0, 2 ) is the noise at receiver k. Cooperative transmit processing
in the form of joint linear precoding with per-transmitter power constraint P is
adopted. Thus, s can be expressed as:
s = Wx =

K


wk xk ,

(6.58)

k=1

where x CK1 is the vector of transmit symbols, its entries being independent
and with x NC (0, I). The overall beamforming matrix W groups the precoding vectors wk carrying the dierent users symbols, so that W = [w1 . . . wK ]
CNBS K , where precoding vector wk = [(wk1 )T . . . (wkM )T ]T carries user ks symbols, and wkm CNbs 1 is BS ms precoding contribution towards UE k. The rate
achievable for user k is equal to
Rk = log2 (1 + k ),

(6.59)

124

CoMP Schemes Based on Multi-Cell Joint Signal Processing

hk

Qlk (.)
k ,max

Qlk (.)
Q0k (.)

(L
h k

h (kL

( l k ,max ))

( l ))

h (kL

( 0 ))

Figure 6.15 Distributed hierarchical CSI model: the quantization codebooks are
designed to be hierarchical to oer additional structure. Qkl (.) denotes the l-level
(specifying the accuracy) quantization function of user ks channel.

where the signal-to-interference-and-noise ratio (SINR) k is equal to


2
 M


m H
m

(hk ) wk 

m=1
k =

2 .
M 




H

hm
2 +
wjm 
j


(6.60)

j=k m=1

In the following, it will also be useful to focus on a given BSs beamforming


decisions jointly. We denote this Wm , so that Wm = [w1m . . . wkm ], the overall
precoding matrix at BS m. We now consider how these will be designed.
Decentralized Beamforming using Team Decision
In a decentralized processing scenario, each BS computes its beamforming matrix
based on its local CSI. In addition to its own local CSI, statistical CSI (slow varying) concerning other transmitters local CSI may be available. The M transmitters may be viewed as members of a team who need to take decisions in order to
attain a common payo, but who do not have access to the same information.
Thus, transmitter m chooses Wm based on its local CSI, the quantized channel
(m) , and the extra statistical information it has.
matrix H
Hierarchical CSI Structure
We now show how structure can be exploited into the CSI model so as to facilitate
team decision making at the BSs. In the setup below, the channel hk corresponding to user k is quantized using a hierarchical codebook, such that dierent BSs
acquire a given channel vector information up to one particular level of quantization error. The most precise codebook, composed of a series of embedded
less precise sub-codebooks, is assumed to be known at all BSs. Each BS is also
assumed to be aware of the level of feedback accuracy decodable at other BSs
(typically a function of the distance to the user).

6.4 Downlink Decentralized Multi-User Transmission

125

We dene Lk (m) as the number of quantization bits successfully decodable by


BS m for the channel vector hk corresponding to user k. Thus, BS m acquires
(m) . An interesting feature of
a quantized version of this channel denoted by h
k
hierarchical codebooks is the strong information structure: Assuming without
loss of generality that Lk (m1 ) > Lk (m2 ), then BS m1 knows exactly what is
(m2 )
known by BS m2 in addition to its own estimate. Thus BS m1 also knows h
k
(m
)
1 belongs to a subset of codewords located in
while BS m2 only knows that h
k
(m2 ) . This is shown in Fig. 6.15, where L1 (l)
the Voronoi region centered at h
k
k
provides the set of BSs (if any) which decode the feedback information of user k
up to accuracy level l.
Bayesian Optimization
As perfect CSI is neither available nor shared, we consider the maximization of
a global average utility U which may be decoupled as a sum of utilities over the
users:



!

(1) , . . . , WM H
(M)
U = E U H, W1 H
=

K






!
(1) , . . . , WM H
(M)
E U k hk , W 1 H
.

(6.61)

k=1

For instance, an average weighted sum-rate, where Uk denotes the (possibly


weighted) rate of user k, ts into this model. Restricting ourselves to deterministic decisions, in the sense that there will be a single Wm corresponding to each
(m) , U can be expanded into:
state of channel knowledge at transmitter m, H
<
<


1 (H) , . . . , W
M (H) ,
U = . . . dHfH (H) U H, W
(6.62)
where

 

m (H) =
(m) H
Wm H
W

is the beamforming strategy at transmitter m given the local knowledge at that


transmitter corresponding to a true channel H. fH denotes the probability distribution function of the overall channel matrix H.
Global Optimization
A globally optimal set of beamforming decisions maximizing the above expected
utility U consists of sets of beamforming matrices Wm , m {1, . . . , M }, (one set
per user, consisting of as many matrices as there are possible states of knowledge
at that user), which jointly maximize U. As stated in [Ho80, TA84], for example, it is often intractable to nd the globally optimal strategies at the dierent
team members. In such cases, a suboptimal solution may be obtained by nding
strategies that are person-by-person optimal, as specied next. We note the differences with conventional game-theoretic approaches for beamforming design in

126

CoMP Schemes Based on Multi-Cell Joint Signal Processing

that here all transmitters agree on maximizing a common utility (despite their
lack of shared CSI knowledge), as opposed to optimizing a selsh utility.
Person-by-Person Optimization
Person-by-person optimal strategies3 are such that for each team member, his
strategy is optimal given the other team members strategies. Clearly, the globally optimal strategies are person-by-person optimal, but the converse is in general not true. In our particular setup of distributed CSIT, an optimal strategy
for transmitter m, given that the other transmitters strategies are xed, may
(m) , as follows:
be characterized, for a local channel knowledge equal to H


(m)
Wm H
<
<


(m)

H|
H
U (H, Wm )
= arg max
.
.
.
dHf
(6.63)
(m)

H|H
2
"Wm |F P

where



(H, Wm ) = U H, W
1 (H) , . . . , Wm , . . . , W
M (H) ,
U

(6.64)

and fH|H
(m) denotes the probability distribution function of the overall channel
(m), the quantized overall channel matrix at transmatrix H conditioned on H
mitter m. This is equivalent to
<
<


(H, Wm ) , (6.65)
(m) = arg max
.
.
.
dHfH (H) U
Wm H
(m)
"Wm "2F P

H
R(
)


(m) as the Voronoi region corresponding to this state of
where we dene R H
knowledge at transmitter m. The Voronoi region indicates the set of all possible
(m) has been
values for the actual channel H given that channel estimate H
observed at transmitter m.
A Decentralized Beamforming Example for M = K = 2
To simplify the presentation of the solution to the problem, we focus on the
M = K = 2 case. The hierarchy in the knowledge at the two transmitters, and
as a result the beamforming strategies to follow, fall into one of three cases,
which may be characterized as follows:

Common Knowledge
In this case, L1 (1) = L1 (2) and L2 (1) = L2 (2). It corresponds to the traditional
assumption under limited CSIT, where both transmitters have the same knowledge. This is the case if the BSs mutually exchange their CSIT, as considered
in Sections 6.3 and 13.3, or if the CSI feedback is designed such that it can be
3

Note that in game theory, Nash equilibria, which correspond to strategies from which no user
has any incentive to deviate, are also person-by-person optimal, but there users in general
do not share a common objective and are often competing for resources.

6.4 Downlink Decentralized Multi-User Transmission

127

decoded by both BSs individually, which is the approach pursued in Section 13.4.
Considering a hierarchical CSI structure, the assumption of common knowledge
can be regarded reasonable if both users are located at the cell-edge. In terms of
performance, having the same global CSI available at each BS is equivalent to
having centralized beamforming decisions being made.

Degraded Knowledge
In this case, L1 (1) L1 (2) and L2 (1) L2 (2), or L1 (1) L1 (2) and L2 (1)
L2 (2). In other words, one of the transmitters has a better representation of
both channels, and will adapt its beamforming on a ner scale than the other
transmitter. This is typical, for example, of when both users lie in the same cell.

Symmetric Knowledge
Here, L1 (1) > L1 (2) and L2 (1) < L2 (2), or L1 (1) < L1 (2) and L2 (1) > L2 (2).
Hence, one of the transmitters has a better representation of the channel of a
given user, and a worse one for the other, with opposite knowledge at the other
transmitter. This corresponds, for instance, to the BSs serving users each within
their own cell.
We now focus on the symmetric case where L1 (1) > L1 (2) and L2 (1) < L2 (2):
this represents the more common setup among the ones described and is also the
more challenging to formulate; the remaining cases can be dealt with in a similar
manner. We characterize each users quantized CSI by a pair i1 = (i1,2 , i1,1 ) for
user 1, and another i2 = (i2,1 , i2,2 ) for user 2. The rst index in each pair corresponds to the coarse knowledge (hence is shared by both users), i.e. the index
of the codeword in the coarsest codebook, to which the channel is quantized,
QL1
(hk ) (see Fig. 6.15), and the second index provides the missing
k (minm Lk (m))
bits to locate the ner codeword around the coarsest one, QL1
(hk ).
k (maxm Lk (m))
Given the structure of the distributed CSI, the beamforming matrix decisions may be parameterized in terms of these indices, so that W1 varies with
(i1 , i2,1 ), whereas W2 is a function of (i1,2 , i2 ). Taking this into consideration,
we expand (6.62) to
L1 (2) L2 (1)
2
2

S (i1,2 , i2,1 )

(6.66)

i1,2 =1 i2,1 =1

where S (i1,2 , i2,1 ) is given by


I2 <
I1


i1,1 =1 i2,2 =1

R1 (i1 )

<



dh1 dh2 fH (H) U H, W1 (i1 , i2,1 ) , W2 (i1,2 , i2 ) ,

R2 (i2 )

(6.67)
where I1 = 2L1 (1)L1 (2) , I2 = 2L2 (2)L2 (1) , R1 (i1 ) and R2 (i2 ) correspond to the
Voronoi regions associated with the indexed codewords.

128

CoMP Schemes Based on Multi-Cell Joint Signal Processing

It is easy to verify that the beamforming decisions for each S (i1,2 , i2,1 ) term
may be optimized separately. For given i1,2 and i2,1 , we optimize the corresponding S (i1,2 , i2,1 ). To simplify notation, we remove the dependence on i1,2 and i2,1
from the expressions. The problem is thus:
<
I2 <
I1


max.
dh1 dh2
i1,1 =1 i2,2 =1

7
s.t.

R1 (i1,1 )

R2 (i2,2 )

fH (H) U H, W (i1,1 ) , W2 (i2,2 )

"W (i1,1 ) "2F


"W2 (i2,2 ) "2F
1

8

(6.68)

P, i1,1 = 1, . . . , I1

(6.69)

P, i2,2 = 1, . . . , I2 .

(6.70)

Recalling the separable nature of our utility function (refer to (6.61)), this
can be reformulated as:
<
I1
I2 
2


7
8
Pr Rk (ik,
dhk
max.
)
k
Rk (ik,k )

i1,1 =1 i2,2 =1 k=1

fhk (hk ) Uk hk , W1 (i1,1 ) , W2 (i2,2 )

8

s.t. "W1 (i1,1 ) "2F P, i1,1 = 1, . . . , I1


"W2 (i2,2 ) "2F P, i2,2 = 1, . . . , I2 ,
where k = mod (k, 2) + 1 and
8
Pr Rk (ik,
k
) =
7

(6.71)

<
Rk
(ik,
k
)

dhk fhk (hk ) ,

(6.72)

channel being quantized to the codeword indexed


is the probability of user ks
by the pair (ik,k
, ik,
k
).
Application to Sum-Rate Maximization
The above problem may be approximately solved via a projected gradient ascent
method. Moreover, to avoid integration, we resort to approximations. As we deal
with sum-rate maximization in our illustrative examples, the following approximation is plugged into problem formulation (6.71) above:
<


dhk Uk hk , W1 (i1,1 ) , W2 (i2,2 )
Rk (ik,k )

<

=
Rk (ik,k )

dhH
k

log2
(i

where Ck k,k

'
log2 1 +

2
|hH
k wk (i1,1 , i2,2 ) |
H
2 + |hk wk (i1,1 , i2,2 ) |2
(i

1+

wk (i1,1 , i2,2 )H Ck k,k wk (i1,1 , i2,2 )

*
4

,
(6.73)
(i
)
2 + wk (i1,1 , i2,2 )H Ck k,k wk (i1,1 , i2,2 )

!

= E hk hH
k hk Rk (ik,k ) , and wk (i1,1 , i2,2 ), k = 1, 2 is obtained

from W1 (i1,1 ) and W2 (i2,2 ) by extracting the appropriate entries as dened in


our system model. A similar approximation was used in [KC07, HBO08] for

6.4 Downlink Decentralized Multi-User Transmission

8
rs

ut

7
sum-rate [bit/s/Hz]

129

ut

rs

ldld

ldld

bc

bc

utrs

ldld

rs

4
ut
rs

ut
3 ldbc

bc

ld

bc
fully-shared
CSIT (upper bound), joint BF
symm. knowledge, proposed decentr. BF
symm. knowledge, myopic BF
symm. knowledge, coarse CSIT used, joint BF
knowledge at BS 1 or 2 shared, joint BF

2
5

10

15

20

SNR [dB]
Figure 6.16 Sum-rates for L1 (2) = L2 (1) = 2, L1 (1) = L2 (2) = 6 bits and = 0.1,
c 2010 IEEE.
from [ZG10b]. 

example. The quality of this approximation increases and becomes asymptotically optimal with the size of the codebook.
Reference Schemes
Simple upper and lower bounds to the proposed schemes correspond to joint
beamforming based on the more accurate CSIT (unachievable in a distributed
system) and the least accurate (achievable) CSIT, respectively. In another decentralized scheme which uses the local channel knowledge, each BS designs its
transmission assuming all the other BSs share the same knowledge as itself. This
is simpler than the proposed decentralized scheme, and has similar complexity
to joint beamforming design based on the coarse CSIT.
Numerical Results
To show the gains from this decentralized scheme, we plot average rates achieved
for a symmetric M = K = 2, Nbs = 1 channel, where h1,1 , h2,2 are NC (0, 1),
and h1,2 , h2,1 are NC (0, ), modeling the strength of the interference links.
The hierarchical codebooks are designed using Lloyds algorithm: rst the coarse
codebook, then for each codeword in it, the corresponding ner codebook.
Fig. 6.16 compares the proposed decentralized scheme to the upper and lower
bounds stated before for L1 (2) = L2 (1) = 2 and L1 (1) = L2 (2) = 6. We label
the scheme which attempts to use local channel knowledge as if it were shared
myopic BF, in the sense that each BSs ignores some of the information it could
be using. Thus, the upper bound scheme would require 2(L1 (1) + L2 (2)) = 24
bits of CSIT being shared, whereas the schemes based on distributed, symmetric
CSIT would require L1 (1) + L2 (2) + L1 (2) + L2 (1) = 16 bits. The benet of the

130

CoMP Schemes Based on Multi-Cell Joint Signal Processing

second layer of CSI over the more coarse shared representation of the channel
depends on the signal-to-noise ratio (SNR) and on the value of . At low SNR and
for low, there is little use for the extra information. The performance of myopic
BF, even though it relies on more information than the joint beamforming relying
on coarse CSI, is signicantly worse, highlighting the importance of coordinated
action. For reference, we also plot the performance that would be obtained if the
knowledge at transmitter i, i = 1, 2 were indeed common to both transmitters
and joint beamforming would result; clearly this yields more gain than joint
beamforming based on coarse CSI.

6.4.2

Multi-cell Beamforming with Limited Data Sharing


We now assume the CSI is shared perfectly across the cooperating cells, and consider a limited backhaul dedicated to user data sharing. Imposing nite capacity
constraints on the backhaul links brings with it a set of interesting research
questions, in particular:
Given the backhaul constraints, what kind of rates can we expect to achieve?
What is the capacity region of the resulting multi-cell channel? In fact, the
multi-cell MIMO channel under nite backhaul no longer corresponds to a
MIMO broadcast channel (BC), nor to an interference channel (IC).
How useful is data sharing when backhaul constraints are present? In other
words, how do the rates achieved with a data sharing-, and therefore joint
transmission enabling scheme compare to those achieved without data sharing,
when the backhaul is limited?
These questions have lead to a number of recent interesting research eorts.
To cite a few, in [SSSP08] and [SSPS09b], joint encoding for the cellular downlink
is studied under the assumption that the BSs are connected to a CoMP central
unit (CCU) via nite-capacity links. One of their main conclusions is that central encoding with oblivious cells, whereby quantized versions of the signals to
be transmitted from each BS and computed at a CCU, are sent over the backhaul
links, is shown to be a very attractive option for both ease of implementation
and performance, unless high data rates are required. If this is the case, the BSs
need to be involved in the encoding, i.e. at least part of the backhaul link should
be used for sending the messages themselves not the corresponding codewords.
In [MF08b, MF09a], an optimization framework is proposed for an adopted backhaul usage scheme within clusters of cooperating cells. Here, the BSs can either
be provided by the CCU with already precoded signals, hence being oblivious to
the used codewords as stated before, or with (possibly quantized) representations
of the messages to be transmitted to the terminals, which are then encoded and
precoded locally.
Here, we consider a setup in which the backhaul is between the network and
each of the BSs, and focus on how to use this given backhaul to serve the users

6.4 Downlink Decentralized Multi-User Transmission

backhaul link,
capacity C1
global CSI

Network

131

backhaul link,
capacity C2
global CSI

BBS 1B

BBS 2B

s1

s2

y1

y2

UE 1

UE 2

Figure 6.17 Setup for multi-cell beamforming with limited data sharing, for a toy

setup with M = K = 2.

in the system. We focus on the two-cell problem. We specify a transmission


scheme whereby superposition coding is used to transmit signals to each user, as
also suggested in [MF08b]: this allows us to formulate a continuum between full
message sharing between BSs (resembling a BC) and the conventional network
with a single serving BS per user (an interference channel (IC)). To do so, the
data rate is in fact split between two distinct forms of data to be received by
the users, a private form to be sent by the serving BS alone, and a common
form to be transmitted via multiple bases. The intuition is that the stronger
the interference the most interesting it becomes to share data. The interesting
problem is therefore to determine how much of the trac should be shared vs.
non shared in general cases.
System Model
The system considered is shown in Fig. 6.17. In this preliminary study, we focus
on a two-transmitter, two-receiver setup. We assume a noiseless backhaul link of
capacity Cm between the network and transmitter m, for m {1, 2}: it will be
used to transmit the messages for each user. We distinguish between dierent
types of messages:
private messages are sent from the network to only one of the BSs, and
shared or common messages, which are sent from the network to both transmitters, and are consequently jointly transmitted. Note that this notion of
a common message is dierent from that commonly used in the context of
interference channels (the Han-Kobayashi scheme and schemes derived from
it) for example, as they do not correspond to messages to be decoded by both
receivers, but rather to messages to be sent by both transmitters.

132

CoMP Schemes Based on Multi-Cell Joint Signal Processing

Thus user ks message rate rk is split into common and private rates, rk,c and
rk,p , respectively:
rk = rk,p + rk,c .

(6.74)

In the sequel, we assume full CSITs at both BSs, since we focus on the cost
of sharing data, and denote as k the other BS or UE, depending on the context.
Proposed Backhaul Usage
Backhaul link k with nite capacity Ck serves to carry both private and common
so that:
messages for user k as well as the common messages for user k,
Ck rk,p + rk,c + rk,c
= rk + rk
rk,p
,

k = 1, 2.

(6.75)

Since rk,p rk , we have that r1 + r2 C1 + C2 .


Over the Air Transmission
The transmitted signal is the superposition of the precoded symbols corresponding to both categories of messages described above: private symbols only arrive
from the transmitter that knows them, whereas common symbols are jointly
precoded by both transmitters [MF08b]. Recall again that, unlike in the context
of the IC where common messages arrive from a certain transmitter but are to
be decoded by both receivers, common messages here mean messages dedicated
to a given user but shared across both transmitters, enabling them to jointly
encode them. The transmitted signal may be written as follows:
) ( 1 )
)
(
(
7
8 x1,c
w1,p
0
s = w1,c w2,c
+
x2,p ,
x1,p +
(6.76)
2
x2,c
0
w2,p
where s CNBS is the transmitted signal, such that the rst Nbs elements are
transmitter 1s transmit signal, the remaining Nbs elements are transmitter 2s
signal. wk,c CNBS are the beamforming vectors carrying user ks common message symbols xk,c , and wk,p CNbs are those carrying his private message symbols xk,p . Gaussian signaling is assumed: x1,p , x1,c , x2,p , x2,c are all NC (0, 1).
Similarly to the previous subsection, we split wk,c into the contributions of the
.
/T
1
2
dierent transmitters, so that wk,c = (wk,c
)T (wk,c
)T . Using this notation,
per base station power constraints P imply that:
m 2
m 2
m
"w1,c
" + "w2,c
" + "wm,p
"2 P, m {1, 2}.

(6.77)

Background: MAC with Common Message


Given the assumption that each UE decodes only desired signals and not interference, the channel between the two transmitters and user k can be viewed as
a multiple access channel (MAC) with a common message [SW73b], where the
receiver noise is replaced by receiver noise plus interference due to transmission

133

6.4 Downlink Decentralized Multi-User Transmission

Denoting by 2 this power:


to user k.
k


 H
2
2

k

k2 = 2 + hH
 ,
 + hk wk,c
k wk,p

(6.78)

the following rate region is achievable:


 
 k H k 2
w
h


k
k,p

rk,p log2 1 +
,
2
k

rk = rk,p + rk,c


 
2
 k H k 2  H

w
+
h
w
h


k,c
k
k,p
k

log2 1 +
.
k2

(6.79)

Proof. This follows from results obtained for the two-user MAC with a common
message by [SW73b].
Particular Cases
The transmission scheme introduced here covers the two particular cases of:
no message sharing (IC), obtained by forcing rk,p rk , k = 1, 2, and
full message sharing (BC), obtained by forcing rk,p 0, k = 1, 2.
Achievable Rate Region
An achievable rate region R is the set of (r1 , r1,p , r2 , r2,p ), as specied above, that
satises the specied backhaul and power constraints. One way to get its boundary is to use the rate prole notion from [MZC06]. Points along the boundary
are thus obtained by solving the following optimization problem for [0, 1]:
thus, species how the sum-rate achieved, r, is split between the two users.
max. r
s.t. r1 r,

r2 (1 )r

r1 + r2 r2,p C1 , r1 + r2 r1,p C2

 
2 
2
H


k
k
H

 hk wk,p  + hk wk,c 

rk log2 1 +
, k = 1, 2,
2
 
H
2




k
k
H
2



+  hk
wk,p

 + hk wk,c

 
2
H


k

 hkk wk,p


rk,p log2 1 +
, k = 1, 2,

 
2
 H 2 

k
k
H
2


+  hk
wk,p

 + hk wk,c
k
k
k
2
"wk,p
"2 + "wk,c
"2 + "wk,c
" + P, k = 1, 2.

(6.80)

(6.81)

(6.82)

The above optimization may be solved using a bisection over the sum-rate r:
an essential part of the solution consists of establishing feasibility of a given rate.

134

CoMP Schemes Based on Multi-Cell Joint Signal Processing

Establishing Feasibility of a Given Rate


Assume sum-rate r and to be xed. Thus, r1 = r, r2 = (1 )r. Establishing
feasibility of a given rate pair hinges on the following two remarks:
For rk to be supported, it cannot possibly exceed Ck , and
Sharing information whenever possible outperforms not doing so. Thus a rate
pair is not achievable unless it is so for the minimum possible private message
rates rk,p , k = 1, 2. Given the backhaul constraints, these are:
(rk,p )min = max(0, r1 + r2 Ck ), k = 1, 2.

(6.83)

Thus
the rate pair is only
feasible if r1 C1 , r2 C2 , and rate tuple


r1 , (r1,p )min , r2 , (r2,p )min R4 is achievable.
Feasibility of (r1 , r1,p , r2 , r2,p )
Assume r1 , r2 , r1,p and r2,p are xed. Establishing their feasibility and obtaining
beamforming vectors to achieve them may be done by solving the total transmit
power minimization problem subject to constraints (6.80), (6.81) and (6.82):
feasibility of this problem implies feasibility of the set of rates, and its optimal
solution yields the most power ecient beamforming strategies to attain it. This
problem can be formulated as:
min.

2

7

k
"wk,c "2 + "wk,p
"2

k=1

 

2
 k H k 2  H
 hk wk,p  + hk wk,c 
s.t. 2rk 1
, k = 1, 2,

 
2
 H 2 
k 
H


2 +  hkk
wk,p
+
h
w


k,c
k
 
2
 k H k 
 hk wk,p 
rk,p
1
, k = 1, 2,
2
2
 

2
 H

H



2 +  hkk
wk,p
+
h
w


k,c
k
k
k
2
k
2
"wk,c
"2 + "wk,c
" + "wk,p " P, k = 1, 2.

We can transform the above problem into an equivalent convex optimization.


If rk,p 0 or rk rk,p , we can reduce the problem as follows:
If rk,p 0, the corresponding constraint becomes redundant, and wk,p = 0.
If rk rk,p , the optimum wk,c = 0 and the constraint on rk is removed.
In both cases, the remaining constraint can be transformed into a second-order
cone constraint as in [VGL03, WES06, DY08].
Otherwise, both constraints on UE ks rates will be tight at the optimum, and
2  k H k 2
2rk,p 1  H
 =  hk wk,p  .
h
w
(6.84)
k,c
k
r
2rk 2 k,p

6.4 Downlink Decentralized Multi-User Transmission

135

H
k
Further noting that hH
k wk,c and hk,k wk,p being real does not restrict the
solution, we obtain the following equivalent convex problem:

min.

2

7

k
"wk,c "2 + "wk,p
"2

k=1

6
s.t.

2rk 2rk,p  k H k
hk wk,p = hH
k wk,c , k = 1, 2
2rk,p 1
+(  
)+
+
+  k H k

H
+
k
k
2rk,p 1 +
wk,p , k = 1, 2

h
w
h
w

+
k,c + hk
k
k
k,p
k
k
2
2
"wk,c
"2 + "wk,c
" + "wk,p " P, k = 1, 2.

Comparison with a Quantized Backhaul


As noted in the introduction, the transmission may alternatively be designed
centrally and the backhaul used to provide each BSs with the signals they are
to transmit [SSPS09b]. Since the nite capacity of the backhaul restricts the
rate of these signals, one can interpret them as quantized versions of those that
would be transmitted in the absence of this limitation. We compare such an
approach to the proposed rate splitting by modifying the scheme in [SSPS09b]
which was published in the form of a high complexity non-linear dirty paper
coding (DPC)-based scheme using the so-called Wyner simplied channel model
to our situation which deals with lower complexity linear precoding as well as
more realistic fading channels. For Nbs = 1, the power minimization problem
subject to SINR constraints, central to nding the boundary of the achievable
rate region, can be transformed into a convex optimization, thus solved eciently.

Numerical Results
Figs. 6.18(a)-6.18(c) show the rate regions corresponding to the proposed scheme,
which we label hybrid IC/BC, the IC (rk,c = 0, k = 1, 2), the BC scheme (rk,p =
0, k = 1, 2), and the quantized use of backhaul for dierent values of backhaul
capacity (we let C1 = C2 = C), for a particular channel instance with Nbs = 1.
For low backhaul capacity, the rate regions corresponding to the hybrid scheme
and the IC almost overlap and are both larger than the BC region. As the
backhaul capacity increases, all 3 regions become larger (up to the point where
the system is no longer backhaul-constrained), the BC region becomes larger than
the IC region and closer to the hybrid schemes region, until eventually these two
regions overlap. Moreover, depending on the strength of the interfering links and
on the backhaul constraints, one or the other scheme will be better.
Fig. 6.18(d) illustrates the average common to total rate ratio as a function of
C, when = 0.5, for a Rayleigh block-fading channel such that hkk NC (0, INbs )

and hkk NC (0, INbs ), when Nbs = 1: As for earlier simulation results, parameter controls the strength of the interference links. In general, the maximum
cannot be achieved without sharing some data. How much depends on .

136

CoMP Schemes Based on Multi-Cell Joint Signal Processing

1.5

2.0

bc

1.0
bc

ld

ld

IC (no msg. sharing)


hybrid IC / BC
BC (full msg. shar.)
quant. backhaul

R2 [bit/channel use]

R2 [bit/channel use]

ld

bc
bc

0.5

ldbc

bc
bc

bc

bc

1.0

bc
ld
ld

ldbcbc

0.5

bc

0
0

0.5
1.0
R1 [bit/channel use]

1.5

(a) Rate region for C = 1 bit/channel use.

0.5

1.0
1.5
2.0
R1 [bit/channel use]

1.0

bc

2
bc

bc
bc

IC (no msg. sharing)


hybrid IC / BC
BC (full msg. shar.)
quant. backhaul

0.5
ld

0.3

ld

ld

0
2
3
4
R1 [bit/channel use]

0.1

bc
5

(c) Rate region for C = 6 bit/channel use.

ld

ld

0
6

ld

0.2

bcbc
ld

ld

0.6

0.4

ld

ld

0.7

bcbc
ld

ld

0.8

bcbc

ld

ld

0.9

rc /r

ld

2.5

(b) Rate region for C = 2 bit/channel use.

R2 [bit/channel use]

bc

1.5

IC (no msg. sharing)


hybrid IC / BC
BC (full msg. shar.)
quant. backhaul

= 0.1
= 0.5

3 4 5 6 7 8
C [bit/channel use]

10

(d) Average common to total rate ratios.

Figure 6.18 a)-c) Sample rate regions for 10 dB SNR, Nbs = 1,

H = [0.8152 + i0.8872 0.3489 i0.2163; 0.2150 i0.6359 0.2714 + i0.1499],


d) average common to total rate ratios vs. backhaul for Rayleigh fading channels with

hkk NC (0, INbs ) and hkk NC (0, INbs ).

6.4.3

Summary
In this section, decentralized downlink CoMP strategies suitable for reducing
the backhaul required for information exchange between cooperating BSs were
introduced. The backhaul related to channel state information exchange can be
mitigated by the use of transmit beamforming schemes which explicitly account
for the lack of CSI accuracy and the dierence in CSI estimates at the involved
base stations. The backhaul related to user data exchange can be reduced by
limiting the exchange to only a fraction of the total trac, and adjusting the
ratio of shared versus non shared trac with respect to key system parameters,
such as the number of antennas, the interference strength and the backhaul
capacity limits. Numerical evaluation showed the benet of such approaches.

Part III
Challenges Connected
to CoMP

Clustering
Patrick Marsch, Stefan Br
uck, Andrea Garavaglia,
Matthias Schulist, Ralf Weber and Armin Dekorsy

As mentioned in previous chapters, CoMP has the capability to signicantly


enhance spectral eciency and cell-edge throughput. However, CoMP may
require additional signaling overhead on the air interface and the backhaul, in
particular joint signal processing CoMP as introduced in Chapter 6. Therefore,
in practice only a limited number of base stations can cooperate in order to keep
the overhead manageable. This raises the question which base stations should
form cooperation clusters in order to exploit the advantages of CoMP eciently
at limited complexity.
In general, one can distinguish between static and dynamic clustering algorithms. Static clusters are kept constant over time and designed based on geographical criteria as the positions of base stations and the morphology of the
surroundings. In the case of dynamic clustering, the system continuously adapts
the clustering strategy to changing parameters such as user equipment (UE)
locations and radio frequency (RF) conditions. Here, the central question is on
which information the adaptation of clusters shall be based, and where in the
system clustering decisions are made.
To illustrate concrete clustering results and their corresponding performance,
we use two dierent setups in this chapter. On one hand, we consider an idealistic
setup, i.e. a hexagonal layout of up to M = 111 cells, grouped into sites of 3 cells
each, with an inter-site distance (ISD) of 500 m. We here calculate pathlosses
according to a at-plane pathloss model with
PL = 130.5 + 37.6 log10 (d/km) [dB],

(7.1)

where d is the distance between transmitter and receiver, and take into account
the impact of directive base station (BS) antennas with an azimuth-dependent
attenuation of
2
4


 2
AL = min 12   , 20
[dB]
(7.2)
70
and an antenna gain of 14 dBi. On the other hand, we observe a real-world
setup with M = 54 BSs, as it exists in downtown Dresden, Germany, and again
calculate pathlosses based on (7.1) and (7.2), but now also considering signal
reection, diraction and obstruction based on ray-tracing using a 3D-model of

140

Clustering

(a) Hexagonal grid.

(b) Real-world setup in Dresden.

Figure 7.1 Setups considered in this chapter (pathloss to best serving cell in dB).

the city. The two setups are illustrated in Fig. 7.1, where the pathloss to the best
serving cell is shown as a function of potential UE location. Clearly, the realworld setup has a larger average ISD than the hexagonal grid, but we use the
former as it corresponds to the setup of the test bed discussed in Sections 13.2,
13.4 and 13.5, and the latter as it corresponds to standard next generation mobile
networks (NGMN) simulation assumptions.
Before going into the details of static and dynamic clustering, let us introduce
the concept of ideal clustering as the case where each potential UE location is
served by exactly the set of cells to which it has the strongest links. This is clearly
infeasible in practice, as it will be unlikely to nd other UEs that can be jointly
served through exactly the same set of optimal cells, and as this would involve
a substantial signalling overhead between BSs. However, this concept serves as
a good upper performance bound for any concrete clustering scheme usable in
practice. Assuming for simplication that we have only Nbs = 1 BS antenna per
cell, and that downlink joint signal processing CoMP is performed such that
interference between jointly transmitted streams is completely removed and the
maximum array gain is obtained (i.e. an idealistic assumption), we can state the
downlink signal-to-interference-and-noise ratio (SINR) obtained by a UE j on
an exemplary orthogonal frequency division multiplex (OFDM) sub-carrier if it
is served by a cluster of cells M' as
 m
j
P
mM'
M'

SINRj =
,
(7.3)
2
P
m
j +
m{M\M' }

where M is the set of all cells in the system, and m


j is the path gain (linear and
in terms of power) from UE j to the BS serving cell m. P is the transmit power,
assumed to be equal at each BS, and 2 the noise variance. If each UE is now
served by the best M ' cells in terms of signal strength, cumulative distribution

141

7.1 Static Clustering Concepts

0.9

8dB shadowing

0.8

cluster size 1..20

0.7

0.6

CDF

0.4
0.2
0

bc

bc

bc

bc

bc

1.0
0.9

bc

0.5

0
5
10
Geometry [dB]

15

(a) Hexagonal grid.

20

bc
bc

bc
bc

bc

bc

bc

bc

bc

bc

0.2

bc

0.3
ideal. clust. of 3 cells

bc

0.6

0.1

0.7

no clustering
bc

8dB shadowing

0.8

bc

bc

bc
bc

0.4

bc

bc

0.3

bc

0.5

0.1

CDF

1.0

bc

no clustering
bc

ideal. clust. of 3 cells

5
10
Geometry [dB]

15

20

(b) Real-world setup in Dresden.

Figure 7.2 SINRs achievable under perfect interference cancelation in ideal clusters of
dierent sizes.

functions (CDFs) of the resulting downlink SINRs can be calculated as shown


in Fig. 7.2. This is typically referred to as the interference geometry, which in
this case varies as dierent extents of (idealized) CoMP are applied. We can
see that serving each UE by an ideal cluster of 3 cells oers a median SINR
improvement of about 6 dB or 8 dB as compared to a cluster size 1 (i.e. without
BS cooperation), for the idealistic and real-world setup, respectively.

7.1

Static Clustering Concepts


Patrick Marsch
Let us rst focus on the case of static clustering, i.e. where clusters are created
based on time-invariant information such as base station (BS) locations and
signal propagation properties, considering any potential user equipment (UE)
locations. Such schemes have been proposed in, e.g., [MF07a, BH07b, Ven07].
As mentioned before, it is unfeasible to serve each UE by the best set of cells in
practice. Due to various practical constraints that will be revealed in the next
chapters, we furthermore assume that cooperation takes place in clusters of no
more than |M' | = 3 cells. In the next subsections, we will hence explore how
xed clusters of 3 cells each can be dened so that most potential UEs locations can experience a moderate gain through CoMP, even though it may be
served by a sub-optimal set of cells. We will use the lines marked with crosses
and circles in Fig. 7.2 as lower and upper performance bounds, i.e. representing the non-cooperative case and idealized clusters of 3 cells, respectively, and
denote all potential UE locations as a set J . We generally dierentiate between
non-overlapping and overlapping clusters, which are treated in Subsections 7.1.1
and 7.1.2, respectively, before the impact of the introduced clustering schemes
on the interference geometry is discussed in Subsection 7.1.3.

142

Clustering

7.1.1

Non-Overlapping Clusters
Let us rst consider the case where clusters may not overlap, i.e. where clusters
are disjunct w.r.t. the cells involved. The question is now how such a set of
clusters can be found in accordance to some performance metric at reasonable
complexity.
For this, we consider two dierent optimization criteria. On one hand, it can
be desirable to maximize the mean signal-to-interference-and-noise ratio (SINR)
that the points in J can achieve under a particular xed clustering. On the other
hand, we can consider maximizing a certain outage measure, i.e. the number of
locations in J for which a certain minimum SINR can be achieved. For both
cases, let us assume that a set of C potential clusters has already been chosen
heuristically (for example based on a ranking of the most frequently desired
clusters for the locations in J ), and a matrix
A {0, 1}[MC]

(7.4)

is given, where each non-zero element am,c means that cell m is involved in
cluster c. We now state the SINR achievable at a location j if served by cluster c
similarly as in (7.3) as

P
m
j
SINRcj =

mM,am,c =1

m
j
mM,am,c =0

+ 2

(7.5)

The mean SINR that could be achieved over all locations in J for a particular
potential cluster c can then be stated as
1 
SINRcj .
(7.6)
fc =
|J |
jJ

If this mean SINR is to be optimized, we have the optimization problem


max f T x

(7.7)

s.t. Ax 1[M1]
[C1]

and x {0, 1}

(7.8)
binary,

(7.9)

where x is a vector of binary values stating whether a particular cluster has


nally been selected or not, and the (element-wise) linear inequality constraint
in (7.8) assures that each cell is involved in at most one cluster. This is a standard
binary optimization problem with linear constraints, for which standard solvers
can be used [BV04]. Note that averaging SINRs in linear domain does not reect
the fact that capacity is a concave function of SINR, but this has nevertheless
shown to lead to a numerically stable optimization yielding good results.
If an outage criterion is to be optimized, we introduce a certain target SINR ,
and a matrix B {0, 1}[JC] where each element bj,c denotes whether a terminal

7.1 Static Clustering Concepts

in location j and served by cluster c can achieve the target, i.e.


%
1 SINRcj
j, c : bj,c =
.
0 otherwise

143

(7.10)

We then maximize the number of locations that can achieve the SINR target
through a particular clustering concept by re-stating the optimization problem
from (7.7) as
max 1T z

(7.11)

s.t. Bx z,

(7.12)

and Ax 1

(7.13)
[C1]

and x {0, 1}

[J1]

, z {0, 1}

binary,

(7.14)

where z is a binary auxiliary variable indicating whether the chosen clustering


approach fullls the SINR targets at dierent locations or not. This is similar
to an approach used for determining optimal BS locations in [MN00]. The problem can be re-written into the standard notation for binary linear optimization
problems by jointly optimizing over both x and the auxiliary variable z, i.e.
( )
7
8 x
max 0T 1T
(7.15)
z
)( ) (
)
(
x
1[M1]
A 0

(7.16)
s.t.
0[J1]
z
B I[J]
and x {0, 1}[C1] , z {0, 1}[J1] binary.

(7.17)

Typically, a standard problem solver would rst treat x and z as arbitrary,


positive real-valued variables (i.e. solve a linear programm), and then introduce
additional linear constraints and perform branching to reect the fact that x
and z are constrained to be binary (often referred to as branch-and-cut). As the
number of inequality constraints grows linearly in M and J, and the complexity
of solving the problem can grow exponentially in this number of constraints, it is
essential to choose reasonable values for C, M and J. Optimization complexity
can for example be reduced signicantly by optimizing smaller parts of cellular
systems separately (hence equally scaling down C, M and J), performing a better
heuristical pre-selection of clusters (reducing C), or reducing the granularity of
potential UE locations (hence reducing the number of SINR constraints J).
In this work, for the setups stated before, we perform a pre-selection of potential clusters by observing which cluster of 3 cells would be preferred by which
potential UE location, and then rank the clusters by their frequency of occurrence. We then further reduce the granularity of potential UE locations to obtain
J = 300 SINR constraints. After solving the problems in (7.7) or (7.15), respectively, we then assign all potential UE locations at the initial granularity to
the selected clusters according to the largest geometry. For both the pathloss
data based on a at-plane model and a hexagonal grid of BSs, as well as the

Clustering

Y-Distance [km]

8
3 b 1

36 b 34

53

35

18 b 16

12 b 10

17

4
51 b 49

0.5

1.5

43

9 b 7

20

35
12 b 10

17

81

11

51 b 49

15 b 13

39 b 37

50

14

38

48 b 46

42 b 40

47

36 b 34

2
18 b 16

99

78

8
3 b 1

97

0.5

34

2.0

2.5

9
b

18 b 16

Y-Distance [km]

32

100

18 21 26

29

21 b 19

1.0
1.5
X-Distance [km]

9 b 7
7 8 54
8

20

33 b 31

0.5

(b) Real-world setup, mean SINR max.


2.0

53

0.5

36

75

6 b 4

103

11

15 b 13

23

54 b 52

71
30 b 28

56
7

1.0

12 b 10

0.5

2.5

57 b 55
106

3 b 1

14

85

2.0

26
24 b 22

84
87

1.0
1.5
X-Distance [km]

1.5

17

41
45

59
109

6 b 4

1.0

(a) Hexagonal grid, mean SINR max.


2.0

18 b 16

38
42 b 40

47
97

39 b 37

14
48 b 46

99

81

11
15 b 13

50
100

78

54 b 52
103

0.5

32
9 b 7

20

1.0

33 b 31

5
21 b 19

9 b 7

75

29
6 b 4

56
106

20

30 b 28

23
57 b 55

2.0

71

24 b 22

1.5

26

59
109

Y-Distance [km]

2.0

Y-Distance [km]

144

1.5

6 16 20

17

6 b 4
5
135
3 b 1

1.0

2
15 17 28
2 12 13

0.5

15 b 13

12 b 10
11

14

84

41
45

43

1.0
1.5
X-Distance [km]

87

85

2.0

(c) Hexagonal grid, outage max.

0
2.5

36

0.5

1.0
1.5
X-Distance [km]

2.0

34

2.5

(d) Real-world setup, outage max.

Figure 7.3 Optimization results for non-overlapping clusters.

data based on ray-tracing, the clustering results are shown in Fig. 7.3. We can
here see the sets of locations that are assigned to particular clusters, where the
indices of the involved cells are given (except in cases where this is obvious). For
the hexagonal grid, the result is rather intuitive. In the case of a mean SINR
optimization, complete sites are declared as clusters, as users at sector borders
can benet most from CoMP due to a large signal-to-noise ratio (SNR), while
for the outage optimization, cells of dierent sites are grouped to clusters, as
this can mainly increase the performance of the weak users. For real-world BS
locations, however, clusters may of course span co-located as well as distributed
cells. Clearly, the actual assignment of potential UE locations to the best serving
cluster here leads to a more scattered result due to shadowing, but we have here
averaged over these eects for illustration purposes. The performance obtainable
with these clustering schemes is shown in Fig. 7.5 and will be discussed later.

7.1 Static Clustering Concepts

7.1.2

145

Overlapping Clusters
Clearly, the clustering schemes introduced in the last subsection inherit the problem that UEs at the border between clusters will always experience a low SINR.
Hence, one may consider using spatially overlapping clusters using dierent subsets of system resources [MF07a, Mar10]. This can be seen as a kind of fractional
frequency reuse, but employed in such a way that the overall reuse factor is 1.
Let us assume, for example, that the system resources are split into R = 3
equally-sized resource blocks (RBs). Each cell can then be involved in up to 3
clusters with dierent partnering cells. We can solve both the problem of nding
the optimal choice of overlapping clusters as well as the optimal assignment of
resources to clusters by a simple extension of the problem in (7.7) to
8
7
(7.18)
max f T f T f T x

A 0 0
0 A 0

s.t.
(7.19)
0 0 A x 1[RM+C1]
I[C] I[C] I[C]
and x {0, 1}[RC1] binary,

(7.20)

in the case that the mean SINR is to be optimized. We here simply observe R C
virtual clusters connected to one of the three resource blocks. The constraint
in (7.19) assures that each cell is only involved in one cluster for each RB, and
that each potential cluster is only chosen on one RB. Equivalently, if an outage
measure is to be optimized, we can change (7.15) to
( )
8 x
7
(7.21)
max 0T 1T
z

A 0 0 0
( ) (
)
0 A 0 0 x
1[RM1]

s.t.

(7.22)
0 0 A 0 z
0[J1]
B B B I[J]
and x {0, 1}[RC1] , z {J 1} binary.

(7.23)

The clustering results for before mentioned setups and based on overlapping
clusters are shown in Fig. 7.4. For a hexagonal setup, the result is interestingly
almost the same, independent of whether mean SINR or outage is optimized.
We can see that both co-located cells and those belonging to 3 dierent sites are
grouped to clusters, and that each cell is now involved in exactly one intra-site
and two inter-site clusters. The resulting clustering approach is similar to those
proposed intuitively in [MF07a, Mar10]. For the real-world setup, the chosen
clustering diers strongly depending on the optimization criterion, and a cell
must not necessarily be involved in 3 clusters.

Clustering

2.0

23

Y-Distance [km]

29
6 b 4

33 b 31

32

21 b 19

9 b 7

20

8
3 b 1

54 b 52

12 b 10

17

4
51 b 49

39 b 37

14
42 b 40

47

41

0.5

45

43

1.0

35
12 b 10

17

11
15 b 13

39 b 37

50

14

38

48 b 46

42 b 40

47
99

81

51 b 49
100

78
36 b 34

18 b 16

97

0.5

43

1.0
1.5
X-Distance [km]

34

2.0

2.5

9
b

9 b 7
7 88 9
4 6 20
6 b 4

18 b 16
16 17 18
17

1.5

5
356

145

3 16 17

1 8 10
3 b 1

1.0
2 3 13

2
1 2 12

15 17 28
12 b 10

0.5

11

15 b 13
14

84

2 13 14
11 36 40

41
45

1.0
1.5
X-Distance [km]

16 20 2120

Y-Distance [km]

Y-Distance [km]

3 b 1

53
103
4

0.5

32
8

0.5

2.0
75

5
9 b 7

11 34 36

(b) Real-world setup, mean SINR opt.

33 b 31

20

11 12 36

36

21 b 19
54 b 52

1.0

15 b 13

29

56
106
7

71

6 b 4

12 b 10
10 11 12
11

2 13 15

0.5

2.5

30 b 28

23
57 b 55

1 10 12

14
13 14 15

85

2.0

26
24 b 22

1.5

235

13 14 40

59
109

8 53 54

3 b 1
123
2

5 28 30

(a) Hexagonal grid, mean SINR opt.


2.0

30

84
87

1.0
1.5
X-Distance [km]

6 b 4
456
5
135

38

48 b 46
97

81

11
15 b 13

50
100

35

18 b 16

99

36 b 34

53
103

0.5

78

4 6 20

18 16
16 17 18 b
17 16 17 20

1.5

7 8 54

9 b 7
789
8

75

56
106

1.0

30 b 28

57 b 55

9
b 21
19 20
20

2.0

71

24 b 22

1.5

26

59
109

Y-Distance [km]

146

87

2.0

(c) Hexagonal grid, outage opt.

85

0
2.5

36

0.5

1.0
1.5
X-Distance [km]

2.0

34

2.5

(d) Real-world setup, outage opt.

Figure 7.4 Optimization results for overlapping clusters. The dierently hatched areas

represent clusters connected to the three dierent system resource partitions.

7.1.3

Resulting Geometries
Fig. 7.5 nally shows the geometries that can be achieved with the dierent
proposed static clustering strategies. As before, the upper two plots refer to the
case where the mean SINR is optimized, and the lower two to the case of outage
optimization. We generally consider xed clusters, but calculate geometries based
on many shadow fading realizations with a standard pathloss deviation of 2 dB
or 8 dB, respectively, where the shadowing from one UE to multiple co-located
cells is assumed fully correlated, and that to arbitrary cells has a correlation
coecient of 0.5. While a standard deviation of 2 dB is mostly used to model
channels for indoor UE positions, a value of 8 dB reects outdoor locations. As
mentioned before, the case of no cooperation at all and ideal clusters of size 3
are considered as upper and lower bounds. In Plots 7.5(a) and 7.5(c), reecting
a hexagonal setup, we can see that static, non-overlapping clusters can help to
improve either the strong users (mean SINR optimization) or the outage by
about 3-5 dB. Signicantly improved geometries can be obtained for overlapping
clusters, where for the hexagonal setup there is no dierence between mean SINR

147

7.1 Static Clustering Concepts

1.0
0.9

CDF

0.7
0.6
0.5
0.4
0.3

ut

0.2

ut

ut

rsrsbc
rsbc

0.1

rsutut
bc

ut
ut

ut
ut

rs

ut

bcbc

0.5

utut

0.2

utut
rs

10

(a) Hexagonal grid, mean SINR opt.


1.0
0.9

CDF

0.7
0.5

0.4

0.3
0.2
0.1

ut
rsbc

0
5

rsrsutbc

ut

rs
ut

rs
bc

bc

rsrsbc

bc

bc
bc
bc

bc

bc

5
geometry [dB]

10

0.5

15

0.2
0.1

rsututbc

0
15

ut

ut
rs
rsrsutbc

ut
rs
ut

0.3

rs

ut
rs
ut
rs

ut

bc
rs
bc

rs
bc

bc

utut

utut
rs

ut
rs

ut
rs
rs
rs

bc
bc

rs

bc
bc

bc
bc
nors clust.
rs
rs ut
bcbc
non-overlap.
rs
overlap.
bc
ut
ut

ut
ut

0.6
0.4

ideal. clust.
2dB shad.
8dB shad.

(c) Hexagonal grid, outage opt.

bc
bcbc

ideal. clust.
2dB shad.
8dB shad.

bc

0.7

bc

10

bc
rs

0.8

bc

bc

5
geometry [dB]

rs

0.9

rs no clust.
rs
bc ut
non-overlap.
rs
overlap.

rs
rsbc

rs

bc

rs
rs

rsrs

rs

rs

ut
rs
rs

1.0

ut

rsrs
ut

rs
bc

ut

ut

bc

rs

rs

bc

ut

ut

ut

ut

ut

ut

ut

rsbcbc

ut
utut

(b) Real-world setup, mean SINR opt.

CDF

0.8

ut

utut

rs

rsrsbc

ut

15

utut

rs

ut

rsututbc

5
geometry [dB]

0.6

no clust.
rsrs ut
bc
non-overlap.
rs
overlap.
bc
rs

0.3

utut
utut

ut

0.1

bc

0.6
0.4

ideal. clust.
2dB shad.
8dB shad.

0.7

rs no clust.
rs
bc ut
non-overlap.
rs
overlap.

rs
rsbc

0.8

rsrsut

ut
bcbc

bc

bc
rs

rsrsututbc

ut

ut

ut

rs
bc

0.9

rsrs

bc

rs

ut

bc

rs

ut

ut

1.0

CDF

0.8

ideal. clust.
2dB shad.
8dB shad.

bc

5
geometry [dB]

10

15

(d) Real-world setup, outage opt.

Figure 7.5 Geometries achievable with dierent clustering concepts.

and outage optimization. Performance then approaches that of ideal clustering


within less than 2 dB. These results are also paralleled in Plots 7.5(b) and 7.5(d)
for a real-world setup. Here, however, we can see a stronger benet of nonoverlapping clusters, as interference is more localized due to morphology than
in a plat-plane model. Again, employing overlapping clusters allows to approach
the performance of ideal clustering by less than 2 dB.
While previous results strongly suggest the usage of overlapping clusters, one
must mention that these come at a certain price. As the system resources are
split into R = 3 blocks, one looses multi-user diversity in each cluster, and also
a certain extent of diversity in frequency for each UE [Mar10].
While the clustering techniques introduced in this section were based on optimizing the performance for all potential UE locations (as clustering is calculated
once and then xed), it is intuitively clear that better clustering results can be
obtained if this is performed dynamically, i.e. for a set of actual UE locations,
and changing over time. This will be addressed in the following section, along
with the question of how clustering can be based on UE-centric measurement
data available in legacy networks rather than on ray-tracing data.

148

Clustering

7.2

Self-Organizing Clustering Concepts


Stefan Br
uck, Andrea Garavaglia, Matthias Schulist, Ralf Weber
and Armin Dekorsy
Initial work on dynamic clustering based on radio frequency (RF) channel measurements of the mobile stations can be found in [PGH08] and [PA10]. A key
requirement for any dynamic clustering algorithm is that it ts into the architecture of the radio access and/or the core network of LTE as described in [3GP10i].
The 3GPP standard already oers a framework for self-organizing concepts to
support automatic conguration and optimization of the network [3GP10i]. The
aim of this section is to introduce an adaptive terminal-aware clustering concept
and to show how it can be integrated into the existing network architecture and
the self-organizing network (SON) concept of LTE.
The remaining part of this section is organized as follows: First, we briey
provide an overview of the SON framework in LTE in Subsection 7.2.1. Then,
the proposed adaptive clustering algorithm is introduced in Subsection 7.2.2 and
simulation results are provided in Subsection 7.2.3. In Subsection 7.2.4, it is then
shown how this algorithm can be integrated into the existing LTE architecture
and the operation and maintenance (OAM) system.

7.2.1

Self-Organizing Network Concepts in 3GPP LTE


With the specication of the 3GPP Release 8 standard of LTE, the concept of
SON has been included to pursue a reduction of network operational costs. The
basic principle is to include appropriate instruments into the standards such that
typically time-consuming tasks related to network operations can be automated
as much as possible. With this principle in mind, dierent areas where such
savings could be applied have been identied, e.g.:
Self-Conguration: The ability of (a group of) network elements to congure themselves automatically, e.g. at power-up or after a major failure or
change propagated by the OAM.
Self-Optimization: The ability of the system to automatically adjust system parameters (could be per single site or cell, as well as per cluster) in
order to maximize a certain predened performance objective, according to
the operators strategy.
Self-Healing: The ability of the system to recover from a major failure (for
example a site gets out of service), by temporarily reconguring the surrounding elements or nodes in order to guarantee a minimum service quality.
The LTE system has been designed from the beginning to include these features by identifying what tasks could be automated both from a practical and
feasibility point of view in a real system and specifying the means to achieve
such automation (see [3GP10e], [3GP10i]).

7.2 Self-Organizing Clustering Concepts

149

Some functions have been specied in Release 8, like the automatic neighbor relation (ANR) function, the physical cell identier (PCI) selection function
and self-conguration functionalities, while others have been added in Release 9
and later rened. Among them are mobility robustness optimization, mobility
load balance - see [3GP10e], [3GP10i] for latest updates. The specication work
relates to both detailed interfaces within the network, which is mainly done
in the radio access network (RAN) working groups, as well as details related
to the management of the system (OAM), mainly done in the SA5 group (see
www.3gpp.org). Both centralized and distributed functions are considered from
an architecture point of view, where the particular choice is made case-by-case,
based on trade-os between complexity to implement the function and overall
benets for the operations.
The approach that is followed in this work for CoMP clustering also considers
the possible impact on the standardization, and how the functionality could be
integrated in the future 3GPP specications. By looking at existing functions,
it turns out that the ANR function, which collects radio quality information
to automatically create and adjust neighbor relation tables (NRTs) in the systems, provides a good framework for the integration of a clustering algorithm
with limited complexity. In fact, CoMP clustering could be regarded from this
perspective as an extension of the information that is included in the neighbor
relations, by considering what cells are suited to cooperate.

7.2.2

Adaptive Clustering Algorithms


The idea of adaptive clustering for CoMP is to provide the system with the ability
to capture variations of the perceived radio environment and user locations, to
achieve better CoMP performance. In fact, due to the time-variant characteristics
of the wireless channel, the variations of system loading and the mobility of the
users, it is expected that a clustering algorithm able to adapt to such conditions
will enable CoMP to perform better from a system point of view than a static
clustering approach as considered in Section 7.1, where all cooperative sets are
pre-dened based on proximity information and on network planning predictions.
Lets consider a group of CoMP-enabled user equipments (UEs) served by
a cell m M in the macro area where CoMP functionalities are principally
available (which we denote as the CoMP top cluster ). These UEs will report
radio quality measurements to the serving cell m M, which can thus collect
statistics of RF measurements and exploit them for CoMP clustering purposes.
One simple option to collect UE inputs is to make use of existing measurements,
for example extracting data from measurement report messages (MRMs) in terms
of averaged reference signal received power (RSRP) of the measured cells. Let
us assume that a UE reports a set of cells Mc M, where for example only
the cells stronger than a certain congurable threshold are considered forming
the set, reported in any arbitrary order. The serving cell m collects several of

150

Clustering

CCU

Clusters 1 , 2 , , c

Optimal
Goal
Cellm
[
[

M1 reported 10 times
M2 reported 5 times

Mc reported Nc times

1 , N1] = [ {Cell-m, Cell-k}, 10]


2 , N2] = [ {Cell-m, Cell-1}, 20]
.
c , Nc] = [ {Cell-m, Cell-5}, 17]

UE-2
UE-1

Cell-1

UE-i

1 = {Cell-m, Cell2, Cell-k}

Cell-m: -96 dBm

( Cell-2 discarded as below


selected threshold )

Cell-2: -110 dBm


Cell-k: -99 dBm
MRM from UE-i

Cell-2

Cell-k

Figure 7.6 Adaptive CoMP clustering algorithm ow example.

such sets from dierent terminals over the observation period T and computes
relevant statistical properties. In each serving cell, the reported information can
be summarized with a list of pairs [Mc , Nc ], whereby Nc 1 is the number of
occurrences the set Mc has been reported by all UEs to the serving cell during
the period T . The idea behind [Mc , Nc ] is that cell combinations that have been
observed very often oer a higher potential to improve the system performance
for several users when a CoMP scheme is adopted.
This information is eventually collected in a CoMP central unit (CCU) associated with the considered top cluster, which computes the cell clusters in an
adaptive manner by optimizing selected objectives. Fig. 7.6 illustrates the steps
of the entire process and the involved logical entities: At each period T , information is collected at the CCU and passed to an optimization algorithm that
adapts the cell clustering and redistributes back the new sets to all base stations
in the top cluster.
As in Section 7.1, the optimization problem to be performed by the CCU can
be written in a classical linear programming notation. Let C denote a set of C
potential clusters chosen based on the before mentioned UE reports and according to some heuristic. As before, binary matrix A {0, 1}[MC] states which cell
is involved in which potential cluster. Dierent from Section 7.1, however, clusters may consist of varying numbers of cells within a heuristically chosen range

7.2 Self-Organizing Clustering Concepts

151

c : Kmin |Mc | Kmax . To each potential cluster c there is an associated


cost c and a cardinality |Mc |, i.e. the number of cells belonging to the set. The
choice of clusters from all potential clusters in C can now be stated as a cost
minimization problem, i.e.

min
c xc
(7.24)
cC

s.t. Ax 1

(7.25)
[C1]

and x {0, 1}

binary.

(7.26)

where x {0, 1}
is a binary vector stating which clusters have nally been
selected, and (7.25) assures that each cell is involved in at least one cluster. If
this latter constraint is changed to an equality, disjoint clusters are enforced, as
in Section 7.1.1. Furthermore, equally-sized clusters can be obtained by choosing
Kmin = Kmax . One of the key factors to dene an appropriate optimization problem is the selection of the cost function c . Looking at the CoMP functionality,
a trade-o between system complexity and performance could for example be
obtained by making the cost proportional to the cluster cardinality |Mc | or to
the number of required X2 interfaces (as large clusters increase system complexity) and inversely proportional to the combined radio conditions of the cluster
cells (better radio conditions means higher performance). In order to account
for the number of UEs that would benet from a certain cluster c, the cost can
included a term inversely proportional to Nc , i.e. to the number of UEs that
have reported the cluster, leading to
[C1]

|Mc |
,
RSRPm 10Nc

(7.27)

mMc

where RSRPm represents the average of the UE measurements for cell m


expressed in linear scale and collected over all received measurement reports.
Equation (7.27) captures the combined radio conditions of the set of cells Mc as
an estimate of the CoMP performance potential of that cluster. This cost function has been selected heuristically after comparing dierent similar options and
will be used later on for a simulative analysis of the algorithm. As mentioned in
Section 7.1, obtaining the optimum of the problem described by equations (7.24)
to (7.27) may be of prohibitive computational complexity, depending on the
number of cells and potential clusters. Hence, we describe a simplied, heuristic
algorithm leading to a good sub-optimal solution in the sequel:
1. Generate a set C of potential clusters according to cardinality constraints in
an exhaustive way
This is possible as the computational complexity, which grows exponentially with the number of cells and neighbor relations, remains small in
practical cases, considering a top cluster includes around 30-40 cells in realistic cases and Kmax is practically constrained to a few cells, comparable

152

Clustering

to the active set size in wideband code division multiple access (WCDMA)
systems.
2. Associate to each potential cluster c C its cost c according to (7.27)
3. Create an initial optimization solution by adding the sets in increasing cost
order, till all cells are included in the nal solution or there are no more
potential clusters available
A modied version is needed in case of disjoint sets (set partitioning), as
at each step the sets overlapping with the ones already put in the solution
shall be removed from the candidate list - the process stops when all cells
are covered or the candidate list is empty.
4. Improve the solution by step-wise replacing two (or more) sets with one set
not yet included, whose cost is lower than the sum of the costs of the replaced
sets.
This is in fact the only way to decrease the cost, as the initial solution was
built by selecting sets in cost-increasing order.
Cluster computations and simulation results following this scheme are detailed
in the next section.

7.2.3

Simulation Results
In order to evaluate the performance of the adaptive clustering principle, system
level simulations were run employing the hexagonal setup shown in Fig. 7.7 and
the simulation assumptions according to [3GP10d]. A 3GPP reference network
layout was congured with 19 3-sector sites of 500 m inter-site distance (ISD).
Each of the 57 cells was equipped with Nbs = 2 antennas of 15 degrees downtilt. Herewith, the typical 3GPP urban macro spatial channel model extended
(SCME) in the 2 GHz band was used. A number of 100 UEs were placed at
random locations within each of the 4 hotspot areas indicated in Fig. 7.7. UEs
were simulated with Nue = 2 antennas moving at a speed of 3 km/h. For each
UE, the 8 strongest interfering sectors were simulated as spatially correlated.
A signal bandwidth of 5 MHz was used, and the maximum transmit power per
sector was set to 20 W (43 dBm) per 5 MHz.
Fig. 7.7 shows the result of the applied clustering algorithm which was congured to obtain the optimal solution for a disjoint set of clusters with up to
3 cells using a shadow fading standard deviation of 2 dB. The clustering algorithm took all RSRP measurements from UEs into account that were greater
than 120 dBm. It is apparent that for the two circular-type UE hotspots on
the upper right and lower left, the closest 3 sectors (4, 6, 26) and (14, 16, 42)
from three dierent sites were selected, respectively. For the other two line-type
hotspots on the upper left and lower right, the three geographically closest sectors (10, 33, 35) and (18, 20, 52) in the middle of the area as well as the adjacent
cells each belonging to a dierent site were selected.

153

7.2 Self-Organizing Clustering Concepts

500

500

Y-Distance [m]

1000

Y-Distance [m]

1000

500

500

1000

1000

1000

500

500

1000

1000

500

X-Distance [m]

(a) No clustering.
1.0
0.9
ut

utbc

utbc

1.0

0.8

0.6

0.6

CDF

0.7

0.5
0.4

0.5

0.4

0.3
ut

0.2

bc

0 utbc
10

utbc

bc

0
5
10
geometry [dB]

1000

0.9

0.7

0.1

500

(b) Adaptive clustering, Kmax = 3.

bc

0.8

CDF

utbc
ut

X-Distance [m]

15

(c) Resulting interference geometries.

0.1
20

bc

0.2

bc

bc

bc

bc

bc

0.3

no CoMP
2dB shad.
8dB shad.

bc bc bc

bc

2dB shad.
8dB shad.

bc
bc
0 bc
1 0 1 2 3 4 5 6 7 8 9 10
geometry for Kmax = 3 vs. no CoMP [dB]

(d) Geometry gain.

Figure 7.7 Cell layout with UE positions and selected clusters, and clustering gains.

In order to assess the performance of the adaptive clustering algorithm, network simulations were run with the calculated clusters obtained in Fig. 7.7(b). As
in the previous section, performance is measured via the interference geometry
introduced in (7.3). The cumulative distribution function (CDF) of UE geometries obtained for the calculated CoMP clusters using 2 and 8 dB shadow fading
standard deviation are depicted in Fig. 7.7 c) and compared to the corresponding geometries if the UE is served by one cell only1 . While a shadow fading with
8 dB standard deviation represents the default value for outdoor scenarios, a
standard deviation of 2 dB is selected for indoor scenarios. It is seen in the gure
1

Simulations without CoMP resulted in similar geometry curves for the selected UE distributions in Fig. 7.7 a) for dierent shadow fading standard deviations due to the underlying
3GPP cross-correlation coecient which is set to 0.54 for interfering cells from other sites
and to 1.0 for interfering cells belonging to the same site.

Clustering

500

500

Y-Distance [m]

1000

Y-Distance [m]

1000

500

500

1000

1000

1000

500

500

1000

1000

500

X-Distance [m]

1.0

0.9

0.8

bc

bc

1.0

0.7

bc

0.4
0.3
bc

0.1
0

bc

10

bc

bc

bc

1000

2dB shad.
8dB shad.

5
0
5
10
geometry gain vs. static clust. [dB]

(c) Geometry gain, adaptive vs. static clust.

bc

bc

bc

bc

bc

bc

bc

0.5

0.3
bc

bc

0.6

0.4
0.2

bc
bc

0.8

CDF

0.5

bc

0.9

bc

bc

0.6

500

(b) UEs suering from reduced geometry


gain with respect to static clustering.

0.7

X-Distance [m]

(a) Static clustering example, Kmax = 3.

CDF

154

0.2

bc

0.1

2dB shad.
8dB shad.

0
0

2
4
6
8
10
geometry loss w.r.t. ideal clust. [dB]

(d) Loss w.r.t. UE-specic, ideal clustering.

Figure 7.8 Cell layout with UE positions and selected clusters, cluster size Kmax = 3.

that for 50% of the observed geometries, the CoMP clustering algorithm results
in a 6 dB better geometry environment. Though the CDFs in Fig. 7.7 c) are
showing the geometry statistics of all UEs, the curves do not reect the eective
improvement experienced by individual UEs at the very same position in the
network. These are instead represented in Fig. 7.7(d).
For the outdoor case with 8 dB shadow fading standard deviation, the CoMP
cluster in Fig. 7.7(b) achieves a median geometry improvement of 3.5 dB for
individual UEs, whereas for the indoor case, the reduced standard deviation of
2 dB leads to an even higher median gain of 5.7 dB.
One drawback of the adaptive clustering algorithm compared to pre-dened
(static) clusters is the need for an additional control entity in the network and
the increased signalling overhead to estimate a good set of clusters. Such addi-

155

7.2 Self-Organizing Clustering Concepts

500

500

Y-Distance [m]

1000

Y-Distance [m]

1000

500

500

1000

1000

1000

500

500

1000

X-Distance [m]

(a) Selection up to cluster size Kmax = 4.

1000

500

500

1000

X-Distance [m]

(b) Selection up to cluster size Kmax = 6.

Figure 7.9 Clustering results for dierent maximum cluster sizes Kmax .

tional complexity is only justied if gains compared to static clustering can be


achieved. In order to evaluate the performance improvement of the adaptive
algorithm, simulations were run and compared with a static cluster as dened
in Fig. 7.8(a). The selection of the static cluster was done based on empirical
proximity layout considerations, where each cell of a cluster belongs to a dierent site. A wrap-around mechanism was used to assign sectors at the borders of
the macro cluster. As can be seen from Fig. 7.8(c), using a radio channel aware
adaptive clustering, 32% (60%) of the UEs experience a performance improvement, 66% (30%) experience the same performance in terms of geometry gain
for 8 dB (2 dB) shadow fading standard deviation, respectively.
The optimization algorithm for adaptive CoMP clustering considers all UEs
and selects a set of clusters under the given constraints (i.e. number of cells per
clusters, disjoint sets etc.) in order to optimize the performance for the majority
of UEs. However, some UEs may not experience an improvement if compared
with a dierent cluster set selection depending on their location and the standard
deviation of the shadow fading. Herewith, UEs can be aected that are located in
areas between sectors which do not belong to the same CoMP cluster. This can be
seen from Fig. 7.8(c), where around 14% of the UEs perceived a lower geometry
than the static cluster set of Fig. 7.7(c) in the 2 dB shadow fading case. While
in the static cluster case, these UEs were located between sectors belonging to
the same CoMP cluster, they were located at the border of two CoMP clusters
selected by the adaptive clustering algorithm. In case of the higher shadow fading
standard deviation of 8 dB, this eect is reduced and aects only 2% of the UEs.
Fig. 7.8(b) depicts those UEs that experience a lower geometry with adaptive
than with static clustering.
As stated at the beginning of this chapter, the case of UE-specic clustering,
where each UE is assigned to its set of best-serving cells, can be seen as a

Clustering

rsut

qprsut

ut

1.0

bc
qprsqp
bc

ut
rs

0.9

bc

ut
qp

0.8

rs
bc

rs

0.7
bc
qp

0.6
CDF

156

qp
ut

ut
ut
rs

ut

0.5

bc
rs
qp
rs
bc
qp
rsutbc

bc

0.4
ut

0.3
qprs

0.2

bcrs

0.1
ut

bc
qp

rs

qp

qp
bcqp

bc

bc

SF=2dB,
SF=8dB,
SF=2dB,
SF=8dB,
SF=2dB,
SF=8dB,
SF=2dB,
SF=8dB,

CS=3
CS=3
CS=4
CS=4
CS=5
CS=5
CS=6
CS=6

0
0

4
6
8
10
geometry gain vs. no CoMP [dB]

12

14

Figure 7.10 Interference geometry obtained with dierent cluster sizes.

performance benchmark. The performance gap between the proposed UE-aware


adaptive clustering algorithm leading to Fig. 7.7(b) and UE-specic clustering
with cluster size 3 is shown in Fig. 7.8(d). In the 2 dB shadow fading case, 30%
of the UEs would experience an improvement, where 10% perceive more than 3
dB geometry gain. For 8 dB shadow fading standard deviation, 64% of the UEs
enjoy better performance and 10% even benet from a 5 dB geometry gain.
In order to improve the performance gain obtained through adaptive clustering, higher cluster set sizes and/or overlapping clustering can be applied, as in
Section 7.1.2. Fig. 7.9 shows the cluster selection and performance for increased
cluster sizes of up to 4 and 6 cells per cluster. It is apparent from Fig. 7.10
that further geometry gains can be achieved when higher number of sectors per
cluster can be accomplished by the network. For example, in case clusters of up
to 6 cells can be selected, the median geometry gain compared to no clustering reaches 6 and 9 dB for shadow fading standard deviations of 2 and 8 dB,
respectively. Compared with a maximum cluster size of Kmax = 3, this gives an
additional performance improvement of 2.5 3 dB. Nevertheless, the additional
price to pay with the use of larger cluster sizes is again an increased complexity
and signalling amount needed to coordinate the cluster scheduling.
At the end of this section, some aspects that deserve further investigations
shall be briey mentioned. The presented results have been achieved based on the
cost function from (7.27). During the investigations it turned out that the performance is very sensitive to changes in the cost function. Further investigations
of reliable cost functions are therefore an interesting topic for future investigation. It should further be noted that the antenna pattern model that has been
chosen according to [3GP10d] has a strong backlobe component, which has a
signicant impact on the footprint of the sector and therefore impacts cluster-

7.2 Self-Organizing Clustering Concepts

157

ing. Considering more realistic antenna patterns is therefore of interest for future
evaluations. Another aspect to be mentioned is the modeling of shadow fading
in the 3GPP spatial channel model (SCM). Shadow fading is modeled as being
spatially uncorrelated in the SCM. A more realistic model, however, would take
a correlation of the shadow fading over distance into account. This missing correlation impacts the presented results as well, since UEs being located next to
each other can measure very dierent RSRP values from the same cells. Taking
a spatial correlation of shadow fading into account, it becomes more likely that
closely located UEs report similar sets of cells Sj , which should improve the
reliability of the adaptive clustering algorithm.

7.2.4

Signaling and Control Procedures


In this subsection, it is illustrated how self-organizing clustering concepts can be
included in 3GPP LTE-A as described in [3GP10i]. In particular, it is shown how
the SON functionality named ANR can be enhanced to enable clustering of cells.
The purpose of the ANR function is to relieve the operator from the burden of
manually managing neighbor relations (NRs), which are needed to e.g. execute
handovers from one cell to the other.
The ANR function resides in the enhanced Node B (eNB) and manages the
conceptual NRT. One main sub-function is the neighbor detection function that
nds new neighbors and adds them to the NRT. Moreover, to remove outdated
neighbor relations, the ANR function also contains the neighbor removal function. Both subfunctions are implementation specic. Further, an NR in the context of the ANR functionality is dened between a source and a target cell. The
NR relates the source and the target cell such that the source cell
knows the enhanced UMTS terrestrial radio access network (E-UTRAN) cell
global identier (CGI) and the PCI of the target cell.
has an entry in its NRT in order to identify the target cell.
has set the attributes in the NR for the target cell. Either the attributes are
congured by default or set via OAM.
The NRT contains an entry for each NR with a target cell, which is addressed
by the target cell identier (TCI).
The base station instructs its mobile stations to perform measurements in
order to identify the TCI of neighbor cells. Both the type of the measurement as
well as the periodicity of the reports is determined by the base station. A typical
example of those measurements are average received power levels obtained from
the LTE reference signals (RSRP). Since the NRT contains mid to long term
measurements obtained from specic mobile stations, this knowledge of the base
station about neighbor cells can be exploited for adaptive cell clustering.
In the following, a framework for control signaling and the network architecture is presented that allows using the ANR and the NRT for cell clustering.

158

Clustering

Figure 7.11 Architecture for adaptive cell clustering.

To manage clustering of an entire network with a huge number of cells with


acceptable complexity, we rstly group a set of cells in top-clusters that are precomputed and static. We assume that a CCU controls such a top-cluster, such
that the clustering method introduced here is a central approach as well. Distributed approaches are not covered by the presented architectural framework.
The main task of the CCU is to determine sets of cells within a top-cluster
that form a CoMP cluster. For simplicity of the signaling, it is assumed that
all cells of a base station belong to the same top-cluster. Fig. 7.11 depicts the
architecture for the adaptive cell clustering. It is assumed that the CCU is part
of the SON server of the OAM functionality. The OAM system is connected by
the Itf-N interface to the domain manager device manager (DM), which again
is connected to several eNBs. Fig. 7.11 further illustrates the inputs (Ix), the
parameters (Py) as well as the logical functions (Lz) being required for adaptive
cell clustering. In this architecture, the mobile stations report over the LTE-A
air interface MRMs including e.g. RSRP measurements and PCIs of those cells
that are strongly received to their serving base stations (I1). It should be noted
that for the algorithm described above only existing LTE Release 8 messages
are required. The base station extracts the RF measurements and set of cells
contained in the MRM (LF1). These measurements are used to update the NRT
and to set extension attributes in the NRT required for CoMP clustering (I2).
This includes the RF measurement values themselves as well as the frequency
of the measurements. Once the NRT is updated, the base stations transmit the

7.3 Summary

159

extended NRTs to the CCU via the DM and the Itf-N interface (LF2). The CCU
in the SON server receives the updated and extended NRT (LF3) and computes
the updated clusters (LF4) within a top-cluster as a result of the optimization
algorithm described in the previous subsections. Finally, the CCU transmits the
updated cluster information (LF3), which is then received by the base stations
(LF2). If a master/slave concept is applied for CoMP, the message may also contain information which cells act as master and slaves. Since in this architectural
framework the CCU is part of the OAM, the Itf-N interface is impacted and needs
to be enhanced to support sending the extended NRT tables to the CCU and
the updated cluster information back to the base stations. Principally, the CCU
could also be part of the serving gateway (S-GW) or the mobility management
entity (MME). In this case the S1-U and the S1-C interfaces could be extended,
respectively. In case the CCU is a completely separate entity, introducing a new
interface is required.

7.3

Summary
In this chapter, cell clustering techniques were observed, which are an essential prerequisite of using CoMP in practical cellular systems. One can here
mainly dierentiate between static clustering concepts, where certain sets of
cooperation-enabled base stations are dened once and then xed, and adaptive,
or self-organizing clustering concepts, where clusters are adapted over time to
user locations. Interference geometries have been introduced as a performance
metric for clustering, where all results can then be compared to the theoretical benchmark of UE-specic, ideal clustering, where each terminal is served by
the best possible set of cells. We have seen that static clustering can already
obtain a geometry within a few dB of that of UE-specic clustering if overlapping clusters are dened, hence if each cell is involved in multiple clusters
connected to dierent portions of the system resources. Performance can be further improved at the expense of signaling overhead through adaptive clustering,
where a concrete algorithm based on terminal-side radio channel measurements
already supported in LTE Release 8 was described. With adaptive clustering, up
to 70% of all terminal stations experienced geometry gains compared to the case
of static, non-overlapping clusters. It has to be noted, however, that the performance of adaptive clustering is very sensitive to the choice of the cost function,
which is hence an important topic for future investigations.
Finally, it was shown that the proposed adaptive clustering scheme ts well
into the SON framework of LTE with only slight extensions of the system architecture and of the already existing SON ANR concept. In this case, the clustering
algorithm is run on the SON server of the OAM system.

Synchronization

This chapter deals with another major challenge connected to CoMP, namely
the synchronization of cooperating and cooperatively served devices in time and
frequency. On one hand, there are dierent local oscillators in each base station
and mobile terminal that lead to deviations in the carrier frequency according
to its nominal value. On the other hand, there are variations in the symbol
timing between each transmitter and receiver station. Both eects need to be
compensated by synchronization techniques.
In cellular networks, we can distinguish between a network synchronization
among all involved base stations and the alignment of the user equipments to that
time and frequency reference. The basic denitions of the synchronization terms
as well as procedures for the reference network synchronization are described in
Section 8.1. The impact of symbol timing mismatches on CoMP is then treated
in Section 8.2, before Section 8.3 concludes this chapter with the analysis of the
impact of residual carrier frequency osets on CoMP performance.

8.1

Synchronization Concepts
D. Richard Brown III and Andrew G. Klein
Synchronization is the process of establishing a common notion of time among
two or more entities. In the context of wired and wireless communication networks, synchronization enables coordination among the nodes in the network and
can facilitate applications such as distributed sensing. Precise synchronization
can also facilitate scheduling of communication resources as well as interference
avoidance in multi-access networks. This section provides an overview of some of
the synchronization concepts and techniques used in coordinated communication
networks.

8.1.1

Synchronization Terminology
In the context of wireless communication networks, each node in the network keeps
a local notion of time, i.e. a clock, by counting cycles of a local oscillator (LO).
Among other parameters, all oscillators are characterized in terms of their nominal

162

Synchronization

frequency and accuracy. The accuracy of an oscillator is typically specied in parts


per million (ppm) with respect to the nominal frequency. For example, a 10 MHz
LO with 10 ppm accuracy oscillates with a frequency between 100 Hz of the
LOs nominal 10 MHz frequency. Low-cost oscillators typically provide 100 ppm
accuracy [FMd09], and higher-cost temperature-compensated or oven-controlled
oscillators can provide accuracies better than 1 ppm [Rak09]. A clock derived
from an unsynchronized low-cost 100 ppm LO can gain or lose, with respect to a
perfect reference clock, up to 8.64 seconds in a day.
Two clocks are said to be perfectly syntonized if they agree exactly on the
duration of an interval between two events. In other words, syntonized clocks
share the same rate or frequency, but there is no requirement for the clocks to
agree on the time of a single event. Syntonized clocks are sometimes said to
be frequency synchronized. A clock can be said to be syntonized to a specied
level of uncertainty if the frequency dierence with respect to a reference clock
(often normalized and specied in ppm) is no more than the uncertainty. This
frequency dierence is commonly called frequency oset or skew and is typically
specied in statistical terms, e.g. standard deviation or maximum frequency oset. For example, the LTE specication requires user equipments (UEs) to have
a maximum frequency oset of 0.1 ppm [STB09].
Two clocks are said to be perfectly synchronized if they agree exactly on the
time of occurrence of an event at an arbitrary time. Note that synchronized
clocks must also be syntonized since unsyntonized clocks can only agree on the
occurrence of an event at a particular time. Clocks can also be said to be synchronized to a specied level of uncertainty if they do not agree precisely on the
time of occurrence of an event, but the dierence in the measured event times
between the clocks is no more than the uncertainty. The time dierence between
two clocks is commonly called clock oset or phase oset and is typically specied
in statistical terms, e.g. standard deviation or maximum clock oset.
Since each node in a wireless network keeps time with its own LO, syntonation and synchronization is necessary to establish a common time base among
the nodes in the network to a desired level of precision. In fact, if the nodes
in the network have a maximum clock oset requirement, periodic frequency
and phase re-synchronization is necessary to correct for unavoidable phase drift
between any pair of nodes caused by frequency oset and oscillator instabilities.
Several factors can aect the accuracy of phase and frequency synchronization
among nodes in communication networks. These factors include local oscillator
stability, network stability, and the re-synchronization interval, i.e. how often
synchronization messages are exchanged among the nodes in the network.
Figure 8.1 shows an example of clock and frequency oset between two clocks
labeled clock A and clock B. The re-synchronization interval in this example
is denoted as T . Prior to the rst synchronization attempt at time t1 , the phase
of clock B is ahead of clock A and the phases of the clocks are drifting apart
due to the frequency oset. At time t1 , clock B adjusts its frequency and phase

8.1 Synchronization Concepts

163

clock B

T
t1

t2

t3

clock A

Figure 8.1 Clock and frequency oset example.

in response to a synchronization attempt with clock A. In this example, the


clocks become syntonized at t1 , but they remain unsynchronized. At time t2 ,
the clocks again attempt to synchronize. In this example, clock B and clock A
are temporarily synchronized at time t2 , but are now unsyntonized. The nodes
become synchronized and syntonized after the synchronization attempt at t3 .
In the following subsections, we provide an overview of several synchronization
techniques suitable for distributing a common notion of time to nodes in wired
and wireless communication networks. We begin with a discussion of two network
synchronization techniques: network time protocol (NTP) and precision time
protocol (PTP), i.e. IEEE 1588. We then describe satellite-based synchronization techniques with a focus on global positioning system (GPS). We conclude
with a discussion of endogenous wireless distributed synchronization techniques
suitable for the precisely synchronizing and syntonizing the carriers of wireless
transmitters for distributed phase coherent communication.

8.1.2

Network Synchronization
The Network Time Protocol
The NTP is a protocol for synchronizing the clocks of nodes that are connected
through variable-latency networks [Mil91]. NTP is an application-layer protocol
that operates over the Internet protocol (IP), and can therefore be implemented
completely in software. The protocol has been in use since the 1980s, and today
it is responsible for synchronizing the clocks of the majority of computers connected to the Internet. Nodes in the network are assigned to a class or stratum,

164

Synchronization

Master clock time Slave clock time

Ti3
Ti2
Ti1
Ti

Figure 8.2 NTP message exchange.

and those with the lowest stratum number are assumed to be perfectly synchronized with Coordinated Universal Time (UTC). Nodes with higher stratum
numbers synchronize their clocks with nodes having lower stratum numbers. This
hierarchical structure of NTP results in it being highly scalable.
To estimate clock osets, a master and slave exchange timestamps which are
64-bit descriptions their current local clock time. Figure 8.2 demonstrates the
exchange of timestamps between a master and slave. If Ti , Ti1 , Ti , and Ti1 are
the four most recent timestamps, then the clock oset of the slave relative to the
master at time Ti can be calculated via
Ti2 + Ti1 Ti3 Ti
.
(8.1)
2
Since each NTP message contains the last three timestamps Ti1 , Ti2 , Ti3 ,
and the nal timestamp Ti is estimated upon arrival of the message, the clock
oset can be estimated from a single message exchange between slave and master.
i =

Equation (8.1) implicitly assumes that the two transmission paths are symmetric and have equal delay. In practice, however, network delays are stochastic
quantities. Consequently, NTP performs multiple oset estimates in combination with a ltering and selection scheme to obtain a more accurate estimate
of the clock oset. The estimated clock osets are fed to a Type-II adaptive
parameter phase-locked loop (PLL), which corrects the LO phase and frequency.
An adaptive Type-II PLL has one integrator in the loop lter (or two poles
in the open-loop transfer function) and continuously adjusts the phase and frequency [Smi86].
The accuracy of the protocol depends on a variety of factors, including the
update interval and network topology. Several studies (e.g. [Mil03, MTH97,

8.1 Synchronization Concepts

165

KZM07, Min99]) have investigated the performance of NTP under typical use,
showing that clock osets have a standard deviation on the order of several
milliseconds, and residual frequency osets on the order of 0.1 ppm.
The Precision Time Protocol
Also known as IEEE 1588 [IEE08a], the PTP attains sub-microsecond accuracy
which is necessary in applications such as networked control systems and precision machinery in factories. The phase and frequency correction in PTP are quite
similar in principle to NTP: after a sequence of messages are exchanged between
slave and master, the clock osets are estimated through ltering and selection,
and are used to adjust a PLL which corrects the LO phase and frequency. There
are, however, several fundamental dierences between PTP and NTP. The primary dierence is that PTP is implemented in hardware rather than software.
By moving the clock synchronization as close to the physical layer as possible,
sources of jitter and processing delay introduced in network layers higher up the
stack can be mitigated. In addition, PTP is primarily intended to be used in
a local area network (LAN) setting as opposed to NTP, which may synchronize to an Internet clock reference located some far distance away. While PTP
can achieve a higher accuracy than NTP, it does require the use of dedicated
hardware. The performance of PTP will again depend on a variety of factors,
including the quality of the LO, as well as the network topology. Products already
available on the market today [Sem10] claim clock osets within 1 s and frequency osets better than 0.01 ppm. Similar results were achieved [Ton05] in
a test of PTP over a metropolitan area network.
The most recent version of the standard, referred to as IEEE 1588-2008, oers
a transparent clock mode which requires dedicated network switches that support
the standard. Such switches employ a transparent clock that further minimizes
delay by providing an alternate local clock for network nodes so that they need
not rely on the master clock. This mode permits maximum clock oset errors on
the order of tens of nanoseconds [HJ10].

8.1.3

Satellite-Based Synchronization
A Global Navigation Satellite System (GNSS) permits nodes to determine their
location to within a few meters using time signals received line-of-sight from satellites. While the primary intent of a GNSS is for determining position information,
such systems are also very useful as an accurate, common clock reference. In contrast to NTP and PTP, clock synchronization using a GNSS is done wirelessly
using one-way communication links (i.e. by receiving signals broadcast from the
satellites). In order for a terrestrial node to be able to receive the relatively weak
signals from distant satellites, however, a line-of-sight link is typically necessary.
In the absence of precise location information, a node must be able to receive
signals from four satellites since there are four unknowns: latitude, longitude,

166

Synchronization

exp{j(t )}

exp{jt}
(t

)
(t

T1

T2
D

Figure 8.3 Two-transmitter one-destination distributed beamforming scenario.

altitude, and time. If precise location information is available, only one satellite
is needed for clock synchronization since propagation delay is known.
Examples of GNSSs include the United States GPS, the Russian GLObal
NAvigation Satellite System (GLONASS), and the European Galileo system. As
stated in [LAK99], GPS provides clock synchronization to better than 100 ns
in time and 1013 in frequency. Other satellite systems are expected to give
synchronization accuracy of a similar order, as they share many of the same
parameters as GPS [HP05a].

8.1.4

Endogenous Distributed Wireless Carrier Synchronization


Coherent downlink CoMP techniques have recently been proposed in which the
base stations (BSs) transmit with phase-aligned carriers such that the bandpass
signals are aligned at the intended destination. These coherent transmission techniques require the BSs to accurately pre-compensate for the downlink channel
phases and maintain close synchronization. One approach to coherent downlink
CoMP, as discussed in Sections 13.3 and 13.4, is to closely syntonize the BSs carriers using, for example, highly stable GPS-referenced local oscillators. Coherent
downlink transmission can then be achieved by having the mobiles estimate the
downlink channel state information (CSI) and feeding this back to the BSs for
carrier phase pre-compensation. However, periodic downlink CSI re-estimation
and low-latency feedback on the order of a few milliseconds is necessary to maintain phase coherence at the mobile, as pointed out in Sections 13.3 and 13.4, and
the used oscillators may be considered a cost issue in a large-scale deployment.
A dierent approach to coherent downlink CoMP is to have the BSs endogenously synchronize their carriers without the aid of GPS and without CSI feedback from the mobiles. The two-way downlink beamforming technique proposed in [PD10] is one example of this approach. Two-way downlink beamforming is a retrodirective transmission technique in which the BSs closely synchronize
their carriers (in both phase and frequency) to emulate a conventional retrodirective antenna array [Pon64] and achieve coherent downlink transmission through
uplink channel conjugation. Note that this approach requires uplink/downlink
channel reciprocity and also precise synchronization of the BSs.

8.1 Synchronization Concepts

167

To understand just how accurately the BSs must be synchronized to facilitate retrodirective transmission, consider the two-transmitter distributed beamforming scenario shown in Fig. 8.3 where both BSs simultaneously transmit
unmodulated carriers at radian frequency with the goal of having the carriers
arrive with identical phase, i.e. coherently combine, at the destination, i.e. the
mobile. Note that the BSs are implicitly syntonized in this scenario, but they
are unsynchronized such that transmitter 2 has a clock oset of with respect
to transmitter 1. After propagation through the unit-gain single-path channels,
the received signal at the destination can be written as
y(t) = exp{j(t )} + exp{j(t )},
where the baseband signals modulated by each carrier are omitted for clarity.
The received power can be computed as |y(t)|2 = 2 + 2 cos () 4. When the
transmitters are perfectly synchronized, i.e. = 0, the carriers combine coherently at the destination and the received power |y(t)|2 = 4. This corresponds to
the ideal coherent case in distributed beamforming. When the transmitters
are not synchronized, the received power will be less than in the ideal coherent case. To illustrate the eect of unsynchronized transmitters, the clock oset
can be modeled as a zero-mean Gaussian distributed random variable with
standard deviation . Fig. 8.4 shows the received power at the destination as a

function of and carrier frequency f0 = 2


. This example shows that, even at
the lowest carrier frequency of 800 MHz, the standard deviation of the transmitter clock oset must be smaller than approximately 130 picoseconds in order to
achieve, on average, 90% or better of the ideal coherent received power. Please
note that this level of synchronization accuracy is not required by the downlink
CoMP techniques described in the remainder of this book, which use GPS and
CSI feedback to achieve baseband signal alignment at the mobile. The approach
described here is based on distributed beamforming in which the radio frequency
signals of the base stations are aligned at the mobile.
Since conventional synchronization techniques like GPS and PTP are unable
to provide the accuracy required for retrodirective downlink transmission at
typical radio frequencies, several recent studies have focused on the development of precise endogenous distributed wireless carrier synchronization techniques including full-feedback closed-loop [TP02], one-bit closed-loop [MHMB05,
MWMR06, MHMB10], master-slave open-loop [MBM07], round-trip open-loop
carrier synchronization [DPM05, DH08], and two-way open-loop carrier synchronization [PD10]. Each of these techniques has advantages and disadvantages in
particular applications, as discussed in the survey article [MDMH09].
Many of the distributed wireless carrier synchronization techniques described
in [MDMH09] operate on the principle of exchanging beacons between the BSs
and synchronizing carriers based on estimates of the phase and frequency of these
beacons. For example, in the two-way carrier synchronization protocol [PD10],
a series of forward beacons are exchanged from node 1 to node 2 and so on to

4.5
utrsldbcqp

ideal coherent

utrsldbcqp
ut

4.0

utrs
ut
ldbc
qp
ut
rs

90% of ideal
coherent

3.5
ld
bc

3.0
rs

2.5
bc

ld

2.0
qp
ut

1.5
103

f0
f0
f0
f0
f0
f0

=
=
=
=
=
=

800 MHz
900 MHz
1.4 GHz
1.7 GHz
2.1 GHz
2.6 GHz

qp

ut

incoherent

rs

ld
bcqp
ut

ut

ut

mean received power at destination [linear]

Synchronization

ut

168

102
101
transmitter clock oset standard deviation [ns]

100

Figure 8.4 The eect of transmitter clock oset on distributed beamforming power for

several common cellular carrier frequencies.

node M where node m transmits a periodic extension of the beacon it received


from node m 1. A series of backward beacons are also exchanged in the same
way from node M to node M 1 and so on to node 1. Each node then sums its
phase and frequency estimates obtained from the forward and backward beacons
to derive a synchronized local oscillator with frequency and phase identical (in the
absence of estimation error) to the other nodes in the system. These synchronized
local oscillators can then be used to enable retrodirective downlink transmission
from two or more BSs to a mobile in the network.
While the various carrier synchronization techniques described in [MDMH09]
dier in terms of how beacons are exchanged, the overall performance of each of
these techniques tends to be limited by the accuracy of the beacon phase and
frequency estimators. A common technique for the estimation of the phase and
frequency of a single tone in additive white Gaussian noise is the maximum likelihood estimator (MLE) [RB74]. Under mild regularity conditions [H.V94], the
MLE is known to asymptotically achieve the Cramer-Rao lower bound (CRLB).
Given a beacon of amplitude a in complex Gaussian white noise with power
spectral density N20 , the CRLB is given as [RB74]
%.
/3 & N ( 12 6 )
0

) 2 T63 T42 ,
cov
,
a
T2 T
where T is the duration of the observation, and the notation A ) B means that
A B is positive semi-denite.
As an example, Fig. 8.5 shows the clock and frequency oset standard deviations for the two-way carrier synchronization protocol developed in [PD10]. This
example assumes seven transmitters serially exchange 1 GHz wireless beacons

169

8.1 Synchronization Concepts

bc
rs

ut
bc
rs
ut
bc
rs
ut
bc
rs

ut
bc
rs

bc
ut

103
10

bc

ld

0
5
10
beacon SNR [dB]

15

(a) Clock oset std. deviation.

ld
ld
ld
ld

10

rs

ut

rs

ut

bc
ut

ut

T=1s resync
T=500ms resync
T=250ms resync
T=125ms resync
ut

ut

ld

10

ut

10

ut

ut

10

rs

ut

clock oset std. dev. [ns]

ut

100

frequency oset std. dev. [ppm]

102

101

20

ld

105
10

0
5
10
beacon SNR [dB]

15

20

(b) Frequency oset std. deviation.

Figure 8.5 Clock and frequency oset standard deviations as a function of beacon
SNR and re-synchronization interval for seven transmitters synchronized via the
two-way carrier synchronization protocol.

according to the two-way synchronization protocol. Each beacon has a duration


T = 1 ms and a total of 12 beacons are exchanged to synchronize the transmitters. The transmitters use MLE to form local phase and frequency estimates
from the noisy observations. After synchronization is complete in this example,
clock oset standard deviations better than 100 ps and frequency oset standard
deviations better than 0.4 parts per billion (ppb) can be obtained when the beacon signal-to-noise ratio (SNR) is greater than 5 dB and the re-synchronization
interval T 500 ms.
Distributed wireless carrier synchronization techniques achieve high accuracy
by exploiting the timing information contained in the phase and frequency of the
bandpass beacons exchanged during synchronization. While distributed wireless
carrier synchronization can, in principle, achieve much more precise clock and
frequency oset than PTP and GPS, the use of unmodulated beacons can lead
to periodic ambiguities in the phase estimates. Hence, distributed wireless carrier synchronization techniques may be used in conjunction with conventional
lower-precision synchronization techniques for to provide appropriate synchronization at dierent timescales. For example, GPS can be used with two-way
carrier synchronization to provide symbol synchronization and also to stabilize
the frequencies of the carriers at the BSs.

8.1.5

Summary
In this section, we have introduced the concept of synchronization and described
several approaches to the problem of synchronizing nodes in a coordinated communication network. These techniques can be used separately or in conjunction
to facilitate the establishment of a common notion of both frequency and time

170

Synchronization

to a desired level of accuracy. Existing synchronization techniques are in fact


sucient to enable synchronization-sensitive CoMP schemes such as downlink
multi-cell joint transmission in the context of orthogonal frequency division multiple access (OFDMA), if baseband signals are to overlap coherently. However,
these techniques require low-latency channel feedback, and current solutions may
be considered a cost issue in large-scale deployments, rendering further research
on alternative techniques interesting. The following sections will now analyze the
eect of residual time and frequency errors on the performance of coordinated
communication networks.

8.2

Imperfect Synchronization in Time: Performance Degradation


and Compensation
Vincent Kotzsch and Gerhard Fettweis
In this section, we investigate the eect of unavoidable time delay of arrival
(TDOA) between the transmitter and receiver stations in orthogonal frequency
division multiplex (OFDM) based CoMP systems. As we can see on the left
side of Fig. 8.6, they originate from arbitrary distances d between each user
equipment and base station that lead to dierent path delays (d) with
d
(8.2)
c
on each link, where c is the speed of light. In common cellular wireless
communication systems, where one base station serves a certain number of
mobiles, synchronization procedures are used to compensate those path delays
(e.g. [MKP07]). In the uplink, this can be done by adjusting the timing of the
mobiles to the base station network reference via timing advance commands that
are sent to the mobiles via control channels. In the context of CoMP, however, in
particular for joint signal processing concepts as introduced in Chapter 6, we are
interested in exploiting the signal propagation between base stations (BSs) and
user equipments (UEs) across multiple cells. In these scenarios, we have a delay
coupling matrix Td containing the resulting non-compensable TDOAs of the different links between each transmitter and receiver. In OFDM systems, a cyclic
extension also known as cyclic prex (CP) of length TCP is used to overcome synchronization mismatches as well as to avoid the inter-symbol interference (ISI)
between two consecutive OFDM symbols. Given this constraint, the remaining
TDOAs (d) = (dmax ) (dmin ) in Td need to be less than the cyclic prex
minus the maximum channel excess delay (L), i.e.
(d) =

(d) TCP (L).

(8.3)

A possible timing scenario is shown on the right side of Fig. 8.6, where the
signals of three transmitters are received by one receiver. The desired transmitter
(Tx#1) is synchronized to the receiver, while the others are delayed such that

171

8.2 Imperfect Sync in Time: Perf. Degradation and Compensation

t=0
BS#3
Rx DFT Window
Tx #1:

UE#2

d2 , d2

d3 , d3

UE#3

BS#1

CP

OFDM Symbol (o-1)

CP

OFDM Symbol o
(desired)

Tx #2:

d1 , d1 UE#1
Tx #3:
BS#2
Inter-symbol interference (ISI) from previous symbol

Figure 8.6 Hexagonal cell structure and possible timing scenarios in CoMP systems.

the TDOA of transmitter 2 (Tx#2) lies within the CP, but the ISI that is caused
by the channel decay already leaks into the discrete Fourier transform (DFT)
window. The third transmitter (Tx#3) even violates the CP limit, such that a
portion of the previous OFDM symbol leaks into the DFT span.
As we will see in Subsection 8.2.1, ISI is introduced in the system on top
of multi-user interference (MUI) if the maximum TDOAs are not limited to be
within the CP (see e.g. [WG00], [WXBD09], [ZMM+ 08], [Ham10]). The amount
of additional interference depends on the grade of timing mismatch. Fig. 8.7(a)
depicts the joint distribution of occurring TDOAs after synchronization for different inter-site distances (ISDs), for the case that three users are uniformly
distributed within a hexagonal cooperative cell that is served by three base stations. As an example, we also indicate the bounds for the short and the long CP
length that is used in 3GPP LTE systems (TCP = 4.7 / 16.7s) by vertical lines.
Besides the path delays, another important eect that needs to be considered is
the pathloss (d) with
' *
d
,
(8.4)
(d) =
d0
which also depends on the distance between the transmitters and receivers. Note
that is used as pathloss exponent, d0 as the reference distance and as an
attenuation factor that depends on the environment here. If we consider the
pathloss attenuations from all links, we can also form a coupling matrix d
similar to Td . As it is known from literature, the attenuation of each link due to
pathloss leads to special structures of the channel matrix, e.g. diagonal or row
dominated matrices (depending on the relative position of the users). Likewise,
also the possible interference power is attenuated. To compensate the pathloss,
transmit power can be controlled, which is however limited to the maximum
transmit power Pmax . Therefore, especially in large cells we may not be able
to achieve a required target signal power level throughout the whole serving
area. A convenient metric to assess the decoupling of two links is the separation
factor (SF) that denes the ratio between maximum and minimum receive power

Synchronization

1.0

1.0

0.9

0.9

0.8

0.8

0.6

0.7
ISD={1000:1000:10000}m

0.5

CDF

0.7
CDF

172

0.6

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

={2:0.5:4}

0.5

0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
d [ s ]
(a) TDOA distribution.

10 15 20 25 30 35
separation factor SF [dB]

40

(b) SF distribution.

Figure 8.7 TDOA and SF distributions in a hexagonal cell.

among all transmitters at one receiver station, given as


*
'
Pmax (dmax )
PR,max
dmax
=
=
.
(d) =
PR,min
(dmin ) Pmax
dmin

(8.5)

A distribution of occurring link separations is shown in Fig. 8.7(b) for dierent


values of the pathloss exponent in a system with 3 BSs and 3 UEs, as it is
already used in Fig. 8.7(a). As expected, the lower the pathloss exponent is, the
higher the probability is that all three users are within a certain cooperation
range (CR) . To simplify the analysis of the occurring timing delays, we dene
a symmetric user conguration scenario, where three users directly move from
the cell corner towards their primary serving base stations on the border of a
circle with radius rC . In those scenarios, we only have to consider the direct
distance from each mobile to the primary serving base station d1 = D rC and
the distance between the mobile and the secondary serving base stations d2 =

r2
D(1 + rDC + DC2 )1/2 . We use D = dISD / 3 as cell diameter here. In order to fulll
the cyclic prex restriction, we can re-write (8.3) as
d2 d1
(8.6)
TCP (L).
c
If we reorder (8.6), we obtain an expression for the cooperation radius of the
circle in which ISI-free CoMP is possible (see [KJRF10] for details), i.e.
rC,TDOA

2c(TCP (L))D + (c(TCP (L)))2


.
3D + 2c(TCP (L))

(8.7)

If we only allow a joint signal processing for users who are within a certain
cooperation range , we can dene the second constraint as
' *
d2
.
(8.8)
d1

8.2 Imperfect Sync in Time: Perf. Degradation and Compensation

173

Then, a cooperation radius fullling the maximum link separation is given as


2
rC,CR

a

2

4
a2
1 + 22/
.
1 D with a =
4
1 2/

(8.9)

However, in systems with limited transmit power we also have to ensure that
a minimum link signal-to-noise ratio (SNR) can be achieved although a low link
separation is available. The synchronization eects that need to be characterized
for CoMP systems can be summarized as follows:
The TDOAs must not exceed the CP limitation, otherwise ISI is induced.
The channel length increases above the CP constraint.
The level of ISI power depends on the TDOA and the link separation.

8.2.1

MIMO OFDM Transmission with Asynchronous Interference


While the notation used throughout the book so far focussed on the transmission
over one sub-carrier of a perfectly synchronized orthogonal frequency division
multiple access (OFDMA) system, we now have to extend our model to also
capture the time and frequency dimensions of our signals. All variables from
Chapter 3 are used in the same way as before, but are followed by parentheses
containing these new indices:
OFDM symbol indices o and o' .
Sample indices i and i' (for time domain representation).
Sub-carrier indices q and q ' (for frequency domain representation).
While all transmitted and received signals, channel matrices etc. were so
far given in frequency domain, we now additionally introduce variables in time
domain, which are the same but with a tilde on top. As an example, the frequency domain channel from transmitter k to receiver m in OFDM symbol o
and on sub-carrier q is given as Hkm (o, q), and the corresponding time domain
representation (now using a sample index i instead of a sub-carrier index q) is
m (o, i).
H
k
Let us assume that the transmit symbols in frequency domain of a transmitter k in OFDM symbol o are given as xk (o, q), where only a subset of sub-carriers
q Q is used. After a DFT operation of size N and the insertion of a cyclic prex
of length NCP samples, the corresponding time domain signals of the user are
given as
j2qi
1 
xk (o, q) e N , NCP i N 1.
(8.10)
x
k (o, i) =
N qQ

174

Synchronization

'
After transmission over a channel specied by the taps hm
k (o, i ) for the link
between transmitter k and receiver m, the time domain signal is given as

ym (o, i) =

L 
K 



'
m
m (o, i' ) x

(o,
i

)
+n
m (o, i).
h
k
k
k

(8.11)

k=1 i' =1

Here, n
NC (0, n2 I) is the receive noise in time domain. The channel
'
taps of the link specic channel impulse responses are modeled as hm
k (o, i )
NC (0, h2 m (o,i' ) km (d)), where L = 5 (L)/TS 6 represents the discrete channel
k
length and h2 m (o,i' ) the tap variance given by the corresponding power delay
k
m
prole. The parameter m
k = 5k (d)/TS 6 expresses a timing oset (given in
samples). The received signal in frequency domain is obtained by the DFT operation applied to the received samples ym (o, i) as
N 1
j2qi
1 
ym (o, i) e N , q Q.
ym (o, q) =
N i=0

(8.12)

In [KF10], a closed-form solution of (8.12) is derived that gives us an expression for the frequency domain transmission with arbitrary symbol timing osets
. The transmission can then be summarized for the received signal at the m-th
receiver branch with
K  


2 m '
ym (o, q) =
Ekm (o, q, q ' )Hkm (o, q ' , q ' )xk (o, q ' )ej N k q
k=1 q' D

K 


k=1 q ' D

K


K 

k=1 q ' D

Ekm (o, q, q '' )Ckm (o, q '' , q ' ) xk (o, q ' )

q '' =0

 

k=1 q ' D

N
1


Ekm (o 1, q, q ' )Hkm (o, q' , q ' )xk (o 1, q ' )eo N (k NCP )q
2

N
1


'

Ekm (o, q, q '' )Ckm (o 1, q '' , q ' ) xk (o 1, q' ) + nm (o, q),

q '' =0

(8.13)
m
[N N ]
which
where Ekm are elements of the matrices Em
k (o) , Ek (o 1) C
include the inter-carrier interference (ICI) due to the windowing of the current
and previous OFDM symbols in time domain in the case that the CP limit is

8.2 Imperfect Sync in Time: Perf. Degradation and Compensation

exceeded. These matrix elements can be denoted as


NB m
k

N
m
m
'
m
k
))
Ek (o, q, q ) = 1 j
o sin( N  (NB
k
N
e

m
N
k
sin

Ekm (o

'

1, q, q ) =

m NCP
k

N
1

N e

175

m
k =0
otherwise

m
j
sin( Nk (m
N ))
 km  CP
N o1

k
sin
N

m
k =0
otherwise

'
'
with m
k = q q q, q = 1...N and
m
o = m
k (k + NB 2NCP 1)
m
o1 = m
k (k NCP 1).

In (8.13), Hkm are elements of the diagonal matrix with the channel trans[N N ]
. By
fer function (CTF) in frequency domain on a certain link Hm
k C
[N N ]
using the Fourier transform matrix F C
with the elements F (q, q ' ) =
2
'
H , where
ej N q q / N , this channel matrix can also be written as H = FHF
[NB NB ]

includes the time domain channel impulse response (CIR) matrix


HC
in Toeplitz structure with the rst column [h1 hL 0 0]T . NB = N + NCP
is used as OFDM block length here. The elements Ckm (o) , Ckm (o 1) of the
matrices C(o), C(o 1) C[N N ] contain the inter-symbol interference that is
induced by the (L 1)-tap channel decay from the previous OFDM symbol.
They are given as
Ckm (o, q, q ' ) =

N 1 N 1
2
1   m
'
Ck (o, a, b)ej N (bq aq)
N a=0

(8.14)

b=0

Ckm (o

N 1 N 1
2
1   m
'
1, q, q ) =
Ck (o 1, a, b)ej N (bq aq) ,
N a=0
'

(8.15)

b=0

where Ckm (o) and Ckm (o 1) are elements of special Toeplitz matrices in time
domain that are explained in more detail in [KF10]. It should be noted that
m
m
for m
k NCP , Ek (o) becomes an identity matrix and Ek (o 1) changes to a
m
zero matrix. Within the range of NCP L + 1 k NCP , we dene an eective cyclic extension NCP,e = NCP L + 1 for the following which gives us the
interference free range within the CP. For the case that m
k NCP,e , C(o) and
C(o 1) also become zero matrices and we get the well known asynchronous
interference free transmission equation in frequency domain with
K 


2 m
y (o, q) =
Hkm (o, q, q)xk (o, q)ej N k q ,
m

(8.16)

k=1

where we only have a phase slope caused by over all sub-carriers within one
OFDM symbol.

176

Synchronization

If we reorder (8.13), we can dene Z(o, q) C[MK] as the representation


of the interference channel on the q-th sub-carrier in frequency domain. In a
next step, we can summarize the transmission equation of the user symbols
x(o, q) C[K1] to the receivers y(o, q) C[M1] as follows:


Z(o, q ' )x(o, q ' ) +
Z(o 1, q ' )x(o 1, q ' ) +n.
y(o, q) = Z(o, q)x(o, q) +

'
q D
q ' D\q
MUI



ICI

ISI

u(o,q)

(8.17)
As we can observe in this expression, we now have a coupling between adjacent sub-carriers (ICI) and consecutive OFDM symbols (ISI) in addition to the
coupling between multi-user interference (MUI). A characterization of the ISI
and ICI is done for example in [SDAD02], [SM03], [MC06] and [NK02].

8.2.2

Interf.-Aware Multi-User Joint Detection and Transmission


If we assume a signal transmission with a transmit lter W C[KK] and receive
lter G C[KM] , we can re-write (8.17) in order to get a generalized expression
for the signal transmission in frequency domain:

, (o, q) = G(o, q)Z(o, q)W(o, q) P(o, q)x(o, q)


x


+
G(o, q)Z(o, q ' )W(o, q ' ) P(o, q ' )x(o, q ' )
q ' D\q

G(o, q)Z(o 1, q ' )W(o 1, q ' ) P(o 1, q' )x(o 1, q ' )

q ' D

+ G(o, q)n(o, q),

(8.18)

where we also introduced a transmit power


through the diagonal matrices
#
" scaling
[KK]
H
= xx = IK for all sub-carriers q
PC
. In that case we assume E xx
and OFDM symbols o. Unless the sub-carrier and OFDM symbol indices are
explicitly given, we assume the q-th sub-carrier and the o-th OFDM symbol for
our equations in the following. The error covariance matrix of the mean square
!
, |2
error (MSE) between the received and transmitted symbols ee = E |x x
is given by


ee = (GZW I) P WH ZH GH I + G (uu + nn ) GH
(8.19)

8.2 Imperfect Sync in Time: Perf. Degradation and Compensation

with
uu =

177

Z(o, q ' )W(o, q ' )P(o, q ' )W(o, q ' )H Z(o, q ' )H

q' D\q

Z(o 1, q ' )W(o 1, q ' )P(o 1, q ' )W(o 1, q ' )H Z(o 1, q ' )H .

q ' D

(8.20)
It should be noted that for frequency-selective channels we have to include a
separate precoding lter and power allocation vector for all adjacent sub-carriers
in order to get an exact expression for the MSE.
The interference aware receive lter can then be obtained by minimizing the
sum mean squared error (SMSE), i.e.
!!
,"22
G = argmin {tr {ee }} = argmin E "x x
.
(8.21)
G

The minimum argument is obtained by setting the derivative of the SMSE to


zero (e.g. [Say08]). In this way we obtain
G = PWH ZH 1
yy

(8.22)
#
with yy = E yyH . The interference-aware transmit lter can be derived by
solving the optimization criterion of (8.21) with respect to W such that
%
&
%+
+ &
+ !
+
+ +2
1 +2
+
, 2 | E +W Px+ = Pmax ,
(8.23)
W = argmin E x x
"

{W,}

where we included an additional sum-power constraint such that the maximum


transmit power does not exceed Pmax . This restriction is fullled by introducing
the transmit power scaling parameter that has to be reversed at the receiver
input. By following the steps in [JJT+ 09] and setting the derivative of the resulting Lagrangian function w.r.t W and to zero, we get

tr(nn + uu )

W = ZH ZZH +
IK

Pmax

(8.24)

'nn

%+
+ &
"
#
+ +2
E +W Px+ = 2 tr WPWH = Pmax
2
>
?
Pmax
?
!.
=@
tr ZZH (ZZH + 'nn )2 P

(8.25)
(8.26)

If we assume an uplink transmission where the users have only one transmit antenna and are not able to communicate to each other the transmit lter
becomes an identity matrix W = IK . Under this constraint, we can derive an
expression for the post equalization signal-to-interference-and-noise ratio (SINR)

178

Synchronization

of the k-th user after the joint BS signal processing as the ratio of the desired
signal power and the portion of MUI, ISI and ICI plus noise as
SINRk =

gkH

pk gkH zk zH
k gk



K
r=1,r=k

 .
pr zr zH
r + uu + vv gk

(8.27)

It is worth mentioning that the used lter matrix only aims at canceling
the multi-user interference for the desired sub-carrier with the knowledge of
the colored noise u (see (8.17)). If we use the lter dened in (8.22), the post
equalization SINR for the k-th user yields
SINRk =

(zk )H 1
yy zk
.
H
1 (zk ) 1
yy zk

(8.28)

For a downlink transmission to M non-cooperative receiver stations with one


receiver antenna each, the receive lter becomes an identity matrix G = 1 IM .
In that case, the SINR for the k-th user stream can be expressed by
H
p k zH
k wk wk zk

SINRk = K
r=1,r=k

with
u2 =

H
2 ( 2 + 2 )
pr zH
u
v
k w r w r zk +

(8.29)

zk (o, q ' )H W(o, q ' )P(o, q ' )W(o, q ' )H zk (o, q ' )

q' D\q

zk (o 1, q' )H W(o 1, q ' )P(o 1, q ' )W(o 1, q ' )H zk (o 1, q ' ).

q ' D

(8.30)
In [AA09], a joint optimization of the receive and transmit lter as well as
power control in systems with asynchronous interference has been presented
based on the results in [SB04] and [ZMM+ 08]. In order to simplify the analysis,
we introduce a simple power control scheme here. The transmit power of each
transmitter is controlled in order to achieve a target SNR k on the strongest
link within one column of the channel matrix. The uplink transmit power values
in that case can be obtained by
pk =

k v2
max |zkm |2

!.

(8.31)

8.2.3

System Level SINR Analysis


After the derivation of the OFDM CoMP transmission model, it is used for the
numerical analysis of dierent channel eects in a system model with hexagonal
cells as shown in Fig. 8.6. Unless otherwise stated, the parameter set of Table 8.1
is used for all simulations, which is based on a 3GPP LTE setup.
We used the non line-of-sight (NLOS) rural macro model for the pathloss
calculations. Therefore, we assumed an average building height of 5 m and a BS

8.2 Imperfect Sync in Time: Perf. Degradation and Compensation

179

Table 8.1. Simulation Parameters


Parameter
DFT size N
Used sub-carriers NSC
CP length NCP = TCP /TS
System bandwidth BS = 1/TS
Sub-carrier bandwidth BSC
Carrier frequency fC
Target SNR
UE maximum Tx power pk,max
UE noise gure UE,NF
UE antenna gain UE,AG
BS maximum Tx power pk,max
BS noise gure BS,NF
BS antenna gain BS,AG
Noise power per sub-carrier n2
Fast fading margin FFM
Inter-cell interf. margin IIM
RM prop. env. factor 10 lg()
RM pathloss coecient
Power delay prole of length L

Value
256
120
18
3.84 MHz
15 kHz
800 MHz
20 dB
23 dBm
7 dB
1 dBi
43 dBm
4 dB
15 dBi
J
10 log10 (1.3823 K
290K BSC ) = 132 dBm
2 dB
3 dB
1.69 dB @800 MHz
3.86
(1)
L
L1
1
e = 1 , = ln(0.1)
=1

antenna height of 35 m. The height for a mobile terminal is assumed to be 1.5 m


and a street width is chosen as 20 m. For the numerical simulations we use the
already mentioned symmetric user positioning model with three users and three
base stations in the uplink direction in order analyze the average SINR in the
case that the user circle radius rC is increased. In that case, we have only the
link channels as random numbers for averaging. The advantage of this model is
that we have the same propagation conditions for all users at one xed point rC .
This is helpful to understand the basic limitations within CoMP.
The experiment that we introduce here is intended to get an idea about the
impact of the ISI. The result in terms of the average SINR loss according to
target SNR is shown in Fig. 8.8(a), where we assume a single link power control
as denoted in (8.31). We normalized the user radius rC to the cell diameter D.
Thus, the point 0 marks the cell center, and at point 1 the users are very close
to their primary serving base stations.
As one reference, we include performance values for a scenario where no
pathloss is considered ( = 0). This is usually done in investigations on link level
where we mainly look at results in which we only increase the timing oset. Here,
we use two channel lengths (L). In one case, we use a at channel ( (L) = 0),
and in the other we set the channel length to be in the order of half of the CP
length ( (L) = 2.3s). If we look at the performance curves, one can clearly see
from the points where the interference-free range of the CP is exceeded we have
strong SINR degradations. In Fig. 8.8(b), we depicted the occurring TDOAs
of that experiment where we marked the two CP limits. As another reference,

Synchronization

average SINR loss [dB]

bc
rs

14
bc

16

rs

12

TCP =4.7s

bc
rs
bc

(L)=2.3s rs

=3.86
(TCP (L))
> (d)

6
rs

4
2

bc

bcrs

bcrs

rs
bcbc

(L)=0
0

rs

rs

rs
bc

bc
bcbc

14

=0

10
8

18

bc
rs

SUD
bcrs

bc
bc

(d) [s]

180

bc

12
10
8

(L)=2.3s

TCP =4.7s

6
4
2

bcrs
bc

bcrsbc
bcrs

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
rC / D
(a) Average SINR loss.

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
rC / D
(b) Occurring TDOAs.

Figure 8.8 Average SINR loss and occurring TDOAs in a symmetric user scenario.

we show results where we assume that we have a CP length that can cover all
possible TDOAs ((TCP (L)) > (d)). For that simulation we used now the
pathloss model as described above. As it can be observed, the signal attenuations due to the pathloss lead to a decoupling of the channel matrix between the
diagonal and o-diagonal entries with increasing rC . In the cell-center, the singleuser power control scheme is not able to control the transmit powers to achieve
the target SNR, since the multi-user interference is not considered. That is the
reason why we can observe an initial SINR loss of 2 dB. In a next scenario,
we limit the CP as is denoted in Table 8.1 with TCP = 4.7 s. One can see an
SINR degradation from the point where the CP limit is exceeded again, but we
can also observe that the ISI power is attenuated due to pathloss with increasing rC . The single user detection (SUD) performance is included as a general
bound where we assume that each base station only wants to detect the closest
user without the knowledge of the others. A key observation of the presented
results is that, as expected, only in a small region the asynchronous interference
is the dominant degrading eect. On the one hand, the TDOAs must exceed the
ISI-free range within the CP. It should be noted that in real systems, due to
timing estimation errors, the ISI-free range can be reduced in addition to the
reduction caused by the channel decay. On the other hand, the ISI power from
the interfering users which is also attenuated by the pathloss needs to exceed
a certain threshold such that it leads to additional interference. Among others,
this threshold mainly depends on inter-site distance, carrier frequency, pathloss
exponent, transmit power etc. Furthermore it should be mentioned that for the
downlink case we can achieve similar results since the problem of asynchronous
interference is equivalent in both directions.

8.3 Imperfect Sync in Frequency: Perf. Degradation and Compensation

8.2.4

181

Summary
In this section, we investigated CoMP systems in the case that the time differences of arrival after single link synchronization exceed the limitation which
is given by the cyclic prex in OFDM systems. We analytically described the
impact on the OFDM transmission model as well as on multi-user joint detection and transmission. In numerical simulations, we analyzed how the additional
asynchronous interference aects the SINR performance in hexagonal cells within
a simple symmetric user conguration setup. We could show that the SINR loss
increases if the residual symbol timing osets violate the cyclic prex limitation
until the pathloss leads to a decoupling of the users and consequently to an
attenuation of the asynchronous interference.

8.3

Imperfect Synchronization in Frequency: Performance


Degradation and Compensation
Malte Schellmann
In a CoMP system, frequency errors may be introduced if the simultaneously
transmitting entities are not perfectly synchronized in their carrier frequency.
These frequency errors impose a time-continuous phase rotation on the signals
transmitted from the single antennas, giving the eective channels seen at the
receivers a time-variant nature. The time variance of the channel may degrade
the system performance severely, as the channel-dependent matrices used for
the spatial pre- or post-processing may no longer match the true channel, and
correspondingly additional interference between the simultaneously transmitted
spatial streams will arise. In orthogonal frequency division multiplex (OFDM)
systems, channel time variances may also destroy the orthogonality between the
sub-carrier signals, giving rise to inter-carrier interference (ICI). In this section,
we will focus on these two performance degrading eects, namely the inter-stream
interference and the ICI, separately for the downlink and the uplink. We will
derive analytical expressions for the resulting signal-to-interference ratios (SIRs),
allowing to reveal their direct dependence on the system parameters and the
frequency errors. From these results, we draw some conclusions on the required
synchronization accuracy of the transmitting nodes. Furthermore, we give a brief
overview on existing techniques to compensate for frequency errors from the
literature.
In the literature, only few studies exist on the eect of synchronization errors
in CoMP systems, especially for the precoded downlink. The most comprehensive study available in this eld can be found in [Zar08]. This study is based on
pure numerical evaluations of high-dimensional CoMP systems with 32 transmit antennas in total. Analytical investigation on this topic are barely available;
the reason can certainly be found in the fact that analysis of CoMP systems
with more than two transmit antennas is hardly tractable. Nevertheless, we

182

Synchronization

have decided to thoroughly analyze the simplest case of a CoMP system, with 2
cooperating single-antenna base stations (BSs) serving two single-antennas user
equipments (UEs). The obtained closed-form expressions give us clear insights
into the exact relations between the frequency errors and the measures of interest. For systems of higher dimension, it can be expected that these relations are
basically sustained, and performance simply scales (to some extent) with the
number of antennas.

8.3.1

Downlink Analysis
Inter-stream Interference of Spatially Precoded Streams
We consider the downlink of an exemplary CoMP scheme, where M singleantenna BSs transmit to K single-antenna UEs. The K UEs are all assumed to
be assigned to the same physical resource block (PRB). In the sequel, we focus
on the signal conditions observed at a single sub-carrier of this PRB. According to the notation introduced in Section 3.5, the transmission equation for this
sub-carrier can be given as
y = HH Wx + n.

(8.32)

If the cooperating BSs are not properly synchronized, then an independent,


time-continuous phase rotation is imposed on the signal transmitted from each
BS antenna, which results from the carrier frequency oset (CFO) between the
transmitting BS and a common reference. This phase rotation is represented by
the diagonal matrix
(t) = diag (exp(j2f1 t), . . . , exp(j2fM t)) ,

(8.33)

where fm is the CFO between BS m and the common reference. For notational
convenience, we introduce the angular frequencies of the CFOs m = 2fm .
The matrix (t) is incorporated into the transmission equation (8.32) according
to
y = HH (t)Wx + n.

(8.34)

In general, there are also CFOs observed at the side of the receiver, representing the osets of the receivers oscillators. These could be captured by another
CFO matrix, as it has been done in the equation for the uplink in (8.54). However, those CFOs have been neglected in (8.34), as they can easily be tracked
continuously and compensated by standard synchronization techniques for the
downlink [MKP07].
We now draw our attention to the joint precoding matrix W. This matrix can
be separated into the product of two matrices
W = C P = [p1 c1 . . . pK cK ]
C = [c1 . . . cK ]
P = diag (p1 , . . . , pK ) .

(8.35)

8.3 Imperfect Sync in Frequency: Perf. Degradation and Compensation

183

The matrix C represents the algebraic function used to diagonalize the eective transmission channel HH W, while P is a diagonal matrix, whose diagonal
elements pk represent the scaling of the column vectors ck in C (i.e. the precoding beams for the transmit symbols xk in vector x) according to the power
allocation. As we do not assume the columns ck to be normalized, the relation
between pk and the transmit power Pk allocated to beam k can be characterized

by pk = "ck "1 Pk .
To analyze the impact of CFOs on the signal conditions at the receivers, we
focus on the simplest case of a CoMP system, where two BSs each with one
single antenna cooperate to simultaneously serve two single-antenna terminals.
Then the channel matrix is given as
) ( T)
(
h1
h11 h12
H
=
,
(8.36)
H =
h21 h22
hT2
where hk specify the channel components connected to UE k. To diagonalize
the eective channel HH W, we use zero-forcing (ZF) precoding. Assuming that
both BS have ideal channel knowledge at time instant t = 0, the ZF matrix C is
calculated according to
(
) (
)
1
h22 h12
c11 c12
H 1
=
.
(8.37)
C = (H ) =
h21 h11
c21 c22
h h h h

11 22 12 21

The constant scaling factor in front of the matrix will in the following be
denoted as . The eective channel matrix for time instant t = 0 then yields
( T
)
h1 (t)c1 p1 hT1 (t)c2 p2
HH (t)W =
.
(8.38)
hT2 (t)c1 p1 hT2 (t)c2 p2
Here, the k-th diagonal element of the matrix represents the eective channel of the transmit symbol xk intended for k-th receiver, while the o-diagonal
elements in row k at position m represent the interference from symbol xm
on the signal seen at k-th receiver. As we assume symmetric conditions for all
receivers, we focus exemplarily on the the signal received at the rst receiver,
i.e. y1 = hT1 (t)Wx. The signal y1 will be separated into a desired and an interference part, y1 = y1,d + y1,i , which will be analyzed separately in the following.
The desired signal y1,d contains the contribution from x1 ; with (8.33), (8.36)
and (8.37), it amounts to
y1,d = hT1 (t)c1 p1 x1

= p1 exp(j1 t) h11 h22 exp(j (2 1 ) t)h12 h21 x1


21

= p1 exp(j1 t) [1 (1 exp(j21 t)) h12 c21 ] x1 .

(8.39)

In the above equation, we observe that the ZF pre-compensated channel is distorted by the complex term (1 exp(j21 t)) h12 c21 . It is reasonable to assume

184

Synchronization

that BSs and UEs are spaced suciently apart from each other, and hence the
single channel coecients hij can be considered to be mutually independent.
Consequently, the product h12 c21 represents a random complex number with
zero mean. The amplitude of the ZF pre-compensated channel may therefore
increase or decrease with the same probability.
In a similar manner, we can calculate the interference signal y1,i , which reects
the inter-stream interference from the unintended signal x2 :
y1,i = hT1 (t)c2 p2 x2
= p2 exp(j1 t) [(exp(j21 t) 1) h11 h12 ] x2
= p2 exp(j1 t) [0 (1 exp(j21 t)) h12 c22 ] x2 .

(8.40)

Note the similar structure in (8.40) compared to (8.39), which both exhibit
the same weighting factor (1 exp(j21 t)) scaling the complex distortion. The
complex distortion is formed here by the product of h12 and c22 , which is independent of the value altering the useful signal part in (8.39), as c21 is independent
of c22 . We can therefore conclude that the inter-stream interference introduced
by the CFOs has a magnitude that is similar to that of the amplitude change of
the desired signal, as long as c21 and c22 can be assumed to have similar mean
power.
From the above results, we will now derive an expression for the SIR between
desired and interference signal. From (8.39) and (8.40), the instantaneous SIRi
can be given as


 p1 (h11 h22 exp(j21 t)h12 h21 ) 2
|y1,d |2


=
(8.41)
SIRi =
 p2 (1 exp(j21 t))h11 h12  .
|y1,i |2
To obtain an estimate for the mean SIR, we use Jensens inequality, allowing
us to determine the mean value for enumerator and denominator separately:
SIR
=

E{|y1,d |2 }
E{|y1,i |2 }
E{"c2 "2 (|h11 h22 exp(j21 t)h12 h21 |2 )}
P1

.
P2 (|1 exp(j21 t)|2 )
E{"c1 "2 |h11 h12 |2 }
(8.42)

Additional to the mutual independence, we assume for further simplication


that the channel coecients hij have the same mean power and zero mean. This
then yields for the mean SIR
P1
E{"c2 "2 (|h11 h22 |2 + |h12 h21 |2 )}
P2 (|1 exp(j21 t)|2 )
E{"c1 "2 |h11 h12 |2 }
2
P1
P1

,
=
P2 (1 cos(21 t))
(21 t)2 P2

SIR

(8.43)

8.3 Imperfect Sync in Frequency: Perf. Degradation and Compensation

185

35
30

SIR [dB]

25
20
15
10
5
0
0

0.2

0.4

0.6

0.8

1.0

t
Figure 8.9 SIR degradation due to CFO-induced inter-stream interference of two
spatially precoded streams according to (8.43) (P1 = P2 ).

where the Taylor expansion of cosine for small angles, cos() 1 0.52 , was
used to obtain the approximation on the right-hand side. An illustration of this
SIR relation for P1 = P2 is given in Fig. 8.9.
Eq. (8.43) reveals that the mean SIR resulting from the CFO-induced interstream interference between the spatially precoded streams strongly depends on
the dierence of the CFOs observed at the transmitters, 21 = 2 1 , as well
as ratio of the powers Pm allocated to the single transmission streams. As long
as the precoding matrix C is not updated, the interference grows continuously
over time with the factor (21 t)2 . From the above result, we can thus deduce
a requirement for the minimum update interval of the precoding matrix C to
achieve a desired SIR constraint :
$
(8.44)
t = (21 )1 2 1 .
The CFO dierence 21 can be related to the accuracy a achieved for the
oscillators used at the transmitters after synchronization. With the carrier frequency fc , the relation 21 = 2afc holds. To give an example, assume that
a CoMP system operates at a carrier frequency fc = 2 GHz, synchronization
of the transmitters has achieved an accuracy of a = 1 parts per billion (ppb)
= 103 parts per million (ppm), and the desired SIR target the system should
not fall below is = 20 dB. Then the update interval should be around 11 ms,
which lies in the dimension of the duration of only a few radio frames in modern
wireless communication systems. This example clearly indicates that synchronization requirements for downlink CoMP precoding are very strict; however, we
have seen in Section 8.1 that synchronization methods are readily available that
are capable of establishing these.

186

Synchronization

Inter-Carrier Interference in an OFDM system


To get insights into the general characteristics of the ICI rstly, we will derive an
analytical expression for a single-user single-input single-output (SISO) channel,
which is distorted by a single CFO f . Afterwards we will use the obtained
results to characterize the ICI in the downlink CoMP system.
We assume a (static) frequency-selective channel is characterized by the dis
crete channel impulse response h(l),
l 1, . . . , L, where the discrete positions in
time, l, are based on the sampling period Ts . Given a CFO of f , the timevarying channel seen at the receiver can be given in the time/delay domain
according to
) =
h(t,

L1


exp(j2f t)( lTs ),


h(l)

(8.45)

l=0

where denotes the Dirac-function. As derived in detail in [STJ08, Sch09], the


corresponding OFDM channel with Q sub-carriers can be given according to
'
*
L1

q
sin(f QTs )

h(l) exp j2l


, (8.46)
exp(jf QTs )
h(q, q) =
Q
(f QTs q)
l=0

h(q)

with
q Q := {0, . . . , Q 1}
q D := {Q/2 + 1, . . . , Q/2}.

(8.47)

According to this denition, h(q, q) for q = 0 represents the interference


that is imposed from the sub-carrier signal at position q on the sub-carrier that
is spaced q sub-carriers apart, whereby the sign of q species the direction.
For q = 0, however, h(q, q) delivers the useful channel seen at sub-carrier q.
The rst part of the expression in (8.46) represents the sub-carrier channel
h(q), which is seen at sub-carrier q if no CFO distortions are present. We can
clearly see that the CFO reduces the amplitude of this useful channel (q =
0) and introduces an additional phase rotation according to the expression on
the right hand side. Accordingly, fractions of the signal power of the channel
coecient h(q) are spread as ICI on the other sub-carriers in the system (q = 0).
Based on the time-variant OFDM channel h(q, q), the transmission equation
for the received signal y(q) at a xed sub-carrier q can be given as

h(q q, q)x(q q) + n(q).
(8.48)
y(q) = h(q, 0)x(q) +
qD\0

The rst term in this expression represents the useful signal, while the sum
term represents the distortions from ICI.

8.3 Imperfect Sync in Frequency: Perf. Degradation and Compensation

187

35
30

SIR [dB]

25
20
15
10
5
0
0

0.05

0.10 0.15 0.20


f Q Ts

0.25

0.30

Figure 8.10 SIR degradation due to CFO-induced inter-carrier interference in a

SISO-OFDM system according to (8.51).

As derived in detail in [Sch09], it can be shown that the mean power of the
useful signal, Pu , and the mean power of the ICI, PICI , amount to
Pu = Ps h2 si2 (f QTs )
PICI =

Ps h2

(8.49)

(1 si (f QTs )),
2

(8.50)

h2

where Ps is the mean power of the transmit signals,


is the total mean power
of the channel and si(x) = sin(x)/x is the si-function. These results are based on
the wide sense stationary uncorrelated scattering (WSSUS) assumption for the
channel coecients; further it has been assumed that a signal with mean power
Ps has been transmitted from all available sub-carriers. The SIR resulting from
the ICI can therefore be given as
SIR =

Pu
si2 (f QTs )
.
=
PICI
1 si2 (f QTs )

(8.51)

This SIR degradation caused by ICI is illustrated in Fig. 8.10.


ICI for CoMP Transmission in Downlink
To ease analytical derivations and allow for simple analytical expressions, we
assume here that on each of the available OFDM sub-carriers, the same set of
users is served by the same set of base stations.
In (8.46) we have seen that the ICI from sub-carrier q on a sub-carrier at a distance q depends on the channel coecient h(q) at that sub-carrier. Therefore,
we can conclude that these relations hold similarly also for the eective channels in a CoMP system that include the spatial precoding weights. According
to (8.34), the eective channel at sub-carrier q reads H(q)H W(q). As synchronization has to ensure that the inter-stream interference does not grow too
large, the majority of the power of this eective channel can be found in the
diagonal elements of this channel, see (8.38). Thus, for the ICI consideration,

188

Synchronization

the inter-stream interference may be neglected and the eective channel may be
understood as a set of independent, orthogonal SISO channels. The sub-carrier
channel h(q) seen by the rst receiving UE is thus the eective SISO channel
h(q) = hT1 (q)c1 (q)p1 (q) p1 (q).

(8.52)

Before inserting this eective channel into (8.46) to obtain useful channel and
ICI channels seen by the rst receiving UE, some considerations on the CFOs
fm are necessary: As both CFOs fm have an inuence on the exact value of
the ICI, we can use their maximum f = maxm fm to upper bound it. Based
on this, (8.46) can be directly translated for the eective channel seen by the
rst UE to
h(q, q) = p1 (q) exp (jf QTs ) si ( (f QTs q)) .

(8.53)

If we assume that the mean power of p1 (q) is identical for all sub-carriers, the
upper bound for the ICI from (8.51) can be found to be valid also for the CoMP
downlink.
By comparing the SIR for the inter-stream interference from (8.43) and for
the ICI from (8.51), we immediately see that the former expression grows with
the time t, while the latter only depends on the absolute value of the CFO
f normalized to (QTs )1 , which is, in fact, the sub-carrier spacing. Therefore,
the requirement on the synchronization accuracy derived from the SIR for the
inter-stream interference is orders of magnitude larger than that derived from
the SIR for the ICI. Resorting to our example from the preceding subsection,
the CFO f to achieve a SIR of 20 dB should be below 5 % of the sub-carrier
spacing. If the sub-carrier spacing is at 15 kHz like in current OFDM-based
mobile radio systems, the maximum allowed CFO would be 750 Hz. Compared to
the maximum allowed CFO of only a few Hertz that is required to enable updates
of the precoding matrices within a few milliseconds, this dierence amounts to a
factor larger than 100. From these considerations, it may be concluded that the
ICI in OFDM systems does not play a signicant role for the synchronization of
the CoMP downlink, and therefore its inuence may be neglected.
Techniques for the Compensation of Degradation Eects
Unfortunately, there are no methods available to compensate for the degradation
eects due to inter-stream interference at the side of the receiver. The reason for
that is related to the fact that the dimension of the spatial receive space in the
downlink is usually much smaller than the dimension of the spatial transmit
space. For an illustration, consider the following: If M BSs with Nbs transmit
antennas each cooperate, they are able to form up to M Nbs orthogonal transmit
beams. If the BSs are not properly synchronized, the beams will loose their
orthogonality and will thus interfere at each UE. Even if the UE has multiple
antennas, it will not be able to resolve the interfering beams, disabling suitable
approaches for proper compensation.

8.3 Imperfect Sync in Frequency: Perf. Degradation and Compensation

189

The only remaining solution is to compensate for CFO distortions at the side
where they appear, that is the transmitter. However, this can only be accomplished if knowledge on the CFOs is available there. In [Zar08], it has been
proposed to estimate the CFOs by all UEs that are involved in the CoMP transmission, who then feed back their estimates to the BSs. Techniques to estimate
these CFOs with high accuracy have also been proposed there. However, this
approach has the drawback that it requires computationally complex estimators
at the UEs and a continuous feedback of the estimates from all UEs. Moreover,
it is questionable whether the proposed technique can achieve a better synchronization accuracy in practice than techniques for the synchronization between
BSs do (see Section 8.1). To summarize, we can conclude here that tight synchronization between the BSs is of fundamental importance to enable reliable
CoMP transmission in the downlink, which calls for inter-BS synchronization
techniques achieving the best accuracy possible.

8.3.2

Uplink Analysis
Inter-Stream Interference of Spatially Precoded Streams
The transmission equation for the uplink can be given as
y = r (t)Ht (t)Px + n.

(8.54)

To point out the duality to the downlink, we consider the CFO distortion
matrices for both the transmitter and receiver, t (t) and$r (t), here. The diagonal matrix P now is constituted of the elements pk = Pj only, so it reduces
to a simple power allocation matrix here. To obtain the transmitted signals, the
cooperating BSs jointly equalize the received signal y
x
= WH y = WH (r (t)Ht (t) Px + n).

(8.55)

H(t)

As all CFO distortions can be continuously tracked at the receiving BSs, the

eective matrix H(t)


can be compensated completely if we choose the equalization matrix at time instant t according to
WH (t) = t (t)H1 r (t).

(8.56)

However, note that this requires a continuous update of the equalization


matrix on a per-symbol base. If we assume, for some reason, that such a continuous update cannot be conducted and equalization is performed with the matrix
WH (t) = H1 instead, then the situation is very similar to the dual downlink
case considered in (8.34), and correspondingly we obtain the same SIR expression as derived in (8.43) for the two-user case. (Note that the CFOs in matrix
t (t) do not have any eect on the SIR, as they only impose a phase rotation
on desired and interfering signal. This phase rotation, however, can always be
compensated at the receiver by phase tracking after channel equalization).

190

Synchronization

In any case, the constraints on the synchronization requirements will by far


be not as strict as in the downlink case, as the compensation matrix WH can
be updated much more frequently, say at least every sub-frame. If a continuous
update of the compensation matrix is conducted on a per-symbol basis, then
the requirement for the synchronization accuracy comes rather from the ICI
conditions, which are addressed in the next paragraph.
Inter-Carrier Interference in an OFDM System
Similarly as for the downlink analysis, we assume here that on each of the available OFDM sub-carriers the same set of users transmits to the same set of base
stations. Within this subsection, we refer again to the exemplary case of 2 BSs
and two UEs, all being equipped with one antenna each.
For the ICI investigations, it is reasonable to assume that the CFO distortion
eects occurring at the side of the UEs dominate those at the side of the BSs,
as it is much easier to establish a tight synchronization between BSs rather than
between UEs. Therefore, we neglect matrix r (t) in (8.54) for our considerations
here. The sub-carrier channel at sub-carrier q then reads H(q)t (t). Correspondingly, the channel function for the OFDM system in (8.46) can be updated for
the CoMP uplink as follows:
H(q, q) = [h1 (q)
+ [0

0] exp(jf1 QTs )si ( (f1 QTs q))

h2 (q)] exp(jf2 QTs )si ( (f2 QTs q)) .

(8.57)

We observe here that the two dierent CFOs f1 and f2 generate independent ICI from the channel vectors h1 (q) and h2 (q), respectively, which are
simply superimposed. The transmission equation given for the SISO case in (8.48)
is modied to match the CoMP uplink according to

y(q) = H(q, 0)x(q) +
(8.58)
H(q q, q)x(q q) +n(q).
qD\0


ICI

If we jointly equalize the received signal y(q) at sub-carrier q now by using


the equalizer matched to the corresponding sub-carrier channel, i.e. WH (q) =
t (t)H1 (q), we observe that this equalizer will not be able to suppress any
of the ICI in (8.58), because the equalizer is not properly matched to any of the
interference channels H(q q, q). (This relation holds as long the transmission channels are of frequency-selective nature and the channels between dierent
antenna pairs are mutually independent). As a result, the ICI from all simultaneously transmitted data streams will aect the useful signal xk (q) after equalization. Hence, the mean ICI power aecting the useful signal at any sub-carrier
q will for the CoMP uplink amount to




(8.59)
PICI = P1 1 si2 (f1 QTs ) + P2 1 si2 (f2 QTs ) .

8.3 Imperfect Sync in Frequency: Perf. Degradation and Compensation

191

For the power of the useful signal, conditions remain the same as in the SISO
case, i.e. we achieve according to (8.51) for the SIR of the useful signal x
k (q)
SIR (
xk ) =

P1

Pk si2 (fk QTs )



.

1 si2 (f1 QTs ) + P2 1 si2 (f2 QTs )

(8.60)

Compared to the SIR conditions valid for the CoMP downlink in (8.51), we
clearly see that the major dierence for the uplink lies in the fact that the
ICI from all simultaneously transmitted data streams aect the SIR of each
desired signal. The ICI power in the uplink thus scales with the number K of
simultaneously transmitting UEs. Considering that it is more dicult to establish
a tight synchronization between dierent UEs, compensation techniques for ICI
may become an issue in the uplink.
Remark: The brief analysis presented here points out the most important
aspects of the ICI eects in the OFDM-based CoMP uplink only. For a more
detailed investigation, the interested reader is referred to [SJ09].
Techniques for Compensation of Degradation Eects
As mentioned already in the corresponding subsection, compensation of the
degradation eects due to inter-stream interference is simple, as we only have to
update the equalization matrix (8.56) continuously. However, we have pointed
out above that in the uplink the ICI may be a more serious problem, as the
CFOs caused by the low-cost oscillators at the UEs may become much larger
than those caused by the oscillators of the BSs. Although compensation of the
ICI distortions on the side of the receiver is possible, it turns out to be a complex
task. For an overview on existing compensation techniques for CFO-induced ICI,
refer to [MKP07, SJ09].
In general, the OFDM channel including the CFO distortions can be represented as a large square matrix of dimension Q Nbs Q Nbs , which has a
band structure with all elements within this band being non-zero. By zero-forcing
this huge channel matrix, the channel including all CFO-distortions can be fully
compensated. However, it is obvious that this would imply an enormous computational eort. To cut this eort down, several approaches have been proposed
that exploit the specic structure of the large channel matrix to divide it into
a set of smaller sub-matrices. The computational eort remaining is still considerable, though. Also some solutions have been proposed that try to maintain
the sub-carrier-wise processing OFDM systems are favored for. Although they
are not capable of removing the CFO-induced ICI completely, they can compensate for a large amount of it, improving the SIR conditions signicantly. One
such approach was presented in [SJ09]; the signal processing for the compensation process is depicted in Fig. 8.11: The received signals rm at antenna m
are individually transformed to the frequency domain, where the sub-carrierwise channel equalization with matrix GH = H1 is applied that separates the
simultaneously transmitted signal streams. After signal separation, the equalized symbols of each stream are convoluted in frequency domain with a func-

192

Synchronization

Figure 8.11 Receiver processing for simplied signal reconstruction with CFO
compensation in uplink.

tion derived from (8.46) to compensate for the CFO distortions. This method
fully compensates for the CFO-induced ICI of the transmit symbols, however, as
shown analytically in [SJ09], it cannot compensate for the ICI that results from
the CFO-induced violation of the periodic property of the cyclic prex, which is
used for OFDM transmission.

8.3.3

Summary
In this section, we analyzed the inter-stream interference and the ICI induced by
CFO distortions for the downlink and uplink of an OFDM-based CoMP system,
consisting of 2 BSs and 2 UEs, each equipped with a single antenna. For the downlink, it has been shown that the SIR conditions derived for the inter-stream interference set strict requirements for the synchronization accuracy, while the eect
of ICI can be neglected. As compensation for CFO distortions at the receivers is
not possible, tight synchronization between the simultaneously transmitting BSs
is mandatory. In the uplink, we can fully compensate for the inter-stream interference if the CFO distortions are continuously tracked. Therefore, requirements for
the synchronization accuracy are rather deduced from the SIR conditions resulting from the ICI. The ICI distortion may still be compensated at the receiver;
however, the additional computational complexity required for this purpose is
considerable.

Channel Knowledge

In this chapter, we address the issue how channel knowledge - referring to both
desired channels and the channels towards interferers - needed for various CoMP
schemes can be made available where it is needed. We rst investigate channel
estimation techniques at the receiver side in Section 9.1, and then discuss how
the obtained channel knowledge can be eciently fed back to the transmitter
side in Section 9.2, which is for example a crucial requirement for the downlink
CoMP schemes investigated in Sections 6.3 and 6.4. The chapter shows that
standard channel estimation and feedback concepts can principally be extended
to enable CoMP in general. However, it also becomes apparent that large CoMP
cooperation sizes may be considered questionable in practice, due to the fact that
weak links cannot be estimated accurately, and the involved pilot and channel
state information (CSI) feedback overhead may become prohibitive.

9.1

Channel Estimation for CoMP


Wolfgang Zirwas, Lars Thiele, Tobias Weber, Nico Palleit
and Volker Jungnickel
One of the main challenges for CoMP schemes like joint transmission (JT) is
to obtain accurate channel information in a multi-cell mobile radio environment
with acceptable overhead for pilot signals.
The section is structured as follows. In Subsection 9.1.1, main characteristics of
the mobile radio channel and state-of-the-art estimation and interpolation techniques like Wiener ltering will be introduced, with a special focus on channel
prediction. For CoMP, the analysis then has to be extended to multiple channel components and multi-cell scenarios, which will be done in Subsections 9.1.2
and 9.1.3, respectively. While most of the section focusses on a downlink transmission and hence channel estimation at the terminal side, Subsection 9.1.4 nally
looks into specic aspects of uplink channel estimation.
In this section, we observe an orthogonal frequency division multiplex
(OFDM) system, as considered in LTE Release 8 and beyond, having the main
benet of a very eective orthogonalization of resources in time and frequency.
This simplies the implementation of complex precoding or equalization schemes

194

Channel Knowledge

in the context of multiple-input multiple-output (MIMO) or CoMP, as proved


by rst CoMP test beds or demonstrators described in Chapter 13, which are all
OFDM implementations.

9.1.1

Channel Estimation - Single Link


CoMP setups are basically distributed multi-user MIMO (MU-MIMO) systems
involving a large number of radio channel components. However, before analyzing
CoMP including many channel components, one has to understand fundamental
aspects of channel estimation like optimal ltering, sampling or interpolation for
a single channel component, being subject of this rst subsection.
Note that the structure of this rst part is motivated by a proper understanding of the physical aspects of channel estimation based on the time domain
channel impulse response (CIR). Starting point is the multi-path channel itself,
eventually leading to the well-known reference signal design and receiver processing blocks for OFDM systems. The description does not follow exactly the
transmit and receive chain of an OFDM system, but rather gives the reader the
underlying reasoning e.g. for standardized LTE Release 8 parameters as well as
the potential for further optimizations. Channel prediction - as one such possible
optimization - will be introduced in more detail, as it is currently a hot CoMP
research topic promising to overcome the problem of outdated time-variant channel information.
The Channel Transfer Function
Fig. 9.1 illustrates a typical simplied mobile radio channel comprising a direct
and several reected multi-path components (MPCs). According to Fig. 9.1(b),
) R is the resulting CIR varying over time t, given as
h(t,

) =
n (t)( n (t)), with n (t) = dn (t) d0 (t),
(9.1)
h(t,
n

where n {1, . . . , Nm } are the relevant MPC indices, n (t) the amplitude of the
n-th MPC, and d0 (t) is the delay connected to the shortest MPC. Note that
with a tilde, as we are observing a channel in time domain, as
we here use h
opposed to h, which always refers to a channel coecient in frequency domain
) is TCIR = N 1 , which
throughout this book. The length of the CIR h(t,
m
is directly related to the path length dierence s of the shortest and longest
MPC. For example, a value of TCIR of 1 s corresponds to s = 1 s c = 300 m,
with c being the velocity of light of 3 108 m/s. Fig. 9.1(c) illustrates the ideally
innite1 channel transfer function (CTF) h(t, f ) C, i.e. the frequency domain
1

In reality, one can observe a wideband similarity of the radio channel over several 100 MHz,
allowing downlink beamforming based on uplink covariance estimation in frequency division
duplex (FDD) systems.

195

9.1 Channel Estimation for CoMP

(a) Illustration of multi-path components.

(b) Development of the CIR over time.

(c) Illustration of down-conversion.

(d) Illustration of undersampling.

Figure 9.1 Typical multi-path radio channel.

) over frequency f :
representation of h(t,

n (t)( n (t))ej2f n (t) .
h(t, f ) =

(9.2)

Any real-valued time-domain signal s(t) requires that its frequency-domain


representation fullls s(f ) = s (f ), leading to positive and negative complex
conjugate frequency components. Direct conversion will generally shift the frequency bands with positive and negative frequencies together from base- to
radio frequency (RF)-band, resulting in the so called lower and upper sidebands
plus additionally the carrier signal at RF frequency fc . Upper and lower sidebands contain basically the same information twice and therefore for single side
band (SSB) transmission the upper or lower sideband is ltered out, thereby
doubling the spectral eciency. Standard transmitters and receivers according
to the LTE Release 8 specication are based on in- and quadrature phase modulators/demodulators, which generate/demodulate the SSB signal directly without
ltering and even avoiding the carrier signal. The CTF of the baseband signal
hBB (t, f ) is a down-converted copy of the generally complex-valued CTF of the
SSB RF-band:
hBB (t, f ) = BPF(f ) h(t, f )ej2fc t .

(9.3)

196

Channel Knowledge

For simplicity, we will assume in the following an ideal rectangular bandpass


lter BPF(f ) = rect( f (fcB+B/2) ) of bandwidth B corresponding in time domain
to the well-known SI-function representing sin(x)/x.
Analog to digital conversion (ADC) leads to hBB (t, q) C, the sampled CTF
of hBB (t, f ) at frequency bins q f , q {1, ..., Q} with Q f = B, the bandwidth of the baseband signal. In the sequel, we will write the sampled CTF in
vector notation as hBB (t) = [hBB (t, 1) . . . hBB (t, Q)]T C[Q1] . Note that this is
dierent from the notation used in most parts of the book, where channel coefcient vectors reect the spatial signal domain, but now we use these to reect
a frequency dimension (i.e. sub-carrier indices).
i) C with i {1, . . . , Q} - or h(t)

The equidistantly sampled CIR h(t,
C[Q1] in vector notation where the elements represent sample indices - is
obtained by an inverse fast Fourier transform (IFFT) of hBB (t, q) with the fast
Fourier transform (FFT) matrix F C[QQ] :
= 1 FhBB (t), with fi,q = ej2iq/Q , i, q {1, . . . , Q} ,
h(t)
Q

(9.4)

where the maximum time duration of the CIR h(t),


i.e. the time duration of one
FFT window, is T = 1/f = Q t with sampling time t = T /Q = 1/B. Note
that, as explained before, we do not strictly follow the structure of an OFDM
transmission chain, but use the time domain CIR as basis for description of some
main aspects of channel estimation.
For OFDM systems, a guard interval is inserted in the form of a cyclic prex
(CP), where the rst receive samples within the CP of length Tg = L t from
indices L + 1 to 0 are a copy of the samples from Q L + 1 to Q and are
not evaluated at the receiver side. OFDM in combination with a CP results
in a circular convolution of the signal with the channel. In case of a properly
designed CP with a length longer than the maximum expected length of the CIR,
i.e. TCIR, max = Nm Tg , inter-symbol interference (ISI) between consecutive
OFDM symbols and inter-carrier interference (ICI) can be avoided completely.
i)
As a direct consequence, the samples i {L + 1, . . . , Q} of a CIR h(t,
fullling the above design criteria for Tg can be assumed to be zero or at
least below a certain threshold. The relevant part of the shortened CIR is
s (t) = [h(t, 0), . . . , h(t, L 1)]T C[L1] . Note that in the case of LTE, the
h
length of an OFDM symbol is T = Tsymbol = 66, 7 s, and that of the normal CP
is 4.69 s. This corresponds to a maximum MPC path length dierence between
longest and shortest path of l = 4.69 300m 1400 m for ISI-free transmission.
Undersampling in Frequency Domain
Undersampling of a CTF by a factor of fRS /f reduces the length of the IFFT
converted time domain signal from T = Q t = 1/f to TRS = LRS t =
1/fRS , meaning that only the rst LRS samples of the CIR are reconstructed,
while the sampling rate 1/t is unaected (see Fig. 9.1(d)). As a consequence,

9.1 Channel Estimation for CoMP

197

s (t) of length TCIR Tg can be fully reconstructed even for the above
a CIR h
mentioned undersampled frequency domain signal h(t, q). This can and - in case
of LTE Release 8 - is being used for an ecient design of so-called pilots or
reference signals (RSs), as will be explained later.
Receive Filter
In case of a rectangular bandpass RF- (or baseband) lter BPF(f ), the Dirac
functions ( n (t)) of the MPCs will be convoluted with the SI-function
i) is a superposition of dierent
sinc(t/t), so that each tap of the CIR h(t,
MPCs. Note that in the general case, the MPC delays n (t) will not coincide with
the sampling timing i t. Hence, from the measured, quantized and potentially
s (t), one cannot directly derive the real channel MPC delays
shortened CIR h
n and amplitudes n . Accurate knowledge would be desirable - e.g. for channel
prediction - as the superposition of MPCs aects the further evolution of a tap.
Time-varying Radio Channel
Mobile radio channels are time-variant due to movements of the user equipments
(UEs) themselves, moving objects in the environment, or time-variant scatterers.
From Eq. (9.1) and Fig. 9.1(b), the main eects on the CIR for a moving UE
are clearly visible, i.e. at time t + t, the delays of the MPCs will have changed
from n (t) to n (t + t). The delay values n (t + t) compared to n (t) are
determined by the variation of the corresponding path lengths between transmit
and receive antennas, which increase or decrease dependent on the relative - for
downlink (DL) - incident angles at the moving UE for particular MPCs.
Estimation in Time and Frequency - Two-Dimensional Wiener Filter
Mobile radio OFDM or single carrier frequency domain multiple access
(SC-FDMA) systems like LTE Release 8 have been designed for training based
channel estimation, i.e. they rely on predened and standardized pilots or RSs.
To save overhead, RSs are placed only on every nRS -th sub-carrier, where
nRS = fRS /f as introduced above for undersampling of the radio CTF. In
general, RSs may be allocated to any sub-carrier, but in practical systems regular
sampling of the CTF h(t, q) is most common.
Considering the transmission of a signal vector s C[Q1] having zero mean
over a mobile radio channel hBB (t) in frequency domain can be written as
y = hBB (t) ( s + n,

(9.5)

where ( denotes element-wise multiplication, and n C[Q1] is a zero mean


additive i.i.d. Gaussian noise vector with E{nnH } = 2 I. As mentioned above,
every nRS -th sub-carrier in s carries a known RS. All these RSs are collected in
the reduced length reference signal vector sRS C[(Q/nRS )1] . Undersampling s (t, i)
at least in the noise-free case - allows full reconstruction of sampled CIR h
of a length smaller or equal to L, while the overhead for RSs is limited to Q/nRS .

198

Channel Knowledge

Frequency-spaced RSs with undersampling factor 1/nRS reduce processing overhead for channel estimation, and one can introduce
' '

[FRS ]i' ,q' = ej2i q /Q , i' {1, . . . , L, . . . Q/nRS } , q ' {1, . . . , Q/nRS } . (9.6)
Here, FRS C[Q/nRS Q/nRS ] is the row- and column-reduced inverse discrete
Fourier transform (IDFT) matrix F, and yRS (t) is the row-reduced receive vector
for the Q/nRS sub-carriers carrying RSs:
,
= FRS yRS (t),
h(t)

(9.7)

,
,
C[Q/nRS 1] is the noisy estimate of the CIR h(t).

s (t), the
where h(t)
For h
,
with indices L + 1 to Q/nRS can be set to zero, based on the
elements of h
assumption that the CIR is limited to L taps and the taps L + 1, . . . , Q/nRS
carry only noise or at least very low power taps. This zero setting of elements
or taps is called denoising and improves the signal-to-interference ratio (SIR) of
the estimated CTF after conversion of the CIR into frequency domain.

T
,
,
,
s (t) =
1), . . . , h(t,
L), 0, . . . , 0
h
h(t,
.

(9.8)

excess delay

,
s (t) can be calculated based on the row-reduced matrix
The shortened CIR h
'

as the estimated and interFRS = FRS |u=1...L,q=1...Q . We nally calculate h(t)


,
s:
polated CTF for all Q sub-carriers from the noisy time-domain estimate h
,
s (t).
, = FH h
h(t)

(9.9)

,
RS by applying the interpoIt is also possible to generate h(t)
directly from y
[QQ/nRS ]
, which may be pre-calculated for real systems
lation matrix Fint C
and known RS positions and signals, i.e.
,
, = FH h
s (t) = FH F' RS yRS (t) = Fint yRS (t).
h(t)

(9.10)

Interpolation Gain
Estimation accuracy and overhead for RSs in terms of resources and power
are important design parameters for an OFDM system. For a given signal-tointerference-and-noise ratio (SINR), the achievable channel estimation accuracy
depends on the number of RSs or, equivalently, the undersampling factor of the
CTF. By doubling the number of RSs, the so-called interpolation or processing
gain is increased by 3 dB at the cost of two-fold pilot overhead. Note that the
interpolation gain is the improvement of channel estimation accuracy due to
the above described denoising eect, compared to a baseline channel estimation
performed individually for each sub-carrier and OFDM symbol.
The length of the guard interval has been designed for expected worst case
scenarios. In scenarios where Nm is signicantly smaller than L t, further

9.1 Channel Estimation for CoMP

199

interpolation gains may be obtained by setting further samples of the CIR within
,

the GI to zero. For this purpose, the length of the CIR h(t)
has to be estimated
(t) to
requiring additionally an estimate of the signal-to-noise ratio (SNR) of y
nd those taps being below the noise level.
Wiener Filter
A well-known solution for exploiting potential interpolation gains is to apply
Wiener ltering [Hay02], which nds the optimum lter Fint,opt with respect
2
to a minimum mean square error (MMSE) criterion, i.e. minimizing H
=
, h(t)||2 }. Wiener lters generally exploit estimated or known sta1/Q E{||h(t)
tistical properties of the signals, i.e. the auto-covariance matrix yy = E{yyH }
C[QQ] and cross-covariance matrix hy = E{hyH } C[QQ] to calculate
Fint,opt = hy -1
yy .

(9.11)

Note that (9.11) states the general solution, while in the case of sub-sampled
RSs as explained above the dimensionality of yy will have to be changed accordingly. The equation leads to the optimal solution under the assumption of fully
uncorrelated channel h and observation noise n. The interpolation gain of Wiener
ltering is due to noise suppression for radio channels with large coherence bandwidth. Assume as an illustrative example a fully correlated frequency-at radio
channel, where one single complex value can be estimated from Q/nRS observations. hy is a matrix carrying the SNR values on the diagonal elements. With
decreasing SNR, the estimated covariance values from yy will be scaled down
according to their reliability.
The interpolation matrix Fint as already introduced in (9.10) targets the same
interpolation gain as the Wiener lter. Note there is an inverse relation between
the coherence bandwidth in frequency domain aecting yy and the length of

the CIR h(t).


Instead of estimation of hy , the estimated noise level n2 is used
in (9.10) for direct estimation of relevant taps of the CIR above noise level.
In [HSJ+ 05, SJ06], it has been found out that sensitivity to estimation errors

regarding the number of valid taps of h(t)


seems to be relatively small, and a
good compromise with respect to complexity was in the envisaged scenarios to
assume L/2 as the standard length of the channel.
The general way of nding the discrete Wiener nite impulse response (FIR)
is by solving the Wiener Hopf equations [Hay02], while the methodology based
on Fint,opt is an implementation-friendly solution as it makes use of easily available FFT processing blocks. The scheme has been implemented also for the
demonstration system in [HKR+ 97b].
Two-Dimensional Wiener Filter
For OFDM systems - and specically for LTE Release 8 - it is possible to extend
the one-dimensional ltering in frequency into the time domain leading to the
two-dimensional Wiener lter. LTE aims - at least for UEs with low to moderate

200

Channel Knowledge

Figure 9.2 Pattern of LTE Release 8 reference signals (CRS) for 2 antenna ports.

mobility - at exploiting opportunistic multi-user scheduling gains. Supporting


this in LTE, so-called physical resource blocks (PRBs) were dened consisting of
12 sub-carriers q = 1, . . . , 12 and 14 OFDM symbols o = 1, . . . , 14 forming a subframe length of tSF = 14 70 s = 1 ms. Note that in LTE one frame of length
10 ms consists of 10 subframes.
The PRBs are the smallest entity addressed by the multiple access channel
(MAC) layer - i.e. the scheduler - and might be rescheduled in each transmit time
interval (TTI). For that reason, PRBs have to be decodeable independently. At
least in case of dedicated user specic RSs, channel estimation has to be possible
individually per PRB. In the sequel, one single PRB is assumed.
Fig. 9.2 illustrates the characteristic pilot grid R0 (q, o) = {1, 1; 7, 1; ...; 10, 12}
for antenna port 0 and R1 (q, o) = {3, 1; 9, 1; ...7, 12} for antenna port 1 as standardized for LTE Release 8 for a single PRB (so-called common reference signal (CRS)). R0 and R1 have a regular structure as far as possible for the given
number of sub-carriers and OFDM symbols. 2 antenna ports is the baseline
assumption for LTE Release 8, and for optimum channel estimation accuracy,
full orthogonality between the sets R0 and R1 is being achieved by mutual muting (crosses of unused resource elements (REs) in the gure) of REs carrying
RSs for the other port. The resulting overhead for 2 ports is 16/(12 14) = 9.5%.
From Fig. 9.2, we can deduce the eect of two dimensional ltering, where
channel estimation for any RE is the result of an interpolation in time and
frequency. In case of optimal Wiener ltering and for low mobility UEs, the
staggered allocation of RSs in dierent OFDM symbols has the special benet
of almost doubling the frequency resolution.
Fig. 9.2 illustrates a further important aspect of Wiener ltering, i.e. that
dependent on the REs carrying RSs or not, the Wiener lter will have the eect
of smoothing, interpolation or even channel prediction.

9.1 Channel Estimation for CoMP

201

Generally the smoothing eect is interesting for those REs carrying RSs, which
are in a notch of the CTF, so that without smoothing, channel estimation accuracy for these REs would be poor.
Interpolation allows - as long as the sampling theorem is fullled - adapting
of RS overhead to the intended estimation accuracy. With respect to CoMP,
channel prediction might be the most interesting aspect, as it seems to be a viable
option to overcome the issue of channel state information (CSI) outdating. The
current goal is to extend the prediction range for a single PRB for the last 2
OFDM symbols of 70 s up to at least several milliseconds, as explained later.
The two-dimensional Wiener ltering solution is an extension of the onedimensional case. In the two-dimensional case it is necessary to stack all O = 14
channel vectors within a PRB into one channel vector h2 (t). The same has to
be applied to the two-dimensional receive signal, which results in the vector y2 .
With these matrices, it is possible to compute the auto-covariance matrix 2,yy =
E{y2 y2H } and the cross-covariance matrix 2,hy = E{h2 y2H } to calculate the
optimum lter F2,int,opt . More details can be found in [HKR+ 97b].
Subspace Concept - Channel Prediction
The so-called subspace concept as proposed in [WMZ05b] is closely related to
the optimum Wiener lter solution, i.e. it exploits long-term channel statistics
to improve channel estimation quality. The subspace is spanned by the relevant
MPCs within the excess delay according to (9.8) being unequal to zero or - more
precisely - above a certain threshold. Hence the subspace dimension might vary
between 1 and the maximum number of taps L of the CIR. Most easily it is
) is depicted for t1 and t2 =
explained from Fig. 9.1(b), where the CIR h(t,
t1 + t. For small t, the MPCs n will change only marginally, i.e. will mainly
change their phases (n ), while the amplitudes (n ) remain almost constant.
In other words, the large scale fading is almost stable, and only the small-scale
fading varies. If e.g. the main MPCs dening the subspace have been properly
identied by according long-term channel observation of the auto covariance
matrix yy , it will be sucient to limit estimation (and reporting) to the short
term variations of the main MPCs (n ) or, equivalently, of the subspace dened
,
s (t) that exceed a certain power threshold.
by all elements in h
In the case of low sub-space dimensions of the radio channel, it could be shown
in [WMZ05a] that signicant interpolation gains can be possible. As an extreme
example, for one single relevant MPC, an interpolation gain of about 15 dB has
been reported, where some of the gain is due to proper identication of additional

irrelevant taps within the CIR, and not only of the maximum length of h(t).
For low mobility below 3 km/h and a prediction horizon of less than 5 ms,
even simple linear prediction over two previous CSI estimates of the interpolated
CTF might yield a mean square error (MSE) of 20 dB, simulated based on
the spatial channel model extended (SCME). For LTE, a useful target is a prediction horizon of 10 20 ms under real world radio channel conditions. An

202

Channel Knowledge

interesting candidate is model-based CSI prediction [PW09, PW10]. The idea is


to estimate from long-term observations the evolution of main MPCs under the
assumption of a linear movement of the UE, and to generate a corresponding
articial channel model reproducing the time-variant CIR. Future channel conditions are predicted by further linear movement of the UE within the model.
This type of prediction works ne as long as the large scale parameters of the
radio channel remain unchanged. In reality, radio channels are more volatile due
to birth and death of radio channel components, e.g. at street crossings, or due
to varying characteristics of the multi-path reections.

9.1.2

Channel Estimation for CoMP


Up to now, a single channel component h(t) and its corresponding time domain
have been investigated. For advanced CoMP schemes like JT,
representation h(t)
there are specic challenges not suciently addressed yet in LTE Release 8. In
the following, the main requirements as well as some status regarding discussion
in LTE-A will be given.
Typical CoMP Scenarios
In general, the channel estimation requirements depend on the used CoMP
scheme. In the sequel, we will focus on downlink multi-cell JT, as introduced
in Section 6.3, which is most sensitive to imperfect CSI.
Compared to single cell MIMO, for CoMP the number of channel components
is multiplied with the number of cooperating cells. Hence, the number of channel
components might explode, e.g. under assumption of Nbs = 4 or Nbs = 8 antennas per base station (BS) and 3 cooperating cells, there are already 12 or 24
channel coecients to be estimated at each UE receive antenna. This is a serious challenge keeping in mind that in LTE Release 8 pilot overhead is already
9.5% for the estimation of only two channel components!
In addition, interference cancelation for cell-edge UEs between uncorrelated
sites is highly sensitive to channel estimation errors, potentially requiring high
frequency granularity e.g. per PRB or even per half PRB.
Multi-cell JT is generally seen as a very low mobility solution, e.g for
nomadic users with speeds below 3 km/h, which is also emphasized in Chapter 13. Nonetheless, many simulation results as well as test bed implementations [JFJ+ 10] have clearly indicated that despite this low mobility, CSI outdating of only a few ms degrades JT performance signicantly, due to a time delay
between CSI estimation and precoding. CSI prediction might be a smart way to
overcome CSI outdating, motivating a lot of research in this area lately.
Eective Channel Concept
As mentioned above CoMP has specic new challenges. The so-called eective
channel concept is a high level and exible means to handle the issue of high

9.1 Channel Estimation for CoMP

203

number of channel components and potentially dierent numbers of antennas at


dierent BSs. In 3GPP, similar concepts are discussed under the label of implicit
versus explicit channel estimation and feedback, where the latter is more or less
the direct feedback of the quantized channel components, while implicit includes
pre- or postcoders like e.g. beamformers from an LTE Release 8 codebook. In case
of implicit feedback, the UEs might be aware or not of the pre-/postcoders, leading to transparent versus non-transparent solutions, as discussed in more detail
in Section 9.2. As noted before, there might be easily 10-50 explicit channel components motivating implicit solutions. The most eective approach is to reduce
the number of channel components to the number of transmitted spatial layers
within the cluster, leading to the stated eective channel concept [ZMS+ 09b].
For more details, see (6.49) to (6.52), where the eective channel concept has
already been introduced. The main idea is to form so-called virtual antennas by
linear combination of antenna elements at transmitter and receiver side.
Typically, antennas of a cell will be co-located and correlated, while correlation
between dierent cells will be small. Therefore, the currently most promising
scheme uses virtual antennas per cell (implicit CSI) in combination with explicit
inter-cell channel estimation, including phase- and amplitude information per
virtual cell-specic eective channel component.
Assume as a simple example one spatial layer per UE and one active UE with
Nue = 2 antennas, then for a cooperation cluster of size Mc = 3 with Nbs = 4
antenna ports per cell each UE has 3 4 2 = 24 explicit estimates and the
overall channel matrix H would be of size (3 4) (2 3) = 72. For the eective
channel concept, each UE just reports one single virtual channel component per
cell, leading to the eective channel matrix He of size 3 3 = 9 with
H
He = VUE
HH VBS .

(9.12)

VBS C[Mc Nbs Mc ] contains the precoding vectors per BS forming the virtual
transmit antenna ports and VUE C[Mc Mc Nue ] the UE specic postcoders for
all UE. Note that these postcoders might be cell-specic, i.e. in contrast to (6.49),
the UE calculates the left dominant eigenvectors with respect to their serving
cells and not all cooperating cells.
This simplies channel estimation, and can be motivated by the additional
precoders for cancelation of inter-cell interference within the cooperation area.
For He C[Mc Mc ] this linear - potentially zero-forcing (ZF) - precoder is
obtained by the Moore-Penrose pseudo-inverse
1
We = He (HH
e He ) .

(9.13)

Note that the cell-specic pre- and postcoders lead to an implicit per cell
channel estimation, while the estimation of He is done explicitly per eective
channel component. This scheme achieved already some consensus in 3GPP, as
it takes care of the dierent levels of correlation of antenna elements within one
and between dierent cells.

204

Channel Knowledge

Precoder Selection Schemes


The eective channel concept is a powerful means to limit channel estimation
overhead, but requires as an additional step the selection of precoders per BS.
There are basically two procedures possible.
One option is to start with the explicit estimation of all channel components
of H of the serving cell based on an accordingly high number of orthogonal CSI
reference signals (CSI RSs) (for which in this particular case of per-cell channel
estimation the CRSs dened in LTE Release 8 may be used). For the estimated
channels, the UE selects the most suitable wide-band serving cell precoders VBS
and e.g. left dominant eigenvectors as UE-specic postcoders. This explicit channel estimation is done semi-statically, as spatial correlation can be assumed for
antennas of one cell, limiting overall overhead. In a step two, the BSs apply
their wide-band precoders for broadcasting of dedicated demodulation reference
signals (DRSs), which allows the UEs to estimate the resulting implicit channel
components, which will be estimated continuously.
A second approach is to combine JT with coordinated beamforming, i.e. there
is a set of precoders VBS varying over frequency sub-bands and/or time slots in
a predictable manner so that each UE can estimate individually the best tting
wide-band beamformers (sub-band/slot) and the resulting eective channel components. The BS then schedules the UE onto subsequent sub-bands/slots tting
to the reported beamformers and calculates the precoder for the corresponding
He . The benet is that there is no extra phase for explicit estimation of H,
but this has to be paid by some scheduling restrictions and possibly some extra
delay as UEs have to be scheduled onto specic sub-bands/slots.
Demodulation Reference Signals
Specically for transparent solutions, where UEs are not fully aware about the
chosen precoders, precoded or dedicated RS, so-called DRS have to be inserted
for demodulation of each PRB carrying user data over the so-called physical
downlink shared channel (PDSCH).
Dedicated RS means that the DRSs use the same precoders as the PDSCH.
CSI estimation accuracy has to be high as highest modulation and coding
schemes (MCSs) up to 64-QAM have to be decodeable, generating an overhead
per UE of about 15%. Making things worse, in CoMP mode the overhead scales
with the cooperation size due to required inter-UE orthogonality. Research is
ongoing for improved and integrated CSI RS and DRS solutions trading performance versus overall overhead.

9.1.3

Multi-Cell Channel Estimation


Up to now, single-cell channel estimation has been analyzed, where the CRSs
dened in LTE Release 8 were used as CSI RS. In case of mobile cellular radio
systems, there is a high number of interfering radio cells, generating a signicant

9.1 Channel Estimation for CoMP

205

interference oor. Even in case of the eective channel concept, where the number
of relevant channel components is signicantly reduced, channel estimation will
suer from inter-RS interference. Helpful is a localization of interference as far
as possible by applying strong antenna tilting [TWB+ 09, TWS+ 09]. In addition,
LTE Release 8 foresees dierent levels of inter-RS orthogonality:
full orthogonality for antenna ports of the same cell (mutual muting of REs),
cell-specic frequency shifting of antenna patterns (with and without muting
in other cells) and
so-called quasi orthogonality between cells based on cell-specic Zado-Chu
sequences sZC [HT09].
Note that these sequences sZC are a variant of the well-known constant amplitude zero autocorrelation codes (CAZAC) sequences. While they provide zero
auto- and cross-correlation, the amplitude is not really constant, but at least the
value of the cubic metric is comparable to that of QPSK modulation. ZadoChu sequences sZC run over all CRS of one OFDM symbol and provide good
wide-band orthogonality. For CoMP, the challenge is that in case of shortened
sequence lengths - as required for frequency-selective CSI - the performance
degrades signicantly. For example, the cross-correlation for a sequence length
2
= 16 dB for all sequence shifts,
of 50 PRBs achieves an MSE of about CRS
2
while for a length of 10 PRBs this degrades to about CRS
= 10 dB. For one
PRB, the eect will be more detrimental, making improvements in the RS design
mandatory for advanced CoMP.
Due to the required backward compatibility to LTE Release 8, where UEs rely
on a constant transmission of the CRS grid as illustrated in Fig. 9.2, it is obvious
that these existing reference signals have to be kept as they are. A possible
way forward is hence to design new CSI RS, specically intended for channel
estimation for up to 8 antenna ports in a multi-cell environment with suciently
high accuracy. Based on the maximum MCS of 64-QAM with code rate 5/6-th,
2
in the range of 20 dB seems to be a reasonable, but challenging, target.
a CSI
2
For given mobile radio channels, the MSE of channel estimation CSI
will be
aected by the design parameters according to (9.14), i.e. the number of channel
components NCC , the number of RSs Q/nRS , the length of the pilot sequence
LRS and its relative power PRS , where the last point is known as power boosting.

2
=
CSI

NCC nRS
const.
LRS PRS

(9.14)

The main challenge of (9.14) is to nd a good trade-o between overhead and


channel estimation accuracy. For LTE-A for example - based on the assumption
of low mobility for MU-MIMO or CoMP UEs - the current agreement is to make
CSI RS sparse in frequency and time.
Making nRS large reduces pilot overhead, but beside limited interpolation
gain, also the frequency selectivity of the radio channel might not be fully captured. Increasing the length of the pilot sequence LRS means to allocate more

206

Channel Knowledge

mutually orthogonal RSs for dierent antenna ports as well as dierent cell IDs.
This can be done by frequency division multiplex (FDM), time division multiplex (TDM), code division multiplex (CDM), or any hybrid solution, as has
been intensively investigated in 3GPP. Performance-wise there are only minor
dierences, but there might be side eects like backward compatibility to LTE
Release 8, muting and corresponding power oset issues etc.
The parameter NCC - the number of channel components to be estimated
- can be minimized by applying the eective channel concept as explained in
the previous subsection. In [TSS+ 08], a specic solution allows to increase LRS
without adding extra overhead. For that purpose, CSI RS are multiplied in time
domain with cell-specic Hadamard sequences sH (or so called orthogonal cover
codes) and a suitable regular allocation of sH to cells ensures that the required
length of sH for full orthogonality increases with the increasing inter-cell distance.
Depending on UE mobility, the longer sequences might more or less violate
the coherence time of the radio channel, leading to corresponding inter-code
interference. Statistically, the more distant cells will contribute less interference,
so that higher sensitivity to mobility due to long sequences is easier acceptable.
The additional orthogonality comes basically for free, as the Hadamard sequences
are applied to the already available CSI RS. In Fig. 9.3, the normalized MSE of
the channel estimator for correlation over several TTIs for the top 5 strongest
cells is compared with Hadamard or random sequences on top of the reference
signals. With increasing length of the correlation time, Hadamard sequences
provide signicantly lower MSE than the random sequences.
In Fig. 9.3 we can observe a further important aspect: the MSE of channel
estimation degrades signicantly with decreasing receive power of the estimated
channel components, as already seen in Section 4.2. Fortunately, lower receive
power relates to lower interference and for JT, corresponding simulations veried
a self scaling eect, i.e. that precoding sensitivity to channel estimation errors
decreases with decreasing receive power.

9.1.4

Uplink Channel Estimation


In theory, uplink (UL) and DL channels are reciprocal to each other (see Section 3.5),
but in real systems there are some substantial dierences. In the DL, RSs are continuously broadcasted, allowing all UEs to perform channel estimation simultaneously
and over longer periods of time, while for the UL, CSI has to be estimated based on
UE-specic RSs. In LTE Release 8, UEs use SC-FDMA to transmit so-called sounding RSs as well as DRSs for coherent detection and demodulation. Sounding RSs
are wideband and intended for scheduling, while DRS are UEs-specic, dedicated
and precoded RSs for data demodulation, having the same bandwidth as UL user
data [HT09]. Orthogonality between simultaneously transmitting UEs is achieved
by selecting one out of 30 Zado-Chu sequences [HT09] per cell plus one out of 8
cyclic shifts. In case of dierent bandwidth of the Zado-Chu sequences in adjacent

9.1 Channel Estimation for CoMP

qp
ut
qp
ut

ut
rs

rs

bc

20
ut
ut

30
qp

top-1
bc
top-2
top-3
top-4
top-5

ut

10

ut

bc

rs

signal
signal
signal
signal
signal

normalized MSE [dB]

ut

rs

bc

ut
qp
ut
rs

qp
rs

bc

10

bc

bc
rs

20

bc

ut

30

40

ut
qp
ut
rs

ut

ut
qp
rs

qp

10

ut

ut

qp

top-1
top-2
bc
top-3
top-4
top-5

ut

qp

10

ut

normalized MSE [dB]

20

ut

20

207

signal
signalrs
signalbc
signal
signal

40
0

2
4
6
correlation time [TTIs or ms]

(a) Conventional interference estimation.

2
4
6
correlation time [TTIs or ms]

(b) Hadamard sequences on top of reference


signals.

Figure 9.3 Normalized MSE of the correlation estimator for the ve strongest

c 2008 IEEE.
cells [TSS+ 08]. 

cells, cross-correlation will be poor, requiring either enhancements of the overall


RSs scheme or synchronization of the bandwidths between cells, which would lead
to scheduling restrictions.
Note that Zado-Chu sequences provide only wideband orthogonality as discussed already for CSI RS. For powerful joint detection (JD) schemes, this might
be a serious limitation in case of frequency-selective radio channels as being
expected for typical cell-edge users. For UL CoMP, one, two or several UEs per
cell transmit in UL their physical uplink shared channel (PUSCH) data. BSs
will do JD and channel estimation based on the received UE-specic DRSs of all
jointly processed UEs. Compared to the DL, there is the advantage that DRSs
are embedded into user data and are therefore not subject to being outdated at
all. In principal, JD can be done after any delay without performance degradation (unless hybrid automatic repeat request (HARQ) with limited round-trip
time (RTT) is used, as addressed in Section 11.2, but this constraint is not
connected to channel estimation).
LTE applies open and closed loop power control for the PUSCH, specied
as PPUSCH = min(Pmax , 10 log10 (no. PRBs) + P0 + PL + ...), with Pmax as
the maximum allowed transmit power and between 0.4 and 1 as cell-specic
pathloss compensation factor. For = 1, the pathloss will not be fully compensated, leading to a pathloss-dependent and UE-specic receive power per UE,
rendering channel estimation and JD more challenging. For UL CoMP, it might
be useful to set to values similar to 1. Even then, DRSs of cooperating UEs
will arrive with dierent receive power due to the pathloss dierences leading to
accordingly dierent channel estimation quality. However, the same self-scaling
eect as in the DL can be observed, i.e. low power channel components will have
accordingly lower inuence on JD performance.

208

Channel Knowledge

Similar as CSI RS suer in the DL from multi-cell interference, channel estimation in the UL will be degraded due to simultaneously transmitting UEs,
specically from cells reusing the same Zado-Chu sequences for the DRSs.
Therefore, it is important to localize interference by strong BS antenna tilting and to minimize inter-cluster interference by proper clustering of cells and
user grouping [TSS+ 08], still being a eld of extensive research.

9.1.5

Summary
Accurate channel estimation is the basis for any CoMP scheme like coordinated
beamforming / scheduling or - more importantly - joint precoding or detection.
Channel estimation aects precoding accuracy and denes an upper limit for
possible performance gains.
Specically joint transmission faces several further challenges compared to
more conventional single link systems, such as
a high number of channel components to be estimated,
strong multi-cell interference due to frequency reuse one, and
a high sensitivity of joint precoding with respect to estimation errors combined
with the need for frequency-selective channel information.
A sound understanding of the time-variant radio channel is important, motivating well-known channel estimation techniques like two dimensional Wiener
ltering or the so-called sub-space concept. Discussed enhancements are the eective channel concept, which limits the number of eective channel components
to be estimated to the number of supported data streams - and reference signals
carrying orthogonal time domain sequences reducing inter-cell interference for
low mobility users without extra overhead.

9.2

Channel State Information Feedback to the Transmitter


Guido Dietl and Wolfgang Utschick
While the last section was concerned with the estimation and prediction of
channel state information (CSI) at the receiver side, we now want to look into
the problem of feeding back this information to the transmitter side, which is
for example required for the multi-cell joint transmission (JT) schemes discussed
in Sections 6.3 and 6.4. Although we restrict the feedback considerations in this
section to single-cell multi-user MIMO (MU-MIMO) systems, these can be easily extended to CoMP schemes where base stations are cooperating. In fact,
MU-MIMO is seen as one of the main ingredients for CoMP.
In this section, we consider downlink MU-MIMO transmission (cf. Section 6.3)
in frequency division duplex (FDD) systems where CSI is fed back from the user
equipments (UEs) to a single base station (BS). Due to feedback channels of lim-

9.2 Channel State Information Feedback to the Transmitter

209

ited rate, mobiles can only provide imperfect CSI, i.e., a small nite number of
bits is used for the information fed back to BSs (cf., e.g., [MRF10]). The consideration of systems with limited feedback assuming nonlinear transmit processing
can be found in, e.g., [DLZ06, CJCU07]. However, we are especially interested
in linear precoders because of their implementation advantages over non-linear
schemes, like being in general computationally more ecient, having smaller processing delays by avoiding successive encoding, and inducing less requirements
on hardware like the dynamic range of ampliers or analog-to-digital converters. Compared to the investigation of linear precoders in [MSEA03, LHSH04]
(see also references therein) for a single-user system, or in [Jin06] (see also references therein) for a multi-user system, we consider a multi-user system with
user scheduling in order to fully exploit multi-user diversity.
Note that in the standardization of 3GPP LTE-A, i.e., where non-cooperative
single-user MIMO (SU-MIMO) is considered, two major feedback schemes have
been discussed, viz., implicit and explicit feedback of CSI. Here, implicit feedback
means the feedback of a precoder index from each mobile to its assigned BS. The
corresponding codebook entry is used by the BS as the precoder in the downlink.
Although the index directly represents the precoder, the CSI is still included in an
implicit manner. Contrary to that, explicit feedback denotes the direct feedback
of CSI in terms of an index which represents the codebook entry which is closest
to the exact CSI with respect to some distance criterion. In this section, we focus
solely on explicit CSI feedback because it is the most promising feedback scheme
in the context of MU-MIMO and CoMP (cf., e.g., [DB07]).
Please note that, while we were concerned with a most ecient estimation and
representation of channel characteristics in the time- and frequency domain in
Section 9.1, we are now exploiting spatial dimensions of channel matrices in order
to make CSI feedback as ecient as possible. In this section, we mainly focus on
channel vector quantization (CVQ) schemes for eciently capturing these spatial
dimensions. Most of the explicit CSI feedback schemes are based on CVQ using
a nite channel codebook as proposed in [3GP06a, 3GP06b, DB07, DLU09].
Precisely speaking, each user quantizes a product of his channel matrix and
an estimation of its receive lter, in the following denoted as the composite
channel vector, and feeds back the corresponding codebook index together with
an approximate signal-to-interference-and-noise ratio (SINR) value. Note that
users need to estimate their receivers because the nally chosen receive lters
depend on the precoder at the BSs, which is determined after quantization.
In fact, the BSs use the quantized composite channel vectors to compute the
precoder based on the zero-forcing (ZF) criterion, and use the available SINR
values to schedule the users by maximizing the sum-rate.
Usually, CVQ is based on choosing the codebook entry with minimum
Euclidean distance to the composite channel vector. However, minimizing the
Euclidean distance is not necessarily related to the nal goal of designing a
communications system, i.e., maximizing the sum-rate. Therefore, we propose

210

Channel Knowledge

H T1
x
K

..
.

NBS

H TU

+
n1
+
nU

y1
Nue
yU
Nue

gH
1
..
.
gH
U

x
1
1

x
U
1

Figure 9.4 Downlink of a multi-user MIMO system on one OFDMA sub-carrier,


c 2010 IEEE.
from [DLU09]. 

to estimate the receive lter and quantize the corresponding composite channel vector by maximizing the approximate SINR which is directly related to the
achievable rate of the corresponding user [DLU09]. Note that this idea is strongly
related to the method presented in [WSJ+ 10], however, the latter approach is
based on two codebooks and a dierent type of sum-rate approximation.
Before reviewing the state-of-the-art CVQ methods and deriving the proposed
CVQ approaches in Subsections 9.2.3-9.2.6, we introduce our transmission model
in the next two subsections. Finally, we investigate the performance of the proposed schemes when applied to a MU-MIMO system with linear ZF precoding
in Subsections 9.2.7-9.2.9.

9.2.1

Transmission Model
Here, we consider the downlink transmission from one BS with Nbs antennas
to U UEs with Nue receive antennas each, out of which K U UEs have been
scheduled to be served on the same resources in time and frequency. Note that in
this section, NBS = Nbs since the number of BSs is assumed to be M = 1. In the
sequel, we will capture all K selected users in the set K {1, . . . , U }, |K| = K.
As introduced in Chapter 3, the transmission taking place on one exemplary
orthogonal frequency division multiplex (OFDM) sub-carrier of the system can
be stated as
yk = HTk s + nk ,

(9.15)

where yk C[Nue 1] are the signals received by UE k, Hk C[NBS Nue ] is the


channel matrix connected to UE k, s C[NBS 1] are the signals transmitted from
the BS antennas, and nk C[Nue 1] is additive Gaussian noise at receiver k, i.e.,
with nk NC (0, 2 I). In the following, we assume that maximal one data stream
is assigned to each user, regardless of the number Nue of antennas each UE has.
The transmit symbols s are the result of linear precoding, i.e.,

s=
wk xk = Wx,
(9.16)
kK

9.2 Channel State Information Feedback to the Transmitter

211

where W C[NBS K] is the precoding matrix, and x C[K1] are the symbols
of the K scheduled UEs before precoding. As in Section 3.5, we assume that
these symbols have unit power, i.e., x NC (0, I), and that the transmit power
assigned to dierent streams is inherently contained in the precoding matrix W.
Next, we describe the receivers at the mobile stations. We assume that each
user is applying a linear lter gk C[Nue 1] to the receive vector yk to get the
estimate
x
k = gkH yk C

(9.17)

of the symbol xk . In particular, we consider the linear minimum mean square


error (LMMSE) lter obtained via the minimization of the mean square error
(MSE) between xk and xk , where the solution computes as (e.g., [Ver98])

1 T
gk = HTk WWH Hk + 2 I
Hk wk .
(9.18)
Note that we assume throughout this section that each receiver has perfect
CSI connected to its own channel Hk , while it has no CSI about the channels
of the other users. We refer the reader to Section 9.1 for details on receiver-side
channel estimation.

9.2.2

Sum-Rate Performance Measure


A typical measure for the downlink transmission performance of a MU-MIMO
system is the sum-rate over all users. With the assumptions made in Section 9.2.1,
the SINR at the receive lter output gk of the kth user can be written as [DLU09]
 H T 2
g H wk 
k
k
k =
  H T 2 ,
2 2
g k Hk wj
"gk "2 +

(9.19)

j=k

and the sum-rate computes as


Rsum =

log2 (1 + k ) .

(9.20)

kK

9.2.3

Channel Vector Quantization (CVQ)


In order to compute the precoder and schedule the users for transmission, the base
station requires information about the channel matrices Hk for all k {1, . . . , U }.
This CSI is fed back from the terminals to the base station. Precisely speaking, each
user quantizes its channel based on a channel codebook and feeds back the corresponding codebook index as the channel direction indicator (CDI) together with an
SINR value as the channel quality indicator (CQI) which includes a rough estimate of
the interference caused by the quantization error [TBT08, 3GP06a, 3GP06b, Jin06].
The base station then computes a ZF precoder based on the CDIs, and allocates

212

Channel Knowledge

resources and chooses the proper modulation and coding scheme (MCS) using the
CQIs of the dierent users. Again, since we are especially interested in schemes of
strongly constrained feedback, we restrict the maximum number of transmitted data
symbols per user to be one.
Assume for the rst that the precoder W is known at the mobile receivers
such that the LMMSE lters gk can be computed according to (9.18). In order
to compute the CDI, each user k quantizes the composite channel vector ck =
Hk gk C[NBS 1] , being a combination of the LMMSE lter and the physical
channel matrix, by applying CVQ based on the channel codebook
C = {u1 , . . . , u2B },

(9.21)

where B denotes the number of necessary bits for indexing the 2B normalized
codebook vectors uq C[NBS1] , q {1, . . . , 2B }. By doing so, only NBS entries in
ck instead of NBS Nue entries in Hk need to be quantized at each UE, leading to
a smaller quantization error if one keeps the feedback amount constant [DLU09].
However, in a real system, the nally chosen precoder W, and therefore the
resulting receive lter gk is unknown to UE k at the time CVQ is applied.
This is because each user has no knowledge about channels of other users due
to the non-cooperative nature of the downlink channel. As a consequence, the
quantizer QC needs to compute the quantized composite channel vector ck
C[NBS 1] based on an estimate of the receive lter, in the following denoted as
k C[Nue 1] , whose estimation quality compared to the nally chosen LMMSE
g
lter gk depends mainly on the chosen quantization method. In the following,
we dene the quantizer output as
k ) = QC (Hk ) .
CDI: (
ck , g

(9.22)

Moreover, due to the fact that the channels of other users and the nally
chosen precoder are not known when the feedback information is computed at the
mobile, CQI must be approximated as well. This is usually done by taking into
account a rough estimate of the multi-user interference caused by the imperfect
CSI at the base station due to quantization. As derived in [TBT08, 3GP06a],
the CQI of user k, which is here a scaled version of the SINR at the kth mobile
receiver, is approximated via
k "22 cos2 k
k |
"Hk g
|cH Hk g
, cos k = k
,
2
2
k "2
"Hk g
k "2 sin k /NBS
2 + K "Hk g
(9.23)
where k [0, ] denotes the angle between the normalized composite channel
vector and the quantized version thereof (quantization angle), and where, without loss of generality, we set "
gk "2 = 1. In the following, we present two CVQ
methods based on two dierent quantization criteria.
k , Hk ) =
ck , g
CQI: k (

9.2 Channel State Information Feedback to the Transmitter

9.2.4

213

Minimum Euclidean Distance Based CVQ


Remember that one problem of CVQ is the fact that the nally chosen LMMSE
lter is not known when the user computes the feedback information due to its
dependency on the nally chosen precoder, and this precoder cannot be computed at the mobile because of the lack of knowledge about the CSI at other
terminals. To overcome this obstacle, we rst assume arbitrary receive lters gk .
Since the resulting composite channel vector is then an arbitrary linear combination of the columns of Hk , it lies in the column space of Hk . This fact can
be utilized for CVQ in the sense that the codebook entry is chosen such that its
Euclidean distance to the row space of Hk is minimized (minimum quantization
angle or error, cf. Fig. 9.5).
Mathematically speaking, with the QR factorization of Hk = Qk Rk where
Qk C[NBS Nue ] is an orthonormal basis of the column space of Hk , and Rk
C[Nue Nue ] is upper triangular, the quantized composite channel vector reads as
+
+2
+

:= u9 , = = arg maxq{1,...,2B } +QH


(9.24)
cEuclid
k
k uq 2 .
Here, = denotes the codebook index which is fed back to the base station as
the CDI using B bits. To compute the CQI, we need not only the quantized
composite channel vector but also an estimate of the receive lter. In the case
of minimum Euclidean distance based CVQ, the receive lter is also chosen such
that the resulting composite channel vector in the column space of Hk has the
minimum Euclidean distance to the quantized composite channel vector.
In other words, the estimate of the receive lter is obtained by projecting

back into the column space of Hk according to


cEuclid
k
Euclid
Qk Q H
k c
k
+
ck = +
+Qk QH cEuclid + ,
k k
2

(9.25)

and applying the left-hand side pseudo-inverse of the matrix Hk , i.e., Hk =


1 H
(HH
k Hk ) Hk , as well as normalization in order to nally get
H ck
kEuclid := + k + .
g
+ +
+Hk ck +

(9.26)

Note that the optimization criterion of the resulting receive lter estimate
is no longer the MSE such as in the nally applied LMMSE receiver but the
Euclidean distance. This leads to a mismatch between the true SINR and the
one fed back as CQI. Finally, minimum Euclidean distance based CVQ can be
summarized as

+2
+
+

cEuclid
= arg max +QH

k
k u 2 , Hk = Q k R k ,

uC


kEuclid ,
QEuclid
: Hk 8 cEuclid
,g
Euclid
Hk Qk QH
C
k
k c
k
+ ,

kEuclid = +
g

Euclid +

c
+Hk Qk QH
+
k k
2
(9.27)

214

Channel Knowledge

k
c
1
k

1
k
Qk QH
kc

k
c
range(H k )

c 2010 IEEE.
Figure 9.5 Quantization of the composite channel vector, from [DLU09]. 

kEuclid , Hk ) according
and the corresponding CQI computes as k (cEuclid
,g
k
to (9.23).

9.2.5

Maximum SINR Based CVQ


In this section, we propose an alternative CVQ method which is based on the
maximization of the SINR expression [DLU09]. The original CVQ approach
from [Jin06] is only concerned with minimizing the quantization error represented by the quantization angle k . However, the ultimate objective in our system should be to target the highest possible sum-rate. Due to (9.19) and (9.20),
this is achieved by maximizing the SINR, which can be approximated via expressions such as Eq. (9.23).
Due to the above reasons, we argue that an approach based on the maximization of an SINR expression may yield superior performance in a sum-rate sense
than simply minimizing the quantization angle. Our method is solely concerned
k and the quantized composite
with nding the best receive lter estimate g
channel vector ck . In that sense, it provides an alternative to Section 9.2.3.
Let us have a look again at the approximate expression for the scaled SINR
in (9.23). Note that this SINR expression will approach cotan2 k as the noise
variance 2 goes to zero. Therefore, at large signal-to-noise ratio (SNR), the
SINR is almost fully determined by the quantization error, which justies the
minimization of k in such conditions. However, the situation might be dierent
at lower SNRs.
Thus, our proposed approach is to maximize (9.23) over all possible codebook
entries u C and receiver weights v C[Nue 1] of unit norm, i.e.,
 SINR SINR 
k
ck
=
arg max
,g
k (u, v, Hk ),
(9.28)
(u,v)C{C[Nue 1] :"v"2 =1}

kSINR C[Nue 1] . Finally, the corresponding


C and g
yielding an optimal
cSINR
k
kSINR , Hk ).
CQI fed back to the transmitter side is given by k (cSINR
,g
k
In order to perform the above maximization, let us re-write k (u, v, Hk )
H
according to (9.23) by substituting "Hk v"2 cos2 k = vH HH
k uu Hk v and using
2
2
H
the identities sin k = 1 cos k and 1 = gk gk (unit norm constraint of receive

9.2 Channel State Information Feedback to the Transmitter

lters):



H
vH HH
k uu Hk v

k (u, v, Hk ) = H  2
H
v I + HH
k (I uu ) Hk v
vH A(u)v
.
= H
v B(u)v

215

(9.29)

It is well known that expressions in the form of (9.29) are maximized by setting
v to the eigenvector corresponding to the largest eigenvalue i solving the generalized eigenvalue problem A(u)vi = i B(u)vi (e.g., [BX05]). Moreover, if B(u)
is invertible, the eigenvalues and eigenvectors are the same as for the regular
eigenvalue decomposition of B1 (u)A(u).
Note that this maximization nds the best v given a specic codebook entry
u. The optimal
cSINR
is the one yielding the largest SINR value over all codebook
k
SINR
k
is the corresponding optimal weight vector for this cSINR
, i.e.,
entries, and g
k

QSINR
C

SINR

= arg max
max
k (u, v, Hk ),
ck

v{C[Nue 1] :"v"2 =1}
uC
SINR
SINR

k
: Hk
8
ck
,g
,

kSINR =
arg max
k (cSINR
, v, Hk ).
g
k


v{C[Nue 1] :"v"2 =1}

(9.30)
A drawback of this method applied directly is the computational complexity,
since the maximization over v needs to be performed for all entries of the channel
codebook, and thus, requires 2B generalized eigenvalue decompositions per subcarrier on which the optimization is performed.

9.2.6

Pseudo-Maximum SINR based CVQ


To overcome the high computational burden of the maximum SINR solution, we
propose a pseudo-maximization algorithm as an alternative to the exact maximization in (9.28) [DLU09]. We note that k is only a function of the composite channel magnitude (CCM) "Hk v"22 and the quantization error k for a
given transmit power. The only way to increase k is by increasing "Hk v"22 or
decreasing k . The critical assumption we will make here is that k is close to its
maximum when either "Hk v"22 is maximized or k is minimized. Therefore, we
kCCM ) and (cEuclid
kEuclid ),
,g
,g
evaluate k at two specic points, denoted as (cCCM
k
k
where we dene
kCCM =
g
cCCM
k

arg max
v{C[Nue 1] :"v"

2 =1}

v H HH
k Hk v,



kCCM  ,
= arg max uH Hk g

kEuclid)
(
cEuclid
,g
k

(9.31)

uC

and the point


is the minimum Euclidean distance solution that
kCCM is in the direction of the
we already presented in Section 9.2.3. Note that g
eigenvector corresponding to the largest eigenvalue of HH
k Hk . The increase in

216

Channel Knowledge

computational complexity of the pseudo-maximization solution with respect to


the minimum Euclidean distance method therefore includes one (regular) eigen.
value decomposition and one search for the closest quantization vector cCCM
k
We nally summarize the Pseudo-Maximum (PM) solution to the SINR maximization problem as


kPM :=
k (u, v, Hk ).
QPM
: Hk 8 cPM
arg max
C
k ,g
CCM , c
Euclid
(u,v){(
,
gk
,
gk
cCCM
) (Euclid
)}
k
k
(9.32)

9.2.7

Application to Zero-Forcing (ZF) Precoding


C[NBS K] composed by the
With the quantized composite channel matrix C
columns ck , k K, the ZF precoder at the base station computes as [Ver98,
DLU09]
 

1
T = C
C

TC
W = W' 1/2 , W' = C
,
(9.33)
where the diagonal matrix CKK represents power loading. For the simulations in Section 9.2.9, we assume equal power loading according to
2
4K
1
= diag
,
(9.34)
"W' ek "22 k=1
where ek {0, 1}[K1] contains a one in element k, and otherwise zero.

9.2.8

Resource Allocation
With the codebook indices and the scaled SINR values of all users, the base
station schedules the users and computes the ZF precoder as described in the
previous subsection. To do so, it calculates the SINR approximations based on
the scaled versions thereof. It holds [TBT08, 3GP06a]
= K diag (k' )kK .

(9.35)

Then, it uses these SINR approximations in order to schedule the users according to a greedy algorithm as described in [DLU09, TBT08, 3GP06a], see also
Section 9.1. Finally, the set K of scheduled users is used to compute the ZF
precoder according to (9.33) and (9.34).

9.2.9

Simulation Results
We investigate the proposed schemes in a MIMO OFDM system with the parameters as given in Table 9.1 and assuming the typical urban macro-cell channel
model of the WINNER project [HKK+ 07].

9.2 Channel State Information Feedback to the Transmitter

217

c 2010 IEEE.
Table 9.1. Simulation parameters, see [DLU09]. 

Parameter
Num. of BSs
Num. of Tx ant. per BS
Num. of overall users
Num. of users sched. to same res.
Num. of Rx ant.
Speed of users
Carrier frequency
Bandwidth
FFT size
Num. of sub-carriers
Num. of feedback bits
Feedback period
SINR quantization
Channel model
Path loss used
Ant. spacing at Tx
Ant. spacing at Rx

Variable
M
Nbs
U
K
Nue

Value
1
4
10
4
2
1 m/s
2.0 GHz
18 MHz
2048
1200
4
1.0 ms
No
Typical urban macro
No
0.5 wavelength
0.5 wavelength

Fig. 9.6 illustrates the performance dierence between the CVQ schemes with
Pseudo-Maximization (PM) and full maximization of the CQI indicator (approximate SINR), and compares them with the minimum Euclidean Distance (minimum quantization error) approach. Here, we assume a random codebook with
B = 4, where the elements of C are chosen from an isotropic distribution on the
NBS -dimensional unit sphere, i.e., normalized versions of vectors with random
entries that correspond to NC (0, 1).
The maximal gain of the pseudo and full maximization schemes over the
minimum Euclidean distance method seems to be about 1.2 bit/s/Hz at 0 dB
SNR. Moreover, one can see that the pseudo-maximization scheme acceptably
approaches the performance of the full maximization scheme. Indeed, the performance gap between the two is never more than about 0.7 bit/s/Hz. Surprisingly, pseudo-maximization and Euclidean distance minimization are both
slightly superior to full maximization in the mid and high SNR regions. This
is possible due to the fact that the SINR measure used as the cost criterion
for maximization does not represent the exact SINR, but is rather an approximation of this quantity. It appears that Euclidean distance minimization is the
best option at mid and high SNR, and that in that range the SINR pseudomaximization achieves the same performance. As pseudo-maximization makes a
choice between the codebook entry that maximizes channel magnitude and the
one that minimizes the quantization angle (Euclidean distance), one can suspect
that at high SNR, the minimum quantization angle solution is chosen most of

218

Channel Knowledge

15

Sum rate in bits/s/Hz

Max. SINR (SINR)


Pseudo-Max. SINR (PM)
Min. Euclidean Distance (Euclid)

10

0
10 5

10

15

20

25

30

35

40

SNR in dB,
Figure 9.6 Performance comparison in case of random codebook and typical urban

c 2010 IEEE.
macro-cell, from [DLU09]. 

the time, such that no performance dierence is noticeable with the scheme that
only chooses the entry with the minimum Euclidean distance.
Remember that the pseudo-maximization scheme requires much less computational complexity than full maximization (see the discussion on that eect in
Section 9.2.5). Due to the above results, it also appears that pseudo-maximization
always performs better or equivalently to quantization angle minimization,
and that not much is gained by using full maximization (it can even cause
slight performance degradation in certain SNR ranges). Therefore, the pseudomaximization scheme seems like the preferred alternative to quantization angle
minimization and produce signicant sum-rate gains with respect to this scheme
in the low SNR region. In [KDA+ 10], the same conclusion has been drawn on
the basis of system-level simulations.

9.2.10

Summary
In this section, we presented explicit CSI feedback schemes based on CVQ. Whereas
the minimum Euclidean distance method achieves a good performance in the high
SNR region, it degrades enormously in cases of small SNRs. In these cases, CVQ
based on maximizing an estimate of the SINR outperforms Euclidean distance based
schemes. However, the maximum SINR based CVQ method suers from a very high
computational complexity. Here, the suboptimal solutions presented in this section
provide a good trade-o between complexity and performance. While this section
focussed on a single-cell case, all discussed CSI feedback techniques can principally
also be applied to multi-cell CSI (possibly capturing both desired and interfering
channels), and hence be used in the context of CoMP.

10 Ecient and Robust Algorithm


Implementation

In this chapter, we now look into algorithm implementation aspects connected


to CoMP. In Section 10.1, the issue of numerically robust and exible multicell precoding is addressed, while Section 10.2 looks into the performance of
interference rejection combining lters under practical conditions.

10.1

Robust and Flexible Base Station Precoding Implementation


J
org Holfeld and Gerhard Fettweis
As we have seen in Sections 6.3 and 6.4, spectral eciency can be signicantly
increased if downlink CoMP schemes based on multi-cell joint signal processing are applied. Here, multiple base stations (BSs) perform a joint and coherent transmission towards multiple user equipments (UEs). Especially cell-edge
users with symmetric links gain from the joint transmission (JT) with minimized
inter-cell and inter-user interference compared to conventional cellular mobile
networks.
In a cellular system, a spatial downlink resource allocation algorithm may
switch between a single link transmission comprising one BS and one UE, a
single BS multi-user scenario or nally a multi-cell joint transmission. The precoder must be able to cope with dierent system setups or matrix dimensions
respectively. Such an implementation requires a higher signal processing exibility as compared to non-cooperative setups. Physical link parameters like line-ofsight (LOS), non-LOS, or correlated transmit and receive antenna patterns inuence the CoMP multi-user MIMO (MU-MIMO) eigenvalue spread and nally the
spatial diversity and multiplexing gains.
Furthermore, in real-time hardware, the link parameters are related to nite
precision eects. This must also be considered, since hardware architectures are
constrained by a limited number of processing units and by limited numerical
precision. The implementation of precoding for CoMP must be intended to preserve high numerical stability and low error propagation between the processing
stages in conjunction with the heterogeneous CoMP setups.
This section compares several precoder architectures for downlink JT and
focuses on a generalized matrix inverse with assessable link parameters and

220

Ecient and Robust Algorithm Implementation

controllable error propagation. Initially, the transmission model is stated in Subsection 10.1.1, after which precoding architectures are introduced step by step
in Subsections 10.1.2 to 10.1.4. The section concludes with a numerical example
and a summary in Subsections 10.1.5 and 10.1.6, respectively.

10.1.1

System Model
The considered multi-user downlink system employs an orthogonal frequency
division multiple access (OFDMA) scheme, where M BSs with all in all NBS
transmit antennas jointly transmit data to K UEs with NUE receive antennas
in total. The notation from Section 3.5 is reused for the baseband model of a
frequency at channel in frequency domain per sub-carrier as




+n ,
= GH HH Wd(x) + n = GH Hx
(10.1)
x
C[NUE1] are the stacked symbols which are transmitted to the UEs
where x, x
and estimated after receive processing, respectively.
On one hand, the statistical quantities of this model are"the transmit
symbols,
#
H
2
= x INUE . On
whose transmit covariance matrix is given with xx = E xx
the other hand, the vector n C[NUE1] is assumed
to
be
additive
white
Gaussian
"
#
noise (AWGN) with covariance matrix nn = E nnH .
The transmit symbols are processed by the linear spatial precoder W
C[NBS NUE ] which forms together with the physical channel H C[NBS NUE ] an
C[NUE NUE ] . Additionally, the transmit power is limited by
eective channel H
the gain R according to the sum-power constraint
"
"
#
#
E "Wx"2 = 2 tr Wxx WH = ETx .
(10.2)
The aim of the linear precoder is to decouple the received data symbols from
each other already at BS side. Each of the K non-cooperative UEs spatially
lters its received signals independently. Consequently, the receive lter matrix
G C[NUE NUE ] is a block matrix of the compound UEs receive lters. The
equalized data signals at each UE k can be decomposed into

K

.
HH
k = GH
x
(10.3)
HH
k
k Wk xk +
k Wj xj + nk
j=1,j=k

Here, HH
k Wk xk describes the desired symbol part. The second term
K
H
H
represents the spatial interference. The AWGN
j=1,j=k
k Wj xj
#
"
vector nk C[Nue k] per UE is dened with E nk nH
= n2 k INue as local
k
covariance matrix.
For the sake of simplicity, the compound receive lter is reduced to a scaled
diagonal matrix G = 1 INUE . This means that each user stream is handled independently from other streams like single antenna UEs. Each UE should estimate
this scalar based on the eective channel. Admittedly, real handset implemen-

10.1 Robust and Flexible Base Station Precoding Implementation

221

tations have to employ more advanced synchronization and channel estimation


techniques which are not within the scope of this section [SBM+ 04, KHF09].
The topic of spatial interference rejection combining (IRC) and minimum mean
square error (MMSE) ltering will be discussed in detail in Section 10.2.
Based on the previously mentioned decomposition, the receiver-side signal-tointerference-and-noise ratio (SINR) of one single stream 1 l NUE is chosen
as performance metric




H
H
hl wl wl hl

.
(10.4)
SINRl = N

UE


H h + 2 
'
hH
w
w
'
l
l
n
l
l
l' =l
G= 1 INUE

= HH W which
This expresses a power ratio based on the eective channel H
describes the relation of the useful signal part l to interference terms.

10.1.2

Transmit Filter Eigendecomposition


Linear precoding algorithms can be classied into transmit zero-forcing (ZF)
lter and Wiener lter (WF) [JUN05] and, not considered here, block diagonalization [SH02] and iterative methods [SB04]. The WF was derived for the MMSE
criterion as
1

tr {nn }
INUE
(10.5)
W = WF H HH H +
E

Tx

with WF

>
?
?
=@

'

ETx /x2
tr

HH H (HH H + ' )2

!.

(10.6)

The term HH H is regularized with the noise covariance matrix in (10.5). In


interference (and not noise) limited environments with dense BS deployments
this term can be neglected. Hence, only the eigenvalue decomposition (EVD)
can be considered, i.e.
HH H = UUH =

n
min


2k uk uH
k ,

(10.7)

k=1

with the unitary matrix U U[NUE NUE ] and R[NUE NUE ] . This decomposition bounds the system performance.
Spatial conditions are determined, e.g., through antenna correlations, path
loss or shadowing eects, so that reduced rank situations can occur. Consequently, in a multiple-input multiple-output (MIMO) multi-point scenario there
exist at least 21 22 2nmin dominant eigenvalues with uk as the corresponding eigenvector to eigenvalue 2k and nmin = min(M, K). In the full rank

222

Ecient and Robust Algorithm Implementation

case, there exist at most nmin = min(NBS , NUE ) non-zero eigenvalues. Related
to the EVD, each user symbol is transmitted on a dierent eigenmode with
the power allocated according to corresponding eigenvalues. With the EVD, the
transmit lter can be written as

1
'
UH ,
(10.8)
W = WF H U + UH U

and the scalar WF as


WF

with

>
?
?
=@

lim
WF
2

n 0

tr{nn }
ETx INUE

ETx /x2
tr

( + UH ' U)

>
?
?
=?
@

ETx /x2
n
min
1
i=1

and i nmin .

(10.9)

(10.10)

2i

The interpretation of (10.10) leads to following basic results: Under a xed


number of transmit antennas in CoMP and with a xed WF , each additional
user stream increases the transmit power. For a xed ratio ETx /x2 , each additional user stream reduces the SINR of the eective channel. The small eigenvalues aect the transmit power or the SINR at most. They also determine the
performance from a numerical point of view as condition number
0
(10.11)
(H) = 21 /2nmin 1.
If eigenvalues are rounded towards zero due to limited precision, then the
observed numerical rank, which is denoted as ranknum (H), is smaller than the
rank of H. The threshold num depends on the algebraic computations.
min(NBS ,NUE )

0 for 2i = 0
i with i =
(10.12)
rank(H) =
1 otherwise
i=1


min(NBS ,NUE )

ranknum (H) =

i=1

i,num with i,num =

0 for | 2i |< num


1 otherwise

(10.13)
Finally, the number of user streams cannot exceed the number of eigenmodes.
Hence, a spatial resource allocation may not include an additional stream which
would reduce the SINR of the already precoded streams.

10.1.3

Transmit Filter Computations


MU-MIMO algorithms for CoMP must be designed for an upper number of cooperative BSs that serve an upper number of UEs. To realize the linear lter (10.5),

10.1 Robust and Flexible Base Station Precoding Implementation

223

one has to choose an appropriate algebraic decomposition [GVL96]. In this section, three basic approaches will be compared before the idea of order-recursions
will be presented.
The Cholesky decomposition (CD), which is similar to Gaussian elimination, separates symmetric, positive-denite matrices into a triangular matrix R.
By calculating the inverse of R with backward recursions and pivoting, CD is a
3
)) and numerical stable algorithm.
low complex (O(1/3NUE


'
(10.14)
Substitution: RH R = HH H + I
H

Computation:
W = H R1 R1
(10.15)
On the other hand, the ability to handle a multitude of MIMO setups requires
a further organizational overhead, because the assignment of sub-matrices
within H necessarily leads to an advanced exception handling within the algorithm.
Another possibility is to use the class of QR decomposition with its realizations namely the modied Gram-Schmidt orthogonalization, Householder transformation or the Givens rotation. A compound matrix is decomposed into a
unitary and a triangular matrix with
) (
) (
)
(
Q1
R
HH
=

and
(10.16)
Substitution:
'
1/2
Q2
0
Computation:

= R1 QH
1 .

(10.17)

Compared to the CD, the orthogonalization step increases the complexity


2
3
to O(4NBS NUE
1/3NUE
). The usability of these algorithms in various spatial
setups is given, because the propagation of null elements is inherently realized.
The third method is the Schur complement. Submatrices of small dimensions are selected and calculated iteratively. These iterations can be realized very
eciently in pipelined architectures.
(
)1 

'
AU
Substitution:
= HH H +
(10.18)
VC
(
)1
AU
(10.19)
Computation:
W
=H
VC

(
)1
P
PK
AU
1
1
= SP
C1
KV) UC
+C1
V (A
VC

(10.20)
The drawback is the high error propagation in limited precision arithmetics.
Numerical errors are accumulated with each processing stage from C1 to K
until the full matrix has been processed.

224

Ecient and Robust Algorithm Implementation

The Idea of Order-Recursive Precoding


The idea of order-recursive precoding is based on changing the view onto
the matrix elements of the Schur complement. Instead of sub-matrices,
a column wise or row wise consideration of a matrix is chosen with
)
(
)
(
8
7
A
A U
H=
H= A u =
.
(10.21)
V C
v
[NBS 1]
uC

vC[1NUE ]

This principle denotes the transmit lter as a series of column- or row-wise matrix
extensions. Here, each column in H is the algebraic mapping of antenna links
which relate to one column in W for each user data stream.
*1
'
tr{nn }
H
w1 = TxWF h1 h1 h1 +
(10.22)
ETx
'
*1
tr{nn }
(10.23)
[w1 w2 ] = TxWF [h1 h2 ] [h1 h2 ]H [h1 h2 ] +
ETx I2
..
.
(10.24)


1
'
.
(10.25)
W = TxWF H HH H +
In the following, the algorithm is discussed in detail to explain the lter updates
by forward and backward recursion.

10.1.4

The Order-Recursive Filter in Details


As preliminary steps, an ordered set of stream indices I {1, . . . , NUE } must be
introduced, in order to pick out the columns of H and W which correspond to
the potential spatial streams that can be transmitted. The notation I[a:b] shall
denote the subset of the a-th to b-th elements in I. Further, for any matrix A,
the notation A[I,J ] yields the sub-matrix of A which correspond to the rows and
columns indexed in I and J , respectively. Then, for any set I of selected spatial
streams, the transmit WF can be expressed as

1
'
W[:,I] = H[:,I] HH
.
(10.26)
[:,I] H[:,I] + [I,I]
The idea of order-recursive precoding is based on a recursive lter structure
where in each iteration l, a previously computed precoding matrix W[:,I[1:l1] ]
connected to stream indices 1 to l 1 is extended by a column for the l-th
stream index in I. Note that the order in which streams are processed (hence
the order of the elements in I) can be arbitrary. Furthermore, a spatial resource
allocator can switch BSs and UEs on or o based on a selection criterion , as
it will be detailed later.
In each iteration, the new precoding coecients are determined by an additive matrix update and the concatenated precoding vector b C[NBS1] for user

10.1 Robust and Flexible Base Station Precoding Implementation

stream I[l] with

.
W[:,I[1:l]] = W[:,I[1:l1] ] b d[IT[1:l1] ]

225

/
.

In the rst step, the vector d C[NBS 1] is calculated as


A
T B

W
HH
[:,I
]
[1:l1]
[:,I[l] ]
,
d[I[1:l] ] =
1

(10.27)

(10.28)

and is followed by the projection vector e C[NBS 1] as


H
e = HH
[:,I[l] ] H[:,I[1:l1] ] d[I[l1] ] .

(10.29)

Finally, b is determined dependent on an update criterion  for order extension


0, H[:,I[l]] := 0 if  is not fullled
(10.30)
b=

b
otherwise


2
= e / "e"2 + " ' 1/2
.
(10.31)
with b

d
"
[I[1:l] ]
[I[1:l] ,I[1:l ]
The updates are performed with subspace projections (see (10.28)
and (10.29)). Thus
'
T *

H
e = INBS W[:,I[1:l1] ] H[:,I[1:l1]]
H[:,I[l]]
(10.32)



PC (HH

[:,I[1:l1] ]

can be expressed with the orthogonal projector PC (H[:,I

[1:l1] ]

onto the comple-

mentary space which is spanned by the set of the already considered row space
of the channel matrix. With this approach, the precoding matrix is additively
extended with rank-one updates of regularized projections through the channel
column vector H[:,I[l] ] into the existing precoding solution.
The quantity of linear independence between the user stream I[l] and the
existing precoder W[:,I[1:l1] ] , corresponding to the EVD, is the angle
cos I[l] =

HT[:,I[l] ] e
"H[:,I[l]] " "e"

, 0 I[l]

(10.33)

as a function of H and the user stream order in I. If I[l] is completely dependent


of the already spanned row space the angle is zero. Vice versa, a fully independent I[l] is orthogonal. In this case the eigenvalue is equal to "H[:,I[l]] "2 . The
eigenvalues in (10.8) bound the observed norms of e

2
21 HT[:,I[l]] PC (H[:,I
) H[:,I[l] ] nmin , "e" > 0 .
[1:l1] ]

(10.34)

"e"2

The main advantage of subspace projections is the ability to propagate


null elements in H[:,I[l] ] to the corresponding position in W[:,I[l] ] . Hence, it is

226

Ecient and Robust Algorithm Implementation

Table 10.1. Operation count comparison of matrix inversion algorithms [GVL96, HKF10].

O-notation
Divisions
Sqr. roots

Cholesky dec.
3
1/3NUE
NUE
0

QR dec.
2
3
4NBS NUE
1/3NUE
NUE
NUE

Order-rec.
2
3/2NBSNUE
NUE
0

inherently possible to handle multiple transmission scenarios such as single-input


single-output (SISO), multiple-input single-output (MISO) and MIMO. Reorganizing the matrix elements is not necessary with this approach. The full pseudo
code as reference for an implementation in a real-time system and a complexity estimation is given in [HKF10]. Table 10.1 shows the algorithmic complexity
which is made up of a complex operation count of the required multiplications,
divisions and square root computations.

Update Sequence and Criteria


The user streams are decoupled from the iteration numbering through the choice
of I. With each order-recursive iteration, the sequence of each processed user
stream can be computed based on update criterion  as already given in (10.30).
Furthermore, quality of service (QoS) oriented precoding criteria like transmit
power, SINRs or user stream priorities beside the condition number can be used
as  to decide how an additional user aects the system performance. Based
on this, stopping thresholds can be dened. With this minor extension, the proposed solution can be extended towards a simple spatial scheduling or resource
allocation methodology similar to the tree search of [FDGH07].
For example, in ill-conditioned CoMP channels weak eigenmodes can be
skipped to preserve the numerical stability. Here from the base station point
of view, the column vector H[I[l] ,:] is lled with zeros. This means that either a
UE will not be served by CoMP or a particular stream to this UE is not included.
More details on this topic can be found in Section 11.1.

10.1.5

Example: SINR as Function of the Condition Number


This example illustrates the SINR (10.4) as function of the condition number for a
physical channel matrix H with independently and identically distributed (i.i.d.)
circular symmetric complex normal coecients. In this metric, the mean over the
desired signal power and the mean over the interference plus AWGN noise power
is given as ratio in decibel. Fig. 10.1 compares the SINR under various noise
power levels as well as oating and xed-point precision. The implemented xedpoint number format is the 2s complement with one sign bit, 5 integer bits and
10 fractional bits. The loss of SINR compared to the oating point computations
is negligible for good-conditioned channels. In case of ill-conditioning the previously described eects of rounding and numerical propagation errors become

10.2 Low-Complexity Terminal-Side Receiver Implementation

25

n2 = 0.001

20

25

SINR [dB]

10

n2 = 0.1

5
0

101
102
channel condition number (H)

(a) NBS = NUE = 4 and ETx = 1.

10

n2 = 0.1

0
103

oating point
xed-point

n2 = 0.01

15

n2 = 1.0
100

n2 = 0.001

20

n2 = 0.01

15

30
oating point
xed-point
SINR [dB]

30

227

n2 = 1.0
100

101
102
channel condition number (H)

103

(b) NBS = NUE = 6 and ETx = 1.

Figure 10.1 Floating and xed-point SINR for i.i.d. Gaussian MIMO systems as
function of the channel condition number.

prominent. If eigenvalues (10.8) are smaller than the machine accuracy num or
the projection norms (10.34) are observed as zeros then a low desired signal and
high spatial interference power results. With an increasing noise, the regulariza'
tion through aects the division operations to be in a dened range, prevents
numerical instabilities and improves the conditioning of the matrix inversion.
Anyway, the regularized (or approximated) division results in an imperfect interference suppression.

10.1.6

Summary
This section described an order-recursive algorithm to calculate the Wiener
transmit lter for coherent CoMP transmissions between several decentralized
base stations to several decentralized mobile terminals in cellular radio networks.
State of the art implementations do not oer the exibility to handle multiple
transmission setups as required for multi-user precoding in conjunction with a
proposed reduced complexity. The discussion of all signal processing steps shows
how algebraic observations are mapped to physical parameters and how they
reect the spatial eigenmodes. These observations can be integrated into spatial
resource allocation to extend the precoding coecients recursively with additional spatial streams.

10.2

Low-Complexity Terminal-Side Receiver Implementation


Udo Wachsmann, Rainer Bachl and Stefan M
uller-Weinfurtner
Cellular systems adopting CoMP techniques to increase system capacity impose
new requirements on the terminal-side receiver. One particular aspect being pre-

228

Ecient and Robust Algorithm Implementation

sented in this section is the treatment of intra-cell and inter-cell interference in


the downlink. In orthogonal frequency division multiplex (OFDM)-based wireless systems like LTE, intra-cell interference is mainly due to spatial multiplexing
of multiple users. For inter-cell interference, we have to distinguish the two cases
where serving and interfering cell(s) are co-located or located at dierent sites.
In both cases, it is assumed that CoMP techniques lead to dedicated beams
towards the user equipments (UEs) in order to increase spectral eciency. This
shall apply for the desired link as well as for the interfering link(s).
A good overview on space-time receivers in general and means to mitigate
co-channel interference can be found in [PNG03]. One particular method in a
terminal-side receiver to exploit the characteristics of the interference signal at
aordable cost is the so-called interference rejection combining (IRC), also known
as optimum combining [Win84]. In this chapter, the potential gains of IRC on
link level are analyzed. Starting from a simple system model with ideal assumptions on channel and interference estimation, the impact of estimation errors
is elaborated in more detail. Particular focus is spent on complexity aspects
for interference estimation when taking the LTE Release 8 signal structure into
consideration [3GP07b].

10.2.1

Introduction to Interference Rejection Combining (IRC)


The aspects of interference handling in the UE are presented based on a simplied downlink system model where each base station (BS) transmits to exactly
one UE. For the considerations in this chapter, we imply further on that only
one transmission layer is used per UE, i.e. no single-user spatial multiplexing is
applied. However, the study can easily be extended to accommodate single-user
spatial multiplexing without loss of generality. These assumptions enable the
simple model of having one dedicated beam per BS towards one UE. In terms
of the system model introduced in Chapter 3, we have K = M and investigate
the transmission on an exemplary frequency-at sub-carrier of an orthogonal
frequency division multiple access (OFDMA) system.
The pair BS 1 and UE 1 represents the desired link, while the remaining (BS,
UE) pairs are treated as interference at UE 1. For the development of a terminalside receiver implementation, we are only interested in the receive signal of the
desired link. Hence, the simplied system model being adapted from (3.6) can
be written as follows
K

[Hk1 ]H wk xk + n1 ),
x
1 = gH y1 = gH ([H11 ]H w1 x1 +



k=2
noise



desired
interference

(10.35)

where g, y1 , n1 C[Nue 1] , k : wk C[Nbs 1] , and k : Hk1 C[Nbs Nue ] .


In (10.35), we further restricted ourselves to linear and unitary preprocessing
at the BS side and replaced the term d(x) in (3.6) by x. Furthermore, the index

10.2 Low-Complexity Terminal-Side Receiver Implementation

229

for the UE receive lter weights g has been skipped for brevity since we deal
exclusively with UE 1 in this section. Please note that due to the single-layer
assumption introduced above, the transmit symbols towards the UEs, xk , are
denoted as scalars. For simplied notation in the sequel, we dene the eective
channel for single-layer transmission from BS k to UE 1 as follows:
hk = [Hk1 ]H wk .

(10.36)

Then, the interference-and-noise part from (10.35) being eective at the receiver
can be formulated as
z1 =

K


hk xk + n1

(10.37)

k=2

and the resulting simplied system model reads




x
1 = gH y1 = gH h1 x1 + z1 .

(10.38)

The covariance matrix of the eective noise z1 is given by


z1 z1 = Ex,n

"

z1 zH
1

= I+

K


"
#
H
hk hk E |xk |2 .

(10.39)

k=2

The averaging is performed over interfering transmit symbols x and noise n, so


that only the dependency on the instantaneous eective channels hk remains.
At this stage, we did not average over eective channel realizations, since we
assume that they can be estimated with reasonable accuracy in order to exploit
spatial correlations for interference rejection.
The concept of IRC in light of the above-mentioned system model can be
formally described as a cascade of pre-whitening and maximum ratio combining (MRC). The pre-whitening step transforms the colored noise consisting of
interference and white noise into the domain where we have only white noise
with unit power on each receive branch. In case of white noise, the MRC receiver
is well-known to be optimal [Win84]. Hence, the IRC operation in terms of the
UE-side receiver lter can be formulated as
H

H
1/2
= h1 1
gIRC
= h1 1/2
z z
z1 z1
z1 z1 .

1 1



MRC pre-whitening

(10.40)

This conceptual approach of pre-whitening and MRC is also known as whitened


matched lter (WMF), see e.g. [Lar07]. Please note that the MRC part in (10.40)
1/2
1/2
includes z1 z1 since the channel after pre-whitening is given by z1 z1 h1 . Please
note further that the cascade of pre-whitening and MRC is only one way to
introduce and motivate IRC. However, the actual computation of lter weights
is done in a single step.
The performance metric being used for evaluation of numerical results is
the residual signal-to-interference-and-noise ratio (SINR) after the UE receive

230

Ecient and Robust Algorithm Implementation

lter. The motivation for this choice is that the SINR is appropriate to represent the eective signal-to-noise ratio (SNR) on an additive white Gaussian
noise (AWGN) channel. With the eective SNR approach further performance
metrics like error probability or throughput can easily be derived from known
AWGN gures. As starting point, the generic expression for the instantaneous
SINR with a general UE receive lter is given by (see e.g. [HSP01])
 H 2
g h1 
"
#
E |x1 |2 .
SINR(g) = H
(10.41)
g z1 z1 g
The term instantaneous SINR relates to the fact that only one specic channel
realization of the desired and interfering links is considered. The generic expression in (10.41) is used later on when mismatched UE receive lters due to estimation errors are evaluated. The next step is to insert UE receive lter weights
for IRC from (10.40) into (10.41) resulting in the simplied SINR expression
#
"
#
"
H
2
= E |x1 |2 .
(10.42)
SINR(gIRC ) = h1 1
z1 z1 h1 E |x1 |
Here, the IRC parameter
H

= h1 1
z1 z1 h1

(10.43)

is introduced which can be interpreted as SINR in case of ideal IRC for unitpower transmit symbols. In order to complete the framework on IRC, the UE
receive lter weights according to the minimum mean square error (MMSE)
approach [Ver98] are derived and compared to the IRC ones. The MMSE solution
works on the covariance of the receive signal, which reads
y1 y1 = Ex,n

"

y1 y1H

= I+

K


"
#
H
hk hk E |xk |2 .

(10.44)

k=1

The relation between the covariance of the receive signal y1 and the covariance
of the interference-and-noise signal z1 can be observed as
#
"
H
(10.45)
y1 y1 = z1 z1 + h1 h1 E |x1 |2 .
Conceptually, the MMSE weights are obtained by whitening the receive signal
and applying a matched lter afterwards. This results in the following weight
expression
H

H
gMMSE
= h1 1
y1 y1
.
"
#/1
H
H
= h1 z1 z1 + h1 h1 E |x1 |2
.

(10.46)

Applying the Sherman-Morrison-Woodbury formula [GVL96], also known as


matrix inversion lemma, the MMSE weights can then be re-formulated as follows
2
H
gMMSE

H
h1

1
z1 z1

"
# H 1 4
2
1
h1 z1 z1
z1 z1 h1 E |x1 |
H

2
1 + h1 1
z1 z1 h1 E {|x1 | }

(10.47)

10.2 Low-Complexity Terminal-Side Receiver Implementation

231

Inserting the IRC expressions for weights from (10.40) as well as for SINR
from (10.42), the MMSE weights read
*
'
SINR(gIRC )
1
H
H
gIRC
gH . (10.48)
= 1
=
gMMSE
1 + SINR(gIRC )
1 + SINR(gIRC ) IRC
This interesting result shows that MMSE weights gMMSE are simply a scaled
version of IRC weights gIRC , hence yielding the same SINR
SINR(gMMSE ) = SINR(gIRC ).

(10.49)

Moreover, from an implementation point of view, the MMSE weights are


preferable since they provide inherently higher numerical stability. In order to
illustrate"this, #we consider e.g. the case of 1 interferer (K = 2) and high SNR, i.e.
2 = E |x1 |2 . Then, the matrix z1 z1 being used for the IRC weights gIRC is
almost singular and the inversion tends to get numerically instable if no manual
counter measures like e.g. adding a constant are taken. On the other hand, the
matrix y1 y1 being used for the MMSE weights gMMSE is typically well conditioned (at least for not too bad-conditioned channels H1 ) and the inversion
is inherently numerically stable. For the numerical results further below in this
chapter, we further consider the MRC receiver as a reference [Win84]. The UE
receive lter weights for MRC (without pre-whitening) can be formulated as
H

H
gMRC
= 2 h1 .

(10.50)

So far, the UE receive lter weights have been derived assuming ideal knowledge of channel and eective noise covariance, which is also the basis for the rst
performance study in Section 10.2.2. In later subsections, we will incorporate
dierent kinds of estimation errors by introducing the following weights:
, H 1
H
H
) = h
IRC
IRC,a
=g
(h,
g
z1 z1
1
H
H
=
IRC
IRC,b
=g
(h, )
g
H
H
)
=
IRC,c
IRC
g
=g
(h,

H 1

h1
z1 z1
H
,
1
h
z1 z1
1

with estimation of h

(10.51)

with estimation of

(10.52)

with estimation of h and ,

(10.53)

denotes estimates. Here, the letters a, b, c are used to distinguish difwhere ()


ferent degrees of estimation errors.

10.2.2

Performance Study on IRC with Known Channel and


Interference Covariance
In this subsection, numerical results on SINR performance for IRC reception
are presented. Hereby, ideal weight calculation is assumed, i.e. ideal estimation of channel and interference covariance. The multiple-input multiple-output
(MIMO) channel used for the study is modeled by means of a correlation channel
model (CCM) with independent transmit and receive antenna correlation. The
adopted channel model is taken from the one agreed upon in 3GPP [3GP10h]

232

Ecient and Robust Algorithm Implementation

with correlation parameters for transmit correlation and for receive correlation.
As described in the introduction of this section, the study focuses on scenarios where desired and interferer downlink are generated as dedicated beams. For
an appropriate modelling of this beamforming aspect, high correlation at BS
side is assumed. Therefore, the value = 0.9 is chosen since it also models in
3GPP [3GP10h] the high correlation case. At the UE side, we assume some polarization diversity and, therefore, low to medium correlation can be achieved. Here,
= 0.3 is selected matching the medium correlation case in 3GPP [3GP10h].
We consider one BS site with Nbs = 2 physical antennas and one UE with
Nue = 2 physical antennas. Furthermore, we take K = 2, i.e. one desired beam
(layer) and one interferer beam (layer). Please note that a beam in this context is
equivalent to one layer in the LTE sense. For simplicity, no particular pre-coding
is adopted meaning that one BS antenna serves one UE. Such a model gives
direct insight into the performance of multi-user spatial multiplexing.
The performance measure for the study is the mean SINR where the expectation is taken over all channel samples. Study parameters are the SNR and
signal-to-interference ratio (SIR), respectively. They are dened as
E{|x1 |2 }
2
E{|x1 |2 }
.
SIR =
E{|x2 |2 }

SNR =

(10.54)
(10.55)

Starting from (10.42), this results in the following SINR expression (K = 2) for
IRC:
(
)1
z 1 z 1
H
h1
SINR(gIRC ) = h1
E {|x1 |2 }
.
/1
H
H
= h1 SNR1 I + h2 h2 SIR1
h1 .
(10.56)
The curves are sketched as mean SINR versus SNR with SIR as additional parameter. Numerical results for IRC are depicted in Fig. 10.2 for the parameter
SIR {10, 0, 10} dB. Additionally, for reference, the performance of an MRC
receiver is shown.
It can be seen that the potential gains of IRC versus MRC strongly depend
on the SNR operating point. As soon as the SNR is larger than the SIR, the IRC
gain versus MRC gets substantial. This behavior is further aected by transmit
and receive correlation. For higher receive correlation values, the point where
IRC pays o is moved to slightly higher SNR values.
As a conclusion for interference-limited scenarios (SIR < SNR) with one dominating interferer, we see substantial gains for the discussed IRC approach. In
the following sections, it is further elaborated on the losses being implied when
estimation errors for channel and interference covariance are taken into account.

233

10.2 Low-Complexity Terminal-Side Receiver Implementation

average SINR after receive lter [dB]

15
10
5

IRC, SIR=10dB
IRC, SIR=0dB
IRC, SIR=-10dB
MRC, SIR=10dB
MRC, SIR=0dB
MRC, SIR=-10dB

0
5
10
15
10

5
SNR [dB]

10

15

20

Figure 10.2 Mean SINR versus SNR for ideal IRC and MRC receive lter weights.

Correlation channel model from [3GP10h] with = 0.9, = 0.3.

Please note that the underlying channel model assumes simply one antenna
array at BS site which is used by desired as well as interfering link. In the strict
sense, this model is only applicable for the intra-cell interference case. However,
the basic results on algorithm aspects and potential IRC gains can be easily
extended to the cases of inter-cell interference.

10.2.3

Implementation Losses from Imperfect Channel Estimation


Model for Channel Estimation Error and Resulting Weight Mismatch
We assume reference signal (RS)-based non-data-aided channel estimation so
that interference and noise in reference signal processing is independent from the
perturbation in data processing. For the error in the channel estimate for the
desired link, we assume a simple model with an additive perturbation [Ahn08],
which follows the same spatial correlation like interference and noise in data
processing. This assumption is based on the fact that in many mobile communications systems like LTE, the reference signals will suer from the same spatial
interference as data. In LTE, reference signals from directly neighboring cells
are dierently shifted in frequency so that the characteristics of interference on
reference signals is close to that on data symbols (see also Section 9.1). With
sucient accuracy, this can be assumed to hold for all transmission modes in
LTE. Channel estimation ltering in time and/or frequency direction usually
does not remove the spatial correlation. We assume that precoded demodulation
reference signal (DRS) are used as introduced in Section 9.1, enabling each UE
to directly estimate the eective channel after precoding, yielding an estimate
, = h + e,
h
1
1

(10.57)

234

Ecient and Robust Algorithm Implementation

where the vector e consists of complex-valued Gaussian noise with zero mean
and covariance
"
#
1
ee = E eeH = z1 z1 ,
(10.58)
G
and where we introduced the interpolation or processing gain G as a linear
power ratio (see Section 9.1). The amount of corresponding de-noising by means
of channel estimation ltering in time and/or frequency direction is given by
10 log10 (G) [dB].
Let us rst assume the interference-and-noise covariance estimate to be ideal
and deal with estimation in Section 10.2.4. With the additive error model, the
non-ideal weight vector suering from channel estimation errors reads
H

H
H 1
IRC,a
g
= h1 1
z1 z1 + e z1 z1 .

(10.59)

Analysis of SINR with Weight Mismatch


The instantaneous SINR suering from combining-weight mismatch can be
obtained by using (10.59) to tailor (10.41) to our needs. We average separately
over the perturbations in nominator and denominator to obtain
-
2 5

 , H 1
Ee h1 z1 z1 h1 
% H
& E{|x1 |2 }.
(10.60)
SINR(
gIRC,a ) =
,
,
1
Ee h1 z1 z1 h1
Including the additive error model for channel estimation from (10.57) into this
formula yields
%
2 &

 H

Ee  h1 + eH 1
h

1
z1 z1
2
 H

SINR(
gIRC,a ) =
(10.61)
! E{|x1 | }.

H
1
Ee h1 + e
z1 z1 h1 + e
We now start to evaluate the expectation under exploitation of the zero-mean
property of e and by making use of (10.43) to obtain
2 !


2 + Ee eH 1
z1 z1 h1
"
# E{|x1 |2 }.
SINR(
gIRC,a ) =
(10.62)
+ Ee eH 1
z1 z1 e
With the identity aH e = tr(eaH ) [HJ99] and the general denition for ee
from (10.58), we obtain the nal general SINR result
H

SINR(
gIRC,a ) =

2 + h1 1
ee 1
z1 z1 h1
z1 z1

E{|x1 |2 }.
+ tr ee 1
z1 z1

(10.63)

10.2 Low-Complexity Terminal-Side Receiver Implementation

235

The intention now is to put this SINR into a relationship with SINR(gIRC )
from (10.42), to obtain the general SINR ratio
H

ee 1
1 2 + h1 1
SINR(
gIRC,a )
z1 z1 h1

z1 z1
.
=
1
SINR(gIRC )

+ tr ee z1 z1

(10.64)

In a nal step, we make use of our channel estimation error model with specic
denition for ee from (10.58), which allows for substituting ee 1
z1 z1 = I/G to
obtain the SINR ratio
1 2 + /G
+ 1/G
SINR(
gIRC,a )
=
=
.
SINR(gIRC )
+ Nue /G
+ Nue /G

(10.65)

This is valid for the specic model that channel estimation error exhibits the
same spatial correlation like data, but only scaled by a scalar processing gain.
It needs to be understood that this is purely the ratio of post-combining SINR.
The actual losses measurable in error rate or throughput will usually be higher
and they will also depend on the specic type of signal constellation and the
post-combining demodulation strategy. As such, the ratio in (10.65) manifests
a fundamental lower bound for the non-recoverable losses caused during combining by weight mismatch due to imperfect channel estimation. We will treat
the additional demodulation loss from phase- or amplitude mismatch in the next
paragraph.
Further, we can read from (10.65) that the SINR ratio is equal to 1 for Nue = 1
UE receive antenna, which is reasonable, since the post-combining SINR does
not suer from an amplitude or phase error. It is only the demodulation process
itself, which later causes losses. Another sanity check is that the SINR ratio
converges to 1 for G irrespective of a nite Nue .
Demodulation Loss
In order to consider the impact of the demodulation loss, the channel estimation
error is introduced into the system model established in (10.38):


, e x + gH z
x
1 = g H h
1
1
1
, x + gH (z ex ) .
= gH h
1 1
1
1

(10.66)

, x , while gH ex has to be treated


The useful part for demodulation is only gH h
1 1
1
as noise. Then, the SINR expression for arbitrary lter weights g based on an
extension of (10.41) with the model from (10.66) reads


 H , 2
g h1 
#
"
(10.67)
E |x1 |2 .
SINR(g) = H
2
g (z1 z1 + ee E {|x1 | }) g

236

Ecient and Robust Algorithm Implementation

With the specic channel estimation error model introduced in (10.58), this
SINR expression can be further simplied to


 H , 2
g h1 
1

.
(10.68)
SINR(g) = H
g z1 z1 g 1/E {|x1 |2 } + 1/G
With the chosen model being valid for not too low SNRs, the demodulation loss
is independent of the SNR that data transmission is subject to (but of course
dependent on the SNR channel estimation is subject to) and the actually chosen lter weights. So, when comparing the achievable SINR for dierent weight
selection approaches, the demodulation loss has no impact since it eects each
approach equivalently. It is a signicant contribution to the overall loss, though,
as we will see in Fig. 10.3 in the next paragraph.
Numerical Evaluation of SINR Loss with Weight Mismatch
The channel estimation interpolation gain achievable in a general OFDMA system depends on the topology of reference signals in the time-frequency grid and
the two-dimensional correlations of channel coecients parameterized by time
dispersion properties (e.g., delay spread) and time variation (e.g., Doppler frequency) of the channel (see Section 9.1). It can be shown by analysis of the
Wiener solution [HKR+ 97b] that for the specic reference signal situation and
channel model parameters in LTE, the interpolation gain by means of UE-side
implementable ltering with limited span in frequency direction ranges roughly
between 3 dB in a fairly dispersive channel still compliant with a normal cyclic
prex and 12 dB in a non-dispersive channel. Further interpolation gain may be
achieved by means of ltering in time direction.
Fig. 10.3 shows the SINR ratio for exemplary parameters. With a processing
gain of 6 dB for Nue = 2 UE receive antennas, we have an acceptable combining
loss of 0.11 dB when targeting 10 log10 () = 10 dB. For Nue = 4, the combining loss increases to 0.31 dB. To guarantee the same post-combining SINR with
Nue = 4 instead of Nue = 2, the processing gain needs to be improved by approximately 5 dB. This result is related to [RCP09], and it shows that the cost for
channel estimation is rising with the number of antennas when the combining
loss with respect to ideal performance shall be limited.
The combining loss to be expected with realistic channel estimation is small
compared to the signicant gains oered by ideal IRC, as shown in Fig. 10.2.
It should also be noted that the requirements on processing gain are dominated
by the demodulation loss, if we consider Nue 4 and the higher SINR range,
e.g. above 8 dB. Hence, irrespective of combining loss due to channel estimation
errors, IRC implementation still yields substantial improvements.

10.2 Low-Complexity Terminal-Side Receiver Implementation

u
u

u
u

u
u

u
u
l

1.4
b
u

1.8

l
u

1.6

bl

SINR ratio [dB]

1.0

l
l

b
b
b

0.8

l
l

r
r

b
b

r
r

0.6

1.2

r
l

l
l

urbrl
l

l
l
u

b
b

0.4

r
l

b
b

rb
rlb

rl
r

rl

bl
b

lr

urbl
uurb

urub
uur

uur
ub

rl

urb
ru

urb
ru

0.2

u
l

ur

u
u

u
ur

237

Nue = 2
Nue = 4
10 log10 ()
10 log10 ()
10 log10 ()
10 log10 ()
10 log10 ()

=
=
=
=
=

16 dB
12 dB
10 dB
8 dB
4 dB

Demodulation loss

2
0

5 6 7 8 9 10 11 12 13 14 15
Processing gain [dB]

Figure 10.3 SINR ratio versus processing gain G. Combining losses from (10.65) for

Nue {2, 4} and a selection of typical values for are shown together with the
demodulation loss from (10.68) for E {|x1 |2 } = 1.

10.2.4

Implementation Losses from Spatial Interference-and-Noise Covariance


Estimation
IRC employs a spatial interference-and-noise covariance corresponding to (10.39)
and, hence, requires knowledge of either (i) channel state corresponding to all or
at least dominant interfering signals or (ii) interference-and-noise spatial covariance matrix. Estimation of the channel state for a certain number of interfering
signals was discussed in Sections 5.1 and 6.3, but may not be practical as it
involves rather high computational complexity especially in the case of multiple interferers. Hence, this subsection focuses on the estimation of the spatial
interference-and-noise covariance matrix.
The spatial interference-and-noise covariance matrix can be estimated by
averaging the Hermitian outer product of interference-and-noise vector samples, which yields the so-called sample covariance matrix. The sample covariance
matrix is the maximum likelihood estimate of the true interference-and-noise
covariance matrix with the properties that the estimator is unbiased and the
matrix is positive semi-denite. However, the interference as perceived by the
UE depends on the channel, power and time-frequency resources allocated for cochannel interferers. The power and time-frequency resources used for co-channel
interferers are changing with the resource allocation granularity to exploit the
throughput gains from link adaptation and multi-user diversity. Hence, only samples within the transmission time interval share the same spatial interferenceand-noise covariance matrix, which in turn also limits any sample averaging to
the duration of the scheduling interval, e.g. 1 ms in LTE.

238

Ecient and Robust Algorithm Implementation

In [JH09, LZLG08, MMK07], some sample covariance matrix enhancement


techniques are suggested for MIMO-OFDM systems which make use of the limited delay spread of the channel. However, for numerical robustness it needs to
be guaranteed that any smoothed sample covariance matrix remains positive
semi-denite. From the sample covariance matrix enhancement techniques, the
approach in [JH09] satises this criterion and has signicantly lower computational complexity when compared to the alternatives in [LZLG08, MMK07]. In
particular, it is suggested in [JH09] to average the entries of the sample covariance matrix over a set of sub-carriers, whereby the averaging length is chosen as
a trade-o between noise reduction and distortion of the auto/cross-correlation
terms. In practice, however, dierent interferers may be present per allocated
resource block. Hence, only samples within the resource allocation size share the
same spatial interference-and-noise matrix, which in turn also limits any sample
averaging to the resource block size in frequency direction. In LTE, the resource
block size corresponds to 12 sub-carriers with 15 kHz sub-carrier spacing.
The ideal spatial interference-and-noise covariance can be written as


(10.69)
z1 z1 (q, o) = Ex,n { y1 (q, o) x1 (q, o) h1 (q, o)

H
y1 (q, o) x1 (q, o) h1 (q, o)
},
where q denotes the sub-carrier and o denotes the OFDM symbol, and
Ex,n {} denotes expectation with respect to noise and transmit symbols. Equation (10.69) can be approximated with the sample covariance matrix as

 
, (
z z (q, o) = 1
y1 (
q , o) x1 (
q , o)h
)
(10.70)

1 q, o
1 1
Nscm
qQ,
oO
H

, (
q , o) x1 (
q , o)h
)
y1 (
1 q, o
4
2K


1
=
xk (
q , o)hk (
q , o) + n(
q , o) x1 (
q , o)e(
q , o)
Nscm
qQ,
oO k=2
4H
2K

xk (
q , o)hk (
q , o) + n(
q , o) x1 (
q , o)e(
q , o)
,

k=2

where the summation is over a set of sub-carriers q Q and a set of OFDM


denotes estimates, respectively.
symbols o O and ()
The cardinality of the sets Q, O dened as |Q| and |O| corresponds to the number of samples Nscm = |Q O| used to compute the sample covariance matrix.
The number of used sub-carriers |Q| is limited in practice by a bandwidth smaller
than the coherence bandwidth and by the smallest resource block allocation
size, while the number of used OFDM symbols |O| is limited by the time duration smaller than the coherence time and by the scheduling interval. Note that
the sample covariance is not stationary across the boundaries of the resource
allocations and, hence, the sample covariance needs to be estimated within the

10.2 Low-Complexity Terminal-Side Receiver Implementation

0.5

SINR ratio [dB]

1.0
1.5

bc

bc

bc
bc

2.5
bc

bc

3.0
bc

bc
bc

3.5

bcbc
bc

bc
bc

bc

bc

bc

bc

bc
bc

bc
bc

bc
bc

bc

2.0

bc

bc

bc

bc

bc
bc

bcbc
bc

bc
bc

239

bc

Nue =2,
Nue =2,
Nue =2,
Nue =2,
Nue =4,
Nue =4,
Nue =4,
Nue =4,

SIR=, theory and sim.


SIR=10dB, sim.
SIR=0dB, sim.
SIR=-10dB, sim.
SIR=, theory and sim.
SIR=10dB, sim.
SIR=0dB, sim.
SIR=-10dB, sim.

4
2

10
12
14
16
18
number of samples Nscm

20

22

24

Figure 10.4 SINR ratio due to covariance matrix mismatch for varying number of

samples at SNR = 0 dB, with one MPSK modulated interferer.

scheduling interval in time direction and within the resource block allocation size
in frequency direction. Some latency constraints may further reduce the number of samples that can be used in estimating the spatial interference-and-noise
covariance. Note that interference can also be vastly dierent between the timefrequency resources used for control signaling and those used for data transmission and, hence, estimation of the spatial interference-and-noise covariance also
needs to be restricted to the appropriate time-frequency resources. Furthermore,
the computation of the sample covariance matrix requires knowledge of some
symbols x1 either from transmitted reference symbols or from decision feedback
of the desired received signal, which may also limit the number of samples that
can be used for estimation. For IRC, the sample covariance matrix needs to be
invertible and, hence, Nscm Nue . Note that there are typically only between 4
and 12 samples available for spatial interference-and-noise covariance estimation
when using UE-specic RSs and without decision feedback in the LTE Release 8
downlink [3GP07b].
The set Q of sub-carriers is ideally centered at the sub-carrier q where the
sample covariance matrix is computed. Similarly, the set O of OFDM symbols is ideally centered at the OFDM symbol o where the sample covariance
matrix is computed. However, centering the sets Q, O at q, o is not possible
at the boundaries of the time-frequency resource allocations where the spatial
interference-and-noise covariance may change. Moreover, it is desirable to use
the same covariance matrix over an entire range of sub-carriers q and OFDM
symbols o. The partitioning of the time-frequency resources into tiles with identical sample covariance matrix determines the main computational complexity
as given by the number of sample covariance computations and matrix inversions
of size Nue Nue required for IRC.

240

Ecient and Robust Algorithm Implementation

Performance Loss from Covariance Estimation


For the evaluation of the performance impact of the limited number of samples
available for spatial interference-and-noise covariance estimation, the channel of
, (f, t) = h (f, t) for
the desired signal is assumed to be perfectly known, i.e. h
1
1
all f, t and, hence, e (f, t) = 0. In particular, the SINR can then be evaluated as
!
H
H 1
1

h1 En
h

1
1
z1 z1
z1 z1 h1
!
SINR (
gIRC,b ) = H
1 h1
1 z z
h1 En
z1 z1
z1 z1
1 1

(10.71)

where En {} denotes expectation with respect to noise as part of the sample covariance matrix. Note that we have dropped the explicit dependency of
the sample covariance with respect to the sub-carriers q and OFDM symbols o
because the sample covariance matrix can be assumed to be constant over an
entire tile in time-frequency resources. For simplicity of the notation, we have
also dropped the dependency of the channel for the desired UE with respect to
the sub-carriers q and OFDM symbols o, which corresponds to a block-fading
approach in frequency and time directions. Equation (10.71) cannot be simplied
in general and the performance loss from imperfect covariance matrices needs to
be evaluated by simulations. However, there exists a simple and elegant solution for the SINR loss from covariance estimation in case the sample covariance
matrix is complex non-central Wishart distributed, which is the case when the
interference-and-noise samples can be assumed to be Gaussian distributed with
zero-mean [TC94]. This assumption holds (i) for the case of no interferers and
(ii) for the case of interferers with complex Gaussian signal alphabet,i.e. when
q , o) is Gaussian distributed in (10.70). Furthermore, the assumpK = 1 or xk (
tion of zero-mean Gaussian distributed interference-and-noise samples is good
approximation for the case of multiple interferers and interferers with higher
order modulation. For complex non-central Wishart distributed sample covariance matrices, the SINR ratio can then be derived [TC94] as
Nscm Nue + 1
SINR (
gIRC,b )
=
SINR (gIRC )
Nscm

for Nscm > Nue + 1.

(10.72)

Note that the SINR ratio from estimating the spatial interference-and-noise
covariance matrix is independent of the SNR and SIR for zero-mean Gaussian
distributed interference-and-noise samples. Equation (10.71) is evaluated by simulations for the case that the interference-and-noise samples are zero-mean Gaussian distributed without interferer, i.e. SIR = , and for the case that there is
one interferer with, e.g., MPSK modulation, such that the samples are no longer
Gaussian distributed. Fig. 10.4 illustrates the theoretical and simulated SINR
losses for the cases of Nue = 2 and Nue = 4 dependent on the number of samples
Nscm , and for SNR = 0 dB. The theoretical results from (10.72) are shown as
solid lines, and the simulated results are depicted with crosses for Nue = 2 and
circles for Nue = 4, respectively. From the simulation results with one interferer,

10.2 Low-Complexity Terminal-Side Receiver Implementation

241

the loss observed from sample covariance estimation is upper bounded by the
loss due to zero-mean Gaussian distributed interference-and-noise samples.

10.2.5

Implementation Losses from Channel and Interference Estimation Errors


In the previous two subsections, the impact of channel estimation errors and
covariance estimation errors has been investigated separately. In this subsection, these estimation losses are jointly evaluated by simulations, which show
the robustness of IRC with respect to receiver imperfections and provide some
guidelines for the implementation losses to be expected. The instantaneous SINR
resulting from combining-weight mismatch is given as
-
2 5

 , H 1


En,e h1 z1 z1 h1 
% H
& E{|x1 |2 }.
(10.73)
SINR(
gIRC,c ) =
,
,
1 h
1

En,e h1

z1 z1 z1 z1 z1 z1 1
and we evaluate the performance loss with the SINR ratio given by
SINR(
gIRC,c )/SINR(
gIRC ) by randomizing all relevant variables.
Fig. 10.5 shows the results for the SINR ratio versus SNR for one interferer with MPSK modulation and SIR = 0 dB, Nue = 2, 4 and processing gain
G = 6 dB and 15 dB for the channel scenario outlined in Section 10.2.2. The number of samples for covariance estimation is chosen as Nscm = 6 and Nscm = 18
to cover a range typical for LTE as well as a scenario with reduced losses from
covariance mismatch. The combined losses from channel estimation and spatial interference-and-noise covariance estimation are smaller than the sum of the
individual losses. From comparing Fig. 10.5 with Fig. 10.4, it can be seen that
at low SNR the channel estimation errors even compensate for some covariance
matrix mismatch such that the combined losses become smaller than the losses
from covariance matrix mismatch alone. The losses are not negligible but rather
small as compared to the gains from IRC versus MRC outlined in Section 10.2.2.
In particular, in the low SIR regime, the benets of IRC versus MRC outweigh
the implementation losses by far. Note that the implementation losses are substantially increased with a larger number of antennas Nue which may also need
to be considered in overall performance and throughput evaluations for MIMO
systems.

10.2.6

Summary
This section has focused on the link-level assessment of IRC in the presence of
estimation errors due to low-complexity implementation. First, SINR expressions
for IRC and MRC were derived, showing that the MMSE approach yields exactly
the same SINR value as the studied IRC method. A basic performance analysis
in terms of achievable SINR after combining was conducted comparing IRC

Ecient and Robust Algorithm Implementation

bc

0.5

bc

bc

bc

bc

bc

bcbc

bc

bc

SINR ratio [dB]

242

1.0
bc

bcbc
bc

1.5
bc

2.0

bc
bc

2.5
bc
bc

bc

Nue =2,G=6dB,Nscm=6
Nue =2,G=6dB,Nscm=18
Nue =2,G=15dB,Nscm=6
Nue =2,G=15dB,Nscm=18
Nue =4,G=6dB,Nscm=6
Nue =4,G=6dB,Nscm=18
Nue =4,G=15dB,Nscm=6
Nue =4,G=15dB,Nscm=18

3
0

10

15

20

25

SNR [dB]
Figure 10.5 SINR loss versus SNR for one interferer with MPSK modulation and SIR

= 0 dB, Nue = 2, 4 and processing gain G = 6 dB and 15 dB.

and MRC when perfect knowledge on channel and interference characteristics is


assumed. In chosen interference-limited scenarios with one dominating interferer,
substantial gains for IRC on link-level were observed.
In order to assess one eect of low-complexity implementation, the SINR loss
due to channel estimation errors for the case of IRC was derived, indicating a
simple dependency on channel estimation processing gain and the number of
terminal receive antennas. A numerical study showed that, for Nue = 2 receive
antennas and processing gains of 6 dB and higher, the loss due to imperfect
channel estimation is always below 0.5 dB even at low SINR operating points
such as 4 dB. So, general statements about the superiority of IRC versus MRC
still holds if imperfect channel estimation is taken into account.
An additional eect of low-complexity implementation is the erroneous estimation of the interference-and-noise covariance. Again, the SINR loss for the case
of IRC was derived as a function of complexity, where the latter is expressed in
terms of number of samples used for estimation. Depending on the operating
SINR point, 6 to 10 covariance samples are required to keep the resulting SINR
loss below 0.5 dB when Nue = 2 receive antennas are used.
Finally, the section was wrapped up with a joint assessment of errors from
channel and interference covariance estimation. The general nding is that the
combined loss for the dierent estimation aspects is slightly smaller than the
sum of the individual ones. An important overall conclusion is that the losses due
to estimation errors become signicantly larger when Nue = 4 terminal antennas are considered. Hence, when designing MIMO systems, not only the cost of
many antennas at the UE side matters, but also performance degradation due
to estimation errors becomes more and more evident.

11 Scheduling, Signaling and Adaptive


Usage of CoMP

In this chapter, we address how CoMP can be applied selectively and adaptively
to well-chosen sets of terminals in a mobile communications system. While Section 11.1 focuses on scheduling approaches, where a central scheduling unit performs multi-cell resource allocation in the context of non-cooperative or joint
transmission in a cellular downlink, Section 11.2 looks into radio link control
and signalling aspects connected to establishing CoMP on-demand. Finally, Section 11.3 ventures into the eld of ad-hoc CoMP, where cooperation is established
exibly after uplink transmission has already taken place.

11.1

Centralized Scheduling for CoMP


Tarcisio Maciel, Ricardo B. dos Santos and Anja Klein
In this section, we discuss centralized multi-cell scheduling for a system using
either non-cooperative transmission or joint transmission in the downlink. After
Subsection 11.1.1 motivates the topic, Subsection 11.1.2 presents the studied scenario and its main models. Subsection 11.1.3 introduces some relevant
scheduling problems, where the aim is to maximize system throughput. Subsection 11.1.4 analyzes the problems introduced earlier through system level simulations. Finally, Subsection 11.1.5 adds some nal remarks on the problem of
centralized scheduling and its extension to uplink scenarios.

11.1.1

Introduction
In previous chapters, we have typically observed transmissions between multiple
base stations (BSs) and user equipments (UEs) on a single orthogonal frequency
division multiplex (OFDM) sub-carrier, assuming that the assignment of system resources to the communicating entities has already taken place. While the
question of which BSs should in principal be clustered, and hence enabled to
cooperate, was already addressed in Chapter 7, we now want to look into the
question of how UEs can be assigned to system resources eciently, such that
the performance under a particular transmission scheme is maximized. More
specically, we will investigate

244

Scheduling, Signaling and Adaptive Usage of CoMP

cells = sectors

sites with
3 base stations
each

exemplary
CoMP cluster

central scheduling
unit (CSU)

Figure 11.1 A CoMP system setup with an exemplary CoMP cluster.

which UEs should simultaneously use the same physical resource block (PRB)
in dierent cells in the case of conventional, non-cooperative transmission, or
which UEs can be eciently served on the same resources if downlink joint
transmission (JT) (see Section 6.3) is used.
In this respect, scheduling may explore the degrees of freedom of choosing
tuples of UEs whose links will be lightly aected by the mutual co-channel interference in the conventional, non-cooperative transmission case, or who can be
served eciently on the same resources in the JT case. In the following, some
heuristic algorithms are described, and their results will show that, by making
use of the information fed back by the UEs and made available to a central
scheduling unit (CSU) through backhaul links, centralized scheduling provides
considerable gains compared to conventional, individual scheduling by the BSs.

11.1.2

System Model
In this subsection, the system modeling considered for the study of scheduling
algorithms is described. The downlink of a CoMP-enabled system with M BSs
is considered. These BSs are grouped into C clusters, where the sets of BSs
included are denoted as Mc , c = 1, . . . , C. All BSs within one cluster have a
backhaul link to a dedicated CSU. A setup where one exemplary CoMP cluster
is highlighted is shown in Fig. 11.1.
In the sequel, we shift our focus to this one cluster c, and assume it has a set
of UEs Uc that are served by the BSs in Mc and may be assigned to R available
PRBs, which are indicated by r = 1, 2, . . . , R. For simplicity, in this section only

11.1 Centralized Scheduling for CoMP

245

single-antenna BSs and UEs are considered, hence Nbs = Nue = 1 according to
the notation used in previous chapters. Let us denote as Kr the set of all UEs
in the system that are assigned to resource r, and Kc,r Kr as the subset of
these UEs that are served by cluster c. Vector hck (r) C[|Mc |1] denotes the
channel coecients connected to resource r, representing the links from UE k to
each BS m belonging to cluster c. Similar as derived before in Section 5.1, the
downlink signal-to-interference-and-noise ratio (SINR) experienced by a UE k
belonging to cluster c and assigned to resource r can be stated as

2
 c

H
(hk (r)) wkc (r)
, (11.1)
k (r) =
2
2

 




H
H
(hck (r)) wjc (r) +
(hk (r)) wj (r) + 2
j{Kc,r \k}

Intra-cluster interference

j{Kr \Kc,r }

Inter-cluster interference

where wk (r) is the precoding vector employed at the BS-side to serve UE k


on resource r, of which wkc (r) is the sub-part connected to the transmission
originating from the BSs in cluster c. Note that in the case of non-cooperative
transmission, the transmission to each UE can only originate from one BS, i.e.
each precoding vector wk (r) is zero in all elements except one, whereas wk (r)
can be non-zero for all elements connected to one and the same cluster for JT.
Considering (11.1) and the setup in Fig. 11.1, one sees that co-channel interference comes from BSs belonging to the same cluster, denoted as intra-cluster
interference, and from BSs belonging to other clusters, denoted as inter-cluster
interference. Intra-cluster interference can be estimated well or even predicted
by the CSU based on channel state information (CSI) fed back through the
backhaul, thus enabling the use of intelligent resource reuse through centralized
scheduling. On the other hand, inter-cluster interference might only be estimated
and cannot be directly controlled.
In the following, models for adaptive modulation and block error rate (BLER)
assessment are presented. Both are based on SINR values estimated using the
CSI available at each CSU. Note that the modulation choice depends on the
SINR, which is an outcome of the resource assignment. Consequently, adaptive
modulation becomes strongly dependent on the resource assignment.
Due to the resource reuse across the system, link quality will vary according to resource assignments done by other clusters, thus making inter-cluster
interference harder to estimate. As a consequence, packet losses might occur due
to imperfect selection of modulation schemes caused by considering wrong intercluster interference values. In order to capture packet errors, the following model
is employed. According to [Cal04], considering uncoded QAM with q bits/symbol
and assuming that co-channel interference is Gaussian-distributed with its power
added directly to the additive white Gaussian noise (AWGN) power, the bit error

246

Scheduling, Signaling and Adaptive Usage of CoMP

rate (BER) for a link with SINR can be approximated as


*
'
1.6
.
BER() 0.2 exp q
2 1

(11.2)

Then, assuming packets of L symbols being transmitted on a single PRB


during one transmit time interval (TTI), the BLER can be written as
BLER() = 1 (1 BER())Lq ,

(11.3)

which is used to model whether a given transmission has been successful. Here,
adaptive modulation takes into account 2-, 4-, 16- and 64-QAM as available modulation schemes. Since the focus here is on enhancing total throughput, adaptive modulation selects the QAM of order Q = 2q yielding the highest average
throughput, i.e.,
Q= = arg max (1 BLER()) L q.

(11.4)

Q2{1,2,4,6}

11.1.3

Centralized Scheduling Problems


In this subsection, some relevant scheduling problems are described. In general,
scheduling UEs in a CoMP system is a very complex optimization problem involving multiple dimensions BSs, UEs, PRBs, transmit/receive antennas, transmit
powers, CoMP schemes, etc. and incorporates several sub-problems:
the PRB assignment problem, which selects PRBs to be allocated to each
UE and also denes the reuse in centralized scheduling.
the CoMP decision problem, i.e. to determine whether or not and which
CoMP scheme is to be applied.
the power allocation problem, which corresponds to distributing the available power among UEs and PRBs eciently.
the precoding problem, which corresponds to determining precoding vectors as to spatially multiplex signals intended to dierent UEs.
the link adaptation problem, e.g. the problem of determining a suitable
modulation and coding scheme (MCS) for each transmission.
These subproblems, especially the rst four, cannot be separated without
incurring some loss of optimality. If the channels of the UEs sharing a PRB are
highly spatially uncorrelated, these UEs can be eciently separated in space
using precoding. However, if their channels are highly correlated, these UEs
cannot be spatially separated eciently and strongly interfere with each other.
Further, the interference between UEs sharing a given resource through spatial
division multiple access (SDMA) is a function of the power distribution among
the UEs, as well as the power distribution among PRBs. Considering a certain
amount of power available for a PRB, allocating more power to one UE enhances
the quality of its signal, e.g., in terms of SINR, but also reduces the SINR of
the other UEs using the PRB. Analogously, allocating more power to a certain

11.1 Centralized Scheduling for CoMP

247

PRB enhances the SINR of the UEs sharing this PRB, but reduces the SINR of
the UEs allocated on the other PRBs. Finally, the spatial compatibility among
UEs is PRB-dependent, i.e., UEs that are spatially compatible on a given PRB
might be incompatible on another [LZ06, MK10].
These relationships illustrate the strong interdependency among the above
subproblems, and the challenge of jointly solving them usually leads to computationally prohibitive solutions. All previously referred aspects aect signal
and/or interference levels and consequently the system performance. Moreover,
even for some subproblems, optimum solutions can already be very complex.
Thus, sub-optimal scheduling solutions are often preferred [FDH07, MK10].
Dierent objectives may be pursued by the scheduler (spectral eciency, fairness, quality of service (QoS) requirements, etc.) and each objective results in its
own problem which may not have a known optimal solution. While conventional
schedulers share the same scheduling objectives, the additional information available to a centralized scheduler allows this to consider the impact of the resource
allocation at one BS on the other BSs within the same cluster. Moreover, the
additional control of joint scheduling introduces other degrees of freedom which
are exploited, e.g., by adapting the set of simultaneously transmitting BSs.
In this section, we focus on the resource allocation problem comprising PRB
assignment and precoding sub-problems, and we target at maximizing system
throughput. If we assume a xed and equal power distribution among PRBs, the
problem may be decoupled and solved separately for each PRB. This approach
is, in fact, used in the presentation of all algorithms in this section. We further
consider a xed power control explained later and also x the used CoMP scheme
and choice of precoders to the following two options:
Conv. transmission (CT): PRBs are reused by multiple BS-UE links within
a cluster, but each active UE is served exclusively by only one BS.
Joint transmission (JT): PRBs are reused by multiple BS-UE links within
a cluster, with all BSs sending linearly and jointly precoded data to all UEs
and, consequently, based on a zero-forcing (ZF) lter.
The remaining resource allocation problem is hence: For the case of
conventional transmission (CT), the CSU needs to determine which BS-UE links
can simultaneously use a same PRB. Assuming that the CSU has CSI on all links
within a cluster, it is able to estimate the impact of the intra-cluster interference
induced by the PRB reuse, and can dynamically determine which PRBs should
be assigned to which UEs served by which BSs. For the JT problem, the CSU
needs to nd sets of UEs with good compound channel properties, which can be
eciently served with spatial multiplexing on the same PRB. Maximizing system
throughput assuming CT or JT becomes a combinatorial problem whose optimal
solution may be computationally complex to nd. Therefore, only sub-optimal,
but rather ecient solutions are considered herein.

248

Scheduling, Signaling and Adaptive Usage of CoMP

Table 11.1. Greedy scheduling algorithm for conventional transmission (CT).


1.

Find the BS-UE link {k , m } = arg

2.

Assign PRB r to the link {m , k } by



= {k } and
dening set of scheduled UEs of CoMP cluster c on PRB r as Kc,r

dening set of scheduled BSs of CoMP cluster c on PRB r as Mc,r = {m }.

Estimate the total rate R(Kc,r
, Mc,r ) using (11.1) to (11.4).



Find {m , k } = arg
max
R(Kc,r
{k}, Mc,r {m}).



3.
4.
5.

max

kUc , mMc

2
|hm
k (r)| with highest gain.

kUc \Kc,r , mMc \Mc,r





{k }, Mc,r {m }) R(Kc,r
, Mc,r ), set Kc,r
= Kc,r
{k }, Mc,r =
If R(Kc,r


Mc,r {m } and go to the previous step, otherwise nish.

Heuristic Scheduling Algorithm for Conventional Transmission


In this subsection, a heuristic scheduling algorithm for the CT case is described.
This introduces exibility to decide, for each PRB, whether all BSs associated to the CSU will be used or only some of them in order to reduce
co-channel interference and transmission failure probabilities. As mentioned
before, equal power assignment among dierent PRBs is commonly assumed
for simplicity. Although not optimal, it leads to only marginal performance
degradation compared to optimum power allocation if adaptive modulation is
employed [JL03, ZL04], and allows to consider resource allocation individually
for each PRB whenever throughput maximization is being pursued. Some particular cases of joint precoding, power allocation, and scheduling are addressed
in [SB04, TC04, SBO06, CC07, MSLT07].
Greedy scheduling algorithms assigning resources to the BS-UE links with
highest gain on each PRB oer a sub-optimal solution for throughput-oriented
scheduling in CoMP systems. Indeed, in the absence of intra-cluster interference,
a greedy algorithm would be even optimal for orthogonal frequency division
multiple access (OFDMA)-based systems. In any case, it is assumed that the
CSU has knowledge about the gains of all BS-UE links, so that it can eciently
estimate the impact of intra-cluster interference on achievable rates.
Greedy algorithms usually solve a problem in stages where at each stage a
decision for the best solution is made, assuming that decisions taken at previous stages were optimal. In other words, greedy algorithms make a locally
optimal choice at each stage and hope that these will lead to the global optimum [CLRS01]. Clearly, this optimum is not necessarily reached in this way.
Nevertheless, for the problem of maximizing the system throughput, a greedy
algorithm can be employed to schedule, within each CoMP cluster, a set of
UEs with high channel gains by estimating their rates after each allocation. In
other words, it starts by scheduling the BS-UE link with highest channel gain
within the whole cluster. Then, based on the resulting SINRs, it calculates the
potential cluster throughput of scheduling each available BS-UE link within the
same cluster together with the previously scheduled link. Finally, the scheduled

11.1 Centralized Scheduling for CoMP

249

Table 11.2. Greedy scheduling algorithm for joint transmission (JT).


1.

Find the UE {k } = arg max "hck (r)" with highest channel vector norm.

2.

'
of scheduled UEs
Assign initially the PRB r to the UE {k } by dening the set Kc,b
'
of CoMP cluster c on PRB r as Kc,r
= {k } .
While K ' K 
'
a. Find k = arg max' (Kc,r
{k}).

3.

kUc

kUc \Kc,r

4.

'
'
= Kc,r
{k }.
b. Set Kc,r

'
Set Kc,r = Kc,r .

link is the one which leads to the highest throughput and, in case of ties, the link
with highest channel gain is chosen. This procedure continues adding new links
as long as the cluster throughput increases. Otherwise, it nishes and goes to
the next PRB. The greedy algorithm for each PRB r in the case of conventional,
non-cooperative transmission can be stated as shown in Table 11.1.
Heuristic Scheduling Algorithm for Joint Transmission
In this subsection, a heuristic, sub-optimal scheduling algorithm for the case of
JT is described, where the discussion is restricted to a single CoMP cluster c and
=
is built. For simplicity of notation, the indices c and
PRB r, on which a group Kc,r
r are omitted in the sequel. For JT, the data symbols xk intended for all scheduled
UEs k are made available to all BSs of the CoMP cluster and are precoded using
their associated precoding vectors wk before transmission. For the spatial signal
separation, linear precoding can be employed [PNG03, JJT+09], while the eciency of such separation strongly depends on the characteristics of the channel
vector of the scheduled UEs. Therefore, a JT scheduler that only allows sharing PRBs among UEs with uncorrelated channels is usually employed [MK10].
Thus, the problem to be solved is choosing a group of K = M UEs that are
spatially compatible, i.e., that can eciently share the same PRB. JT schedulers
for this problem are usually heuristic and composed by two elements: a spatial
compatibility metric and a user selection algorithm [MK10].
In the following, the spatial compatibility metric is discussed. It is employed
by the CSU to measure the spatial compatibility among UEs. In general, spatial
compatibility metrics are functions of the CSI (available at the CSU through
the backhaul) that try to map the characteristics of the spatial channels of the
UEs to a scalar value quantifying how eciently these UEs can be separated in
space [MK10]. Such groups of UEs are termed a compatibility group.
When ZF precoding is employed, it has been shown that the sum of channel
gains with null-space successive projections represents an eective measure of
spatial compatibility, especially when aiming at maximizing the system throughput [MK10, TUBN06, YG05]. For a compatibility group K' = {1, 2, . . . , K ' },
the use of successive null-space projections imposes that the channel vector

250

Scheduling, Signaling and Adaptive Usage of CoMP

hk' of UE k ' K' be projected onto the null-space of the channels of all UEs
k K' , k = 1, 2, . . . , k ' 1 [TUBN06, YG05], i.e., a vector space orthogonal to
channels of all UEs k = 1, 2, . . . , k ' 1.
Since signals conveyed through orthogonal channels do not interfere with each
other, the more orthogonal to hk the channel vector hk' of UE k ' is, the less its
squared norm (i.e. the channel gain) will be aected by the projection and the
more spatially compatible to the UEs k = 1, 2, . . . , k' 1 the UE k' will be.
Denoting as Tk' C[MM] the matrix that projects the channel vector hk' of
UE k ' onto the null-space of the channels of UEs 1, 2, . . . , k ' 1 [TUBN06], one
has

for k' = 1,
I[M] ,
H
H
(11.5)
Tk' =
Tk' 1 hk' 1 hk' 1 Tk' 1

, for k' = 2, . . . , K ' ,


Tk' 1
2
"hk' 1 Tk' 1 "
where for k' = 1 no projections are needed, i.e., T1 = I[M] . Then, using (11.5),
the sum of channel gains with null-space successive projections (K' ) considered
in this subsection is written as
'

(K' ) =

K


"hk' Tk' "2 .

(11.6)

k' =1

Note that according to (11.6), the higher the gain of the channels of the UEs
belonging to K' is, and the more orthogonal to each other they are, the larger
(K' ) becomes. Altogether, these high orthogonality degrees and high channel
gains result in an increased system throughput, rendering (K' ) a suitable spatial compatibility metric, especially for algorithms oriented towards throughput
maximization, at it is the case herein.
In the following, the user selection algorithm considered in this subsection is
discussed. Its task is to arrange the UEs of the CoMP cell in a compatibility
group by using the spatial compatibility metric. Often, the optimum compatibility group can only be found through an exhaustive search over all possible
groups, so that sub-optimal, but rather ecient user selection algorithms are
desired. One such algorithm is the best t algorithm [STKL01, DS04, Cal04],
which is also a greedy algorithm.
Starting from a compatibility group containing only an initial UE, the best
t algorithm extends the group by sequentially admitting the most spatially
compatible UE with respect to those UEs already admitted to the compatibility
group. Let K' = {k '} be the initial compatibility group containing only the UE k ' ,
which is chosen as the UE with the highest channel norm because this leads to
the highest throughput for single-user transmission. Let K ' be the size of the
compatibility group K' . Then, the best t algorithm computes the spatial compatibility metric (K' {k}) for each UE k Kc \ K' . Then, the UE k = which
leads to the highest value for the spatial compatibility metric () is admitted
to the group K' . After that, the same procedure is repeated with the remaining

11.1 Centralized Scheduling for CoMP

251

UEs and an additional UE is admitted to the group, and so on until the group
size K ' reaches the target compatibility group size K = . When using ZF precoding, the choice of K = is fundamentally limited by the number of transmitting
antennas, which is the maximum number of UEs that can be multiplexed in this
case. However, additional restrictions such as a maximum number of scheduled
UEs per PRB may be applied to limit the amount of control information to be
exchanged at each TTI.
Combining the spatial compatibility metric described by (11.6) and the best
t algorithm, an overall scheduling algorithm for JT can be derived, which is
presented in Table 11.2. Similarly to the scheduling algorithm for the CT case,
PRBs are processed sequentially. It should be noted that the scheduling algorithm does not compute precoding vectors wk for any UE, thus avoiding a considerable amount of computations [MK10]. Additionally, it also does not involve
power allocation. This allows the algorithm to be more easily combined with different precoding and power allocation schemes. However, precoding and power
allocation should also be oriented to the same objective of the algorithm, namely
throughput maximization. Beyond the single-antenna case, the algorithm can be
straightforwardly adapted to cases considering multiple antennas at the communicating nodes by extending the channel vectors hk (r) accordingly [TUBN06].
The eective channel (including the eect of transmit precoding) might be estimated at the UEs using pilot symbols, as discussed in Section 9.1. Alternatively,
xed receive lters at the UEs might be considered at the transmitter using, e.g.,
a receiver-oriented design of the BSs precoding vectors [MBQ04].

11.1.4

Analyses and Results


The scheduling algorithms described in Section 11.1.3 are now analyzed using system level simulations. The performed simulations employ the models described
in Section 11.1.2. C = 7 CoMP clusters composed of 21 BSs each and organized
in 3-sectored sites with a inter-site distance of 1 km are considered as the system setup. A CSU controls the operation of each CoMP cluster. A wrap-around
model is used to avoid border eects on interference among CoMP clusters, as
described in detail in Section 14.1.
The CoMP system considers a carrier frequency fc of 2 GHz and B =
15 PRBs, each composed of 12 sub-carriers spaced by f = 15 kHz [3GP07a].
Each sub-carrier transmits 14 symbols per TTI, which has a duration of 1 ms.
There is no power control, i.e., we consider equal power allocation on all
resources. Moreover, perfect CSI on the channels of all links within a CoMP
cluster is assumed to be available at the CSU. It is worth mentioning that,
if only compressed CSI is available at the CSI, e.g., due to backhaul capacity
constraints or limited feedback through the air interface, the performance of
centralized scheduling algorithms might be reduced considerably, as discussed in
Section 5.2.

252

Scheduling, Signaling and Adaptive Usage of CoMP

Table 11.3. Simulation parameters.


Parameter
Number of CoMP clusters
Number of sites per CoMP cluster
Inter-site distance
Minimum BS-UE distance
Carrier frequency
Sub-carrier spacing
Number of PRBs
Number of symbols per TTI
Eective TTI duration
Pathloss model
Shadowing standard deviation
UE speed
Channel power-delay prole
Spatial precoding
Average signal-to-noise ratio (SNR) at cell-edge
Snapshot duration
Number of UEs per BS
Target compatibility group size

Value
7
7 (i.e. 21 BSs and cells per cluster)
1 km
50 m
2 GHz
15 kHz
15 (with 12 sub-carriers each)
14
1 ms
35.3 + 37.6 log10 (d) in dB
8 dB
3 km/h
TU [3GP08b]
ZF
6 dB
1s
3 to 12
21

The path loss and shadowing are modeled according to [PDF+ 08] alongside
various other simulation parameters in Table 11.3, and BS antenna patterns are
modeled according to [3GP07a], i.e.,
- '
5
*2
18
(a)
(11.7)
k,m,c , 20 [dB].
G (k,m,c ) = min 12
7
Fast fading considers an average UE speed of 3 km/h and employs the typical
urban power-delay prole to model frequency selectivity [3GP08b]. When considering JT, linear ZF precoding is adopted as precoding technique due to its
simplicity and its ability to suppress intra-cell interference [MK10, YG05]. The
precoding vectors wkc (r) computed for the scheduled UEs according to the ZF
criterion are rst scaled to become unit-norm vectors. After that, because each
BS has a limited transmit power available per PRB, all precoding vectors within
a CoMP cluster are scaled so that no BS spends more than this total transmit
power per PRB. This is easily accomplished as follows. First, the precoding vectors wkc (r) for all UEs k Kc,r scheduled to receive on PRB r within the CoMP
cluster c are organized in a precoding matrix Wc (r). Then, the total power spent
by a BS m corresponds to the sum of the squared absolute values of the weights
it applies to each transmit signal, i.e., tr{Wm (r)(Wm (r))H }. Since the power
ratio among elements of each column may not be changed in order to preserve
the properties of ZF, the precoding matrix Wc (r) is simply scaled down so that
power constraints are fullled. Note that this means that the transmit power of
some BSs might not be fully used, which is sub-optimal.
For each BS, a number of UEs is uniformly distributed over the coverage
area of the cell. The transmit power of the BSs is set as to grant an SNR of
at least 6 dB at the cell-edge considering the eects of pathloss, antenna pat-

11.1 Centralized Scheduling for CoMP

1200

*
bc

900
ut

800
ut

700
600

rs

rs
CT

ut

bc

bc

bc

ut

ut

ut

ut

ut

ut
ut

ut
ut

bc

300
ut

ut

ut

ut

ut

ut

ut

200

bc
bc

bc
bc

400

bc
bc

500

rs
rs

rs

ut

system throughput [Mbps]

1000

rs
rs

ut

1100

JT, greedy sched.


JT, max.-gain sched.
rs
rs
CT, greedy sched.
*
rs
* LA
CT, interf.-aware

ut

rs

253

10

11

12

100
0
7
8
9
load [UEs/sector]

Figure 11.2 Performance of the schemes compared in this section.

tern and shadowing (95% of reliability). It is assumed that BSs always have
data to transmit to the UEs, which make use of a best-eort service. Several
snapshots are considered in each simulation. During each snapshot, large-scale
fading is assumed constant while small-scale fading variation is modeled using
Jakes model [Jak74]. A sucient number of snapshots is simulated in order to
get reliable statistics about the system throughput.
Fig. 11.2 shows the total throughput of the system averaged over all simulated snapshots as a function of the system load in UEs per cell. The following
transmission, scheduling and link adaptation schemes are compared:
Conventional transmission, scheduling and link adaptation. This corresponds to a conventional cellular system in which there is no coordination
or communication among sites. Each BS uses a local, greedy scheduler. For
a given PRB, it schedules the UE with the highest channel gain at each BS.
Thus, a full reuse of frequency resources is observed. In this scenario, link
adaptation is based on the interference perceived during the last transmission
to a UE on a PRB.
Conventional transmission and scheduling, but interference-aware
link adaptation. Here, the same local schedulers are used, but link adaptation is based on the assumption that each BS knows the scheduling decisions of
the other BSs and can precisely predict intra-cluster interference, as proposed
in Section 5.2.2 for the uplink.
Conventional transmission, centralized scheduling and interferenceaware link adaptation, using the greedy scheduler for CT proposed before.
Joint transmission, conventional scheduling and interference-aware
link adaptation, where the UEs with the highest channel gains in the cluster

254

Scheduling, Signaling and Adaptive Usage of CoMP

are scheduled for JT, and two UEs may not be scheduled to the same BS on
the same resource.
Joint transmission, centralized scheduling and interference-aware
link adaptation, using the greedy scheduler for JT proposed before.
Note that for all schemes, inter-cluster interference is estimated as interference
perceived during the last transmission to the UE on the PRB.
Regarding non-cooperative transmission, we can see that a large performance
gain of 100 % to 120 % can already be achieved if knowledge on intra-cluster
interference is used for link adaptation, as also observed in Section 5.2 for the
uplink. Performance can further be increased by around 20% if the proposed
centralized scheduling algorithm for CT is used. Joint transmission in general
performs signicantly better than non-cooperative transmission, as also observed
in Sections 6.3 and 6.4, but we can see that a throughput improvement of about
10% can additionally be obtained if the greedy, centralized scheduling scheme
proposed in this section is used, as opposed to classical scheduling at each BS.

11.1.5

Summary
In this section, multi-cell centralized scheduling algorithms oriented towards the
maximization of system throughput have been presented, for a downlink system
based on non-cooperative or multi-cell joint transmission. While both algorithms
are heuristic and have low complexity, the results presented in this section have
illustrated the huge potential of intelligent scheduling to provide high data rates
in CoMP systems.
This section has concentrated on the downlink. However, similar relative performances are expected in the uplink if a sucient amount of CSI is available
at each cluster. The studies considered here have assumed relatively idealized
conditions. Real systems have to deal with further challenges such as backhaul
constraints, signaling overhead, limited or outdated CSI and synchronization
issues which degrade system performance, as discussed in various other parts of
the book.

11.2

Decentralized Radio Link Control and Inter-BS Signaling


Christian Hoymann and Laetitia Falconetti
This section discusses radio link control and signaling aspects of practical implementations of CoMP schemes in cellular systems at the example of 3GPP LTE.
The radio link towards a user equipment (UE) is controlled by its serving base
station (BS), and this section discusses the potential modications of existing
control loops when applying CoMP.

11.2 Decentralized Radio Link Control and Inter-BS Signaling

255

The section focusses on decentralized radio link control1 , where each BS typically controls the UEs of its cell, though some aspects also apply to centralized
control, where a central node controls all UEs of one or more CoMP clusters. In
general, uplink (UL) and downlink (DL) transmissions need radio link control,
but some control loops only refer to UL transmissions, e.g., UL power control, UL
timing advance, etc. In 3GPP LTE, frequency division duplex (FDD) and time
division duplex (TDD) use the same radio link control and they face basically
the same problems; only the radio link measurements may dier if the channel
is reciprocal. Signaling refers to the direct communication of cooperating BSs
when using decentralized radio link control and to the indirect communication
via a central node when centralized radio link control is applied.

11.2.1

Resource Allocation
Resource allocation, as considered from a physical layer point of view in Section 11.1, is the process where BSs allocate radio resources in time and frequency to certain UEs. The BSs need to answer the question which and how
many resources to allocate. The decision on which resources to allocate is based
on information about the (predicted) radio link quality, e.g., the current radio
channel state including slow and fast fading as well as the interference situation.
The decision on how many resources to allocate is based on information about the
trac demand, e.g., buer ll levels and quality of service (QoS) requirements.
All information needs to be available at the serving BS performing resource
allocation.
In cooperative transmission and reception schemes as introduced in Chapters 5
and 6, the resulting radio link quality changes compared to a non-cooperative
transmission, and the resource allocation should be based on the quality of the
cooperative radio link between the UE and the BS antennas in multiple cells,
see also Chapter 9. Inter-BS communication could be used to exchange channel
information between cooperating BSs so that they know the quality of the cooperative radio link. However, such an information exchange between BSs might
not be possible, e.g., due to the lack of an appropriate interface (limited capacity and/or long delays). In that case, the serving BS could try to estimate the
improvement of the radio link quality due to cooperation, e.g., an oset could
be added to the estimated signal-to-interference-and-noise ratio (SINR).
Having estimated the quality of the cooperative radio link, each BS can allocate resources independently of the other BSs, or a set of BSs can do some form of
joint resource allocation in order to better take interference into account, as proposed in Section 5.2. In the former case, the process does not dier from regular
1

Note that the distinction of centralized and decentralized radio link control does not correspond to the classication of centralized and decentralized CoMP schemes, which refers to
the place where decoding (uplink) and encoding (downlink) are performed, see Section 4.

256

Scheduling, Signaling and Adaptive Usage of CoMP

resource allocation, and strongly interfering transmissions in neighbor cells could


be allocated to the same resource. In the latter case, a hierarchical scheduling
can be applied, which relies on detailed knowledge of co-channel interference.
For instance, a rst cell (or group of cells) starts scheduling its own UEs. Then
the schedule, i.e., the allocation of radio resources to UEs probably extended by
information on transmit power, precoding vectors etc., is forwarded to a second
cell, which in turn schedules its own UEs considering the known interference
caused by the rst cell. The schedules of the rst and second cell are forwarded
to a third cell and so on. By doing so, strongly interfering transmissions in closeby cells could be allocated onto dierent resources. As a drawback, hierarchical
scheduling requires additional signaling between cooperating BSs and increases
scheduling delay.
The amount of allocated resources depends on the trac demand. If the
demand exceeds available capacity, some form of prioritization needs to be performed. This is particularly important if QoS requirements have to be guaranteed. The trac demand in DL can be deducted from the buer ll levels at
the BS. The trac demand in UL can be acquired from buer status reports
or scheduling requests sent by the UEs. With trac-dependent QoS, knowledge
about the trac demand is required per service-class. A common dierentiation is to separate signalling-, real-time-, and best-eort trac. Each serving
BS is aware of the trac demand for all attached UEs. Coordinated scheduling schemes, which would need to take the trac demand into account, require
access to the trac demand per service-class per UE and per cooperating BS,
which would require additional signalling.
Having acquired knowledge on the trac demand, the same QoS-aware
resource allocation schemes as in non-CoMP systems, such as proportional fair
or maximum rate scheduling, can be applied in CoMP-enabled systems.

11.2.2

Link Adaptation
The selection of modulation and coding schemes (MCSs) is carried out by the
serving BS. Based on the estimated SINR, the BS selects the MCS that maximizes the user throughput. Since CoMP can increase the SINR perceived at the
receiver, link adaptation in CoMP-enabled networks should not be based on the
SINR of the BS-UE link, but on the increased SINR after cooperation. Thereby,
the BS can select a more aggressive MCS resulting in a higher throughput.
One way to estimate the channel quality after cooperation is to measure the
radio links involved in the cooperation and to gather and combine the measurements at the serving BS, see also Section 9.1 and details provided later in
Section 11.2.3. Besides channel quality, the SINR is also determined by interference, which can either be estimated from previous transmission attempts, or be
more accurately predicted if cooperating BSs exchange their schedules prior to

11.2 Decentralized Radio Link Control and Inter-BS Signaling

257

link adaptation (see Section 5.2). Exchanging schedules between BSs of course
requires additional signaling and increases delay.
Alternatively, the serving BS could estimate the SINR increase due to cooperation without additional inter-BS or UE to BS signaling. For instance, mobility measurements, which are anyway reported by a UE, give an indication on
the pathloss to candidate BSs. Such a technique was for instance described
in Section 7.2. For CoMP transmissions, which last over several hybrid automatic repeat request (HARQ) round-trip times (RTTs), the number of HARQ
re-transmissions that indicates the actual block error rate (BLER) can be considered when setting the MCS. During such a transmission period, the MCS could
be adapted to better meet the BLER target, which maximizes throughput. If
the MCS selection is not adapted to the increased SINR obtained with cooperation, CoMP only reduces the BLER leading to fewer re-transmission requests.
Although this slightly reduces the packet delay, it is desirable to operate the
HARQ at a more spectrally ecient BLER.

11.2.3

Radio Link Measurements


As described above, the CoMP-specic adaptation of control loops may be based
on radio link measurements. In UL, BSs can measure the radio links and exchange
the measurement reports between cooperating BSs. In DL, the UE can measure
the links towards cooperating BSs and report the measurements to the serving
BS. If supported by the radio interface, the UE might report the measurements
to multiple BSs, a concept considered later in Section 13.4. If the channel is
reciprocal, UL measurements could be leveraged for adapting DL transmissions.
In that case, UEs would not need to transmit their reports over the air. However, since interference is not reciprocal, UE reporting and inter-BS signaling is
desirable to obtain an estimate of the DL interference experienced by a UE.
The amount of information contained in the measurement data inuences the
accuracy of the radio link control: the more the better. However, the required
signaling capacity and therewith the signaling delays increase as well. Especially
for DL CoMP, where UEs report the measurements over the air, the signaling
overhead is a very serious issue. In order to balance control loop accuracy against
signaling overhead, several measurements can be dened. First of all, detailed
channel state information (CSI), which includes frequency selective phase information for all antennas, can be measured and reported. From the CSI reports,
BSs can extract all kinds of information required for various purposes: relative
quality of resource blocks for the purpose of resource allocation, channel rank,
corresponding optimal precoders, and SINR after cooperation for the purpose of
link adaptation etc.
Since CSI reports are huge and generate lots of signaling trac over the air (as
well as on the backhaul), measurements can be tailored for specic purposes. Such
measurements contain only the required information and hence they generate

258

Scheduling, Signaling and Adaptive Usage of CoMP

much less signaling trac. Examples of such radio link measurements in LTE
are rank indicator (RI), precoding matrix indicator (PMI), and channel quality
indicator (CQI). One could even further reduce the signaling load by reporting
only long-term measurements such as pathloss coecients.
In general, the information exchange between the nodes can be on-demand or
periodic. In the former case, the serving BS can request the required information
for a specic cooperation attempt with specic BSs on-demand. BSs not involved
in the cooperation and UEs attached to those BSs are not required to measure or
report anything. With periodic signaling, all candidate links have to be measured
and reported independently of the actual need for cooperation, and UEs and/or
BSs continuously exchange CoMP-related information.

11.2.4

Uplink Power Control


UL power control is the process to adjust the UE transmit power so that signals
are received at the BS with an appropriate power level. The power level should
be selected to maximize spectral eciency by balancing achieved link bitrate and
generated interference to co-channel cells. In 3GPP LTE, the UE transmit power
for the UL data channel consists of an open-loop and a closed-loop component,
and the UE sets it according to [3GP09f]:
PTX = min {Pmax , P0 + PLDL + 10 log10 (R) + MCS + } .

(11.8)

Here, Pmax is the maximum UE transmit power, which is of course the upper
limit of the actual UE transmit power. P0 can be seen as the (cell-specic) desired
receive power, which is transmitted by the BS as part of the LTE system information. The term 10log10 (R) reects the fact that for a larger number of assigned
resource blocks R a higher received power and thus a correspondingly higher
transmit power is needed. The parameter MCS , congured by the BS, adds an
MCS-dependent power oset, which reects the dierent SINR requirements per
MCS. PLDL is part of the open-loop power control component, where each UE
selects an appropriate transmit power to compensate a fraction of the estimated DL pathloss to the serving cell, PLDL . The DL pathloss is derived from
the signal strength of the DL reference signals. Without cooperation, a UE only
detects the DL reference signals of its serving BS to estimate the pathloss.
With cooperation, this component should also take the pathloss to supporting
BSs into account. This is tricky because UEs would then need to know which
cells actually cooperate, which would require additional BS-UE signaling to precongure supporting BSs. This signaling would reduce the serving BSs ability to
react quickly to changing transmission conditions by selecting supporting cells
on-demand on a subframe basis. Furthermore, UL CoMP can be designed to be
transparent to the UE, which would allow the support of legacy UEs. Including
the pathloss to supporting BSs in the open-loop power control component would

11.2 Decentralized Radio Link Control and Inter-BS Signaling

259

not be transparent to UEs, and hence it would not be backward-compatible. See


details on UE-aware clustering concepts in Section 7.2.
The term is part of the closed-loop power control component, where the
UE transmit power is adjusted by the serving BS by means of power control
commands. Without CoMP, the term is used, e.g., to compensate for UE
errors or for UL multi-path fading, which is not reected in the estimated DL
pathloss [DPSB08]. With CoMP, the closed-loop component could be used to
adjust the UE transmit power to the increased SINR resulting from UL CoMP:
the UE transmit power and therewith the interference level could be reduced
while keeping the same MCS and BLER.

11.2.5

Uplink Timing Advance


The BS advances the UE transmission timing so that UL signals can be received
time-aligned at the BS. More specically, in order to maintain UL orthogonality
between UEs in a cell, any timing misalignment should fall in the cyclic prex
(CP) duration [DPSB08]. The timing advance depends on the signal propagation
delay, which basically depends on the distance between the BS and the UE.
With UL CoMP, the UE signal is received by several BSs and, in general,
the distance from the UE to the BSs is dierent. If the misalignment due to
the dierent signal propagation delays is shorter than the CP duration, UL
orthogonality between UEs can be maintained even when a BS receives signals
from UEs of several cells. If the misalignment is larger than the CP duration,
UL orthogonality will be degraded and UEs signals will interfere each other, as
investigated in detail in Section 8.2. This constraint imposes an upper limit on
the potential distance between cooperating BSs.

11.2.6

HARQ-related Timing Constraints for UL CoMP


Cooperation between distant sites introduces additional delays, which result
mainly from the delay caused by the transmission of data and control information
between cooperating sites. The actual delay depends on the inter-site distance,
on the core network deployment topology to be discussed in Section 12.2, and
on the backhaul technology to be discussed in Section 12.3. This extra CoMP
delay prior to each transmission is a potential threat to the strict timing constraints required by HARQ protocols. In 3GPP LTE, DL HARQ is based on
asynchronous re-transmissions, hence a re-transmission for an erroneous initial
DL transmission can be triggered anytime. In contrast, UL HARQ is based on
synchronous re-transmissions, where a re-transmission can only be triggered in
the subframe associated with the same HARQ process as the initial transmission. The UL re-transmission is triggered by a negative HARQ acknowledgement
(NACK) or an UL grant. In LTE FDD, these messages are transmitted 4 ms
before the re-transmission or 4 ms after the initial transmission [3GP09f].

260

Scheduling, Signaling and Adaptive Usage of CoMP

Figure 11.3 Suspension of HARQ process for uplink CoMP in 3GPP LTE.

With UL CoMP, the entire process of cooperative UL reception hence needs


to be completed 4 ms after the UL transmission so that the serving BS can send
a relevant HARQ feedback (or UL grant) to the UE. This timing requirement
is very challenging, and it is most likely not possible to be met if distant sites
cooperate. However, there are some possibilities to relax the transmission timing
and hence give more time for the cooperative reception. Two alternatives are
briey described in the following.
In order to gain more time for cooperation, the serving BS could send a positive HARQ acknowledgement (ACK) to the UE, thereby suspending the HARQ
process. Fig. 11.3 shows that the BS sends the ACK in subframe 4 (4 transmit
time intervals (TTIs) after the initial transmission), although the transport
block (TB) has not yet been decoded. Due to the reception of an HARQ ACK,
the UE does not perform a re-transmission for the corresponding HARQ process. A new initial transmission for the corresponding HARQ process can also
not be triggered by the BS because the UE has to keep the TB in the HARQ
buer for a potential re-transmission in case the cooperative reception fails. For
the particular UE, the corresponding TTI has to remain empty. By suspending
the HARQ process, the time for cooperative joint detection increases by 8 ms
and the serving BS has more time to gather the required CoMP information
from supporting sites and to perform the CoMP reception. One RTT after the
HARQ feedback, the CoMP processing should be nished, and the serving BS
should send relevant feedback, which either triggers a re-transmission or which
acknowledges the successful data transmission. In Fig. 11.3, the initial transmission could not be decoded successfully, and the BS triggers the re-transmission
by transmitting a scheduling grant for the last TB to the UEs.
With this alternative, the time for the entire process of UL CoMP can be
extended from 3 to 11 ms without any changes in the LTE specications. However, this alternative refrains a CoMP UE to utilize certain TTIs, while the

11.2 Decentralized Radio Link Control and Inter-BS Signaling

261

Figure 11.4 Usage of two transport blocks per HARQ process for uplink CoMP.

corresponding HARQ process is suspended. This is visible in Fig. 11.3, where


every second TTI of the considered HARQ process is empty.
Therefore, from UE perspective, it is benecial to apply UL CoMP with
HARQ process suspension only if the achievable throughput with CoMP is more
than twice as large as the throughput in a non-CoMP mode, where every TTI can
be used. Nevertheless, from network perspective, UL CoMP with HARQ process
suspension could improve cell spectral eciency when the resource blocks (RBs)
in empty TTIs can be assigned to other UEs.
Alternatively, a second transport block per HARQ process could be used,
which allows interlacing the transmission of the other transport block while the
cooperative reception is ongoing. Figure 11.4 shows that the UL transmission
starts in subframe 0 with the transmission of the rst TB of the HARQ process
indexed with 1.1. Like in the previous scheme, the BS sends an ACK after 4 TTIs
to avoid a non-adaptive retransmission. At the same time, the BS sends a scheduling grant for the second TB indexed 2.1. Eventually, 11 TTIs after the initial
transmission, the BS has to send meaningful feedback for TB 1.1. This solution
also doubles the HARQ RTT, but in contrast to the above alternative, the HARQ
process can be used continuously by the UE. This solution requires that the corresponding control signaling is in place. The concept of having two TBs per
HARQ process is known from the DL multiple-input multiple-output (MIMO)
transmission mode in LTE [3GP07b]. There, either one or two transport blocks
can be transmitted in a single subframe. UL MIMO transmission is currently
being specied in LTE Release 10 [3GP10d], and the stated control signaling
might be re-used for UL CoMP.

262

Scheduling, Signaling and Adaptive Usage of CoMP

11.2.7

Handover
In cellular systems, such as LTE, a given terminal is associated to one serving cell.
In general, the serving cell is chosen based on the BS-to-UE radio link quality.
UEs regularly measure the signal strength of neighbor cells and report to their
serving BS. As soon as the signal of the serving cell is received with lower signal
strength compared to the signal of a neighbor cell, a handover is triggered by the
serving BS. In general, thresholds and a hysteresis are used to avoid ping-pong
eects. After the handover, the target cell with the best radio link is in charge
of the UE and becomes its new serving cell.
By means of CoMP, the eective signal quality and thereby user and cell
throughput increases. In order to cooperate, data and control has to be exchanged
via the transport network connecting the sites. If the transport network, especially the serving BSs backhaul link, is highly loaded, the serving BS is not able
to perform CoMP. Therefore, improved user and cell throughput due to CoMP
may be limited by the backhaul link capacity. Dierent BSs may have dierent
limitations on their backhaul link, e.g. some may be connected via bre, other
via leased telephone lines (E1/T1).
The handover algorithm could mitigate this limitation by considering the
backhaul capacity and the current backhaul load. For instance, a UE that is not
supported by CoMP due to the limited (copper) backhaul of its serving BS could
be handed over to a dierent serving BS with free (bre) backhaul resources.
With CoMP support at the target BS, the performance could be enhanced. An
adapted algorithm could trigger a handover for active UEs with a backhaullimited serving BS such that the new (target) serving BS has free backhaul
capacity to cooperate. Accordingly, the cell-selection criterion of UEs in idle
mode could be modied. The thereby selected cell may not have the best (noncooperative) radio link quality, which is typically used as a criterion to select
a serving cell, but due to BS cooperation the resulting user and cell may be
optimized.

11.2.8

Inter-BS Signaling
According to LTE operation, each UE is associated to one serving cell, and the
corresponding BS controls the radio link, e.g., resource allocation, link adaptation, etc. With CoMP, transmissions in multiple cells are coordinated, and the
radio link control can be done in a centralized or distributed manner.
With centralized radio link control, a master entity controls and coordinates
the transmissions in multiple cells. In order to perform resource allocation, link
adaptation, power control etc., the master requires the above mentioned radio
link measures, such as CQI, RI, CSI, buer ll levels, etc. Such information has
to be centrally collected by the master, while control commands have to be distributed to the sites. In such a setup, where the radio link control is centralized,
the baseband data processing could also be centralized. In that case, UL signals

11.2 Decentralized Radio Link Control and Inter-BS Signaling

263

are forwarded from the antenna sites to the master, which jointly processes them;
DL signals are fed from the master to the sites. The sites are then equipped with
simple nodes, such as remote radio heads (RRHs), and all complex processing
is performed in powerful BSs, see also Section 11.1. With such a scheme, xed
CoMP clusters are determined by the cells coordinated by one master, also refer
to Section 7.1. Within that cluster, all transmissions are coordinated, but coordination across cluster boundaries is not possible. Since there is a logical tree
topology between the master and each site of the cluster, UL and DL signals are
transmitted only once between a site and the master (in contrast to the following
distributed radio link control). However, since the network control is located in
the master, data (and control) needs to be transferred between the antenna sites
and the master for every UL and DL transmission irrespective of the CoMP gain
(in contrast to the following distributed radio link control).
With distributed radio link control, the communication with the UE is still
controlled by the serving cell, although the UE signal can be jointly received
or cooperatively transmitted by several cooperating cells. As a result, there is a
logical mesh topology between cooperating cells, which is composed of multiple
individual tree topologies between each serving cell and its supporting cells.
Since a given cell can support multiple serving cells at the same time, UL and
DL signals may be exchanged among multiple cells. However, network control
remains in the BSs, and each BS is capable of handling UL and DL transmissions
on its own, i.e., CoMP can be disabled. This is especially useful for UEs with
very good channel conditions to the serving cell, which would not benet much
from CoMP. A distributed control scheme allows adapting (i.e. decreasing and
increasing) the cluster size: the selection of supporting cells can be done in a UEor cell-specic manner.
In a cell-specic selection, cells which benet most from the CoMP scheme,
e.g., cells with large overlapping areas, would be clustered. This could be done by
the operation and maintenance system based on network planning and the result
would be xed clusters, which corresponds to the centralized scheme above. The
cell-specic selection can be made more dynamic by re-conguring the clusters
during operation based on measurements, such as UE location and signal quality.
Here, cells would be clustered so that certain hotspots or certain areas with bad
link quality benet from cooperation, see also Section 7.2. However, with cellspecic clustering the supporting cells are not optimal for all UEs of a cell.
In a UE-specic cell selection, the serving BS could request cooperation from
one or more supporting BSs for certain UEs. Here, each UE always has the optimal cluster of supporting cells for the given cooperation mode. Consequently,
there are no cluster boundaries anymore where transmissions cannot be coordinated; each UE is always in the middle of its own cluster.
An example of UE-specic signaling for decentralized UL joint detection based
on IQ sample exchange is shown in Fig. 11.5. Basics on UL joint detection were
provided in Sections 6.1 and 6.2, and a simulative performance evaluation of

264

Scheduling, Signaling and Adaptive Usage of CoMP

serving
BS

UE

supporting
BS

scheduling
IQ_req(R
Bs)

PUxCH

Rx

Rx
s)

aram
Q, opt. p
IQ_rsp(I

joint IRC/IC reception


demodulation &
decoding
ACK / NACK

Figure 11.5 Message sequence chart of requesting IQ samples from a supporting


c 2009 IEEE.
BS [HFG09]. 

the particular scheme considered here can be found in Section 14.3. First, the
serving BS does the scheduling. As described above, resource allocation, link
adaptation and power control can be adapted to the mode of cooperation. Then,
the serving BS requests support from one or more supporting cells for a particular UE transmitting on certain resources. The corresponding message to request
IQ samples for certain RBs is named IQ req(RBs) in Fig. 11.5. UEs requiring
cooperation and the corresponding supporting cells are selected on-demand. As
it will be described in the sequel, UE and BS selection can be based on different parameters, such as location, pathloss, actual channel realization, etc.
Having received the UEs data (physical uplink shared channel (PUSCH)) or
control channel (physical uplink control channel (PUCCH)) on the indicated
resources, the supporting BS transfers the requested IQ samples to the serving
BS. The corresponding response carrying IQ samples and other optional parameters is named IQ rsp(IQ, opt. params) in Fig. 11.5. The serving BS performs
joint reception (using maximum ratio combining (MRC) or interference rejection combining (IRC)) based on the IQ samples received from cooperating BSs
in conjunction with its own received signal. Finally, it checks if the reception was
correct and prepares the transmission of HARQ feedback.
A hybrid approach combining a centralized control scheme with a decentralized CoMP scheme has the advantage that the cluster size can be adapted in
the sense that the actual CoMP cluster can be a subset of the control cluster.
However, it still has the drawback of a new network entity requiring a continuous
exchange of control information with the sites.

11.2 Decentralized Radio Link Control and Inter-BS Signaling

265

UE Selection
A UE-specic CoMP scheme with distributed control requires a proper selection
of UEs. In CoMP schemes where the backhaul trac and the computational
complexity scales with the number of supported UEs, it could be benecial not
to select all UEs of a cell for cooperation. If too many UEs are selected for
CoMP, the resulting backhaul load might be overwhelming, or processing time
may become a critical resource. If the wrong UEs or if too few UEs are selected,
then the potential gain of CoMP cannot be fully exploited. Various dierent
methods could be used aiming at optimizing dierent parameters:
Relative channel quality: This method aims at selecting UEs for which the
quality of the channels, e.g., pathloss, towards cooperating cells are relatively
close to the quality of the channel towards the serving cell. This method allows
selecting UEs that are close to the cell-edge. These UEs are usually suering
from high co-channel interference.
Current radio link performance: This method aims at selecting UEs that
have bad radio link performance, e.g., bad absolute channel quality to the
serving cell or very active co-channel cells generating lots of interference.
Data rate improvement: This method aims at selecting UEs that would
experience the largest throughput increase due to CoMP. The selection would
be based on the dierence of the (estimated) user throughput with and without
CoMP support. Depending on the expression used to measure the throughput
increase, this method can lead to maximum cell capacity.
Geographic location: This method uses the geographic location to select UEs
for CoMP mode, which can be obtained by means of dierent techniques,
e.g., cellular location methods (see Section 15.1), or global positioning system
(GPS) measurements reported by outdoor UEs.
Type of application: This method uses service or application-specic parameters to select UEs to operate under CoMP mode. UEs in CoMP mode perceive
increased data rates, but they might see slightly higher packet delays. Furthermore, the setup of the CoMP mode may take time. So CoMP perfectly
suits for all kind of services requiring large bit rates over a certain period of
time. Such services are, e.g., le download, le sharing, (high denition) video
streaming, software updates, mailbox synchronization, harddisk backup, etc.
Supporting BS Selection
A UE-specic CoMP scheme requires the proper selection of supporting BSs for
each selected UE. Even a cell-specic CoMP scheme requires the selection of
supporting BSs for a given serving BS. Again, the careful selection of supporting
BSs is important to not overload the backhaul network and the BS processors.
This holds for CoMP schemes where the backhaul trac and the computational
complexity scales with the number of supported BSs. BSs of the same site are
less critical to select, since the information can be exchanged at no backhaul
expense, whereas information between BSs of dierent sites are exchanged via

266

Scheduling, Signaling and Adaptive Usage of CoMP

backhaul. As an example, two dierent approaches to select supporting BSs are


detailed in the following:
Maximize received signal energy: This approach aims at selecting supporting BSs that can collect most of the carrier signal energy. This method uses the
characteristics of the links between non-serving BSs and UEs of the serving
BS as selection criterion. By cooperating with BSs having good link quality,
the serving BS can increase the received carrier signal energy. This method
can improve cell-edge throughput, especially in noise-limited scenarios.
Control severe interference: A second approach aims at selecting supporting BSs that create most of the interference. This approach is based on the
characteristics of the link between the serving BS and UEs of non-serving BSs.
By cooperating with BSs generating strong interference, the serving BS can
control and mitigate the interfering signals. This method is especially useful
in interference limited scenarios.

11.2.9

Summary
In this section, the potential modications of existing radio link control loops
and signalling aspects of practical implementations of CoMP schemes in cellular
systems at the example of 3GPP LTE were discussed. Most radio link control
loops are aected when introducing CoMP. The biggest challenge is to get the
right radio link measures, i.e., the measures of the radio channel between antennas of multiple BSs and the UE, at the right place, i.e., the place where the radio
link is controlled. Only the HARQ and UL timing advance procedures impose
more strict constraints.
The signalling corresponding to the communication between cooperating BSs
mainly depends on whether control is centralized or not. When using decentralized radio link control, the corresponding signalling can trade-o complexity,
backhaul load and latency. An example signalling scheme for a backhaul-ecient,
UE-specic, on-demand decentralized UL CoMP scheme has been introduced.

11.3

Ad-hoc CoMP
Michael Grieger, Patrick Marsch and Gerhard Fettweis
In this section, the concept of ad-hoc CoMP is introduced for the cellular uplink.
In this concept, a certain cooperation strategy is decided upon after uplink transmission has already taken place. Furthermore, the extent of cooperation may
be progressively adapted until successful decoding is possible. Both concepts
make use of the fact that the channel knowledge during detection and decoding is more accurate than at the time of scheduling. The topic is motivated in
Subsection 11.3.1, after which concrete schemes are proposed for two particular

11.3 Ad-hoc CoMP

267

scenarios in Subsections 11.3.2 and 11.3.3, respectively. It is discussed in Subsection 11.3.4, to which extent the concept of ad-hoc CoMP may shed a dierent
light on hybrid automatic repeat request (HARQ), followed by a summary in
Subsection 11.3.5.

11.3.1

Introduction
Opportunistic Communication and Scheduling
The volatile nature of the wireless channel has long been seen as a burden,
complicating the life of wireless system engineers. In recent years, however, the
perspective has changed, bringing wireless channels into a more favorable light
as summarized by David Tse and Pramod Viswanath in [TV05]. In a cellular system, available time and frequency resources are shared among multipleusers. Provided the channel state can be tracked with sucient accuracy, channel
uctuations can be exploited by scheduling users on those time and frequency
resources for which channel conditions are the best. A particular channel condition can then be exploited by matching modulation and coding schemes (MCSs)
as well as signal parameters such as transmit power to the channel state, a concept referred to as link adaptation. Centralized scheduling schemes that exploit
channel uctuations for improved CoMP system performance were already presented in Section 11.1.
Imperfect CSI at the Scheduler
It is obvious that the performance of these scheduling concepts depends strongly
on the amount and quality of channel information that is available at the scheduler, which should be as up to date, accurate and extensive as possible, i.e.,
ideally including information on interference, distortion, radio frequency (RF)
imperfections, etc. In reality, however, the picture that is available at the scheduler is fairly noisy, and is akin to an image seen through a narrow lens. For CoMP
systems in particular:
Channel links to dierent base stations (BSs) have diverse gains and are, therefore, hard to estimate. Additionally, there might be interference between pilot
signals (see Section 9.1) which further impedes accurate channel estimation.
Joint scheduling, which promises huge gains in terms of total throughput and
fairness by exploiting multi-user diversity, requires that channel state information (CSI) be forwarded to a central scheduling node and the scheduling
decision be forwarded to the user equipments (UEs). Due to this scheduling
delay, the scheduler bases its decision on outdated information.
Hence, while making its decision, the scheduler relies on imperfect CSI. Consequently, transmission errors are inevitable, because there is a probability that
the scheduler assigns a transmission rate which is not supported by the channel.
As described in Section 11.2, in contemporary standards of cellular systems like
LTE, the impact of transmission errors is reduced by using HARQ techniques.

268

Scheduling, Signaling and Adaptive Usage of CoMP

CoMP under Backhaul Constraints


In Chapter 2, it was already emphasized that a major economic hurdle challenging a substantial deployment of CoMP is the extensive backhaul infrastructure
that is required for information exchange among BSs. Hence, the identication
of backhaul ecient cooperation strategies is an important challenge, in order
to keep additional costs for an upgrade of the existing backhaul infrastructure
low. The results presented in Section 4.3.1 show that a exible usage of dierent CoMP schemes is benecial in the uplink of a backhaul constrained system,
because their performance depends on current channel conditions. For example, it is known that the exchange of decoded information for the purpose of
interference cancelation is very eective (in terms of CoMP gain vs. required
backhaul) for asymmetric scenarios where the most severe interference links can
be canceled. An exchange of compressed receive signals to a joint decoder, however, as observed in Section 6.1, allows achieving higher data rates when the
available backhaul capacity is large. Hence, the scheduler should take all available CoMP options and backhaul constraints into consideration while assigning
physical resources and MCSs.
Scheduling, Ad-hoc CoMP and HARQ
In Section 11.1, we have seen that the functionality of a scheduler may consist of
a variety of tasks, namely resource allocation, decision on CoMP schemes, power
control, choice of precoders, and link adaptation. However, as stated above, in
a mobile environment the scheduler has imperfect CSI, which results in nonoptimal scheduling. While most decisions made by a scheduler have to be kept
xed during transmission, we here want to point out that the choice of a particular CoMP scheme can indeed still be altered after uplink transmission has
already taken place. This choice comprises two main aspects:
The size of the cooperation cluster Intuitively, it makes sense to exibly
choose more and more supporting BSs (see, e.g., Section 11.2.8), as opposed
to a xed cluster of CoMP cells, until a UE can be decoded successfully.
The cooperation scheme that is used, and parameters involved As shown in
the previous chapters, many dierent uplink CoMP schemes exist with dierent performance / backhaul trade-os, dierent properties regarding latency
etc.. Furthermore, dierent uplink CoMP schemes are suitable for dierent
channel conditions. Hence, it also makes sense to adapt the CoMP strategy
according to the better CSI (and possibly rst decoding feedback) available
after transmission.
In this section, we show how an ad-hoc decision on the CoMP mode, which we
refer to as Ad-Hoc CoMP, leads to a more ecient usage of backhaul infrastructure. Thus, system performance can be increased if the availability of backhaul
capacity limits the number of terminals that can be served with CoMP. To this
end, we investigate a model for the cellular uplink that includes three aspects:
scheduling, Ad-Hoc CoMP, and HARQ, as shown in Fig. 11.6.

11.3 Ad-hoc CoMP

estimate channel
at BSs

forward CSI
to scheduler

scheduling:
resource allocation
link adaptation
CoMP mode
rate adaptation

269

send uplink
grants to
UEs

request retransmission (HARQ)


Transmission

Ad Hoc CoMP
refine CoMP mode
take decoding
success into account

adapt CoMP mode


take more accurate CSI
information into account

Figure 11.6 Scheduling, Ad-Hoc CoMP, and HARQ process.

CSI impairments can be divided into two dierent classes:


Impairments that only aect the CSI available at the scheduler, thereby resulting in more accurate CSI being available for Ad-Hoc CoMP.
Impairments that aect the CSI at the scheduler as well as the CSI available
for Ad-Hoc CoMP.
In the remainder of this section, we will address both of these eects separately.
In Subsection 11.3.2, impairments that aect the CSI at the scheduler will be
considered. Examining a distributed antenna system (DAS), where quantized
receive signals are exchanged between BSs, we analyze the eect that these kinds
of impairments have on achievable throughput, and we propose adaptive compression as a means for more ecient usage of the available backhaul capacity. In
Subsection 11.3.3, we address CSI impairments that eect the CSI available at
the scheduler and for Ad-Hoc CoMP. Once again, we examine these impairments
more closely for the example of a DAS, and observe how feedback on decoding
success can be used for a more ecient backhaul usage. Obviously, in any real
system, both kinds of CSI impairments will occur together. Ad-Hoc CoMP is
thus divided into two successive processes as depicted in Fig. 11.6. First, the
CoMP scheme is adapted by taking new channel knowledge into account. If this
is not sucient for successful decoding, the usage of CoMP is rened. In a conventional system, the only way to achieve reliable communication despite having
imperfect CSI at the scheduler is to employ HARQ. However, in a CoMP system, there is an additional degree of freedom: the extent of cooperation. The
link between Ad-Hoc CoMP and HARQ is discussed in Subsection 11.3.4.

11.3.2

Ad-Hoc CoMP With More Accurate CSI


Joint scheduling requires the distribution of CSI to a central scheduling node
and forwarding of the scheduling decision to the UEs. Due to this delay and the
time varying nature of the mobile channel, the scheduler bases its decision on

270

Scheduling, Signaling and Adaptive Usage of CoMP

BSs forward
channel estimates to global
scheduler

Delay

Scheduler determines:
- compression code
- decoding order
- inst. achievable rates
- scheduled rates

Communication of the
scheduling decision
Delay

Channel state
has changed

Transmission
Joint decoding

adaptive compression

Figure 11.7 Scheduling and ad-hoc decoding process.

CSI that does not describe the channel that is used for the transmission of user
data perfectly. In cellular systems, the channel is variant for two main reason:
fast fading.
time varying interference, particularly in systems with little interference averaging such as LTE.
Assuming that these are the only kinds of CSI impairments that could occur,
the CoMP mode can be adapted such that the backhaul capacity available is
used optimally. For all users that can be decoded locally, CoMP would not be
used at all. The same is true for users that could not be decoded even with the
maximum CoMP support available. At the same time, dierent uplink CoMP
schemes such as distributed interference subtraction (DIS) or DAS as introduced
in Section 4.3.1 would be used whenever they deemed most eective. In the
sequel, we study the eect of outdated CSI in a distributed antenna system with
centralized decoding (see Section 4.3.1).
Example: Adaptive Compression in a Distributed Antenna System
As introduced in Section 4.3.1, in a DAS with centralized decoding, one of the BSs
functions as a joint decoder of codewords transmitted by all UEs in the CoMP
cluster, and all other BSs function as remote radio heads (RRHs), forwarding
their receive signal. We consider that the backhaul connecting the BSs is limited
in its capacity. Thus, the signals received at all RRHs have to be compressed
prior to their exchange over the backhaul.
The scheduling, transmission, and decoding process is depicted in Fig. 11.7.
Since the problem of resource allocation was described in detail in Section 11.1,
we here assume that resources have already been allocated to UEs by any arbitrary algorithm. For this reason, we consider simplied scenarios of few BSs
and UEs as drafted in Fig. 4.4(c). The inuence of inter-cluster interference is
neglected in this model, and we only observe outdated CSI due to fast fading
eects.
Since we consider the eect of outdated CSI at the scheduler only, we assume
that the channel is perfectly estimated for every transmission block. As depicted

11.3 Ad-hoc CoMP

271

in Fig. 11.7, in a mobile time-variant environment, the scheduler has access only
to CSI that is outdated by nd transmission blocks, because certain delays for the
exchange of channel estimates and for the communication of uplink grants are
inevitable. Based on this outdated channel information, the scheduler estimates
achievable transmission rates and assigns appropriate MCSs. For simplicity, in
the remainder of this section, we assume that the number of possible MCSs is
unlimited, ignoring the fact that in real systems only a certain granularity of
MCSs is available. Transmission errors occur if the rate of the assigned MCS is
too high to be successfully decoded. Since achievable rates in a DAS depend on
the compression accuracy, the scheduler has to nd a trade-o between throughput and the required backhaul capacity. Note that in the multi-user case with
the employment of successive interference cancelation (SIC) at the decoder, the
rate of each UE also depends on the decoding order.
In the following paragraphs, we investigate the benet of adaptive compression. In particular, two strategies are compared:
1. xed compression: a xed backhaul rate is used for the exchange of the compressed signals from the RRH to the decoder.
2. adaptive ad-hoc compression: the updated CSI after transmission is taken into
account to decide which backhaul rate (and therefore compression accuracy)
is sucient for successful decoding.
If the adaptive scheme is employed, we exploit the fact that the RRHs have
full knowledge of the current channel state after the transmission. They are
therefore able to adapt the compression appropriately and to enable successful decoding with as little information exchange as possible. The gain of the
adaptive scheme is indeed two-fold: besides achieving higher throughput due to
the reduced probability of transmission errors; backhaul consumption is reduced
because the adaptive scheme exploits all cases where decoding is possible with a
backhaul rate that is lower than the xed rate. Additionally, in the case where
successful decoding could not be achieved even under full cooperation, the backhaul is not used at all, enabling its potential usage for other terminals.
A comparison of the maximum sum-rate that can be achieved for a certain
average backhaul rate is shown in Fig. 11.8, where we consider a scenario with
M = 2 BSs with Nbs = 1 receive antennas each, and either K = 1 double-antenna
UE (Fig. 11.8(a)) or K = 2 single-antenna UEs (Fig.11.8(b)). The UEs use xed
per antenna transmit power P , and the received signal is distorted by additive
white Gaussian noise (AWGN) with variance v2 . In general, a rich scattering
environment leading to complex Gaussian channel realizations (Rayleigh channel) that are spatially independent is assumed. The UEs are assumed to be
located at the cell-edge. In order to model the time-variance of the channel, it is
assumed that the 1 or 2 UEs are moving at a constant speed v. We employ the
widely used Jakes spectrum to model the eects of the Doppler spread [JC94].
Furthermore, we assume coding over a complete transmission block of 1 ms and

6
5
v = 5 km/h
v = 15 km/h
v = 30 km/h
v = 45 km/h
adaptive

4
3
2

sum-rate [bit/channel use]

Scheduling, Signaling and Adaptive Usage of CoMP

sum-rate [bit/channel use]

272

6
5
4
3
2

xed
1

1
0

2
4
6
8
average backhaul rate [bit/channel use]

(a) Setup 1 (K = 1, M = 2)

2
4
6
8
average backhaul rate [bit/channel use]

(b) Setup 2 (K = 2, M = 2)

Figure 11.8 Comparison of sum-rate vs. backhaul for the adaptive and xed schemes

for dierent time varying Rayleigh channels (fc = 2.68 GHz, v2 = 0.1, P = 1).

a scheduling delay of 3 ms. The results are based on information-theoretic models that include the use of best-known compression techniques that also utilize
side-information by Wyner-Ziv coding [dCS09], and a rather simple scheduler
that makes a decision based on the channel that was observed nd codewords
earlier and considers a backo-factor which is chosen such that throughput is
maximized. For further details, we refer to [GMF10a].
As expected, Fig. 11.8 shows a throughput loss that increases with the
time-variance of the channel. However, when the ad-hoc cooperation scheme
is employed, we see strong gains in terms of the throughput/backhaul trade-o.
Indeed, ad-hoc cooperation allows us to achieve almost maximum throughput
for much lower backhaul rates than with xed cooperation, which in the low
backhaul regime mitigates the negative impact of time varying channels on the
achievable throughput. As expected, the backhaul savings of the ad-hoc scheme
increase with increasing mobility.
When the achievable gains for the one UE case (Fig. 11.8(a)) and the two
UE case (Fig. 11.8(b)) are compared, we see that the possible gains of Ad-Hoc
CoMP are reduced. The reason for this observation is that, in the two user case,
it is not possible to adapt the backhaul rate such that the rates of both users
are equal to the scheduled rate separately because the backhaul rate is increased
until both users can be decoded successfully. Hence, the gains from the proposed
adaptive cooperation scheme decreases with an increase in the number of UEs
that are decoded jointly. However, this occurs when only single antenna BSs are
employed. By using multiple BS antennas, the backhaul rate can be distributed
on the spatial dimensions of the receive signal in a way that less backhaul rate
is utilized for the compression of user signals beyond the accuracy required for
successful decoding as shown in [GMF10c].

11.3 Ad-hoc CoMP

11.3.3

273

Ad-Hoc CoMP with CSI Impairments


In the case of outdated CSI, the CSI available for Ad-Hoc CoMP after the
transmission is more accurate than the CSI available at the scheduler. However,
perfect CSI will never be achieved because of
channel estimation errors,
noise covariance estimation errors, and
RF-impairment estimation errors.
Hence, in practice it is not possible to adapt the CoMP mode perfectly as
done in the previous subsection. At the same time, we still strive for ecient
backhaul use and a minimization of transmission errors. A possible solution to
this problem relies on error detection coding schemes that indicate the decoding
success. In practice, this decision is based either on the output of an outer error
detection code (e.g. cyclic redundancy check (CRC)), or by observing reliability
information delivered by soft output decoders. The idea is to increase the use
of CoMP techniques progressively until successful decoding is achieved. In the
following example, we use this concept in a DAS where compression accuracy is
progressively increased by using successive renements.
Example: Distributed Antenna System
In this example, we concentrate on the eects of channel estimation errors. In
this case, the scheduled rates are determined based on the estimated channel
state. The channel estimation error, which is constant during the reception of
a codeword, leads to an additive transmission impairment with an unknown
variance. A transmission error occurs if the scheduled data rate chosen is too
high to be supported by the channel. In order to decide on the scheduled rate,
the scheduler tries to predict the estimation distortion. In a practical system, a
simple solution to this problem is to consider a signal-to-interference-and-noise
ratio (SINR) margin. The CSI that is available for Ad-Hoc CoMP is impaired as
well. Hence, optimal adaptation of the backhaul rate is not possible. However,
the usage of a xed backhaul rate is still inecient because we can utilize the
information on the decoding success in conjunction with successively renable
source coding schemes [EC91], as demonstrated in the following paragraph.
The algorithm used for the progressive renement of the CoMP mode is
depicted in Fig. 11.9. It is based on the employment of successively renable
source coding schemes [EC91]. The quantization accuracy is progressively rened
as long as decoding is unsuccessful. It is known that successive renements
are possible without any rate loss [EC91] for Gaussian sources, however, other
sources are also renable. Prominent examples are all scalar quantizers. Here,
some of the most signicant bits can be exchanged initially. If the quantization
distortion turns out to be too high for successful decoding, the representation
can be rened by an exchange of the next most signicant bits, and so forth.
Outage occurs only if the nest possible quantization accuracy is insucient for

274

Scheduling, Signaling and Adaptive Usage of CoMP

Exchange of a coarse version of the


received signal to decoder BS A
Joint decoding based on received
coarse representations

Decoding
successful?

yes

End of
transmission

no

Investigate probability for successful decoding


based on more exact digital representation
of received signal (opt.)

Exchange of finer version


of the received signal to
master eNB

yes

Request refined
digital representation?

no (H)ARQ

Figure 11.9 Flowchart of the progressive ad-hoc cooperation algorithm.

successful decoding. A downside of the proposed scheme is that it requires a feedback mechanism between the decoder and the forwarding BSs, which introduces
an additional delay.
For the rest of this subsection, we will compare two approaches:
1. xed compression: the same backhaul rate cx is always used for the exchange
of compressed signals from BS 2 (the RRH) to BS 1 (the decoder). The
throughput is maximized (in this case) by choosing an optimal signal-to-noise
ratio (SNR) gap at the scheduler.
2. progressive cooperation: a progressive renement of the exchanged signal is
used to achieve the lowest backhaul rate cpro that enables successful decoding.
When latency and complexity are not constrained, the most backhaul-ecient
scheme would be to rene the accuracy of the forwarded information in very
small successive steps. However, in real-world systems, a good trade-o between
throughput, backhaul rate, and latency is desired. Therefore, we need to nd
other methods that limit the number of iterations. A straightforward approach
is a simple three step scheme. The signal is rst quantized with the rate cfix
. If
2
decoding is unsuccessful, the exchanged signal is rened to a total rate of cx . If
decoding is still not successful, in the last step, further renement to a rate of
2cx is used. The transmission is in outage if even this rate is not sucient for
decoding. Further information is given in [GMF10b].
As mentioned earlier, we assume a block fading channel, such that the channel
(as well as the channel estimation and the channel estimation error) are constant for the transmission of one codeword, and successive channel realizations
are assumed to be uncorrelated. Fig. 11.10(a) shows the Monte-Carlo simulation
results for the setup that was already observed in Section 11.3.2. In addition to
2
= 0), we consider channel estimation
the case of perfect channel estimation (est
2
errors with variance est = {0.02, 0.05, 0.1}. The relatively large gap between

2
est
=0
2
est
= 0.02
2
est
= 0.05
2
est
= 0.1
optimal
three-step

5
4
3
2

xed

sum-rate [bit/channel use]

sum-rate [bit/channel use]

11.3 Ad-hoc CoMP

275

6
5
4
3
2
1

2
4
6
8
average backhaul rate [bit/channel use]

(a) Setup 1 (K = 1, M = 2)

2
4
6
8
average backhaul rate [bit/channel use]

(b) Setup 2 (K = 2, M = 2)

Figure 11.10 Comparison of sum-rate vs. backhaul for the optimal and the heuristic
three-step progressive scheme as well as the xed scheme (v2 = 0.1, P = 1).

the throughput for perfect channel estimation and for imperfect CSI is a consequence of the fact that the variance of the estimation distortion is unknown,
resulting either in transmission errors or scheduled rates that are far from the
achievable rates.
Fig. 11.10(b) shows that the gain of progressive ad-hoc cooperation scheme
decreases with the number of UEs that are decoded jointly, because the backhaul
rate is increased until both users can be decoded successfully. The reasons and
potential countermeasures are the same as in the case of imperfect CSI due to a
scheduling delay.

11.3.4

Ad-Hoc CoMP and HARQ


Even with the usage of ad-hoc CoMP, there is a certain chance that decoding is not possible. In these cases, a retransmission is required as indicated in
Fig. 11.6. In current cellular standards, HARQ is used because the extra costs
in terms of system complexity are justied by large throughput gains. As discussed in Section 11.2, when CoMP systems are considered, HARQ is more
problematic because the information exchange required over the backhaul network adds additional delay to the total signal processing time until decoding
is successful, potentially violating the demands of existing standards. Although
these new requirements could principally be considered in new versions of cellular standards, this is undesirable, since the application of uplink CoMP does
not necessarily require further changes to the standard. Thus, the use of Ad-Hoc
CoMP is not only contingent on backhaul rate but also to backhaul latency constraints that need to be considered in the selection of appropriate CoMP modes.
In particular, progressive schemes introduce additional delays and their practical
feasibility needs to be proven in practice. The results presented in Sections 11.3.2

276

Scheduling, Signaling and Adaptive Usage of CoMP

and 11.3.3 show that the number of retransmissions can be reduced by using
Ad-Hoc CoMP. Future research will show if an ad-hoc use of CoMP along with
coordinated scheduling might have the same potential as that of reliable communication without HARQ on the rst two layers.

11.3.5

Summary
In this section, the concept of Ad-Hoc CoMP for the cellular uplink was introduced. The key concept is to adapt the CoMP strategy after transmission has
taken place in order to exploit channel information that is more recent than the
one available at the time of scheduling. In this way, a more ecient use of backhaul can be achieved. The potential gains of using Ad-Hoc CoMP were shown for
the example of a distributed antenna system where base stations exchange quantized receive signals for centralized decoding, and where two particular scenarios
were considered.
In the rst scenario, assuming that perfect CSI is available to the BSs after
transmission, while only inaccurate CSI was available at the time of scheduling,
it was shown that the employment of an adaptive backhaul compression rate can
greatly increase backhaul eciency.
In the other scenario, now assuming that the CSI available at the time of
decoding is also subject to estimation errors, it could be shown that a progressive
ad-hoc cooperation scheme is highly benecial in terms of backhaul savings. Here,
successively rened information is passed over the backhaul in multiple iterations,
until successful decoding of the terminal transmissions is possible. Clearly, this
leads to a trade-o between latency (number of iterations) and sum backhaul
rate. A simple three-step approach was introduced, and its performance relative
to the optimal progressive scheme and a naive xed cooperation scheme was
shown.
For the scenarios observed, the results indicate that an adaptive and progressive use of CoMP promises to reduce the required backhaul rate by about
50 %.

12 Backhaul

In this chapter, we address a last, but absolutely not least important challenge
connected to CoMP, namely the fact that most base station cooperation schemes
require information exchange over a backhaul infrastructure. Depending on the
existing infrastructure of a mobile operator, both backhaul capacity and latency
requirements of some CoMP schemes may be the main cost drivers or potential
show stoppers on the roadmap towards CoMP. The chapter starts with addressing fundamental aspects of backhaul-constrained cooperation in Section 12.1,
after which concrete backhaul capacity and latency requirements of various
uplink and downlink CoMP schemes and their scaling behavior are derived in
Section 12.2. Finally, Section 12.3 gives an overview on existing and upcoming
backhaul technology options, and hence gives the reader a feeling of whether
particular CoMP schemes can be expected to be technically and commercially
feasible in the near future or not.

12.1

Fundamental Limits of Interference Mitigation with Limited


Backhaul Cooperation
I-Hsiang Wang and David Tse
As we have seen in previous parts of this book, cooperation among base stations (BSs) via infrastructure backhaul networks can help mitigate interference
by forming distributed multiple-input multiple-output (MIMO) systems, while
the rate at which BSs cooperate is limited in wide-band cellular systems. How
much interference can one bit of backhaul cooperation mitigate? In this section, we study the two-user Gaussian interference channel with limited backhaul cooperation to answer this question in a simple setting. We identify two
regions pertaining to the fundamental gain from backhaul cooperation: linear
and saturation regions. In the linear region, cooperation is ecient and provides
a degrees-of-freedom gain, which is either one cooperation bit buys one more bit or
two cooperation bits buy one more bit until saturation. In the saturation region,
cooperation is inecient and provides a power gain, which is at most a constant

278

Backhaul

regardless of the rate at which BSs cooperate. The conclusion is drawn based on
the characterization of the capacity region to within a constant gap1 .

12.1.1

Introduction
Why is Backhaul Limited?
One of the common misconceptions about backhaul cooperation is that the backhaul provides near unlimited cooperation capability, so that base stations can
cooperate in an unlimited manner. To refute this, we shall use a simple example
to illustrate that in a wide-band cellular system, backhaul cooperation is usually
limited.
Consider a wide-band orthogonal frequency division multiplex (OFDM)-based
cellular system with a bandwidth of 20 MHz. To attain near unlimited cooperation, the received signal at a base station should be quantized nely enough so
that it can be recovered with a negligible distortion at other base stations. Let
us do a back-of-the-envelope calculation to get a sense of the rate that should be
used to convey these quantization outputs. Suppose we use 8 bit/s/Hz to quantize the signal and use the backhaul to exchange them. The total throughput
required in the backhaul is then 20 8 = 160 Mbits/s. Even for optical carriers
in synchronous optical network (SONET)/synchronous digital hierarchy (SDH),
such a high data rate is only supported beyond OC-12, and not to mention
other technologies such as digital subscriber line (DSL) that cannot support
it. For wireless technologies with growing bandwidth, since the backhaul link
capacity does not increase with wireless spectra, from the above calculation we
conclude that backhaul cooperation should be considered limited, and understanding how to make use of backhaul cooperation eciently for interference
mitigation becomes important.
Gaussian Interference Channel with Backhaul Cooperation
The simplest information-theoretic model for studying the fundamental limits of
a communication system in the presence of interference is the interference channel (IC). In its simplest form, an interference channel consists of two transmitterreceiver pairs, and each receiver is only interested in retrieving information from
its own transmitter. Therefore, one users information-carrying signal becomes
interference for the other user. A Gaussian IC is one where the second users
signal x2 interferes with the rst users signal x1 in an additive fashion and vice
versa, along with additive white Gaussian noises at both receivers. Mathematically, the Gaussian interference channel is dened as follows:
y1 = h11 x1 + h12 x2 + z1 , y2 = h21 x1 + h22 x2 + z2
1

(12.1)

The dierence between inner and outer bounds are within a constant number of bits, which
does not depend on channel parameters.

279

12.1 Fund. Limits of Interf. Mitigation with Limited Backhaul Coop.

z1
m1

ENC 1

h11

x1

z1
DEC 1

m
, 1 m1

ENC 1

h21

ENC 2

x2

DEC 1

m
1

DEC 2

m
2

h21
CB12

h12
m2

h11

x1

h22

CB21

DEC 2

CB12
m
 2 m2

CB21

ENC 2

h12

x2

z2

(a) Uplink scenario.

h22

z2

(b) Downlink scenario.

Figure 12.1 Channel model considered. Dashed lines denote interfering links.

are the received signals, where two mutually independent additive noise processes
{zi [k]}N
k=1 (i = 1, 2) are independently and identically distributed (i.i.d.) with
NC (0, 1) over time. In this section, we use [] to denote time indices. Transmitter
i intends to convey message mi to receiver i by encoding it into a block codeword
{xi [k]}N
k=1 , with transmit power constraints
N
2
1  
xi [k] 1, i = 1, 2,
N

(12.2)

k=1

for an arbitrary block length N . Messages m1 and m2 are independent. Dene


channel parameters
SNRi := |hii |2 , INRi := |hij |2 , i, j = 1, 2, i = j.

(12.3)

In the uplink scenario, since BSs serve as receivers, backhaul cooperation is


modeled as receiver cooperation. On the other hand, in the downlink scenario,
backhaul cooperation is modeled as transmitter cooperation. Backhaul cooperation links are modeled as noise-free links with nite cooperation capacity CBij
from BS i to j, for (i, j) = (1, 2) or (2, 1). The models are depicted in Fig. 12.1.
Despite the simplicity of this model, even in the scenario without cooperation,
exact characterization of the capacity region has remained open since its introduction in 60s. In this section, we will not pursue the exact characterization of
the capacity region. Instead, we aim at a uniformly approximate characterization where the capacity region is determined to within a constant gap, meaning
that the dierence between inner and outer bounds is within a constant number of bits. The constant gap is not dependent on channel parameters, and
hence the fundamental limit in the interference-limited regime (that is, at high
signal-to-noise ratio (SNR)) is fully characterized. Etkin et.al. characterize the
capacity region of the Gaussian IC to within 1 bit/s/Hz per complex dimension [ETW08]. Wang et.al. characterize the capacity region of the Gaussian IC
with limited backhaul cooperation to within 2 bit/s/Hz and 6.5 bit/s/Hz, for the
uplink receiver cooperation scenario [WT09a] and downlink transmitter cooperation scenario [WT10], respectively.

280

Backhaul

A Deterministic Approach to the Gaussian Interference Channel


Throughout the section, we shall employ the linear deterministic model [ADT07,
BT08] for the Gaussian interference channel to study the problem and illustrate high-level intuitions. For our system, the corresponding linear deterministic
model is parameterized by six integers {n11 , n12 , n21 , n22 , k12 , k21 }, where

+
(12.4)
nij := 5log |hij |2 6 , i, j {1, 2}, k12 := 5CB12 6, k21 := 5CB21 6.
For the interference channel part, the transmit signal at transmitter i is xi
for i = 1, 2. Here F2 denotes the binary eld {0, 1}. The received signals are

Fq2 ,

y1 = S qn11 x1 + S qn12 x2 , y2 = S qn21 x1 + S qn22 x2 ,

(12.5)

where additions are modulo-two component-wise, q = max {n11 , n12 , n21 , n22 },
is the shift matrix
and S Fqq
2

0 0 0 0
1 0 0 0

(12.6)
S = 0 1 0 0.
.
. . ..
..
. .
0 0 1 0
An interpretation of this model considers the binary expansion of signals. The
eect of additive white Gaussian noise is modeled by truncation of the signal
below the noise level. The eect of superposition with interference is modeled
by the modulo-two component-wise addition of the bits, where the carry-over in
real addition is not captured for simplicity.
Fundamental Gain from Limited Backhaul Cooperation
We identify two regions pertaining to the gain from limited backhaul cooperation:
linear and saturation regions, as illustrated by a numerical example in Fig. 12.2.
The example is symmetric with SNR1 = SNR2 = SNR = 20 dB, INR1 = INR2 =
INR = 15 dB, and CB12 = CB21 = CB . In the linear region, backhaul cooperation is
ecient, in the sense that the growth of user data rate is roughly linear with
respect to the capacity of the backhaul links. The gain in this region is the
degrees-of-freedom gain that CoMP systems provide. On the other hand, in the
saturation region, backhaul cooperation is inecient in the sense that the growth
of user data rate becomes saturated as one increases the rate in the backhaul
links. The gain is the power gain of a constant number of bits at best, and the
constant is independent of the channel strength and the backhaul cooperation
rate. We will focus on system performance in the linear region, not only because
the rate at which base stations can cooperate is limited in most scenarios, but
also because the gain from cooperation is more signicant.
With the constant-gap-to-optimality result, we nd the fundamental gain from
cooperation in the linear region as follows: either one cooperation bit buys one
more bit or two cooperation bits buy one more bit until saturation, depending

12.1 Fund. Limits of Interf. Mitigation with Limited Backhaul Coop.


SNR = 20dB, INR = 15dB

10
user data rate [bit / channel use]

281

0
0

10

20
30
40
50
cooperation rate [bit / channel use]
linear region
saturation region
cooperation is ecient
cooperation is inecient
Figure 12.2 The gain from limited backhaul cooperation.

on channel parameters. This will be elaborated and explained in the last part of
this section.
The rest of this section is organized as follows. First, we describe the cooperation strategies between base stations that achieve the capacity regions to
within 2 bits and 6.5 bits in uplink and downlink scenarios respectively. Next,
we show that there is an uplink-downlink reciprocity between the two scenarios,
and hence there is no dierence in the fundamental gains obtained from receiver
or transmitter cooperation. We then quantify the degree-of-freedom gain by characterizing the number of generalized degrees of freedom in the system. Finally, we
use a couple of linear deterministic examples to illustrate the high-level intuitive
reasons why there are two dierent kinds of behaviors of the gain from backhaul
cooperation in the linear region.

12.1.2

Uplink Scenario: Receiver Cooperation


We shall not give the full expression of the inner and outer bounds of the capacity
region due to space constraints. We point interested readers to reference [WT09a]
for more details. Instead, we rst use a linear deterministic example to motivate
the strategy and then describe the strategy that achieves capacity to within a
constant gap in the Gaussian scenario.
Linear Deterministic Examples
Consider the following symmetric channel: SNR1 = SNR2 = SNR, INR1 = INR2 =
INR, and CB12 = CB21 = CB . Set INR to be 2/3 of the SNR in dB scale, that is,
log INR = 23 log SNR. Set CB = 13 log SNR. The corresponding linear deterministic
channel (LDC) is depicted in Fig. 12.3. Bits at the levels of transmitters/receivers

282

Backhaul

Received

a1

a1

a3

b1
a3

Received

Exchanged

a1
a2
a3

a1
b1 a2
a3

Exchanged

b1 a2
b1

b1

b1
a1

b1

b3

b3

b1
a1

b3

b3 a2 b1 a2

(a) Example 1: without cooperation.


Received

a1
a2
a3

b2

(b) Example 1: with cooperation.

Exchanged

Received

a1

a1

a1

a2

a2

a1

a2
a3

a1

a1

b2 a2
a3

a2
a1

(c) Example 2: sub-optimal scheme.

b1

b2

Exchanged

a1
a1

a2

a2

a3

a1

a3

b2 a2
a3

a2
a1

(d) Example 2: optimal scheme.

Figure 12.3 Example channels. {ak } denote user 1s bits, while {bk } denote user 2s.

c 2009 IEEE.
Index k denotes the k-th level at the corresp. transmitter. [WT09b] 

can be thought of as chunks of binary expansions of the transmitted/received


signals. Note that in this example, one bit in the LDC corresponds to 13 log SNR
in the Gaussian channel.
We begin with the baseline scenario where two receivers are not allowed to
cooperate. Transmit signals are naturally split into two parts: (1) the common
levels, which appear at both receivers, and (2) the private levels, which appear
only at its own receiver. Each transmitter splits its message into common and
private parts, which are linearly modulated onto the common and private levels of
the signal respectively. Each receiver then decodes both users common messages
and its own private message by solving the linear equations it received. This is
shown to be optimal in the two-user interference channel [BT08]. In this example
(Fig. 12.3(a)), bits a1 and b1 are common, while a3 and b3 are private. The sumcapacity without cooperation is 4 bits. Since all levels at both receivers are
occupied, one cannot turn on bits a2 or b2 without causing collisions.
With receiver cooperation, the natural split of transmitted signals does not
change. This suggests that the encoding procedure and the aim of each decoder
remain the same. Each receiver with help from the other receiver, however, is
able to decode additional information. Since each users private message is of
no interest to the other receiver, an obvious scheme for receiver cooperation is
to exchange linear combinations formed by the signals above the private signal

12.1 Fund. Limits of Interf. Mitigation with Limited Backhaul Coop.

283

level so that the undesired signal does not pollute the cooperative information.
In this example, as illustrated in Fig. 12.3(b), with one-bit cooperation in each
direction in the LDC, the optimal sum-rate is 5 bits, achieved by turning on one
more bit a2 . This causes collisions at the second level at receiver 1 and at the
third level at receiver 2, which can be resolved with cooperation: receiver 1 sends
b1 a2 to receiver 2, and receiver 2 sends b1 to receiver 1. Now, receiver 1 can
solve (a1 , a2 , a3 , b1 ), and receiver 2 can solve (b1 , b3 , a1 , a2 ). In fact, the exchanged
linear combinations are not unique. For example, receiver 1 can send (b1 a2 )
a1 and receiver 2 can send b1 a1 , and this again achieves the same rates. As
long as receiver 1 does not send a linear combination containing the private bit a3
and the sent linear combination is linearly independent of the signals at receiver 2
(and vice versa for the linear combination sent from receiver 2 to receiver 1), the
scheme is optimal for this example channel. The above discussion regarding the
scheme in the LDC naturally leads to an implementable one-round scheme in the
Gaussian channel, where both receivers quantize-and-bin their received signals
at their own private signal level.
In the above example, it is optimal that each receiver sends to the other,
linear combinations formed by its received signal above its private signal level.
Is this optimal in general? The answer is no. Consider the following asymmetric
example: SNR2 = INR2 , SNR1 is 2/3 of SNR2 in dB, and INR1 is 1/3 of SNR2 in
dB. CB12 = 23 log SNR2 and CB21 = 13 log SNR2 . The corresponding LDC is depicted
in Figs. 12.3(c) and 12.3(d), where one bit in the LDC corresponds to 13 log SNR2
in the Gaussian channel. First consider the same scheme as in the previous
example. Note that if receiver 2 just forwards signals above its private signal
level, it can only forward a1 to receiver 1 and achieves R1 up to 2 bits. On the
other hand, if receiver 2 forwards a3 to receiver 1, which is below user 2s private
signal level, it achieves R1 = 3 bits. From this example, we see that whenever
there is useful information (which should not be polluted by the receivers own
private bits) that lies at or below the private signal level (in this example, the
bit a3 ), the one-round scheme described in the previous example is sub-optimal.
To extract the useful information at or below the private signal level, one of
the receivers (in this example, receiver 2) can rst decode and then form linear
combinations using (decoded) common messages only.
Without loss of generality it turns out that, the above situation (there is useful
information for the other receiver that lies at or below the private signal level)
only occurs at one of the two receivers. In other words, there exists a receiver
where no useful information (for the other receiver) lies at or below the private
signal level. The reason is the following:
1. It is not dicult to see that the capacity region is convex, and hence if a scheme
can achieve max(R1 ,R2 )C {1 R1 + 2 R2 } for all 1 , 2 0, it is optimal. Here
C denotes the capacity region.
2. If 1 2 , we weigh user 1s rate more. Since the private bits are cheaper to
support in the sense that they do not cause interference at receiver 2, user 1

284

Backhaul

should be transmitting at its full private rate, which is equal to the number
of levels at or below the private signal level at receiver 1. Therefore, all levels
at or below the private signal level are occupied by user 1s private bits and
there is no useful information for receiver 2 at receiver 1.
3. Similarly if 1 2 , there is no useful information for receiver 1 at receiver 2,
at or below the private signal level.
Hence, the following two-round strategy is optimal in the LDC: if 1 2 ,
receiver 1 forms a certain number (no more than the cooperative link capacity)
of linear combinations composed of the signals above its private signal level and
sends them to receiver 2. After receiver 2 decodes, it forms a certain number of
linear combinations composed of the decoded common bits and sends them to
receiver 1. If 1 2 , the roles of receiver 1 and 2 are interchanged. Depending
on the operating point in the capacity region, we use dierent congurations,
implying that time-sharing is needed to achieve the full capacity region.
From the above discussion, a natural and implementable two-round strategy
for Gaussian channels emerges. For transmission, we use a superposition Gaussian random coding scheme with a simple power-split conguration, as described
in [ETW08]. For cooperation, one of the receivers quantizes-and-bins its received
signal at its private signal level and forwards the bin index; after the other
receiver decodes with the side information that helps it, it bins-and-forwards the
decoded common messages back to the rst receiver and helps it decode.
Coding Strategy
The scenario is depicted in Fig. 12.1(a). The strategy consists of two parts: (1)
the transmission scheme, describing how transmitters encode their messages, and
(2) the cooperation scheme, describing how receivers exchange information and
decode messages. We give an overview of the strategy below.
Transmission Scheme. We use a simple superposition coding scheme with
Gaussian random codebooks. Each transmitter splits its own message into
common and private (sub-)messages. Each common message is aimed at both
receivers, while each private message is aimed at its own receiver. Each message
is encoded into a Gaussian random codeword with certain power. For transmitter i, the power for its private and common codewords is Qip and Qic = 1 Qip ,
respectively, for i = 1, 2. As [ETW08] points out, since the private signal is undesired at the unintended receiver, a reasonable conguration is to make the private
interference at or below the noise level so that it does not cause much damage
and can still convey additional information in the direct link if it is stronger than
the cross link. When the interference is stronger than the desired signal, simply
set the whole message
! to be common. In other words, for (i, j) = (1, 2) or (2, 1),
Qip = min

1
INRj , 1

if SNRi > INRj , and Qip = 0 otherwise.

Cooperation Scheme. The cooperation scheme is two-round. We briey


describe it as follows: for (i, j) = (1, 2) or (2, 1), at the rst round, receiver j

12.1 Fund. Limits of Interf. Mitigation with Limited Backhaul Coop.

285

quantizes its received signal and sends out the bin index (described in detail
below). At the second round, receiver i receives this side information, decodes
its desired messages (both users common messages and its own private message)
with the decoder described in detail below, randomly bins the decoded common
messages, and sends the bin indices to receiver j. Finally receiver j decodes with
the help from the receiver-cooperative link. We call this a two-round strategy
STGjij , meaning that the processing order is: receiver j quantizes-and-bins,
receiver i decodes-and-bins, and receiver j decodes. Its achievable rate region is
denoted by Rjij . By time-sharing, we can obtain an achievable rate region
R := conv {R212 R121 }, the convex hull of the union of two rate regions.
There is a simple way to understand the strategy from an engineering perspective. To achieve max(R1 ,R2 )R {1 R1 + 2 R2 } for some non-negative (1 , 2 ), the
processing conguration can be easily determined: strategy STGjij should be
used, where i = arg minl=1,2 {l } and j = arg maxl=1,2 {l }. To summarize, the
receiver which decodes last is the one we favor the most.
In the following paragraphs, we describe each component in detail, including
quantize-binning, decode-binning, and their corresponding decoders. For simplicity, we consider strategy STG212 .
Quantize-binning: Upon receiving its signal from the transmitter-receiver
link, receiver 2 does not decode messages immediately. Instead, serving as a
relay, it rst quantizes its signal by a pre-generated Gaussian quantization codebook with a certain distortion, and then sends out a bin index determined by a
pre-generated binning function. How should we set the distortion? As discussed
previously, note that both its own private signal and the noise it encounters
are not of interest to receiver 1. Therefore, a natural conguration is to set the
distortion level equal to the aggregate noise plus the private signal power level.
Decoder at receiver 1: After retrieving the receiver-cooperative side information, that is, the bin index, receiver 1 decodes the two common messages and
its own private message, by searching the transmitters codebooks for a codeword triple (indexed by the two common messages and the users own private
message) that is jointly typical [CT06] with its received signal and some quantization point (codeword) in the given bin. If there is no such unique codeword
triple, it declares an error.
Decode-binning: After receiver 1 decodes, it uses two pre-generated binning
functions to bin the two common messages and sends out these two bin indices
to receiver 2.
Decoder at receiver 2: After receiving these two bin indices, receiver 2
decodes the two common messages and its own private message, by searching the
transmitters codebooks for a codeword triple such that it is jointly typical [CT06]
with its received signal and the common messages that both lie in the given bins.

286

Backhaul

12.1.3

Downlink Scenario: Transmitter Cooperation


Once again, the complete expression of the inner and outer bounds of the capacity
region will not be presented. Reference [WT10] contains more details. Instead,
the strategy that achieves capacity to within a constant gap is described.
Coding Strategy
The scenario is depicted in Fig. 12.1(b). A natural cooperation strategy between
transmitters is that, prior to each block of transmission, two transmitters hold
a conference to tell each other a part of their messages. Hence the messages
are classied into two kinds: (1) cooperative messages, which are known to both
transmitters due to the information exchange, and (2) non-cooperative ones,
which are unknown to the other transmitter since the cooperative link capacities
are nite. On the other hand, messages can also be classied based on their
target receivers: (1) common messages, which are aimed at both receivers, and
(2) private ones, which are aimed at their own receiver. Hence there are, in total,
four kinds of messages for each user, and seven codes for the whole system2 . Now
the question is, how do we encode these messages?
Our strategy turns out to be a simple superposition coding scheme, consisting
of a pair of non-cooperative common and private codes and a pair of cooperative
common and private codes (similar to the scheme proposed in Section 6.4.2).
For the non-cooperative part, the Han-Kobayashi scheme [HK81] is employed,
and the common-private split is such that the private interference is at or below
the noise level at the unintended receiver [ETW08]. For the cooperative part, we
use a simple linear beamforming strategy for encoding private messages, superimposed upon the common codewords. Below, we describe the strategy from a
high-level perspective and leave the details to [WT10]. For the cooperative common message, we modulate it onto a two-dimensional vector code and use both
transmitters to send it. Denote the cooperative common signal by xo . We choose
xo to be Gaussian with zero mean and a covariance matrix which has diagonal
entries (values of transmit power) that are comparable with the total transmit
power. For the cooperative private signal xh , we shall make it a superposition of
zero-forcing vectors
(
(
)
)
h22
h12
v1z =
, v 2z =
(12.7)
h21
h11
and matched-lter vectors
v1m =

( )
)
h11
h21
=
,
v
.
2m
h12
h22

(12.8)

There is only one cooperative common code carrying both cooperative common messages.

12.1 Fund. Limits of Interf. Mitigation with Limited Backhaul Coop.

287

Hence, the overall cooperative signal transmitted by combining both transmitters is the following:
xoh = xo + w1z v1z + w2z v 2z + w1m v 1m + w2m v 2m ,

(12.9)

xh

where wiz and wim are independent Gaussian random codes carrying a part
of the cooperative private message for user i. For the non-cooperative part, we
simply transmit the superposition of two independent Gaussian random codes xic
and xip for the non-cooperative common and non-cooperative private messages
for user i, respectively. Hence, the overall non-cooperative signal transmitted by
transmitter i is xicp = xic + xip , for i = 1, 2. Overall, the transmit signal from
transmitter i is the superposition of cooperative and non-cooperative signals, i.e.
xi = xoh (i) + xicp , i = 1, 2.

(12.10)

For the power allocation, note that the interference caused by the other users
cooperative private signal should be nulled out approximately, that is, its variance
is at or below the noise level. Moreover, the interference caused by the other users
non-cooperative private signal should also be at or below the noise level. With
this guideline, we can determine the power allocation policy. For more details we
point the readers to [WT10].
The decoding procedure, compared to the uplink scenario, is much simpler.
Each receiver decodes all common messages and its own private messages jointly.

12.1.4

UL-DL Reciprocity and Generalized Degrees of Freedom


It turns out that there is a nice reciprocity between uplink and downlink scenarios. First, we dene the reciprocal downlink(uplink) system with respect to an
uplink(downlink) system.
Denition (Reciprocal Systems). Given an uplink(downlink) system with channel matrix H and cooperation capacities CB12 from BS 1 to 2 and CB21 from BS 2
to 1, its reciprocal downlink(uplink) system is specied by channel matrix H and
cooperation capacities CB21 from BS 1 to 2 and CB12 from BS 2 to 1.
The reciprocal property is summarized in the following theorem:
Theorem (Uplink-Downlink Reciprocity). The capacity regions of the reciprocal
uplink-downlink systems are within a constant gap to each other. Therefore, their
system performances in the linear region are the same.
Proof. Comparison of the outer bounds proves the result. See [WT10] for details.

288

Backhaul

Based on reciprocity, we investigate performance in the linear region by characterizing the optimal generalized degrees of freedom available in the system,
and demonstrate the fundamental gain from limited backhaul cooperation in the
rest of this section. The notion of generalized degrees of freedom is originally
proposed in [ETW08]. For simplicity, we consider a symmetric set-up, where
SNR = SNR1 = SNR2 , INR = INR1 = INR2 ; CB = CB12 = CB21 ,

(12.11)

and a standard performance measure is the symmetric capacity


Csym := sup {R : (R, R) capacity region} .

(12.12)

We begin with the denition.


Denition (Generalized Degrees of Freedom). Let
log INR
= ;
SNR log SNR
lim

CB
= ,
SNR log SNR
lim

(12.13)

and dene the number of generalized degrees of freedom per user as


d := lim

x ,
SNR

Csym
,
log SNR

(12.14)

if the limit exists3 .


With the constant-gap-to-optimality result, the generalized degrees of freedom
(g.d.o.f.) an be characterized via straightforward calculations:
Theorem (Number of Generalized Degrees of Freedom Per User).
%
min {1, max (, 1 ) + , 1 /2 + /2} , 0 < 1
d=
min {, 1 + , /2 + /2} ,
1

(12.15)

Numerical plots for the g.d.o.f. are given in Fig. 12.4. We observe that the gain
from cooperation varies at dierent values of . By investigating the g.d.o.f., we
conclude that at high SNR, when interference-to-noise ratio (INR) is below 50%
of SNR (in dB), one-bit cooperation per direction buys roughly one-bit gain per
user until full receiver cooperation performance is reached, while when INR is
between 67% and 200% of SNR (in dB), one-bit cooperation per direction buys
roughly half-bit gain per user until saturation.
3

In fact, the limit does not exist when = 1, where the phases of the channel gains matter.
In particular, its value can depend on whether the system MIMO matrix is well-conditioned
or not. To overcome this issue, we pose a reasonable distribution, namely, i.i.d. uniform
distribution, on the phases, show that the limit exists almost surely, and dene the limit to
be the number of generalized degrees of freedom per user. See [WT09a] for more details.

12.1 Fund. Limits of Interf. Mitigation with Limited Backhaul Coop.

289

d(, )
2.0

1.5

= 1/2
= 1/3
= 1/6

1.0

=0

0.5

0
0

0.5

1.0

1.5

2.0

2.5

3.0

c 2009 IEEE.
Figure 12.4 Generalized degrees of freedom. [WT09b] 

Gain from Limited Cooperation


The fundamental behavior of the gain from limited backhaul cooperation is
explained in the rest of this section, by looking at two particular points: = 12
and = 23 in the uplink scenario. We further use the LDC for illustration.
At = 12 , the plot of d versus is given in Fig. 12.5(a). The slope is 1 until full
receiver cooperation performance is reached, implying that one-bit cooperation
buys one more bit per user. We look at a particular point = 14 and use its
corresponding LDC (Fig. 12.5(b)) to provide insights. Note that 1 bit in the LDC
corresponds to 14 log SNR in the Gaussian channel, and since CB 14 log SNR, in
the corresponding LDC each receiver is able to send one-bit information to the
other. Without cooperation, the optimal way is to turn on bits not causing
interference, that is, the private bits a3 , a4 , b3