Sei sulla pagina 1di 4

PERFORMANCE ANALYSIS OF A FIXED-COMPLEXITY SPHERE DECODER IN

HIGH-DIMENSIONAL MIMO SYSTEMS

Luis G. Barbero and John S. Thompson

Institute for Digital Communications


University of Edinburgh
Email: {l.barbero, john.thompson}@ed.ac.uk

ABSTRACT plitude modulation (QAM) constellation of P points forming


an M -dimensional complex constellation C of P M vectors.
The performance of a new detection algorithm for uncoded
The received N -vector, using matrix notation, is given by
multiple input-multiple output (MIMO) systems based on the
complex version of the sphere decoder (SD) is analyzed in r = Hs + v (1)
this paper. The algorithm performs a fixed number of ope-
rations to detect the signal, independent of the noise level. where s = (s1 , s2 , ..., sM )T denotes the vector of transmit-
Simulation results show that it can be applied to high-dimen- ted symbols with E[|si |2 ] = 1/M , v = (v1 , v2 , ..., vN )T is
sional MIMO systems presenting a very small bit error ratio the vector of independent and identically distributed (i.i.d.)
(BER) degradation compared to the original SD. In addition, complex Gaussian noise samples with variance σ 2 = N0 and
its deterministic nature makes it suitable for hardware imple- r = (r1 , r2 , ..., rN )T is the vector of received symbols. H
mentation. denotes the N × M channel matrix where hij is the complex
transfer function from transmitter j to receiver i. The entries
1. INTRODUCTION of H are modelled as i.i.d. Rayleigh fading with E[|hij |2 ] = 1
and are perfectly estimated at the receiver.
The use of multiple input-multiple output (MIMO) techno- Since the elements of H are i.i.d. complex Gaussian, H
logy has become the new frontier of wireless communica- has full rank M and, therefore, the set {Hs} can be consi-
tions. It enables high-rate data transfers and improved link dered as the complex lattice Λ(H) generated by H. The FSD
quality through the use of multiple antennas at both trans- is directly applied to the complex lattice so that it can be used
mitter and receiver [1]. For spatially multiplexed uncoded for complex constellations different from QAM in a similar
MIMO systems, the sphere decoder (SD) is widely consi- way to [7]. In addition, avoiding the more common real de-
dered the most promising approach to obtain optimal ma- composition would result in a more efficient hardware imple-
ximum likelihood (ML) performance with reduced comple- mentation as shown for the SD in [8].
xity [2],[3]. Although its average complexity is believed to be
polynomial [4], the actual complexity depends on the channel 3. FIXED-COMPLEXITY SPHERE DECODER (FSD)
conditions and the noise level. This makes it difficult to inte-
grate in an actual system where data needs to be processed at a The main idea behind the FSD is to perform a search over only
constant rate. In order to overcome this problem, a new fixed- a fixed number of lattice vectors Hs, generated by a small
complexity sphere decoder (FSD) has been recently proposed subset S ⊂ C, around the received vector r. The transmitted
that provides quasi-ML performance [5]. In addition, it has vector s ∈ S with the smallest Euclidean distance is then
considerably lower complexity than the K-Best lattice de- selected as the solution. The process can be written as
coder proposed in [6]. This paper analyzes the performance
and complexity of the FSD in high-dimensional MIMO sys- ŝfsd = arg{min kr − Hsk2 } . (2)
s∈S
tems showing that it is especially suited for integration into
next-generation wireless communication systems. Eq. (2) can also be written, after matrix decomposition
and removal of constant terms, as
2. MIMO SYSTEM MODEL
ŝfsd = arg{min kU(s − ŝ)k2 } (3)
s∈S
The system model considered has M transmit and N receive
antennas, with N ≥ M , denoted as M × N . The transmit- where U is an M × M upper triangular matrix, with en-
ted symbols are taken independently from a quadrature am- tries denoted uij , obtained through Cholesky decomposition
root
3
1 2
-1-j -1+j 1-j 1+j
i=4 n4 = 3
4
R
i=3 n3 = 2

i=2 n2 = 1
Fig. 2. Schematic of the FSD principle for the 2-dimensional
i=1 n1 = 1 case - only the numbered dots inside the circle are searched
Ns = 1·1·2·3 = 6

Fig. 1. Example of vectors s ∈ S in a 4×4 system with 4- Fig. 1 shows a hypothetical subset S in 4×4 system with
QAM modulation 4-QAM constellation where the number of points per level
nS = (n1 , n2 , n3 , n4 )T = (1, 1, 2, 3)T . In each level i, the
ni closest points to zi are considered as components of the
of the Gram matrix G = HH H and ŝ = H† r is the uncon- subset S. In this case, the Euclidean distance of only NS =
strained ML estimate of s where H† = (HH H)−1 HH is the 6 transmitted vectors would be calculated, whereas the total
pseudoinverse of H. number of transmitted vectors 44 = 256 is much larger.
The (squared) Euclidean distance in (3) can be obtained If S is large, the performance will be closer to that of the
recursively starting from i = M and working backwards until original SD but the number of operations and, therefore, the
i = 1 using required computational resources or the processing time will
M
X increase. That makes the FSD suitable for reconfigurable ar-
Di = u2ii |si 2
− zi | + u2jj |sj − zj |2 = di + Di+1 (4) chitectures where the size of S can be made adaptive depen-
j=i+1 ding on the MIMO channel conditions.
Conceptually, the FSD is equivalent to a SD where, for
where DM +1 = 0, D1 = kU(s − ŝ)k2 and every MIMO symbol, the initial radius R is set to the ma-
XM ximum D1 distance among the NS values obtained. In this
uij
zi = ŝi − (sj − ŝj ) . (5) case, the FSD achieves fixed-complexity by searching over
u
j=i+1 ii only NS points Hs inside the hypersphere so that the lattice
point of the ML solution Hŝml is included with high proba-
In (4), the term Di+1 can be seen as an accumulated (squared) bility. Fig. 2 shows the basic principle of the FSD where the
Euclidean distance down to level j = i + 1 and the term di dots represent the noiseless received constellation, the cross
as the partial (squared) Euclidean distance contribution from represents the actual received point contaminated with noise
level i. and only the numbered dots inside the hypersphere are con-
The subset of transmitted vectors S is determined defining sidered as ML candidates (NS = 4).
the number of points si , denoted as ni , that are considered
per level. In [5], it was shown that, in the SD, the number of
candidates considered per level during the tree search follow 3.1. FSD Preprocessing of the Channel Matrix

E[nM ] ≥ E[nM −1 ] ≥ · · · ≥ E[n1 ] (6) The preprocessing of the channel matrix in the FSD deter-
mines the detection order of the signals ŝi according to the
with 1 ≤ ni ≤ P . The FSD, therefore, assigns a fixed distribution of points nS used.
number of points, ni , to be searched per level following (6). It orders iteratively the M columns of the channel matrix.
This can be explained as follows: whereas in the first level, On the i-th iteration, considering only the signals still to be
i = M , more points need to be considered due to interference detected, the signal ŝi with the smallest post-detection noise
from the other levels, the decision-feedback equalization per- amplification, as calculated in [10], is selected if ni < P .
formed on zi reduces the number of points that need to be If ni = P , the signal with the largest noise amplification is
considered in the last levels to approximate the ML solution. selected instead.
The total number of vectors Q whose Euclidean distance is The following heuristic supports this ordering approach:
M
calculated is, therefore, NS = i=1 ni , where simulations if the maximum possible number of candidates, P , is searched
show that quasi-ML performance is achieved with NS ¿ on one level, the robustness of the signal is not relevant to the
P M , i.e. S is a very small subset of C [5]. The ni points final performance, therefore, the signals that suffer the largest
on each level i are selected according to increasing distance noise amplification can be be detected on the levels where
to zi , following the Schnorr-Euchner (SE) enumeration [9]. ni = P .
M=N=4 M=N=8
0 0
10 10
SD SD
FSD−P = (1,1,1,P) 2
FSD−P = (1,1,1,1,1,1,P,P)
−1 −1
10 10

64−QAM
−2 −2 16−QAM
10 10
BER

BER
−3 −3
10 10
4−QAM 4−QAM

−4 −4
10 10
16−QAM

−5 −5
10 10
0 5 10 15 20 25 0 2 4 6 8 10 12 14 16 18
E /N (dB) E /N (dB)
b 0 b 0

Fig. 3. BER performance of the FSD and the SD as a function Fig. 4. BER performance of the FSD and the SD as a function
of the SNR per bit in a 4×4 system. of the SNR per bit in a 8×8 system.

4. RESULTS observed that, for 16-QAM and at a BER=10−3 , the perfor-


mance degradation of the FSD compared to the SD is of 0.06
The performance and complexity of the FSD has been simu- dB while the K-Best decoder (with K = 16) has a degra-
lated for different constellations and MIMO configurations. dation of 0.015 dB. For 64-QAM and at a BER=10−3 , the
The main aim is to evaluate its suitability for quasi-ML de- performance degradation of the FSD compared to the SD is
tection in a fixed number of operations in systems where the of 0.03 dB while the K-Best decoder (with K = 64) has a
maximum likelihood detector (MLD) is unfeasible due to its degradation of 0.05 dB.
complexity. The results have been obtained simulating 20,000 The BER performance of the FSD for a 8×8 system for
channel realizations with 200 uncoded symbols transmitted 4- and 16-QAM modulation is shown in Fig. 4. In this case,
per channel realization. the total number of points searched in the FSD is NS = P 2
Fig. 3 shows the bit error ratio (BER) performance of the for a P -QAM constellation following the distribution nS =
FSD as a function of the signal to noise ratio (SNR) per bit (1, 1, 1, 1, 1, 1, P, P )T . Thus, all the possible P points are
in a 4×4 system compared to the ML performance provided searched in the first two levels (i = M, M − 1) and only
by the SD. The initial radius in the SD has been set accor- the closest point to zi is considered for the remaining levels.
ding to the noise variance per antenna and it is increased if no The channel matrix has also been ordered using the FSD pre-
point is found inside the hypersphere. The results have been processing. The FSD gives close to ML performance while
obtained for 4-,16- and 64-QAM modulation. The total num- calculating even a smaller percentage of Euclidean distances
ber of points searched in the FSD is NS = P for a P -QAM compared to the 4×4 system (P 2 /P 8 ¿ P/P 4 ). In parti-
constellation following the distribution nS = (1, 1, 1, P )T . cular, for 16-QAM modulation, the degradation compared to
Thus, all the possible P points are searched in the first level the SD is of 0.25 dB at a BER = 10−3 .
(i = M ) and only the closest point to zi is considered for the The number of real floating point operations of the FSD
remaining levels. This distribution has the additional advan- is shown in Fig. 5 where its deterministic nature can be ob-
tage that the SE enumeration is not necessary, further simpli- served, indicating the suitability of the algorithm for real-time
fying the receiver. The channel matrix has been ordered using hardware implementation. The FSD is compared to the SE
the FSD preprocessing, minimizing the BER for the selected version of the SD with and without channel matrix ordering
distribution of candidates nS . in a 4×4 system using 16- and 64-QAM modulation. The
It can be observed that the FSD gives practically ML per- 90-percentile is plotted to indicate the number of operations
formance independent of the SNR, especially for larger con- required to perform the detection process in 90% of the cases.
stellations, by calculating only P Euclidean distances. In par- It can be seen that the FSD has lower complexity than
ticular, for 64-QAM modulation, only 64 Euclidean distances the different SDs. Only for 16-QAM and at high SNR is the
are calculated, whereas the total number of distances to be number of operations of the FSD slightly higher than for the
calculated by the MLD is much larger (644 = 16, 777, 216). SD. However, in a parallel implementation of the algorithm,
The performance curves for the K-Best lattice decoder have the FSD requires only one iteration through the levels, from
not been included for clarity purposes. However, we have i = M to i = 1, achieving the maximum throughput (i.e.
6
M=N=4 the FSD overcomes the problem of the SD, where especial
10
SD (no ordering) techniques (for example, early termination strategies [8]) are
SD (VBLAST−ZF ordering) required to guarantee a minimum throughput.
FSD−P = (1,1,1,P)
5
Finally, the structure of the FSD can be adapted to provide
10
soft information (a posteriori probabilities) about the detected
90−percentile real flops

64−QAM
bits, similar to the list-SD used for iterative turbo-decoding
complex K−Best (K = 16)
4
[7]. This last aspect is the main subject of ongoing work.
10

6. ACKNOWLEDGEMENT
3
10
The authors would like to thank Alpha Data Ltd., company
16−QAM
that partially sponsors this research.
2
10
0 5 10 15 20 25 30 7. REFERENCES
E /N (dB)
b 0

[1] G. J. Foschini, “Layered space-time architecture for wireless


Fig. 5. Complexity of the search stage of the FSD and the communication in a fading environment when using multi-
SE-SD as a function of the SNR per bit in a 4×4 system. element antennas,” Bell Labs Technical Journal, pp. 41–59,
Oct. 1996.
[2] E. Viterbo and J. Boutros, “A universal lattice code decoder for
number of bits detected per second) of the SD. It should be fading channels,” IEEE Trans. Inform. Theory, vol. 45, no. 5,
noted that the FSD requires a specific ordering of the chan- pp. 1639–1642, July 1999.
nel matrix, but its complexity is equivalent to the vertical [3] M. O. Damen, H. E. Gamal, and G. Caire, “On maximum-
Bell Labs layered space time-zero forcing (VBLAST-ZF) or- likelihood detection and the search for the closest lattice
dering of the SD [3]. The complexity of the ordering stage point,” IEEE Trans. Inform. Theory, vol. 49, no. 10, pp. 2389–
could be considered negligible for packet-based communica- 2402, Oct. 2003.
tions where the ordering is only performed once per frame. [4] B. Hassibi and H. Vikalo, “On the expected complexity of
The number of operations of the complex version of the sphere decoding,” in Proc. 35th Asilomar Conference on Sig-
K-Best lattice decoder is also plotted for comparison pur- nals, Systems and Computers, vol. 2, Monterey, CA, Nov.
poses. It can be seen how the complexity of the K-Best lat- 2001, pp. 1051–1055.
tice decoder is considerably higher for both modulations. For [5] L. G. Barbero and J. S. Thompson, “A fixed-complexity
16-QAM (K = 16), the complexity of the K-Best is higher MIMO detector based on the complex sphere decoder,” sub-
by a factor of 13 compared to the FSD, while for 64-QAM mitted to IEEE Workshop on Signal Processing Advances for
(K = 64), the complexity is higher by a factor of 50. There- Wireless Communications (SPAWC ’06).
fore, the FSD achieves a similar quasi-ML performance to [6] K. wai Wong, C. ying Tsiu, R. S. kwan Cheng, and
that of the K-Best lattice decoder while having a lower fixed W. ho Mow, “A VLSI architecture of a K-best lattice decod-
complexity, indicating its suitability for hardware implemen- ing algorithm for MIMO channels,” in Proc. IEEE Interna-
tation. tional Symposium on Circuits and Systems (ISCAS ’02), vol. 3,
Scottsdale, AZ, May 2002, pp. 273–276.
[7] B. M. Hochwald and S. ten Brink, “Achieving near-capacity on
5. CONCLUSION AND FUTURE WORK a multiple-antenna channel,” IEEE Trans. Commun., vol. 51,
no. 3, pp. 389–399, Mar. 2003.
The performance and complexity of a new FSD has been ana-
[8] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Ficht-
lyzed in this paper. The algorithm calculates the Euclidean ner, and H. Bölcskei, “VLSI implementation of MIMO detec-
distances of a very small subset of vectors from the complete tion using the sphere decoding algorithm,” IEEE J. Solid-State
set of all transmitted vectors. It also uses a specific ordering Circuits, vol. 40, no. 7, pp. 1566–1577, July 2005.
of the channel matrix to achieve quasi-ML performance in
[9] C. P. Schnorr and M. Euchner, “Lattice basis reduction: Im-
systems where the number of antennas and the constellation proved practical algorithms and solving subset sum problems,”
order make the MLD unfeasible. Mathematical Programming, vol. 66, pp. 181–199, 1994.
It has been shown that it has a fixed complexity indepen- [10] P. W. Wolniansky, G. J. Foschini, G. D. Golden, and R. A.
dent of the SNR as opposed to the original SD. This is of Valenzuela, “V-BLAST: An architecture for realizing very
special interest for hardware implementation given that the high data rates over the rich-scattering wireless channel,” in
fixed number of operations makes possible a highly-pipelined Proc. URSI International Symposium on Signals, Systems and
parallel implementation of the algorithm that can be integrated Electronics (ISSSE ’98), Atlanta, GA, Sept. 1998, pp. 295–
into complete wireless communication systems. Therefore, 300.

Potrebbero piacerti anche