Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
AbstractEnergy harvesting in cellular networks is an emerging technique to enhance the sustainability of power-constrained
wireless devices. This paper considers the co-channel deployment of a macrocell overlaid with small cells. The small cell
base stations (SBSs) harvest energy from environmental sources
whereas the macrocell (MBS) uses conventional power supply.
Given a stochastic energy arrival process for the SBSs, we derive
a power control policy for the downlink transmission of both
MBS and SBSs such that they can achieve their objectives (e.g.,
maintain the signal-to-interference-plus-noise ratio (SINR) at an
acceptable level) on a given transmission channel. We consider a
centralized energy harvesting mechanism for SBSs, i.e., there
is a central energy queue (CEQ) where energy is harvested
and then distributed to the SBSs. When the number of SBSs
is small, the game between the CEQ and the MBS is modeled as
a single-controller stochastic game and the equilibrium policies
are obtained as a solution of a quadratic programming problem.
However, when the number of SBSs tends to infinity (i.e., a
highly dense network), the centralized scheme becomes infeasible,
and therefore, we use a mean field stochastic game to obtain
a distributed power control policy for each SBS. By solving a
system of partial differential equations, we derive the power
control policy of SBSs given the knowledge of mean field
distribution and the available harvested energy levels in the
battery of SBSs.
Index TermsSmall cell networks, power control, energy
harvesting, stochastic game, mean field game.
I. I NTRODUCTION
Energy harvesting from environment resources (e.g.,
through solar panels, wind power, or geo-thermal power) is
a potential technique to reduce the energy cost of operating
the base stations (BSs) in emerging multi-tier cellular networks. While this solution may not be practically feasible for
macrocell base stations (MBSs) due to their high power consumption and stochastic nature of energy harvesting sources, it
is appealing for small cell BSs (SBSs) that typically consume
less power [2].
Designing efficient power control policies with different
objectives (e.g., maximizing system throughput) is among one
of the major challenges in energy-harvesting networks. In [3],
the authors proposed an offline power control policy for twohop transmission systems assuming energy arrival information
at the nodes. The optimal transmission policy was given by the
directional water filling method. In [4], the authors generalized
this idea to the case where many sources supply energy to the
The authors are with the Department of Electrical and Computer Engineerng
at the University of Manitoba, Canada (emails: trant347@myumanitoba.ca,
{Ekram.Hossain,Hina.Tabassum}@umanitoba.ca).
(1)
pi (t) =
K
Q(t).
T
(2)
M
P
j6=i
other BSs. gi,0 is the channel gain between MBS and the user
served by SBS i, gi,i represents the channel gain between SBS
i and the user it serves, and gi,j is the channel gain between
SBS j and the user served by SBS i. Finally, pi (t) represents
TABLE I: List of symbols used for the single-controller stochastic game model
g
i
0 (1 )
T
I0 (t)
S
m, n
m(s, p)
(t)
R0 , R1
M
P
i=1
(5)
(6)
g
i,j
E(t)
Q(t)
Ii (t)
P
m(s), n(s)
n(s, i)
s
U0 , U1
0 , 1
M
P
pi (t)
g0,i is the average interference at the
where I0 (t) =
i=1
M
1 X
(pi (t)
gi 1 Ii (t))2 ,
M i=1
(7)
M
P
where Ii (t) =
pj (t)
gi,j + p0 (t)
gi,0 is the average interferj6=i
T
X
(8)
t=1
if s0 < S
Pr( = s0 (s Q)),
0
Ss
P
q(s |s, Q) =
(9)
1
Pr( = X),
otherwise.
X=0
Fig. 1: Graphical illustration of the two BSs A, B, and the user D located
within the disk centered at B.
2
M
M
X
1 X
max
pi gi 1 (
pj gij + p0 gi,0 ) ,
p1 ,p2 ,...,pM
M i=1
j6=i
s.t.
M
X
i=1
pi =
K
Q,
T
Pmax pi 0,
i = 1, 2, ..., M,
(10)
s.t.
S
X
s 1 (s, m0 , n),
S
X
s. t.
[m(R0 + R1 )x T 1 1T ],
H1 R1T m,
xT H = T ,
R0s x(s) s 1,
m(s)T1 = 1,
(13)
s = 0, ..., S,
s = 0, ..., S,
m, x 0,
where s is the maximum average payoff of the MBS at state s.
The sub-vector strategy n(s) of the CEQ at state s is calculated
from x as:
s=0
m,x,1 ,
n(s) =
x(s)
.
x(s)T 1
(14)
s0 =0
s, j,
0 j s and 0 s S
(11)
P
with r1 (s, m0 , j) = p0 P U1 (p0 , j)m0 (s, p0 ) is the average payoff for the CEQ at state s when it consumes j quanta
of energy. Using the Dirac function , the dual problem can
be expressed as
max
x
s.t.
S X
s
X
s
S X
X
r1 (s, m0 , j)xs,j ,
s=0 j=0
s=0 j=0
xs,j 0 s, j,
0 j s and 0 s S
(12)
s.t. H1 R1T m0 ,
(P)
max mT0 R1 x,
x
s.t. xT H = T ,
x 0,
(D)
m(R0 + R1 )x T 1 1T = 0.
(15)
m,x,1 ,
s.t.
m(R0 + R1 )x T 1 1T = 0,
(16)
2:
3:
4:
5:
SBSs go off and some are turned on, or when the average
of channel fading gain h changes, or when distribution of
energy arrival (t) at the CEQ is changes. Thanks to the
central design, SBSs and MBS only need to send information
about their channel gains to the CEQ to calculate the Nash
equilibrium at the beginning. Later, for each time slot, only
CEQ and MBS need to exchange information, therefore the
overhead for communications is small.
IV. M EAN F IELD G AME (MFG) FOR L ARGE N UMBER OF
S MALL C ELLS
The main problem of the two-player single-controller
stochastic game is the curse of dimensions. The time
complexity of Algorithm 1 increases exponentially with the
number of states S or the maximum battery size. Note that
R0 and R1 have dimensions of |P| S(S + 1)/2, so the
complexity increases proportionally to S. Moreover, unlike
other optimization problems, we are unable to relax the QCQP
in (16), because Theorem 1 states that the Nash equilibrium
must be the global solution of the quadratic programming (13).
To tackle these problems, we extend the stochastic game model
to an MFG model for very large number of players.
The main idea of an MFG is the assumption of similarity,
i.e., all players are identical and follow the same strategy.
They can only be differentiated by their state vectors. If
the number of players is very large, we can assume that the
effect of a specific player to other players is nearly negligible.
Therefore, in an MFG, a player does not care about others
states but only act according to a mean field m(t, s), which
usually is the probability distribution of state s at time instant
t [18]. In our energy harvesting game, the state is the battery
E and the mean field m(t, E) is the probability distribution
of energy at time t in the area we are considering. When
the number of players M is very large, we can assume that
m(t, E) is a smooth continuous distribution function. We will
show that the average interference at a SBS as a function of
the mean field m.
All the symbols that are used in this section are listed in
Table II.
A. Formulation of the MFG
Denote by E(v) the available energy in the battery of an
SBS at time v. Given the transmission strategies of other SBSs,
each SBS will try to maximize its long run generic utility value
function by solving the following optimal control problem:
"Z
#
T
2
min U (0, E(0)) = E
(p(v, E(v))g (I(v) + N0 )) dv ,
p
(17)
s.t.
(18)
E(v) 0, p(v) 0,
(19)
TABLE II: List of symbols used for the MFG game model
g
E
R
Wt
p(t, E)
p(t, R)
m(t, E)
m(t, R)
p(t)
the number of SBSs in a macrocell and assuming that the Forward-Backward differential equations from the above probother SBSs have the same average channel gain g to the user lem. Then, we apply Finite Difference method to numerically
of the current generic SBS, then, the average interference solve these equations,
I(v) at the user served by a generic RSBS can be expressed
as I(v) = M gp(v), where p(v) = 0 p(v, E)m(v, E)dE B. Forward-Backward Equations of MFG
can be understood as the average transmit power of another
Lets assume the optimal control above starts at time t with
generic SBS. Since the MFG assumes similarity, p(v) can be T t 0, we obtain the Bellman function U (t, R) as
#
"Z
considered as the average transmit power of a generic SBS at
T
=
2
time v. To make the notation simpler, we denote
gM .
(p(v, R(v))g
p(v) N0 ) dv .
U (t, R(t)) = E
t
Thanks to similarity, all the SBSs have the same set of
(23)
equations and constraints, so the optimal control problem for
the M SBSs reduces to finding the optimal policy for only one From this function, at time t, we obtain the following
generic SBS. Mathematically, if an SBS has infinite available Hamilton-Jacobi-Bellman (HJB) [21] equation:
n
o
energy, i.e., E(0) = , it will act as an MBS. However, for
p(t) N0 2 p(t, R)eR R U (t, R)
min p(t, R)g
p0
simplicity, we will assume that only the SBSs are involved
in the game and the interference from the MBS is constant,
2 2R 2
e
RR U = 0,
+
U
+
t
which is included in the noise N0 as in [19]. Except that, the
2
(24)
system model and the optimization problem here are similar
R R
to the case with the discrete stochastic game model.
where p(t) =
e p(t, R)m(t, R)dR is the average
Assuming that the SBSs are uniformly distributed within transmit power ata generic SBS. The Hamiltonian
n
o
the macrocell with radius r centered at the MBS, the average
p(t) N0 2 p(t, R)eR R U (t, R)
min p(t, R)g
interference from the MBS to a generic user served by an p0
SBS can be easily derived by using a method similar to that is given by the Bellmans principle of optimality. By applying
described in Remark 1. For the MFG model, the energy level the first order necessary condition, we obtain the optimal
E is a continuous non-negative variable. The equality in (18) power control as follows:
+
shows the evolution of the battery, where is a constant
p(t) + N0
eR R U
p (t, R) =
+
.
(25)
which is proportional to the maximum energy arrival during
g
2g 2
a time interval. Wv is a Wiener process, thus dWv = v dv,
where v is a Gaussian random variable with mean zero and Remark 3. The Bellman U , if exists, is a non-increasing
variance 1. This model of evolution for battery energy was function of time and energy. Therefore, we have R U 0
mentioned in [20]. The inflexibility of the energy arrival is and t U 0.
p(t) +
the main disadvantage of using the MFG model compared
From equation (25), given the current interference
to the discrete stochastic game model. The random arrival of N0 at a user, the corresponding SBS will transmit less power
energy is configured as noise, so this can be either positive based on the future prospect eR 2R U . If the future prospect is
2g
R
or negative. We can consider the negative part as the battery
0
too small, i.e., e 2g2R U < p(t)+N
, it stops transmission
g
leakage and internal energy consumption. The final inequalito save energy.
ties are the causality constraints: The battery state E(v) and
Replacing p back to the HJB equation we have
transmit power must always be non-negative. To guarantee this
2 2R 2
positivity we follow [21] and change the energy variable E(v)
p(t) + N )2
U
+
e
RR U + (
t
2
to E(v) = eR(v) . This conversion is a bijection map from
+ !2
E(v) to R(v), thus we can write m(v, E) = m(v, R) and
eR R U
p(t) + N +
= 0,
p(v, E) = p(v, R), where > R > . The new optimal
2g
control problem can be rewritten as
(26)
"Z
#
T
has a simpler form as follows:
p(v) N0 )2 dv which
min U (0, R(0)) = E
(p(v, R(v))g
,
p(.)
0
2 2R 2
2
p(t) + N )2 .
e
RR U = (pg) (
U
+
(27)
t
(20)
2
s.t. dR(v) = p(v, R(v))eR(v) dv + eR(v) dWv , (21) Also, from (21), at time t, we have the Fokker-Planck equation
p(v) 0.
(22) [18] as:
To obtain the power control policy p, first, we derive the
t m(t, R) = R (peR m) +
2
RR (me2R ),
2
(28)
p(t) + N0
eR R U
,
+
p(t, R) =
g
2g 2
(30)
2
2
t m(t, R) =R (peR m) +
(e2R m),
2 RR
(31)
Z
p(t) =
eR p(t, R)m(t, R)dR,
(32)
where
m(t, R) 0.
(33)
R
Now
R we substitute (p1 p2 )m = f (t)e that results in
f (t)dR = 0 t. This means f (t) = 0 or p1 = p2 .
2 t
t
A2 +
B2 + m(t 1, R),
2R
2(R)2
(40)
where
A2 =e(R+1)R p(t 1, R + 1)m(t 1, R + 1)
e(R1)R p(t 1, R 1)m(t 1, R 1),
B2 =e
2(R+1)R
+e
m(t 1, R + 1) 2e
2(R1)R
2RR
(41)
m(t 1, R)
m(t 1, R 1).
(42)
m(t, R)R = 1
R=Rmax
m(t, Rmax ) =
R
RX
max
R=Rmax +1
(43)
m(t, R).
.
+
p(t, Rmax ) =
g
2g 2
(44)
Therefore, if we know U (t, Rmax 1) and p, we can
calculate U (t, Rmax ).
Similarly, it must be true that when available energy
is 0, i.e., R = Rmax , a SBS will stop transmission.
Therefore, we can assume
p(t) + N
eRmax R R U (t, Rmax )
= 0. (45)
+
g
2g 2
Again, if we know U (t, Rmax + 1), we can calculate
the boundary U (t, Rmax ).
During simulations, in some cases when the density
is very high, we obtain very large (unrealistic) values
of transmit power. Therefore, we must put an extra
constraint for the upper limit. In this paper, we use
E(t) > p(t, E(t))T , or eR(t)R p(t, R(t))T ,
where T is the duration of one time slot. This means
we have to limit the transmit power during one time step
t to be smaller than the maximum power that can be
transmitted during one time interval T .
Based on the above considerations, we develop an iterative
algorithm (Algorithm 2) detailed as follows.
D. Implementation of MFG
For the MFG, we do not need the location information for
each SBS. However, we need information about the average
channel gain g, g, the number of SBSs M in one macrocell,
and the initial distribution m0 of the energy of SBSs in the
macrocell. Therefore, some central system should measure
these information, solve the differential equations, and then
broadcast the power policy p to all the SBSs. It is more
efficient than broadcasting all the information to all SBSs and
let them solve the differential equations by themselves. Again,
the central system only needs to re-calculate and broadcast to
all SBSs a new power policy if there are changes in g, g, or
M.
V. S IMULATION R ESULTS AND D ISCUSSIONS
A. Single-Controller Stochastic Game
In this section, we quantify the efficacy of the developed
stochastic policy in comparison to the greedy power control
policy. The stochastic policy is obtained from the QCQP
problem. On the other hand, in the greedy policy, we follow
a hierarchical method. First, the MBS chooses its transmit
power, then each SBS tries to transmit with the power such
that the SINR at its user is nearest to its target value, ignoring
the co-interference from other SBSs. Next, the MBS records
0.8
0.7
0.7
0.6
Outage probability
Outage probability
0.9
Greedy Method
Stochastic Method
0.5
0.4
20
30
40
50
60
70
Number of SBSs
80
90
0.6
Greedy Method
Stochastic Method
0.5
0.4
0.3
100
40
60
70
80
Volume of one energy packet
90
100
0.9
0.8
0.6
0.7
0.5
Outage probability
Outage probability
50
0.6
0.5
0.4
1.5
2.5
3
3.5
Target SINR of SBSs
4.5
5
3
x 10
Stochastic method
Greedy method
0.4
0.3
0.2
0.1
0
0.1
3
4
Target SINR of SBSs
5
3
x 10
Fig. 8: Transmit power to serve a generic user using MFG when M = 400
SBSs/cell.
5
16
x 10
Probability distribution
14
12
10
8
6
E=70 microJ
E=75 microJ
E=80 microJ
E=100 microJ
4
2
1.5
2
3
Time (sec)
0.5
1.2
1.4
1.6
1.8
Energy (Joules)
2.2
2.4
4
x 10
Again, as can be seen from Fig. 11, the SBSs with larger
available energy will transmit with large power first and after
sometime when there is less energy available in the system,
all of them start to use less power. Therefore, as can be seen
in Fig. 10, the energy distribution shifts toward the left with
a faster speed than the previous case.
This means each
For M = 600 SBSs/cell, we have g < .
SBS needs to transmit with a power larger than the average p
to achieve the target SINR. In Fig. 12, we see that the behavior
of each SBS is the same as in the previous case. That is, the
SBSs with higher energy transmit with larger power first and
then reduce it, while the poorer SBSs increase their transmit
power over time.
We compare the MFG model against the stochastic discrete
model for different values of M . For simplicity, we assume
that each SBS has the same link gain to its user as g = 0.001.
Also, we assume that the channel gain from each SBS to
the user of another SBS is g = 0.001. Using Remark 2,
it can easily be proven that in this case, each SBS will
transmit with the same power, i.e., if the CEQ sends QCK
J of energy to M SBSs, then each SBS receives QCK/M
J. Then, the interference at each SBS will be calculated
g
as (M 1)g+N0 M
T /(QCK) with multiplier C = 20 and the
2.5
0.03
Probability distribution
t=0
t=1s
t=2.5s
t=5s
1.5
0.5
0.025
0.02
0.015
E=70 microJ
E=90 microJ
E=110 microJ
E=150 microJ
0.01
1.2
1.4
1.6
1.8
Energy (Joules)
2.2
2.4
0.005
x 10
Time (sec)
4.5
E=70 microJ
E=75 microJ
E=80 microJ
E=100 microJ
0.016
0.014
0.012
x 10
4
MFG method
MDP method
Target SINR
3.5
Average SINR value
0.018
0.01
0.008
0.006
0.004
3
2.5
2
1.5
0.002
0
1
0
Time (sec)
Fig. 11: Transmit power for different energy levels when M = 500
SBSs/cell.
0.5
0
100
200
900
1000
and
hand side. That means when the game in state s, the MBS
can choose this power level with probability of 1. However,
obtaining a closed-form ps0 is difficult because first we need
to find (p1 , ..., pM ) in closed-form by solving (10).
Nevertheless, if the average channel gains from each SBS
to the macrocell user (say g0,SBS ) are same, we can obtain
M
s , j) = P pi g0,SBS + N0 =
ps0 in closed form by defining I(p
0
i=1
PM
K
j
g0,SBS + N0 . The final equality
g0,SBS i=1 pi + N0 = T
is from (2). Replacing this result back into (B-2), we have
E[U1 ] max
s
p0 P
s
X
K
(ps0 g0 0 (
j
g0,SBS + N0 ))2 n(s, j).
T
j=0
(B-3)
The right hand side of this inequality is a strictly concave
function
(downward parabola) with respect to ps0 . Note that
Ps
j=0 n(s, j) = 1. The parabola will achieve the maximum
value at its vertex given by
Ps
K
g0,SBS j + N0 n(s, j)
0 j=0 T
s
.
(B-4)
p0 =
g0
If ps
0 is not available in P, since the right hand side of the
inequality above is a parabola w.r.t. ps0 , the best response ps0
to n(s) is the one nearest to the vertex ps
0 .
On the other hand, given strategy m of the MBS, the
problem of finding the best response strategy n for the CEQ is
simplified into a simple MDP in (11). Then, there always exists
a pure stationary strategy n [17, Chapter 2]. This completes
the proof.
A PPENDIX C
P ROOF OF L EMMA 2
Using from the stochastic differential equation in (18) at
time t, dE(t) = p(t, E(t))dt + dWt , we get the integral
form as follows:
Z t+t0
Z t+t0
E(t + t0 ) E(t) =
p(v, E(v))dv +
dWv
t
ps0 P j=0
(B-1)
where ps0 and j {0, ..., s} are the transmit power of the
MBS and the number of energy packets distributed at the CEQ
s , j) is the average interference
at state s, respectively. I(p
0
from other SBSs to the MBS if the CEQ distributes j energy
packets and the MBS transmits with power ps0 . We have
M
s , j) = P pi g0,i +N,where (p1 , p2 , ..., pM ) is the solution
I(p
0
i=1
s
of Remark 2, with
P Q and ps0 replaced by j and p0 ,
respectively. Since
ps0 P m(s, p0 ) = 1 and ms is a nonnegative vector, we have
s
X
s
s0 , j))2 n(s, j) . (B-2)
E[U1 ] max
(p
g
I(p
0
0
0
ps0 P
j=0
Z
0
(C-2)
where m(t, E) is the distribution of E in the system at time
instant t. Using the fact that W is a Wiener process, Wt+t0
d Em(t, E)dE
Z
0
= p(t, E)m(t, E)dE =
p(t).
dt
0
(C-3)
Using m(t, R) = m(t, E), dE = eR dR, and changing the
variable E to R we complete the proof.
R EFERENCES
[1] K. T. Tran, H. Tabassum, and E. Hossain, A stochastic power control
game for two-tier cellular networks with energy harvesting small cells,
Proc. IEEE Globecom14, Austin, TX, USA, 8-12 December, 2014.
[2] M. Deruyck, D. D. Vulter, W. Joseph, and L. Martens, Modelling
the power consumption in femtocell networks, IEEE WCNC 2012
Workshop on Future Green Communications.
[3] B. Gurakan, O. Ozel, J. Yang, and S. Ukulus, Energy cooperation in
energy harvesting communication, IEEE Transactions on Communications, vol. 61, no. 12, Dec. 2013, pp. 48844898.
[4] Z. Ding, S. Perlaza, I. Esnaola, and H. Poor, Power allocation strategies
in energy harvesting wireless cooperative networks, IEEE Transactions
on Wireless Communications, Jan. 2014, pp. 846860.
[5] N. Michelusi, K. Stamatiou, and M. Zorzi, Transmission policies for
energy harvesting sensors with time-correlated energy supply, IEEE
Transactions on Communications, vol. 61, no. 7, Jul. 2013, pp. 2988
3001.
[6] H. Dhillon, Y. Li, P. Nuggehalli, Z. Pi, and J. Andrews, Fundamentals of heterogeneous cellular networks with energy harvesting,
www.arxiv.org/pdf/1307.1524, Jan. 2014.
[7] D. Gunduz, K. Stamatiou, M. Zorzi, Designing intelligent energy
harvesting communication systems, IEEE Communications Magazine,
vol. 52, no. 1, Jan. 2014, pp. 210216
[8] P. Semasinghe and E. Hossain, Downlink power control in selforganizing dense small cells underlaying macrocells: A mean field
game, IEEE Transactions on Mobile Computing, to appear.
[9] K. Huang and V. K. Lau, Enabling wireless power transfer in cellular
networks: Architecture, modeling and deployment, IEEE Transactions
on Wireless Communications, vol. 13, no. 2, Feb. 2014, pp. 902912.
[10] H. Tabassum, E. Hossain, A. Ogundipe, and D.I. Kim, Wireless-Powered
Cellular Networks: Key Challenges and Solution Techniques, IEEE
Communications Magazine, to appear.
[11] C-RAN the road towards green RAN, White Paper, China Mobile
Research Institure, October 2011.
[12] P. Semasinghe, E. Hossain, and K. Zhu, An evolutionary game for
distributed resource allocation in self-organizing small cells, IEEE
Transactions on Mobile Computing, DOI 10.1109/TMC.2014.2318700.
[13] M. Bowling and M. Veloso, An analysis of stochastic game theory for
multiagent reinforcement learning, Technical Report, Carnegie Mellon
University.
[14] J. Hu and M. P. Wellman, Nash Q-learning for general-sum stochastic
games, Journal of Machine Learning Research, vol. 4, Nov. 2003, pp.
10391069.
[15] J. A. Filar, Quadratic programming and the single-controller stochastic
games, Journal of Mathematical Analysis and Applications, vol. 113,
no. 1, Jan. 1986, pp. 136147.
[16] V. Chandrasekhar and J. Andrews, Power control in two-tier femtocell
networks, IEEE Transactions on Wireless Communications, vol. 8, no.
8, Aug. 2009, pp. 43164328.
[17] J. Filar and K. Vrieze, Competitive Markov Decision Process. Springer
1997.
[18] O. Gueant, J. M. Lasry, and P. L. Lions, Mean field games and applications, Paris-Princeton Lectures on Mathematical Finance, Springer,
2011, pp. 205266.
[19] A. Y. Al-Zahrani, R. Yu, and M. Huang, A joint cross-layer and colayer interference management scheme in hyper-dense heterogeneous
networks using mean-field game theory, IEEE Transaction of Vehicular
Technology, to appear, Oct. 2014.
[20] H. Tembine, R. Tempone, and P. Vilanova, Mean field games for
cognitive radio networks, in Proc. of American Control Conference,
June 2012.