Wang2018 PDF

2018 IEEE 7th Data Driven Control and Learning Systems Conference
May 25-27, 2018, Enshi, Hubei Province, China
Simulation Model for the AGC System of Isolated Microgrid Based on

Q-learning Method
Penghu Wang, Hao Tang*, Kai Lv
School of Electrical Engineering and Automation, Hefei University of Technology, Hefei 230009, China
*Corresponding author: htang@hfut.edu.cn
Abstract: The automatic generation control (AGC) in isolated microgrid with multiple distributed energy resources is concerned
in this study. First, the load frequency control (LFC) model of an isolated microgrid, which contains diesel engine generators,
super-magnetic magnetic energy storage, wind turbines and photovoltaic power system, is established through the analysis of the
power generation characteristics of each distributed generation (DG). The LFC model of an isolated microgrid is built by
MATLAB/Simulink with diesel generators as frequency control units. Based on the AGC principle of power grid, the AGC
controller of the microgrid system is designed by the Q learning algorithm based on the discount compensation model to
complete the frequency control. The simulation results verify the feasibility of the isolated microgrid model, showing the
efficient dynamic performance of Q controller by compared with PI controller.
Key Words: Microgrid, Distributed generation, Automatic generation control, Load frequency control, Q-learning algorithm
with AGC centralized power generation control method. The

1 Introduction application of the AGC method to the distributed power
The isolated microgrid has the advantages of less generation system is proposed in [4], in which the probability
pollution, high reliability, and high energy utilization of the use of the diesel generators as the AGC units was
efficiency. However, new energies, such as photovoltaic and discussed. And the literature [5] proposes that the AGC
wind power, are intermittent, random, and volatile with the centralized control method based on reinforcement learning
random changes of load demand, which could make the can be applied to the problem of frequency control in
active power of the islanding microgrid out of balance as microgrid, seeing LFC model as stochastic dynamic decision
well as the frequency greatly deviated from the rated value. problem. An Auxiliary Signal is added for correcting input of
Eventually, the standalone microgrid will be beyond the governor in the AGC Loop to improve frequency response of
range of safe operation. Therefore, to stabilize the frequency microgrid system in [6].
of microgrid while maintaining the voltage of it stable, it is Based on the above documentations, an isolated microgrid
necessary to control frequency of isolated microgrid [1]. LFC model, consisting of diesel generators (DEGs),
AGC refers to the secondary adjustment of the output of superconducting magnetic energy storage system (SMES),
some units of the power grid within the specified output wind turbine generators (WTGs) and photovoltaic (PV)
range, meeting the power system frequency balance and generation system, is established. According to the power
tie-line power control requirements by the control and characteristics of each distributed generation, the LFC model
adjustment of real-time power generation output. Nowadays, of each distributed generation and load is established
AGC is an important technical method for power system separately. The AGC system of isolated microgrid is
control of active power balance and frequency regulation in modeled as a discrete time Markov process decision process
power grids. (DTMDP). The controller based on Q learning algorithm is
Currently, there has been a lot of attention and researches used to automatically control the island microgrid, realizing
on the AGC problem of interconnected power grids, but few the adjustment of active power and frequency.
studies on issues of isolated microgrid such as load 2 The LFC Model of Isolated Microgrid
frequency control. Moreover, domestic and foreign scholars
mostly focused on the optimization control and analysis of The LFC model of the isolated microgrid represented in
DGs for the frequency control of microgrid. The control this work is shown in Fig. 1, which contains DEGs, SMES,
strategies of inverters in a microgrid are addressed in [2], WTGs and PV system these five submodules. From this
which is based on virtual synchronous generator. And a figure, we can see that the stochastic uncertainty of PV,
virtual inertia control strategy of doubly fed induction WTGs and Load demand ( ΔPw , ΔPpv , ΔPl ) cause frequency
generator (DFIG) is represented in [3], combining disturbance of microgrid system; to maintain system stability,
characteristic of virtual inertial control and pitch angle DEGs and SMES response system disturbances with primary
control of DFIG with primary frequency control of diesel frequency control, achieving droop control to avoid system
generator, to provide a transient frequency support for instability. Then the feedback data of frequency deviation
microgrid. ( Δf ) is transferred to Q-learning controller to choose output
Some of the domestic and foreign scholars proposed to
solve the load frequency control problem of the microgrid action ( ΔPz ) depending on current system state, adjusting
DEGs’ output power, which could ensure generation-load
*This work is supported by National Key R&D Program of China
(2017YFB0902600).
978-1-5386-2618-4/18/$31.00 ©2018 IEEE DDCLS'18

1213
balance to complete the secondary load-frequency control of 2.3 Model of WTGs
system.
The power of WTGs is vulnerable to the impacts of
climate and geographical conditions, whose output is a
random variable related to wind speed. The WTGs are
modeled on the basis of power as a function of wind speed in
Δf [10]. Therefore, we adopt the model of WTGs in [10]. The
formula of output Pw of WTGs with wind speed vw is given
in (4) [11].
ΔPw ΔPsmes ΔPz ΔPpv ΔPl 0, vw < vcut −in || vw > vcut −out
 Pw _ rated , vrated ≤ vw ≤ vcut − out
Pw =  (4)
(0.001312 vw − 0.04603vw5 + 0.3314vw4
6
+3.687vw3 − 51.vw2 + 2.33vw + 366), else

Fig. 1: The LFC model of isolated microgrid where Pw _ rated is rated power of WTGs, vcut −in represents
2.1 Model of DEGs cut-in wind speed, and vcut − out as cut-out wind speed, vrated
means normal wind speed, beyond which the power of
The DEGs is used in this work, whose rated power is
WTGs could maintain as Pw _ rated .
12.8KW. The expression of the power Pz (kW) of DEG with
fuel consumption h (L) characterized in (1) [7]. 2.4 Model of PV System
Pz = 0.009766h 2 +0.0625h + 1.4 (1)
The light intensity is easily affected by the climate,
Assuming that the number of DEGs is Z , the number of environment and other factors, so it is highly random. Since
operating units z ∈ Φ1 = {0,1, , Z } , z = 0 denotes that all the PV generation power is directly related to the light
of DEGs are turned off. Since we only think of the power of intensity under the condition of constant ambient
DEGs, the combination of inside control of DEGs is not temperature, the relationship of the PV generation power per
considered. Besides, DEGs are used as primary and unit area of PV panels with the light intensity is characterized
secondary frequency control units, whose LFC model is in (5) [12].
shown in Fig. 2. G
Ppv = Pe [1 + K t (Ta + 0.0256G − Tc )]ηc (5)
Gc
1
R1
where Ppv is PV power, Pe is the rated output of PV
P0 1 1 Kp Δf system， G is light intensity ( W / m 2 )， Gc is reference light

Tg s + 1 Tt s + 1 1 + Tp s intensity, Kt is temperature coefficient, Ta means
environmental temperature, Tc as reference temperature,
Fig. 2: The LFC model of DEGs and ηc is coefficient of PV system generation.
2.2 Model of SMES 2.5 Model of Load Demand
As new energy storage technology, the advantages of Since the factors (weather, temperature, holidays, etc.)
SMES are quick response, high power intensity, which is affecting the load demand in the microgrid are very complex
applicable to microgrid system to improve system stability and highly uncertain, making the load demand changes
[8]. The expression of electricity stored in the coil magnetic randomly and difficult to predict [13], we consider that the
field is characterized in (2) and its power calculation formula load demand change process can be described as a discrete
is given in (3). time Markov process (DTMP) model.
1 2 The load demand is divided into two parts: the basic load
E= LI (2)
2 and the load disturbance, where the load disturbance is set as
∂I ΔPl , the maximum load disturbance is ΔPl max , and the load
Psems = LI = VI (3)
∂t disturbance can be discretized into 2 N l + 1 load levels, i.e.,
E is the energy stored in coil, L is the value of l ∈ Φ 2 = {− N l ,..., −1, 0,1,..., N l } , and N l = ΔPl max / ΔPl min ,
inductance, I is DC current. And as the primary frequency
control unit, the LFC model of SMES is shown in Fig. 3 [9]. where ΔPl min is the load disturbance unit increments.
Let the load disturbance level change in the unit
1 perturbation cycle be a Markov chain with a probability
R1
matrix of P l , and the probability of the load disturbance
P0 1 1 Kp Δf level shifting from grade i to grade j is Pijl .
Ty s + 1 Tw s + 1 1 + Tp
Fig. 3: The LFC model of SMES
DDCLS'18
1214
3 AGC Controller Based on Q-learning  Psk +1 ( ag ) = Psk (ag ) + β [1 − Psk ( ag )]
 k +1
 Ps ( a) = Ps ( a)(1 − β ), ∀a ∈ A, a ≠ ag (8)
k
3.1 System Mathematical Model
 P ( a ) = P k (a ), ∀a ∈ A, ∀s ∈ S , s ≠ s
k +1
The AGC system of the isolated microgrid can be  s g s
considered as an uncertain stochastic system, which could be where β ∈ (0,1) represents the speed of the update action
modeled as DTMDP [14]. probability, the larger the value of β is, then the closer the
In this paper, the average value of Δf collected in the control action is to the greedy policy. β is taken as 0.1 in
AGC decision cycle is taken as the system state quantity s ,
this work. Psk (a) represents the probability of selecting
which is discretized into finite state intervals. The state set is
denoted by S . The controller controls the output adjustment action a when system state is s after k th iteration.
of units according to the divided state, and sets [ −ε , ε ] as The Q learning algorithm process is shown in Table 1.
the frequency adjustment dead zone. The value of Δf is
Table1: Learning process of algorithm
regarded as zero in the dead zone range, in which the
controller does not response, and Fmax is expressed as the Step1 Set AGC decision time Ts , initialize policy
system frequency safety critical value. 1
as Ps0 (a)= , ∀a ∈ D , each element of initial
|D|
The adjustment ΔPz sent by the controller to the diesel
Q-value table is 0; set discount factor γ and learning
generators is taken as system action a , and the action set is steps N ;
denoted by A . We discretize ΔPz into a limited set of output Step2 k = 0 , initialize system state s0 randomly;
levels, where ΔPmin is the minimum adjustment and ΔPmax Step3 Observe current system state sk , and select action
is the maximum adjustment, then the action set can be ak with current policy Psk (a ) ;
divided into 2 N p + 1 levels. Step4 Execute action ak , read the disturbances from load
model, and observe AGC information in the next
Based on the power quality and security standards of the decision time to obtain sk +1 ;
microgrid system, the cost function obtained by the system in
Step5 Calculate current cost C (k ) by (6), and update
the k th decision cycle is as follows:
Q ( sk , ak ) in Q-value table by (7), update current
0, s ∈ [−ε , ε ]
 (6) policy by (8);
C (k ) = λ1 s 2 ,| s |∈ (ε , Fmax ] Step6 If n = N , the learning process is over, otherwise,
λ s 2 ,| s |∈ ( F , +∞ )
 2 max transfer to Step 3.
In the formula, λ1 and λ2 are the cost function weights,
4 Simulation Results
which are 5 and 10 respectively. Fmax is expressed as the
In order to prove the feasibility of Q-learning controller
system frequency safety threshold.
applied in island microgrid, the simulation model of
3.2 Solutions standalone microgrid AGC system is established by
MATLAB/Simulink experimental platform, as shown in
The Q-learning algorithm as a basic reinforcement
Fig.4. To verify the dynamic frequency control performance
learning algorithm is a model-free reinforcement learning
of the Q controller, the simulation experiment was
method proposed by C Watkins [15]. State-action value
performed by the Q controller and the PI controller,
function Q is used as an estimation function during iteration,
respectively.
and the basic form of the Q-learning algorithm can be as K
follows:
1
Q( s (k ), a (k )) = (1 − γ )Q[ s ( k ), a(k )] + R
(7)
γ {C (k ) − η + α min Q* [ s (k + 1), a ]}
a∈D
1 1 Kp
where η is the average cost, γ is the learning step size, α Tg s + 1 Tt s + 1
Δf
1 + Tp
is the discount value, and s (k + 1) is the next state of action
a (k ) for state s (k ) .
A tracking algorithm strategy based on random
probability selecting action is adopted in this work. First, Fig. 4: Block diagram of AGC system for the isolated microgrid
initialize the probability of arbitrary action in each state to Assuming that the total adjustable margin of the microgrid
make them equal. Then, during the process of updating the is 0.2 pu (per unit), the action set can be taken as D = {0.2,
random strategy, the data of Q-value table will constantly 0.1, 0.05, 0.03, 0.01, 0, -0.01, -0.03, -0.05, -0.1, -0.2}; With
update, and the action selection probability will be updated reference to the national standard for power quality, a system
according to Equation (8). with a capacity of 3000 MW or less has frequency tolerance
of 50±0.5Hz.
DDCLS'18
1215
Based on it, we can set Fmax = 0.5 , ε =0.5 and divide the 5
system state space according to the value of Fmax and ε , ie 4
S ={( +∞ ,0.5], (0.5,0.4], (0.4,0.3], (0.3,0.2], (0.2,0.05), 3

[0.05, -0.05], (-0.05, -0.2), [-0.2,-0.3), [-0.3,-0.4), [-0.4,-0.5),
Cost
[-0.5, −∞ )}. 2
The Q-learning controller needs to pre-learn the microgrid 1

system before applied to the frequency control of the
microgrid system, because the learning and exploration will 0
0 2 4 6 8 10 12
be performed in the initial stage of learning process, which Step /1000
could cause instability of microgrid system with a large Fig. 7: Cost assessment of Q controller.
number of random actions being output. The learning convergence process of the Q controller
We add a square wave amplitude of 0.2 pu and period of under random disturbances is shown in Fig. 8. The
600 s to the LFC model of the micro-grid system at t = 0.1 s, disturbance of a discrete time Markov process with period of
as shown in Fig. 5. 100s and amplitude of -0.2 pu to 0.2 pu is added to the
0.3 standalone microgrid. In Fig. 8, 0-5000s is the initial
learning stage of the Q controller, in which the Q controller is
0.2 learning and exploring strategies. It outputs a large number
of random instructions and cannot track random load
disturbances.
Δ L /pu
0.1
From 10000 to 20000 s, it can be seen that the Q controller
0
can roughly track load disturbances, but with small
deviations, still. Besides that, the output has occasional
spikes. From 30000 to 40000s, Q controller has been able to
-0.1
0 0.5 1 1.5 2 2.5 accurately track the load disturbance based on the previous
Time t/s 4
x 10 learning, indicating that the Q controller has completed the
Fig. 5: Curve of Load disturbances learning process.
The learning process of the Q learning controller tracking
load disturbance is given in Fig. 6. It can be seen that the Q 0.2 0.2
Δ P /pu
Δ P /pu
controller is in the strategy exploration stage in the initial 0 0
stage of 0-2000s, with the random strategies continuously -0.2 -0.2

output. The Q-learning controller’s output quickly converges 0 5000 10000 0 500 1000
Time t/s Time t/s
to the approximate disturbance range at 2000s, but with a
small deviation. At 18000s, it is obvious that the Q controller 0.2 0.2
Δ P /pu
Δ P /pu
has been able to output a stabilization strategy and track 0 0
disturbances precisely, which means Q controller completes -0.2 -0.2

its learning process. 1 1.5 2 1.4 1.45 1.5
Time t/s 4 Time t/s 4
x 10 x 10
0.2 0.2
0.2
Δ P /pu
Δ P /pu
0 0
0.1 -0.2 -0.2

3 3.5 4 3.9 3.95 4
Δ P /pu
Time t/s 4 Time t/s 4

0 x 10 x 10
Load Q
-0.1
Fig. 8: Learning process of Q controller with random
-0.2
disturbances
To verify the dynamic response performance of the
0 0.5 1 1.5 2 2.5
Q-controller, the Q-learning controller is compared with the
Time t/s 4
x 10 well-tuned PI controller. A step disturbance with an
Fig. 6: Learning process of Q controller amplitude of 0.2 pu at t=0.1s is added in the system. And the
The strategy learned by the Q controller is evaluated each controller output comparison is shown in Fig. 9.
1000 steps, as shown in Fig. 7. From the figure, the cost As can be seen from Fig. 9 (a), PI controller output
curve quickly converges at the first 2,000 steps. At immediately after the disturbance occurred. But there is
2000-9000 steps, the strategy is continuously explored to overshoot, the output is not stable and the disturbance is
find the optimal strategy. At approximately 9000 steps, the tracking is stable after approximately 5s, which is
cost curve eventually converges to a stable value, indicating unfavorable to the stability of the system frequency.
that the optimal strategy has been learned. Meantime, it could reduce the service life of the units. The Q
controller outputs power after the decision cycle. However,
it can accurately track the amplitude of the disturbance and
avoid the overshoot output of the unit.
DDCLS'18
1216
Fig. 9 (b) shows the frequency deviation curve of the Q References
controller and PI controller after the disturbance response. It
[1] L. Zongxiang, W. Caixia, M. Yong, et al. Overview on
also can be seen that the Q controller stabilizes before the PI
microgrid research. Automation of Electric Power Systems,
controller. 31(19): 100-107, 2007.
[2] M. Ding, X. Yang, J. Su. Control strategies of inverters based
Load on virtual synchronous generator in a microgrid, Automation
0.3
PI of Electric Power Systems, 33(8): 89-93, 2009.
0.25 Q [3] Z. Jingjing, L. Xue, F. Yang. Dynamic frequency control
0.2 strategy of wind/photovoltaic/diesel microgrid based on
DFIG virtual inertia control and pitch angle control,
ΔP /pu
0.15
0.1
Proceedings of the CSEE, 35(15): 3815-3822, 2015.
[4] L. Chen, J. Zhong, D. Gan. Optimal automatic generation
control (AGC) dispatching and its control performance
0.05
0 analysis for the distribution systems with DGs, in Power

-0.05 Engineering Society General Meeting, 2007: 1-6, 2007.
0 5 10 15
Time t/s [5] T. Yu, H. Liang, B. Zhou. Smart power generation control for
microgrids islanded operation based on R (λ) learning,
(a) Output comparison of PI and Q Power System Protection and Control, 40(13):7-13, 2012.
[6] S. Mishra, T. Sathiyanarayanan. Improving the Frequency
0.5 response of a Microgrid using an Auxiliary Signal in the AGC
PI Loop, IFAC-PapersOnLine, 48(30): 300-305, 2015.
Q [7] G. G. Moshi, M. Pedico, C. Bovo, et al. Optimal generation
scheduling of small diesel generators in a microgrid, in IEEE
International Energy Conference, 2014: 867-873, 2014.
Δ f /Hz
0 [8] M. M. Aly, M. A. Akher, S. M. Said, et al. A developed

control strategy for mitigating wind power generation
transients using superconducting magnetic energy storage
with reactive power support, International Journal of
Electrical Power & Energy Systems, 83: 485-494, 2016.
-0.5
0 5 10 15 20 [9] S. Padhan, R. K. Sahu, S. Panda. Automatic generation
Time t/s control with thyristor controlled series compensator
including superconducting magnetic energy storage units,
(b) Frequency deviation comparison of PI and Q Ain Shams Engineering Journal, 5(3): 759-774, 2014.
[10] A. A. El-Fergany, M. A. El-Hameed. Efficient frequency
Fig. 9: Comparison curves of PI controller and Q controller in
controllers for autonomous two-area hybrid microgrid system
microgrid.
using social-spider optimizer, IET Generation, Transmission
5 Conclusions & Distribution, 11(3): 637-648, 2017.
[11] M. Lydia, S. S. Kumar, A. I. Selvakumar, et al. A
Under the background of numerous scholars actively comprehensive review on wind turbine power curve
studying the smart grid, it is worth in-depth researching and modeling techniques, Renewable and Sustainable Energy
exploring to solve problems such as frequency adjustment Reviews, 30: 452-460, 2014.
and automatic power generation control of microgrid. In this [12] H. Borhanazad, S. Mekhilef, V. G. Ganapathy, et al.
paper, the LFC model of the isolated microgrid system is Optimization of micro-grid system using MOPSO,
Renewable Energy, 71(11): 295-306, 2014.
established, and the AGC controller based on the
[13] N. Xie, C. Yuan, Y. Yang. Forecasting China’s energy
reinforcement learning method is designed. The simulation demand and self-sufficiency rate by grey forecasting model
results verify the good dynamic performance of the AGC and Markov model, International Journal of Electrical
controller in the isolated microgrid system. Power & Energy Systems, 66: 1-8, 2015.
In the follow-up study, the combination of Q learning [14] S. Liu, P. X. Liu, A. E. Saddik. Modeling and stability
algorithm and PI controller to avoid the instability of the analysis of automatic generation control over cognitive radio
microgrid system in the early learning stage is considered. networks in smart grids, IEEE Transactions on Systems, Man,
We can also consider inside optimal control of diesel and Cybernetics: Systems, 45(2): 223-234, 2015.
generators, which could improve the economics of isolated [15] Q. Wei, F. L. Lewis, Q. Sun, et al. Discrete-time deterministic
Q-learning: A novel convergence analysis, IEEE transactions
microgrid control strategy.
on cybernetics, 47(5): 1224-1237, 2017.
DDCLS'18
1217

Wang2018 PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Wang2018 PDF

Caricato da

Copyright:

Formati disponibili

2018 IEEE 7th Data Driven Control and Learning Systems Conference

May 25-27, 2018, Enshi, Hubei Province, China

Simulation Model for the AGC System of Isolated Microgrid Based on

with AGC centralized power generation control method. The

978-1-5386-2618-4/18/$31.00 ©2018 IEEE DDCLS'18

+3.687vw3 − 51.vw2 + 2.33vw + 366), else

P0 1 1 Kp Δf system， G is light intensity ( W / m 2 )， Gc is reference light

Fig. 3: The LFC model of SMES

The AGC system of the isolated microgrid can be  s g s

system state space according to the value of Fmax and ε , ie 4

S ={( +∞ ,0.5], (0.5,0.4], (0.4,0.3], (0.3,0.2], (0.2,0.05), 3

The Q-learning controller needs to pre-learn the microgrid 1

stage of 0-2000s, with the random strategies continuously -0.2 -0.2

has been able to output a stabilization strategy and track 0 0

disturbances precisely, which means Q controller completes -0.2 -0.2

0.1 -0.2 -0.2

Time t/s 4 Time t/s 4

0 analysis for the distribution systems with DGs, in Power

0 [8] M. M. Aly, M. A. Akher, S. M. Said, et al. A developed

Potrebbero piacerti anche