Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract: The automatic generation control (AGC) in isolated microgrid with multiple distributed energy resources is concerned
in this study. First, the load frequency control (LFC) model of an isolated microgrid, which contains diesel engine generators,
super-magnetic magnetic energy storage, wind turbines and photovoltaic power system, is established through the analysis of the
power generation characteristics of each distributed generation (DG). The LFC model of an isolated microgrid is built by
MATLAB/Simulink with diesel generators as frequency control units. Based on the AGC principle of power grid, the AGC
controller of the microgrid system is designed by the Q learning algorithm based on the discount compensation model to
complete the frequency control. The simulation results verify the feasibility of the isolated microgrid model, showing the
efficient dynamic performance of Q controller by compared with PI controller.
Key Words: Microgrid, Distributed generation, Automatic generation control, Load frequency control, Q-learning algorithm
(2017YFB0902600).
2.1 Model of DEGs cut-in wind speed, and vcut − out as cut-out wind speed, vrated
means normal wind speed, beyond which the power of
The DEGs is used in this work, whose rated power is
WTGs could maintain as Pw _ rated .
12.8KW. The expression of the power Pz (kW) of DEG with
fuel consumption h (L) characterized in (1) [7]. 2.4 Model of PV System
Pz = 0.009766h 2 +0.0625h + 1.4 (1)
The light intensity is easily affected by the climate,
Assuming that the number of DEGs is Z , the number of environment and other factors, so it is highly random. Since
operating units z ∈ Φ1 = {0,1, , Z } , z = 0 denotes that all the PV generation power is directly related to the light
of DEGs are turned off. Since we only think of the power of intensity under the condition of constant ambient
DEGs, the combination of inside control of DEGs is not temperature, the relationship of the PV generation power per
considered. Besides, DEGs are used as primary and unit area of PV panels with the light intensity is characterized
secondary frequency control units, whose LFC model is in (5) [12].
shown in Fig. 2. G
Ppv = Pe [1 + K t (Ta + 0.0256G − Tc )]ηc (5)
Gc
1
R1
where Ppv is PV power, Pe is the rated output of PV
DDCLS'18
1214
3 AGC Controller Based on Q-learning Psk +1 ( ag ) = Psk (ag ) + β [1 − Psk ( ag )]
k +1
Ps ( a) = Ps ( a)(1 − β ), ∀a ∈ A, a ≠ ag (8)
k
3.1 System Mathematical Model
P ( a ) = P k (a ), ∀a ∈ A, ∀s ∈ S , s ≠ s
k +1
considered as an uncertain stochastic system, which could be where β ∈ (0,1) represents the speed of the update action
modeled as DTMDP [14]. probability, the larger the value of β is, then the closer the
In this paper, the average value of Δf collected in the control action is to the greedy policy. β is taken as 0.1 in
AGC decision cycle is taken as the system state quantity s ,
this work. Psk (a) represents the probability of selecting
which is discretized into finite state intervals. The state set is
denoted by S . The controller controls the output adjustment action a when system state is s after k th iteration.
of units according to the divided state, and sets [ −ε , ε ] as The Q learning algorithm process is shown in Table 1.
the frequency adjustment dead zone. The value of Δf is
Table1: Learning process of algorithm
regarded as zero in the dead zone range, in which the
controller does not response, and Fmax is expressed as the Step1 Set AGC decision time Ts , initialize policy
system frequency safety critical value. 1
as Ps0 (a)= , ∀a ∈ D , each element of initial
|D|
The adjustment ΔPz sent by the controller to the diesel
Q-value table is 0; set discount factor γ and learning
generators is taken as system action a , and the action set is steps N ;
denoted by A . We discretize ΔPz into a limited set of output Step2 k = 0 , initialize system state s0 randomly;
levels, where ΔPmin is the minimum adjustment and ΔPmax Step3 Observe current system state sk , and select action
is the maximum adjustment, then the action set can be ak with current policy Psk (a ) ;
divided into 2 N p + 1 levels. Step4 Execute action ak , read the disturbances from load
model, and observe AGC information in the next
Based on the power quality and security standards of the decision time to obtain sk +1 ;
microgrid system, the cost function obtained by the system in
Step5 Calculate current cost C (k ) by (6), and update
the k th decision cycle is as follows:
Q ( sk , ak ) in Q-value table by (7), update current
0, s ∈ [−ε , ε ]
(6) policy by (8);
C (k ) = λ1 s 2 ,| s |∈ (ε , Fmax ] Step6 If n = N , the learning process is over, otherwise,
λ s 2 ,| s |∈ ( F , +∞ )
2 max transfer to Step 3.
In the formula, λ1 and λ2 are the cost function weights,
4 Simulation Results
which are 5 and 10 respectively. Fmax is expressed as the
In order to prove the feasibility of Q-learning controller
system frequency safety threshold.
applied in island microgrid, the simulation model of
3.2 Solutions standalone microgrid AGC system is established by
MATLAB/Simulink experimental platform, as shown in
The Q-learning algorithm as a basic reinforcement
Fig.4. To verify the dynamic frequency control performance
learning algorithm is a model-free reinforcement learning
of the Q controller, the simulation experiment was
method proposed by C Watkins [15]. State-action value
performed by the Q controller and the PI controller,
function Q is used as an estimation function during iteration,
respectively.
and the basic form of the Q-learning algorithm can be as K
follows:
1
Q( s (k ), a (k )) = (1 − γ )Q[ s ( k ), a(k )] + R
(7)
γ {C (k ) − η + α min Q* [ s (k + 1), a ]}
a∈D
1 1 Kp
where η is the average cost, γ is the learning step size, α Tg s + 1 Tt s + 1
Δf
1 + Tp
is the discount value, and s (k + 1) is the next state of action
a (k ) for state s (k ) .
A tracking algorithm strategy based on random
probability selecting action is adopted in this work. First, Fig. 4: Block diagram of AGC system for the isolated microgrid
initialize the probability of arbitrary action in each state to Assuming that the total adjustable margin of the microgrid
make them equal. Then, during the process of updating the is 0.2 pu (per unit), the action set can be taken as D = {0.2,
random strategy, the data of Q-value table will constantly 0.1, 0.05, 0.03, 0.01, 0, -0.01, -0.03, -0.05, -0.1, -0.2}; With
update, and the action selection probability will be updated reference to the national standard for power quality, a system
according to Equation (8). with a capacity of 3000 MW or less has frequency tolerance
of 50±0.5Hz.
DDCLS'18
1215
Based on it, we can set Fmax = 0.5 , ε =0.5 and divide the 5
Cost
[-0.5, −∞ )}. 2
0.1
From 10000 to 20000 s, it can be seen that the Q controller
0
can roughly track load disturbances, but with small
deviations, still. Besides that, the output has occasional
spikes. From 30000 to 40000s, Q controller has been able to
-0.1
0 0.5 1 1.5 2 2.5 accurately track the load disturbance based on the previous
Time t/s 4
x 10 learning, indicating that the Q controller has completed the
Fig. 5: Curve of Load disturbances learning process.
The learning process of the Q learning controller tracking
load disturbance is given in Fig. 6. It can be seen that the Q 0.2 0.2
Δ P /pu
Δ P /pu
controller is in the strategy exploration stage in the initial 0 0
Δ P /pu
0.2 0.2
0.2
Δ P /pu
Δ P /pu
0 0
-0.1
Fig. 8: Learning process of Q controller with random
-0.2
disturbances
To verify the dynamic response performance of the
0 0.5 1 1.5 2 2.5
Q-controller, the Q-learning controller is compared with the
Time t/s 4
x 10 well-tuned PI controller. A step disturbance with an
Fig. 6: Learning process of Q controller amplitude of 0.2 pu at t=0.1s is added in the system. And the
The strategy learned by the Q controller is evaluated each controller output comparison is shown in Fig. 9.
1000 steps, as shown in Fig. 7. From the figure, the cost As can be seen from Fig. 9 (a), PI controller output
curve quickly converges at the first 2,000 steps. At immediately after the disturbance occurred. But there is
2000-9000 steps, the strategy is continuously explored to overshoot, the output is not stable and the disturbance is
find the optimal strategy. At approximately 9000 steps, the tracking is stable after approximately 5s, which is
cost curve eventually converges to a stable value, indicating unfavorable to the stability of the system frequency.
that the optimal strategy has been learned. Meantime, it could reduce the service life of the units. The Q
controller outputs power after the decision cycle. However,
it can accurately track the amplitude of the disturbance and
avoid the overshoot output of the unit.
DDCLS'18
1216
Fig. 9 (b) shows the frequency deviation curve of the Q References
controller and PI controller after the disturbance response. It
[1] L. Zongxiang, W. Caixia, M. Yong, et al. Overview on
also can be seen that the Q controller stabilizes before the PI
microgrid research. Automation of Electric Power Systems,
controller. 31(19): 100-107, 2007.
[2] M. Ding, X. Yang, J. Su. Control strategies of inverters based
Load on virtual synchronous generator in a microgrid, Automation
0.3
PI of Electric Power Systems, 33(8): 89-93, 2009.
0.25 Q [3] Z. Jingjing, L. Xue, F. Yang. Dynamic frequency control
0.2 strategy of wind/photovoltaic/diesel microgrid based on
DFIG virtual inertia control and pitch angle control,
ΔP /pu
0.15
0.1
Proceedings of the CSEE, 35(15): 3815-3822, 2015.
[4] L. Chen, J. Zhong, D. Gan. Optimal automatic generation
control (AGC) dispatching and its control performance
0.05
DDCLS'18
1217