Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
- = D
j i i
A stress S V , 0 max a
Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand
652
but this adaptation rule has biological reasons. According to
what occurs in Amygdala, once an emotional reaction is
learned, this should become permanent. The Orbitofrontal
cortex inhibits inappropriate reactions of Amygdala.
The Orbitofrontal cortex learning rule is very similar to the
Amygdala rule:
(6)
The reinforcement signal for the O nodes is defined as
difference between model output E and the stress signal. In
other words, the O nodes compare expected and received
reinforcement signal, and inhibit output of the model if there is
a mismatch. The main difference between adaptation rule of
Orbitofrontal cortex and Amygdala, is that the Orbitofrontal
connection weight can be increased and decreased as needed to
track the required inhibiting of Amygdala. Parameter b is
another learning rate constant.
As discussed, BELBIC learns from its emotional signal and
produce its output based on sensory inputs and connection
weights. In [19] the stability of BELBIC is demonstrated by
using cell to cell mapping method.
III. CONTROLLER DESIGN
As mentioned before, the test bed for evaluating the
controller performance is an inverted pendulum which is a well
known SIMO system. Controlling the inverted pendulum is a
hard and interesting control task. The control task is tracking
reference signal and stabilizing the pendulum. The system
which is used to evaluate the controller performance is a
nonlinear model of a laboratorial inverted pendulum system
provided by Feedback Ltd.
Due to nonlinearity of systems state equations and
nonlinear properties of driving motor and friction, designing a
model based controller is a hard task. We used BELBIC as a
model free controller to control the inverted pendulum. The
main challenge in using BELBIC as a model free controller in
unstable systems or stable systems with unstable equilibrium
point such as our test bed is the learning phase at the beginning.
As BELBIC has no information of systems dynamics,
performance of the controlled system may seems to be awful at
the beginning of learning process and the pendulum falls down.
BELBIC has fast learning ability; and theoretically in the short
time it should learn the proper control action according to its
sensory inputs and emotional stress. But in this task the
pendulum angle is very sensitive to the control signal, and any
wrong changes in control signal make the system unstable.
First we used BELBIC as the only controller of the system. It is
possible for BELBIC to learn the proper control strategy, but in
our simulation we find that this process will take too long or
probably impossible in real applications.
To accelerate the learning process and avoiding making the
system unstable, we proposed a new approach. This approach
consists of two parts, in the first part, a simple stabilizing
controller used as the main control system and BELBIC learns
to imitate the behavior of this controller. In the second, after
BELBIC imitatively learned to stabilize the system from initial
controller, the controller is replaced with BELBIC. Due to the
capability of learning, BELBIC will learn to enhance the
systems performance.
A. Proposed Controller
According to idea of hierarchical controller structure, to
design a controller to satisfy various objectives, at first it is
assumed that the objectives can be decoupled and then a
separate controller is designed to satisfy each objective. After
that, outputs of these controllers must be fused together. Fig. 2
shows the proposed BELBIC structure. As there are two major
objectives, position tracking and pendulum angle regulating,
two BELBICs are employed. The cart position error and its
first derivation are defined as sensory signals for one of
BELBICs and the pendulum angel and its first derivation are
for the other. In most of the previously reported structures of
BELBIC, they have only one neuron, because the sensory input
signal was one dimensional. In our structure, as each BELBIC
has two sensory inputs, they must have more than one neuron
and for this task two neurons seems to be adequate.
Figure 2. Structure of controller
The emotional stress signal which will be described,
couples the two separate controllers. Also by employing this
kind of stress signal there will be no need to use complex
fusion block to combine the output of controllers, and just a
summation operator is adequate [16]. And the computational
cost of output fusion is reduced to the cost of fusing some main
and auxiliary objectives in stress generator block. Also to
change the control objectives, and switching from imitative
learning to normal learning, there is no need to change the
controller structure and only changing the emotional stress
signal is enough.
B. Stress generation
As stated before, BELBIC can show various behaviors by
applying different stress signal on it. So, to satisfy different
control objective, proper stress signal must be defined based on
each objective. The ability of achieving more different
objectives can be obtained by defining different stress signals.
1) Imitative learning
In imitative learning, the objective is that BELBIC
produces a similar control signal to the initial controller. So
reducing the difference between these two control signals is
) ( ( stress E S W
i i
- = D b
Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand
653
the main goal in this part and there is no more control
objectives. So the emotional stress signal is consists of two
signals, error of control signal and first derivation of this error.
(7)
The reason to add
u
e& in above sum is that in sometimes
u
e may become zero, while these control signals may have
completely different behaviors.
2) Improving the performance
After BELBIC imitatively learns the control action from
initial controller, it becomes the main controller of the system
and the initial controller is replaced with it. At this time the
control objective changes and reducing position tracking error
and angle error become new control objectives. So the stress
signal must be modified.
To satisfy more than two objectives, more complicated
combination of stress signals which are associated with each
objective is necessary. In order to make BELBIC capable to
satisfy more objectives, the emotional stress signal by
combining of stresses is applied, as describe as follow. To
generate the proper stress signal for all objectives, at any time,
the more important objective must be attended more than the
others. To generate this attention mechanism we use linguistic
rules.
a) Fuzzy stress generation
In order to enhance the behavior of system the major
objectives are defined. Moreover some extra objectives should
be attended. Tracking error of the cart position and error of
pendulum angle are the main concerns which needed to be
decreased as much as possible. One of the extra objectives is to
avoid reaching edges of the rail which leads to breaking the
operation. To impose this behavior to the cart, closeness to the
edges of rail must be punished via stress signal. Another
important index which must be considered in every control
tasks is energy of control force and its variations. This
objective is imposed to the stress generation unit by such a
behavior that when the stresses of previous parts are small,
BELBIC tries to decrease the control forces. Also when these
stresses are significant, the limiting of control force is relaxed
to increase the possibility of fast responses. Figure 3 shows the
stress generating function. By use of a set of linguistic rules the
most salient objectives can be reinforced to generate emotional
stress with respect to contemporary situation.
To generate the stress signal, we used linguistic rules and
then import them to Sugeno fuzzy inference system [21]. Using
this method to generate stress signal makes BELBIC capable to
attend important parts of stress at any time. As it is mentioned
before, four effective variables, errors of the cart position and
pendulum angle, control force and first derivation of it and 16
rules are employed for generating the emotional stress. Fig. 3
shows some of the resulted fuzzy surface. The inputs of the
stress generation function are angle and position error, control
force and its first derivation.
0
0.05
0.1
0
0.005
0.01
0.02
0.04
0.06
0.08
0.1
Position-Error
Angle-Error
S
t
r
e
s
s
0
0.05
0.1
-1
0
1
0.02
0.04
0.06
Position-Error
Control-Force
S
t
r
e
s
s
-1
-0.5
0
0.5
1
-1
0
1
0.02
0.04
0.06
0.08
Control-Force
Derivation-of-Control-Force
S
t
r
e
s
s
Figure 3. Resulted fuzzy surfaces
C. Switching between controllers and stress signals
As we observed in experimental results, hard switching
between controllers and changing stress signals makes the
BELBICs become unstable. So instead of hard switching, a soft
switching must be employed, and the BELBIC control system
must gradually replace the initial controller and at the same
time its emotional stress must be gradually changed. To do this,
we employed a fuzzy inference system to make soft switching,
as it common solution for soft switching. The human linguistic
rules can be imported to Sugeno fuzzy inference system [21]
easily. Fig. 4 shows the fuzzy surface for this task. The inputs
of this fuzzy system are the two mentioned stresses (for
imitative learning and improving performance), the error in
imitative learning phase (difference between initial controller
output and BELBIC output). The mentioned fuzzy switch is
used for switching between both controllers and stresses.
IV. RESULTS
To validate the result of proposed controller, the results are
compared with the original supplied controller, which consists
of two PID controllers [22]. The initial controller for imitative
learning phase is the mentioned original controller. In addition,
without employing imitative learning, BELBIC did not learn
u u u u
e e w e w e w Stress & &
3 2 1
+ + =
Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand
654
the proper control signal in more than 150 seconds of training.
Also the pendulum falls down many times in this duration.
20
30
40
50
0
0.05
0.1
0.02
0.04
0.06
0.08
time
Imitative-Stress
S
t
r
e
s
s
20
30
40
50
0
0.05
0.1
0.02
0.04
0.06
time
Performance-Stress
S
t
r
e
s
s
20
30
40
50
0
0.05
0.1
0.01
0.02
0.03
time
error
S
t
r
e
s
s
Figure 4. Resulted fuzzy surfaces
In order to evaluate the ability of controller to reject
disturbances, a random voltage produced by a Gaussian
distribution with zero mean and 0.1 of variance is applied to
the motor from in some instances. The time of applying this
voltage is random variable which obtained from a uniform
distribution. The mentioned disturbance is applied 8 times from
the 55th seconds until the end of simulation. The results of
original PID controller and proposed controller are depicted in
Figs 5 and 6 respectively. As it is seen BELBIC can imitate the
behavior of original controller in about 10 second from starting
time completely. After that, based on fuzzy switch structure,
from 30 second to 50 the both controller are controlling the
system and after it BELBIC controls the system individually. It
is clear that after imitative learning, BELBIC performance in
reducing tracking and angle error is far better.
As it can be seen, BELBIC clearly shows better
performance in tracking and disturbance rejection which is the
results of its learning capability.
20 40 60 80 100 120 140
-0.5
0
0.5
C
a
r
t
P
o
s
it
io
n
Desired Position Actual Position
20 40 60 80 100 120 140
-0.05
0
0.05
P
e
n
d
u
l
u
m
A
n
g
le
Figure 5. Results Results of originally supplied controller (Double PID) in
presence of disturbance
20 40 60 80 100 120 140
-0.5
0
0.5
C
a
r
t
P
o
s
i
t
io
n
Desired Position Actual Position
20 40 60 80 100 120 140
-0.05
0
0.05
P
e
n
d
u
lu
m
A
n
g
le
Figure 6. Results of BELBIC with fuzzy stress in presence of disturbance
To have a meaningful comparison these controllers, four
performance measures are defined as follow and calculated to
both control systems, originally supplied controller and
BELBIC. As the disturbance applied in randomly selected
times, the experiments carried out 20 times and the statistical
moments of the following parameters (Mean and standard
deviation) are calculated.
IAE: Integral Absolute Error (for cart position and
pendulum angle)
IACF: Integral of Absolute values of Control Force
IADCF: Integral of Absolute values of derivation of
Control Force (shows the fluctuations of the control force)
These performance measures are calculated for the two
mentioned controllers, in normal operation and without
applying disturbance and the results are demonstrated in Table
I.
TABLE I. PERFORMANCE MEASURES OF VARIOUS CONTROLLERS
WITHOUT DISTURBANCE
No-Disturbance IAE (position) IAE (angle) IACF IADCF
BELBIC- fuzzy stress 3.301 0.367 9.432 9.262
Double PID 3.925 0.457 12.219 14.353
Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand
655
From the Table I it is seen that BELBIC shows the fast
learning ability for tracking. Also the control force signal
which is penalized by stress signal is lower than control force
in other controllers and has less oscillation.
In the presence of disturbance, the above mentioned
measures are calculated for the controllers and the results are
presented in Table II. As it is seen, in presence of disturbance
BELBIC again shows better performance although its
performance decreases slightly in comparison with normal
operation. But it shows better disturbance rejection and
robustness.
TABLE II. PERFORMANCE MEASURES OF VARIOUS CONTROLLERS WITH
DISTURBANCE
E
m
p
l
o
y
i
n
g
D
i
s
t
u
r
b
a
n
c
e
IAE
(position)
IAE (angle) IACF MADCF
E STD E STD E STD E STD
BELBIC-
fuzzy
stress
5.699 0.915 0.476 0.174 73.247 3.152 151.26 6.245
Double
PID
11.725 2.141 1.692 0.371 82.705 3.247 160.58 7.831
V. CONCLUSIONS
In this paper a new approach in employing model free
controller with learning ability is introduced based on a soft
switching between two phases of learning. Although BELBIC
has rapid and powerful learning capability, it cannot simply
used to control systems with unstable equilibriums. The
experimental results show that by employing imitative learning
at first phase, BELBIC can rapidly learn to produce proper
control signal for controlling a system with unstable
equilibrium point. After BELBIC learns imitatively from a
simple classical designed controller, by gradually alter the
objectives and stress signals, we can reduce the tracking and
angle error. Moreover it shows more robustness with
disturbances. Another advantage of BELBIC is that it produces
smoother control force with lower energy.
The fuzzy stress generation leads to superior performance
in terms of tracking and angle error than alternative method for
stress generation. Another interesting result is that BELBIC has
better performance in presence of disturbance than the
originally supplied controller, which is a model based
controller that is well tuned especially for this task. This is the
effect of learning capability of BELBIC, which can produce
more appropriate control force at various working conditions.
REFERENCES
[1] A.P. Shon, D.B. Grimes, C.L. Baker, R.P.N. Rao, A.N. Meltzoff, A
Model-Based Goal-Directed Bayesian Framework for Imitation
Learning in Humans and Machines, Cognitive Science, 2004.
[2] M.I. Kuniyoshi and I. Inoue, Learning by watching: extracting reusable
task knowledge from visual observation of human performance, in
Proc. of IEEE Trans. on Robotics and Automation, vol. 10, no. 6, pp.
799822, 1994.
[3] Alissandrakis, C. L. Nehaniv and K. Dautenhahn, Learning how to do
things with imitation, in Proc. Learning How to Do Things, AAAI Fall
Symp. Series Sea Crest Oceanfront Resort &Conf. Center, North
Falmouth, MA, USA, pp. 1-8, Nov. 3-5, 2000.
[4] H. Mobahi, M. N. Ahmadabadi, B. N. Araabi, Concept Oriented
Imitation, Towards Verbal Human-Robot Interaction, Proc. of IEEE
International Conference on Robotics and Automation, Barcelona,
Spain, April 2005, pp.1507-12.
[5] Lopes, M., Santos-Victor, J. Visual learning by imitation with motor
representations. IEEE Transactions on Systems, Man and Cybernetics,
Part B, Vol. 35, No. 3 (2005) 438449
[6] D. Shahmirzadi, COMPUTATIONAL MODELING OF THE BRAIN
LIMBIC SYSTEM AND ITS APPLICATION IN CONTROL
ENGINEERING, MSc thesis, Texas A&M University, USA, 2005.
[7] C. Balkenius, and J. Moren, Emotional learning: a computational model
of the Amygdala, Cybernetics and Systems, 2001.
[8] C. Balkenius, J. Moren, A Computational Model of Emotional
conditioning in the Brain, workshop on Grounding Emotions in
Adaptive Systems, Zurich, 1998.
[9] J. Moren, Emotion and Learning: A computational model of the
amygdale, PhD thesis, Lund university, Lund, Sweden, 2002.
[10] J. Moren, C. Balkenius, A Computational Model of Emotional
Learning in the Amygdala: From animals to animals, Proc. Of 6th
International Conference on the Simulation of Adaptive Behavior,
Cambridge, MIT Press, pp.383-391, 2000.
[11] C. Lucas, D. Shahmirzadi, N. Sheikholeslami, Introducing BELBIC:
Brain Emotional Learning Based Intelligent Controller, International
Journal of Intelligent Automation and Soft Computing, pages11- 22,
2004.
[12] R.M. Milasi, C. Lucas, B. N. Araabi, Intelligent Modeling and Control
of Washing Machine Using Locally Linear Neuro- Fuzzy (LLNF)
Modeling and Modified Brain Emotional Learning Based Intelligent
Controller (BELBIC), Asian Journal of Control, 8 (4), 2005.
[13] R.M. Milasi, M.R. Jamali, C. Lucas, Intelligent Washing Machine: A
Bioinspired and MultiObjective Approach, Accepted in International
Journal of Control, Automation, and Systems.
[14] N. Sheikholeslami, D. Shahmirzadi, E. Semsar, C. Lucas, M. J.
Yazdanpanah, Applying Brain Emotional Learning Algorithm for
Multivariable Control of HVAC Systems, International Journal
of Intelligent and Fuzzy Systems, 17 (1), pp 35- 46, 2006.
[15] A. Gholipour, C. Lucas, D. Shahmirzadi, Purposeful prediction of
space weather phenomena by simulated emotional learning, IASTED
International Journal of Modelling and Simulation, Vol. 24, No. 2,
pp.6572, 2004.
[16] D. Shahmirzadi, C. Lucas, R. Langari, Intelligent signal fusion
algorithm using BEL- brain emotional learning, 7th Joint Conference
on Information Sciences, JCIS03, 1st Symposium on Brain-Like
Computer Architecture, September 2630, Cary, NC, 2003.
[17] R. M. Milasi, C. Lucas, B. N. Araabi, T. S. Radwan, M. A. Rahman,
Implementation of Emotional Controller for Interior Permanent Magnet
Synchronous Motor Drive, IEEE / IAS 41st Annual Meeting: Industry
Applications, Tampa, Florida, USA, October 8 12, 2006.
[18] M. R. Jamali, A. Arami, B. Hosseini, B. Moshiri, C. Lucas, Real Time
Emotional Control for Anti-Swing and Positioning Control of SIMO
Overhead Travelling Crane, Int. Journal of Innovative Computing,
Information and Control (IJICIC), Vol.4, No. 9, pp. 2333-2344,
September 2008.
[19] D. Shahmirzadi and R. Langari, Stability of Amygdala Learning
System using Cell-to-Cell Mapping Algorithm, Journal of Intelligent
system and control, 497-119, (2005)
[20] A. Arami, M. Javan Roshtkhari and C. Lucas, A Fast Model Free
Intelligent Controller Based on Fused Emotions: A Practical Case
Implementation, 16th IEEE Mediterranean Conference on Control and
Automation June 25-27, 2008, Corsica, France.
[21] T. Takagi and M. Sugeno. Derivation of Fuzzy control rules from human
operators control actions. Proc of IFAC symp. On fuzzy information,
knowledge representation and decision analysis. Pp. 55-60, July 1983
[22] Feedback Instrument Ltd, Digital Pendulum Control Experiments,
Manual: 33-935/936-1V60, 2002.
Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand
656