Sei sulla pagina 1di 6

Automatic control application recovery in distributed IEC 61499 based automation and control systems

R. Froschauer, F. Auinger, G. Grabmair T. Strasser University of Applied Sciences Wels PROFACTOR Produktionsforschungs GmbH {r.froschauer, f.auinger ,gu.grabmair}@fh-wels.at thomas.strasser@profactor.at

Abstract
Modern industrial automation systems are supposed to execute applications distributed across heterogeneous networks. Additionally the market has raised a demand for downtime less operation and change of automation and control for such systems. Consequently appropriate concepts for recovering and reconfiguring devices during full operation are needed. The open standard IEC 61499 provides a scalable architecture to model applications for such distributed control systems. It supports interoperability, configurability and portability of control applications and therefore delivers the basis for online-configuration and recovery of heterogeneous systems. The main goal of this paper is to present a concept for autonomous recovery of applications within the context of distributed systems. The proposed concept facilitates the exchange of devices without any need for extra configuration. It supports an automatic recognition of new components in a distributed automation & control system and an automated up & download of control applications, user data and configuration data. The introduced approach is tested using an Ethernet-interconnected distributed demonstration system, consisting of standard personal computers with a running IEC 61499 Function Block Development Kit.

of distributed systems the control logic of industrial plants is distributed on several small devices which are interconnected through a flat communication network. The FIT-IT project Crons [1] introduces a middleware approach, which enables application centred development on top of mechatronic devices called Crons. These devices combine the mechanical or electrical functionality with the corresponding control logic. Enabling this approach each mechatronic device, (e.g. drives, pneumatic cylinders, linear axis ), has to have its own microcontroller, executing the Crons middleware. Furthermore each Cron has a standardized communication interface and a hardware abstraction layer, which allows interaction with the physical part without any special programming skills. Within the next ten years distributed systems are considered to be built of Crons or another comparable technology and will therefore support interoperability, configurability and portability. Furthermore zerodowntime operation within a real-time execution environment will become a common technology. The first step to realize zero-downtime operation is to find an approach to replace faulty devices without stopping and restarting the whole system [7,8]. This paper presents an approach for a basic Plug & Work functionality on the basis of IEC 61499.

1.1. IEC 61499 in the context of the Crons approach

1. Introduction
Today many automation and control systems are indicative for an increasing and almost overwhelming complexity. Only highly skilled engineers are supposed to cope with these requirements and therefore the engineering costs for replacing faulty components are increasing as well. Currently many vendors in the field of automation and control focus their research activities on finding approaches for developing new mechanisms to assemble and maintain industrial plants without any special programming skills. In the context

The IEC 61499 [4, 5] is intended to be the successor of IEC 61131-3 [6] with a special focus on distributed systems. By supporting a standardized communication protocol, which enables remote-configuration of devices, the IEC 61499 offers the possibility for building middleware systems and applications as described above. In order to enable the automatic recovery of faulty devices several guidelines and requirements have to be identified. Especially in heterogeneous networks a set of standardized interfaces is necessary. The

Proceedings of the IEEE Workshop on Distributed Intelligent Systems: Collective Intelligence and Its Applications (DIS06) 0-7695-2589-X/06 $20.00 2006

IEEE

following list presents a minimum set of interfaces [1, 9, 10, 11]: Communication interface: This interface defines methods, which ensure that all devices and even all applications are able to communicate with each other, especially within heterogeneous networks. Application execution interface: This interface defines a kind of hardware abstraction layer, which enables a device independent representation of applications. Especially on heterogeneous devices every application should be executable without recompiling (i.e. a middleware approach). Application transfer interface: This interface defines methods of transferring applications between different devices and between devices and an engineering environment.

Management

Management

Other application 1

Other application n

Other application 1

Other application n

2. Communication scheme
As stated in [2,13] traditional recovery concepts, which are mostly used in storage networks for databases or even control systems, are using a client/server architecture. Every client tries to store its valuable data on a central recovery server, either client- or serverdriven. In case of failure the client device can restore the application from the server. With regards to this type of system numerous algorithms of replica propagation have been developed. The big disadvantage of this approach is the vulnerability to failures concerning the central server. Therefore the preferred approach is a multi-master concept, which allows the integration of more than one recovery server in the network. Due to

Slave 1 Redundancy: 2

Slave 2 Redundancy: 3

Fig. 1. Definition of a recovery group

The RG approach can be divided in two different main operational sequences: Device registration process: The device registration process describes the communication between MD's and SD's in the network and how to assign new SD's to one or more corresponding MD's liable in case of a recovery. The used replica placement strategy comes close to the well-known cli-

Proceedings of the IEEE Workshop on Distributed Intelligent Systems: Collective Intelligence and Its Applications (DIS06) 0-7695-2589-X/06 $20.00 2006

IEEE

Management

Other device

Other application 1

Other application n

Replication Slave

Replication Slave

IEC 61499 does not define how to implement a recovery system but it smoothens the way for new concepts and enables defined interfaces as introduced above [4]. The device - independent representation of automation hardware, applications and function blocks is fundamental for a heterogeneous concept of control application and device recovery. The usage of the mentioned interfaces and the capabilities of IEC 61499 open countless possibilities, but with regards to the shortcomings of current field bus technologies the IEC 61499 may only be capable of overcoming all these missing features, in case several extensions are made to the standard [1]. Especially the configuration management interface has to be extended with commands for device operations, such as identification and storage of configuration data as well as operational states of control applications & function blocks. Assuming these changes are performed the communication scheme for organizing the automatic recovery of faulty devices is described in the next chapter.

the changed type of features the server will be called recovery master (RM) or master-device (MD) in the further investigations. Depending on the requested redundancy the amount of MD's can be increased arbitrarily. This requires a special kind of network communication feature and of course management rules to determine the execution behaviour. The approach described in this paper uses a multiple access network structure where every device has the ability to send and receive messages to and from every other device (for example using Ethernet or CAN). Therefore each of these network technologies supports a kind of Multicast communication which is used instead of or additionally to numerous direct point-to-point connections. With regards to a software-based implementation the master- and the slave-device (SD) can be represented by a master- and a slave-component. Therefore the term master-device is equivalent to the term device with master component, whereas the functionality is reached by an additional software component, such as a master application. Similar to the MD the slavedevice can also be called device with slave component, whereas the slave component is supposed to be a part of each standard device. Furthermore the whole set of devices may also be called recovery group (RG) (see Fig. 1), because this group of devices contains all necessary participants for a system recovery.
Recovery group Master 1 Management Other application 1 Other application n Replication Master Master 2 Management Other application 1 Other application n Replication Master

Router

ent-based replica or pull-approach, whereas some aspects are similar to the push-approach. Application query and transfer process: The application process describes how a slaveapplication is queried, stored and transferred back to the SD by one or more MD's.

until every master-ID is unique. With regards to deterministic execution the master-list may be implemented static, because otherwise the list might get too big, due to adding too much devices. Therefore the size of the list has to be defined before start-up. If a MD receives a message with a master-ID equal to -1, the sending device will be removed from the receivers list.
Device with Master application

3. Recovery group algorithms


The devices are using a pre-defined set of messages to communicate with each other. These messages are called Master register request Slave register request Slave update request Slave recovery request Slave delete request
Management Other application 1 Other application n Replication Master Startup Get (random) masterid Send: Master register request Wait for message Receive: Master register request yes My own message? no
Master already in list? Check ID / TTLC

and are used by several system processes. These processes and the linked algorithms are described within the following subsections.

3.1. Master-component registration algorithm


The master-component registration process describes the way of how devices with mastercomponents (i.e. MD's) communicate with other MD's, especially when adding a new MD to the network. Right after starting up all MD's there is no organization between the devices and no device knows whether it should respond to a message or not. Therefore the main goal is that the MD's automatically build a hierarchy among them, by using self-generated random numbers also called master-ID's. This master-ID is generated by every MD right after the initialization of the network interface for multi-cast capable communication. The master-ID can be every positive integer number from 1 to n, whereas n depends on the highest possible integer number of the specific device. After this process a master register request is sent to the other members of the recovery group (i.e. multicast). Every other MD gets this message and adds the information about the new devices to its own masterlist. The message contains the physical address of the new MD, such as an IP-address in an Ethernet-based network, and its master-ID. If the message contains information about a MD which is already in the list, its information is updated. Each MD is performing these steps, which are also shown in Fig. 2 and can therefore check whether its own master-ID is unique or not. In case a MD detects that its master-ID exists twice, it changes its own master-ID to a new random number and sends a new master register request message. This process may be repeated

Decrease each TTLC

yes

ID = -1

Delete Master

no Add master info to my list Set TTLC to sizeof(Masterlist)

ID != -1 Update master info in list

increase specific TTLC

Fig. 2. Master-component registration sequence

3.2. Slave-component registration algorithm


After the master-component registration process is completed, the devices with slave-components (i.e. SD's) need to register themselves at the RG to establish the relationship between SD and MD. Therefore they start sending registration messages with the purpose of being assigned to one or more MD's responsible for them in the case of failure. The main part of accepting SD's is done by MD's very similar to the mastercomponent registration process. The slave-component registration process works on basis of lists containing information about registered SD's in each MD. Contrary to the master-list which

Proceedings of the IEEE Workshop on Distributed Intelligent Systems: Collective Intelligence and Its Applications (DIS06) 0-7695-2589-X/06 $20.00 2006

IEEE

contains the same data in each MD, the slave-list contains only those devices which should be served by the specific MD. In case of a slave register request a hierarchy is derived from the master-ID of each MD. Depending on the specific position of each MD in this hierarchy and on the requested amount of redundancy the information of a SD will only be kept by those MD's, which have the highest master-ID. The algorithm (shown in Fig. 3) assumes that the RG contains enough MD's to cope with all the redundancy requests and the MD's are registered by each other. Furthermore each MD has to listen and wait for incoming messages.
Device with Master application Management Other application 1 Other application n Replication Master

ing a new SD. After accepting a SD the master-ID is automatically lowered by a defined value or a random number between the old ID and zero. The change of the ID is published to the RG using an extra master registration message (this process is similar to the master-component registration process described in section 3.1). If the MD receives a message from a SD which is already in its list, the stored information is deleted. Therefore every slave register request effects a new distribution of the SD.information across the whole RG. All stored information about this SD is deleted to prevent the system of version conflicts. After a successful registration of a SD the MD tries to retrieve the requested application either from the requesting device or from a local repository.

3.3. Slave-component update algorithm


This mechanism is used to update existing device information for SD's which are already registered at one or more MD's. The redundancy cannot be changed, because the storage position is not changed. In case of an application update on a SD, this device may send a slave update request message to the RG, which contains the same information like a slave register request message. In opposition to the register request which would result in a completely new distribution of the device information, the update request is used to force the MD to update its information, including the stored application data. The RG has no other possibility to detect an application update and therefore this message can be used to keep every application up to date and prevent the RG from delivering old versions to devices.

Send: Master register request Wait for message

Receive: Slave register request Count master with higher id

amount < Redundancy

no

yes Add slave to my list Delete slave from list

3.4. Slave-component recovery algorithm


Decrease master-id Increase master-id

Publish new master-id with a master registration request

Fig. 3. Slave-component registration sequence

If a device wants to register for a recovery, it has to publish a slave register request message which contains its network address, its desired redundancy and an application identifier. Every MD receives this message and checks whether it is allowed to accept the SD registration or not. On the basis of the master-list and the master-IDs the MD can determine its own position within the hierarchy and if it is between highest position and highest position minus redundancy the allowance for accepting the SD is granted. Therefore the redundancy determines the amount of MD's accept-

Succeeding the registration and updating process the slave recovery request starts the prepared recovery process, as shown in Fig. 4. Every MD which has the requesting SD in its list starts to transfer the locally stored application to the requesting device. The multiple connections between the devices are handled by the first come - first serve principle. Therefore no additional algorithm is necessary and the amount of MD's transferring an application to a SD can be increased without any need for further configuration. The way in which the application is transferred depends on the implementation and is not covered in further detail in this work.

Proceedings of the IEEE Workshop on Distributed Intelligent Systems: Collective Intelligence and Its Applications (DIS06) 0-7695-2589-X/06 $20.00 2006

IEEE

Master 1 Other application n Other application 1 Management Replication Master

Master 2 Other application n Other application 1 Management Replication Master

Master 3 Management Other application 1 Other application n Replication Master

Masterlist Slave 1 Slave 2 Slave 3 3


r an nt ti o sfe r

Masterlist Slave 1 Slave 2

Masterlist

Slave 2 Slave 3

4
applicatio n transfer

4 accepted
est requ

accepted

not accepted

component registration process. After this procedure the deleting MD generates a new random master-ID and sends another master register request message to tell the other devices that it is accepting messages again.

5 First come first serve Management

Reco

t ues est ca req requ pl i ry ap overy ve R ec co Re

very

4. Implementation
Slave 3 Redundancy: 2 Management Other application n

2 Other application n Other application 1

Slave 2 Redundancy: 3 Management Other application 1 Replication Slave

Slave has been replaced by a new one, with the same identification

The concepts described in sections 2 and 3 founded the basis for the development of a prototypic IEC 61499 function block library. This library contains the function blocks for basic communication, such as Replication Master/ Slave (see Fig.5 & 6), and furthermore several function blocks for transferring, storing and parsing an control application.
Master EVENT EVENT EVENT EVENT INIT LISTEN DEL ID INITO RECOVER DISCOVER CNFID EVENT EVENT EVENT EVENT

Fig. 4. Slave-component recovery sequence

3.5. Slave-component delete algorithm


There are two ways of deleting a SD entry from the slave-list of a MD: Normal device deletion: A normal device deletion causes a simple deletion of a specified device information entry from the slave-list of specified MD. In a distributed system this kind of deletion should be avoided, because the information of a deleted SD is not stored by any other MD. Therefore this procedure can result in indefinable behaviour of the RG, due to wrong information about the desired redundancy and the actual number of slave-list entries belonging to a specific MD. Redundancy keeping device deletion: The second approach for deleting a SD solves the problem mentioned above and keeps the redundancy level for an application automatically at the desired amount. Using the slave-component registration process, as described in section 3.2, a SD information entry can be deleted without changing the redundancy level of an application or a SD. The MD generates an alias slave delete request which is sent to its own message dispatcher. Receiving this message the MD deletes the SD from its list and generates an alias slave register request containing the information about the deleted SD. The MD now decreases its master-ID to zero by sending a new master register request. After this the prepared slave register request message is sent to the RG. By temporary setting its master-ID to zero the deleting MD prevents itself from accepting the SD again, without storing the old device information. The other MD's react on the slave register request and distribute the information similar to the slave-

Other application 1

Other application n

Replication Slave

Replication Slave

BOOL WSTRING INT INT INT INT BOOL WSTRING WSTRING

QI

ReplicationMaster QO DEST MASTERID

BOOL WSTRING INT INT WSTRING

LCLAPP LCLID DeltaID

MSTDELTA MSTCACHE RMTAPP SLVCACHE Automatic MSTLIST SLVLIST

Fig. 5. Replication Master FB


Slave EVENT EVENT EVENT REGISTER RECOVER UPDATE REGOUT RECOUT UPDOUT EVENT EVENT EVENT

ReplicationSlave BOOL WSTRING INT WSTRING QI MGR_ID REDUNDANCY APPNAME QO STATUS BOOL WSTRING

Fig. 6. Replication Slave FB

The library has been designed by using the Function Block Development Kit FBDK as well as the Function Block Runtime FBRT [3] and some additional JAVA coding. For testing purpose a simple application has been designed running on two common Wintel computers, whereas the faulty behaviour of a device is simply simulated by closing the running application. As depicted in Fig. 7 the user has to start the basic master and slave service on the devices. These services may be part of the firmware of an embedded device and are started automatically after power on. After ensuring all

Proceedings of the IEEE Workshop on Distributed Intelligent Systems: Collective Intelligence and Its Applications (DIS06) 0-7695-2589-X/06 $20.00 2006

IEEE

services are running properly the user can load its application onto the SD. When starting the user application the SD automatically registers itself at the RG and one or more MD's are accepting the register request. Next the MD retrieves the user application from the SD and stores it for a future recovery. In case the SD has a malfunction it is replaced by a new empty SD (i.e. empty means basic services are included as firmware). The empty device is registering itself at the RG and asks for a possible recovery. In case one or more MD's have an appropriate application the recovery process is started. The application is transferred back to the new SD and started up.
User Development tool Device (with master application) Device (with slave application) Device (with slave application)

Acknowledgements
This work is supported by the FIT-IT: Embedded System program, an initiative of the Austrian federal ministry of transport, innovation, and technology (bm:vit) within the Crons-project [1] under contract number FFG 808205. Further information is available at: www.microns.org PROFACTOR is core member of the I*PROMS consortium. www.iproms.org

References
[1] Micro Holons for Next Generation Distributed Embedded Automation and Control, [web page, http://www.microns.org, accessed December 05, Profactor GmbH, 2005]. [2] Robert Spalding, Storage Networks: The complete reference, McGraw-Hill & Osborne, 2003. [3] HOLOBLOC, Inc. - Resources for the new generation of automation and control, [web page, http://www.holobloc.com, HoloBloc Inc., accessed December 05, 2005]. [4] Robert Lewis, Modelling control systems using IEC 61499, The institutions of electrical engineers, London, 2001 [5] IEC 61499: Function blocks for industrial-process measurement and control systems, Publication, International Electrotechnical Commission IEC Standard (2005). [6] IEC 61131-3: Programmable controllers - Part 3: Programming languages, Publication, International Electrotechnical Commission IEC Standard (2003). [7] Kramer J., Magee J., Dynamic configuration for distributed systems, IEEE Transactions on Software Engineering, 1985. [8] Wills L.M., Kannan S., Sander S. Guler M., Heck B.S., Prasad J.V.R., Schrage D., Vachtsevanos G.J., An Open Platform For Reconfigurable Control, IEEE Control Systems Magazine, 2001. [9] Shelton C.P., Koopman P. Nace W., A Framework for Scalable Analysis and Design of System-wide Graceful Degradation in Distributed Embedded Systems, Proceedings of the 8. IEEE International Workschop on Object Oriented Real-Time Dependable Systems, 2003. [10] Garcia H.E., Ray A., Edwards R.M., A reconfigurable hybrid supervisory system for process control, Proceedings of the 33rd conference on Decision and Control, Lake Buena Vista, 1994. [11] Guler M., Clements S., Wills L.M., Heck B.S., Vachtsevanos G.J., Transition Management for Reconfigurable Hybrid Control Systems, IEEE Control Systems Magazine, February 2003. [12] Feiler, P., Jun Li., Consistency in dynamic reconfiguration, Proceedings of the 4. International Conference on Configurable Distributed Systems, 1998. [13] Tanenbaum A.S., van Steen M., Distributed Systems Principles and Paradigms, Prentice Hall New Jersey, ISBN 0-13-088893-1, 2002.

implement

system
start master applicatio n

start slav

e applicat ion and loa

d user ap

Check responsibility

register req

plication uest

query use r appli cat ion


send user applicati on

store user application

add new slave (with

remove slave from netw ork


slave appl ication) to network with no user applications on it

detect existing application

load and parse user application into script

t recover reques
load slave user application

Start recovered application

Fig. 7. Complete recovery sequence diagram

The concept above does only work if the devices provide a unique identifier to determine whether they are appropriate for a recovery or not.

5. Summary and future work


Within this paper we presented an approach for the automatic recovery of distributed control applications based on IEC 61499 function blocks. In a first step the concepts have been implemented with the FBDK and executed on a Java based runtime environment FBRT. The basic concept has been successfully tested. The next steps in our research work cover a detailed testing for a higher amount of devices. Furthermore the concepts introduced in this paper are not capable of real-time execution. Therefore some research in the field of distributed real-time execution and secure communication has to be carried out in the future.

Proceedings of the IEEE Workshop on Distributed Intelligent Systems: Collective Intelligence and Its Applications (DIS06) 0-7695-2589-X/06 $20.00 2006

IEEE

Potrebbero piacerti anche