Sei sulla pagina 1di 7

2010 Fourth International Conference on Sensor Technologies and Applications

Clinic: A Service-oriented Approach for Fault Tolerance in Wireless Sensor Networks


Mohammad Hammoudeh , Sarah Mount , Omar Aldabbas , Martin Stanton Department of Computing and Mathematics Manchester Metropolitan University Manchester, UK {m.hammoudeh,m.stanton}@mmu.ac.uk School of Computing and IT University of Wolverhampton Wolverhampton, UK s.mount@wlv.ac.uk Al-Balqa Applied University Faculty of Engineering Jordan, Al-Salt o.aldabbas@bau.edu.jo

AbstractWith the size and complexity of modern Wireless Sensor Networks (WSNs) systems, a systems ability to recover from faults is becoming more important. A self-healing system is one that has the capability to recover from faults without human intervention during execution. Since WSNs are inherently fault-prone and since their on-site maintenance is infeasible, scalable self-healing is crucial for enabling the deployment of large-scale sensor network applications. Previous work has typically dealt with single faults in isolation, has imposed constraints on systems, or required new protocol elements. In this paper, we attempt to solve some of these problems through the use of service-oriented architecture. We propose a serviceoriented self-healing approach, called Clinic, that works with existing network components, e.g. routing protocols, and resources without adding extra overhead on the network. In Clinic, different network capabilities are viewed as services of the network instead of being isolated capabilities of individual nodes. This view of the network promotes collaboration among nodes and information reuse by sharing information collected by one service with other network services. Preliminary evaluation showed that Clinic achieved fault tolerance while keeping low communication overhead by reusing only the information collected by other network services to heal from faults. Keywords-Service-oriented Architecture; Self-healing; Wireless Sensor Networks;

I. I NTRODUCTION Wireless Sensor Networks (WSNs) is a new technological era in a wireless networking that employ new complex and smart methods of communications. WSNs are failure-prone in nature and since their on-site maintenance is not always possible, it is critical that such networks are equipped with a reliable fault tolerance service to maintain overall network functionality. Fault tolerance is one of the most important factors that need to be considered when designing and implementing
978-0-7695-4096-2/10 $26.00 2010 IEEE DOI 10.1109/SENSORCOMM.2010.98 625

large scale WSNs and their applications. Fault tolerance has never been dened clearly specially in the context of WSNs. Generally speaking, fault tolerance means reliability and availability. Reliability is the fraction of time a system, service, or hardware is able to perform and maintain its intended functionality satisfactorily in exceptional conditions. Whereas, availability can be dened as the fraction of time a network service, component, application, or hardware is available to the user or other system components/services. de Souza et al. [1] dened a fault as any kind of defect that leads to an error, where an error is a process that interacts to any incorrect system event and such an event may lead to a failures. Thus, according to [1], a failure is the (observable) manifestation of an error, which occurs when system deviates from its specication and cannot deliver its intended functionality. Faults in WSNs can be classied into three main categories: sensor node faults; network faults; and application faults. Sensor nodes may fail due to software errors (e.g., timing failure [2], software bugs, etc.) or hardware errors (e.g., physical damage, lack of power, or environmental interference). Network faults are mainly due to communication errors. Communication faults are caused by power exhaustion, software and hardware faults, or node mobility, etc. Communication faults can have serious effects on the usability of WSNs because the communication service is responsible for timely delivery of messages to the end user. Finally, application faults are most often originated from design or programming faults. The programmer may misinterpret a specication or merely makes a mistake. A failure in any of the above categories should not block the whole network; instead fault tolerance methods are used to sustain the network functionality with minimum service

interruption. Many algorithms were proposed to deal with different faults in WSNs; however, most of them are presented as isolated algorithms that require special system resources and optimisation. Since all fault types can occur with different rates in every WSN system, there is a need to implement a large number of fault-healing measures in every system. This option is unrealistic and unfeasible due to the large resources and efforts required to implement, run, and maintain such algorithms. Different algorithms may have conicting requirements and thus an architectural design is required to provide a fundamental solution to such design issues. In this paper we advocate service oriented architecture to implement a comprehensive self-healing service that can reuse most of the approaches proposed in literature with minimum system optimisation and resource utilisation. The paper is structured as follows: Section 2 presents related work followed by Section 3 where a service-oriented architecture for WSNs is presented. Section 4 describes the proposed self-healing service. Section 5 presents Clinic application and evaluation. Finally, Section 6 concludes the paper and highlights future work. II. R ELATED W ORK Coles et al. [3] proposed an algorithm called Bayesian Network Mobility (BNM) algorithm that executes when wireless coverage holes appear in WSNs. In its simplest form, BNM algorithm is a distributed mobility control algorithm which enables sensor nodes to compute their optimum direction of movement with a view to maintain or increase the WSN coverage. Nodes move in particular directions according to a predicted probability coverage model that is computed from information obtained through the reconguration process. The reconguration process is the communication between neighbouring nodes to perform localisation, neighbour discovery, synchronisation, navigation, etc. The accuracy of coverage predictions depends on the reconguration rate. The reconguration process involves a considerable amount of communication overhead which might deplete nodes energy. Furthermore, there is a problem of incurring energy overhead associated with the mobility motors. In [4], a self-healing scheme for WSNs to guarantee the network quality of service is proposed. The authors evaluated an autonomic computing paradigm and some concepts of the IT Infrastructure Library (ITIL) to discover, examine, diagnose and react to failures. This article by Lanthaler et al., provides an excellent literature survey about the autonomic computing paradigms and the ITIL. Self-healing is based on a new component called the autonomic manager which is implemented in three locations: outside the network, on nodes, and on cluster heads. The autonomic computing approach helps to heal from faults without human intervention in a quasi-distributed manner. However, the centralised

autonomic manager is vital to this approach. Central services can cause communication bottlenecks, deplete nodes energy, and they might be unsuitable for mission critical applications amongst other disadvantages of centralisation. Moreover, result showed that this approach ts event-driven networks only. Finally, the denition and maintenance of polices is a challenging task. Older fault tolerance techniques are summarised and compared in [5]. Most of the previous work has dealt with single faults in isolation, has imposed constraints on systems (e.g. a particular routing protocol), or required new network elements (e.g. software radios). These techniques indicate the need for a dynamic, hardware, and applicationindependent self-healing service. In this paper we propose a service-oriented architecture for detection and diagnosis of faults in WSNs. There has been sustained research into service-oriented architecture for sensors networks over the last few years. Atlas is a service-oriented sensor platform with middle-ware based on the concepts of self-integrative, programmable pervasive space [6]. So-called Atlas nodes expose themselves as a service to other components. Each sensor registers its service automatically with the controlling server and their services can then be invoked. Most of the service functionality is executed on controlling server and power consumption is not a major concern. Delicato et al. [7] introduced an approach based on a service-oriented development model, a standard interface to access network data, and a set of congurable services components to support the development of applications and to manage the network behaviour at execution time. Resource Aware Service Architecture (RASA) [8] is another service-oriented architecture that features software changes by injecting services at runtime. The design of RASA simplies local collaboration of sensor nodes and adapts software to dynamical changing processes. Service messages consist of public data and code. The code is to be installed by the receiver node and is used to process the messages public data. RASA assumes homogeneous sensor nodes. The implementation details and performance evaluation are not provided. This approach involves increased communication as a result of the regular transmission of code. The abovementioned research focused on architectures for the sensor level communication in WSNs. Very little work has been done on the architecture and functioning of the sensor node. This paper focuses on node local service composition to provide new functionalities, namely fault detection and diagnosis. III. S ERVICE - ORIENTED A RCHITECTURE FOR WSN S There has been a consistent effort to change the mechanism of use of certain capabilities in WSNs, to simplify and abstract them, turning them into services of the network rather than being the result of coordinating the services of

626

important in WSNs systems that deploy special purpose nodes, e.g., Big mots [10], to provide a specic network functionality. SOA has other advantages that are less related to the design of WSNs systems such as network addressable interfaces. In gure 1, a wireless sensor node is modelled as a set of services. Each node is equipped with a set of sensors and a wireless communication interface. All data collected by sensors and received over the wireless interface is buffered at a central unit that is accessed by different network services. The data sharing unit of the node enables multiple services and applications to read from, and write to, data concurrently. For example, a security service monitors the buffered data to detect any anomaly. A service may provide services to user applications or other node services. For instance, the data aggregation service might be used by the routing service or by the visualisation service to generate maps that can be viewed by the user. In the following functionality of each layer of the proposed SOA is summarised.

Figure 1. A service-oriented architecture model of a wireless sensor node.

individual nodes. TinyDBs retrieval of data [9], for example, uses the abstraction of SQL to effectively hide the details of data collection, buffering, and transmission. A service is a function that is well-dened and does not depend on the context of other services. Service-Oriented Architecture (SOA) promotes an environment for system development where systems components are loosely coupled and interoperable. For example, several researchers may develop and deploy different services, e.g. sensors calibration and power management services, in different languages that can collaborate and share information to accomplish a certain task. SOA has several unique characteristics. The most important aspect of SOA is that it separates the services implementation from its interface to maximise reuse. SOA supports self-healing because it allows dynamic binding and execution of components at runtime. If a service fails, the node may nd and bind to a different service as long as the other service provides the same or similar information. In SOA, services/applications interact through interfaces, which allow varying implementations. Therefore, different services/algorithms implemented in different languages by different vendors can be deployed in one system to achieve a predened functionality in a collaborative manner. A service supports a set of interfaces. An interface is a protocol and a data format that each potential client service or application understands. Also, SOA advocates loose coupling between services and applications. The more loose coupled a component, the less a modication in that component will require modications in client services. Another important benet of SOA is service composition. Developers compose services into applications and into other services. However, it is difcult to predict future applications and how services will be used in those applications. Therefore, it is important for interfaces to be correct and anticipate future requirements when dening the structure of an interface. Service clients/applications do not know the location of the service until the service is actually needed. This is particularly

Data sharing unit: The data sharing unit is a nontrivial component that requires careful specication, deployment, and management. The concurrency model for accessing the data sharing unit is studied by the rst and the third authors in [11], however, work on other issues is still in progress. The data sharing unit can monitor and organise the use of shared data resources, group buffer pools, and maintain a catalogue and a directory for available resources. It also generates reports and traces that is presented for the entire system. The traces show events read/write, in chronological order and indicate which service is reporting the event, and the reports show events summarised for the entire system. WSNs have complex data structures as a result of multimodal sensing and distributed applications deployed on heterogeneous nodes. System designers seeking increased agility and reuse through SOA nd that making sense of widely distributed and disparate data is a key barrier to reaching the benets of SOA. To build a successful SOA, designers have to begin with a data abstraction layer that makes sense of an otherwise imperfect data setting. Data abstraction develops the ability to leverage sense data, regardless of its structure, as new, logical schemas that exist only in middleware. Abstraction layer provide a common data layer that system designers can reorganise as needed, instead of making expensive transforms to the sense data or core services. Messaging: Services use dened protocols that describe how services forward and parse messages using description metadata rather than embedding calls to each other in their source code. Communication protocols increases interoperability between different services developed by different vendors through messages across

627

dened message channels. This leads to decrease in the complexity of the end application, thus allowing application developers to focus on true application functionality. Services: The service repository, which contains the descriptions of the services, is accessed by service clients to discover services being provided. Service bus is a SOA integration platform to mediate, connect, and manage interaction between heterogeneous services. Application: WSNs application developers combine and reuse services in the production of applications without knowing the underlying services implementations. The developer associates services in a nonhierarchical arrangement to build an application utilising these sources. Figure 1 describes the Node Service Layer (NoSL), which provides a middleware that serves a set of applications at a higher Application Layer (AL). This NoSL also provides an interface to the Access Layer (AcL) that is the core network resources layer. However, there is a third layer called the Network Service Layer (NeSL) that is a conceptual service layer within the network architecture. The NeSL is used to establish and organise collaboration amongst nodes. IV. C LINIC : S ELF -H EALING S ERVICE FOR WSN The primary purpose of a WSN is to collect and transmit data, but other capabilities have arisen to support this goal. Just as clustering, routing and aggregation allow for more sophisticated and efcient use of the network resources, a self-healing service would support other network services and make many more applications possible with little extra effort. For instance, applying the concepts of selfstabilisation was found to be a promising solution to achieve fault tolerance in WSNs where nodes lose synchrony and programs reach arbitrary states [12]. Rather than resolving these types of isolated concerns, in this work the WSN is expected to use exchanged messages only to produce responses to user queries but also to make use of the data supporting the queries for more effective routing, self healing, further intelligent data aggregation and information extraction, power scheduling and other network processes. Following on from such work and moving to a slightly higher level of abstraction, we propose a simple self-healing service called clinic. Clinic self-healing service is proposed to demonstrate how SOA design can maximise information reuse and facilitate collaboration between services. The proposed service uses the information made available by other network services, e.g. the routing service. The main objectives of the new self-healing service are detecting faults and isolate them using only the information made available by other network services. Information reuse through interfaces and service composition reduce energy consumption as composite services can share information instead of recollecting it. This

Figure 2.

Self-healing service anatomy.

reduction in the number of exchanged messages improves the network performance and life. The self-healing model described here is based around a metaphor of clinic offering diagnosis and treatment of patients. The clinic is the self-healing service and patients being services suffering from fault(s). The concept is to deploy a designated service that provides comprehensive instruction in a particular eld or activity detected by the clinic or reported by various node services as faulty behaviour. The clinic employs several specialists working in cooperation and sharing the same facilities. Each specialist corresponds to a specic fault model that addresses various faults that may occur in one or more service. Specialists perform critical analysis of the nature of a disease or fault through evaluation of patient history, examination, and review of fault data. The evaluation involves a series of message exchange with the diseased service to gather a precise description of the nature of fault. Then, the clinic may contact other node services to aid in nding solutions. In some cases, the local clinic may contact services located on neighbouring nodes to propose solutions. The healing instructions are derived from such an evaluation. For example, if a node loses contact with its cluster-head due to death or mobility of nodes in the path to that cluster-head, the clinic might instruct the routing service to join a new cluster, or it might instruct the local mobility service to move to a new location to establish a new link with the cluster-head. The decisions taken by the clinic might be based on information collected from other network services such as the localisation service, which knows the number and location of neighbouring nodes. This is the main advantage of implementing a node centralised service is the ability to use information collected from various node services to help solve a fault that occurred in an independent service. The authors believe that utilising node local information to solve a problem should come before collaborating with neighbouring nodes to solve that problem. Though, the proposed solution might require collaboration with the surrounding nodes as explained in the above example. The self-healing service is composed of an uncomplicated

628

fault healing model, which detect and heal faults. Figure 2 shows an anatomy of the self-healing service. In this gure, we distinguish three layers of the service, namely the Monitor layer, the Fault Model layer, and Interface layer.

The Monitor layer: The concept of system state is fundamental to self-healing. The service developer has to choose the abstraction of states for service and dene a state machine to track the services state at that level of abstraction. The execution of the service or applications running over it generates local events that trigger transitions in the state machine. The current state of any service is dened as its local state, and the vector of the local states of all the services in the system is the global state. The main function of the Monitor layer is to keep track of the partial view of the global state necessary for its functionality, fault monitoring and detection. The idea of maintaining a partial view of the global state through sending only the required state change notications to upper layers of the Monitor is to help to decrease the interruption the the services. The Fault Model layer: A fault model contains schematic description of a service fault that accounts for its known properties and maybe used for recovering from that fault. The Fault Model layer may contain a number of models each to deal with one or more faults that may occur in a service. For example, the fault model proposed in [13] could be deployed at this layer. The Interface layer: Interface is the layer through which services can interact. Each service exposes a very simple interface to receive messages on the inbound channel, processes messages, generates instructions/events, and passes them to the outbound channel. V. C LINIC I MPLEMENTATION E XAMPLE

Given, the above information (list of neighbours, their location, and sensing values), each node uses the readings of its surrounding nodes to predict its own reading value (y ) using the weighting function dened by the interpolation service. Then, y is compared to the actual node measured value, y. It is desirable that the estimate of y minimises the value dened as = (y y )
2

The fault model can be dened as follows: if > = run Sc where is a predened threshold, that represents the tolerable level of inaccuracy and imprecision in the system and Sc is a self-calibration service. The system could employ any suitable calibration approach from the literature, e.g., the approach dened in [16]. While, this scheme for selfcalibration utilises a number of services, it does not require a signicant development work. VI. A PPLICATION AND EVALUATION In order to test the feasibility of using Clinic for providing self-healing to other network services, we apply it on communication faults. This allowed us to verify the benets in development time and ease of use that programmers would nd using our proposed SOA architecture. We have chosen a routing protocol called MuMHR [17], which is described as a reliable communication technique. In MuMHR selfhealing is achieved by each node learning multiple paths to its cluster-head and by the election of cluster-head backup nodes(s). The self-healing and routing functionalities are separated from each other. The routing functionality is deployed as the routing service and the fault-tolerance model is moved to the self-healing service. Due to the lack of global knowledge and distributed operational manner, MuMHR does not make globally optimal decisions. During MuMHR setup phase, Clinic will monitor the paths by looking at exchanged packets to build a routing tree that it uses to nd alternative routes in case of path failure. Each of the nodes uses a very small amount of its memory space to maintain the routing tree. When the Clinic detects communication link breaks, it refers to the routing tree to nd an alternative path without causing any delays. This tree can be used to solve other problems, e.g. reduce delay by nding the shortest bath, or to serve other services such as the mapping service [18]. Moreover, we dene the radio service that the Clinic make use of to improve communications faults recover. MuMHR does not use such a service. The radio service uses the radio transmissions that a node hear going in its vicinity to determine its neighbours. When node N1 forwards a message to node N2 that is in the path to the destination, node N1 should hear when node N2 forward the message to the next hop N3 on the path to the nal destination. Node N1 will buffer the message

In this section we provide a simple example to demonstrate how Clinic works and how fault models can be dened. Distributed sensor calibration was chosen as a working example because it is an extremely difcult task especially in large-scale WSNs where individual device calibration is intractable. Suppose that there is a WSN system that has an interpolation capability such as the Inverse Distance Weighted interpolation method [14]. An interpolation service could be deployed to be used by visualisation applications such as that described in [15]. The information made available by the routing and interpolation capabilities of the network are exploited by the Clinic to detect errors in sensor measurements and to provide solutions for such errors in a distributed manner. Using the routing service, each node can keep a list of its neighbouring nodes and their relative location. Due to the nature of broadcast communication in wireless networks, a sensing node can also get a copy of its neighbours measurements of a certain sensed modality.

629

NeSL, for collaboration amongst nodes. Such integration of information proved to be powerful enough to enable the development of solutions to various outstanding problems within WSNs systems. We have presented, above, our very early research into SOA for self-healing in WSNs. Leading directly from the work in this paper, there are a number of avenues that need to be followed. Most importantly, all of the architectural layers dened above need full specication and implementation. Then, this approach needs to be applied and tested on more services. Finally, the computation and communication requirements are to be evaluated.
Figure 3. DDR of the standard MuMHR routing vs. DDR of MuMHR with Clinic.

R EFERENCES
[1] L. M. S. de Souza, Ft-cowisenets: A fault tolerance framework for wireless sensor networks, in SENSORCOMM 07: Proceedings of the 2007 International Conference on Sensor Technologies and Applications. Washington, DC, USA: IEEE Computer Society, 2007, pp. 289294. [2] C. Almeida and P. Verissimo, Timing failure detection and real-time group communication in quasi-synchronous systems, Real-Time Systems, Euromicro Conference on, vol. 0, p. 0230, 1996. [3] M. D. Coles, D. Azzi, and B. P. Haynes, A self healing mobile wireless sensor network using predictive reasoning, Sensor Review Journal, vol. 28, no. 4, pp. 326333, 2008. [4] M. Lanthaler, Self-healing wireless sensor networks, http://www.cs.helsinki./u/niklande/opetus/SemK07/paper/ lanthaler.pdf, 2008, [Online; accessed 07-Feb-2010]. [5] L. Paradis and Q. Han, A survey of fault management in wireless sensor networks, J. Netw. Syst. Manage., vol. 15, no. 2, pp. 171190, 2007. [6] J. C. King, Atlas: a service-oriented sensor and actuator network platform to enable programmable pervasive computing spaces, Ph.D. dissertation, 2007, adviser-Helal, Abdelsalam (Sumi). [7] D. F, P. L, P. P, and de Rezende J, Web technologies to build automatic wireless sensor networks, 8th IFIP IEEE international conference on mobile and wireless communication networks, pp. 99 114, 2006. [8] Sara: A service architecture for resource aware ubiquitous environments, Pervasive and Mobile Computing, vol. 6, no. 1, pp. 1 20, 2010. [9] S. R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong, Tinydb: an acquisitional query processing system for sensor networks, ACM Trans. Database Syst., vol. 30, no. 1, pp. 122173, 2005. [10] R. Newman and E. Gaura, Size does matter: the case for big motes, In the proceedings of Nanotech, 2006. [11] S. Mount, M. Hammoudeh, S. Wilson, and R. Newman, Csp as a domain-specic language embedded in python and jython, Communicating Process Architectures 2009 (WoTUG), 2009.

until it hears node N2 forwards the message to node N3 . If an adequate time passes without node N1 hearing the retransmission of its message by node N2 , then N1 decides that there is a problem with node N2 and it rebroadcast the message again requesting different neighbour to forward the message along. Clinic was simulated using python-based simulation software called Dingo [19]. Dingo has proven that it is not only easy to use, but also powerful enough to model and simulate the Clinic service at various design stages. It provides an easy way to develop system models, enabling users to quickly manipulate hardware elements and achieve the desired results without having to build a full hardware prototype. Random graphs were dispersed in a 100m100m region such that no two nodes share the same location and the transmission range of each node is bound to 75m. A simple model for radio hardware energy dissipation is also assumed. Figure 3 shows that Clinic increases the DDR in the presence of communication failures. Data Delivery Ratio (DDR) [20] is a ratio of successful distinct payload octets received to attempted payload octets transmitted. VII. C ONCLUSION In this paper we demonstrated that SOA could provide a single-integrated solution for the development of WSNs systems. SOA allows the creation of composite self-healing applications using a set of existing services that expose their functionality via standard interfaces. When a number of different services can all share the same structure, and where the relationships between the parts of the structure are the standardised, then new unforeseen applications that will reuse the same set of services are possible with minimum effort. As the number of services increases, the failure becomes a critical issue. In order for the system not to fail, it must be self-healing. Clinic provides a peer-to-peer approach for providing self-healing through integrating information collected by separate services to determine possible solutions for various faults. It also provide a mechanism, at the

630

[12] V. Turau and C. Weyer, Fault tolerance in wireless sensor networks through sel-stabilisation, Int. J. Commun. Netw. Distrib. Syst., vol. 2, no. 1, pp. 7898, 2009. [13] K. Ozaki, K. Watanabe, T. Enokido, and M. Takizawa, A fault-tolerant model of wireless sensor-actuator network, Int. J. Distrib. Sen. Netw., vol. 4, no. 2, pp. 110128, 2008. [14] D. Shepard, A two-dimensional interpolation function for irregularly-spaced data, in Proceedings of the 1968 23rd ACM national conference. ACM Press, 1968, pp. 517524. [15] M. Hammoudeh, R. Newman, S. Mount, and C. Dennett, A combined inductive and deductive sense data extraction and visualisation service, in ICPS 09: Proceedings of the 2009 international conference on Pervasive services. New York, NY, USA: ACM, 2009, pp. 159168. [16] P. Loden, Q. Han, L. Porta, T. Illangasekare, and A. P. Jayasumana, A wireless sensor system for validation of realtime automatic calibration of groundwater transport models, J. Syst. Softw., vol. 82, no. 11, pp. 18591868, 2009. [17] M. Hammoudeh, A. Kurz, and E. Gaura, Mumhr: Multipath, multi-hop hierarchical routing, in SENSORCOMM 07: Proceedings of the 2007 International Conference on Sensor Technologies and Applications. Washington, DC, USA: IEEE Computer Society, 2007, pp. 140145. [18] J. Shuttleworth, M. Hammoudeh, E. Gaura, and R. Newman, Experimental applications of mapping services in wireless sensor networks, in Fourth International Conference on Networked Sensing Systems, June 2007. [19] S. Mount, Dingo wireless sensor networks simulator, http: //code.google.com/p/dingo-wsn/, 2008, [Online; accessed 26Feb-2010]. [20] J. Dunn and C. Martin, Terminology for frame relay benchmarking, in Internet informational RFC 3133, June 2001.

631

Potrebbero piacerti anche