Clustering-Based Correlation Aware Data

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts
for publication in the IEEE GLOBECOM 2005 proceedings.
Clustering-Based Correlation Aware Data

Aggregation for Distributed Sensor Networks
Ramanan Subramanian, Hossein Pishro-Nik and Faramarz Fekri
School of Electrical and Computer Engineering
Georgia Institute of Technology
Atlanta, Georgia 30332-0250
Emails: {ramanan, hossein, fekri}@ece.gatech.edu
Abstract— Temporal and spatial correlation in the sensed of sensor nodes randomly deployed in a circular field. We
data in Wireless Distributed Sensor Networks gives room for then divide the region into several layers, and each layer
better energy efficiency in the network. Several data aggregation is divided into sectors, thus defining the cluster boundaries.
schemes have been suggested in the literature. However a clear-
cut solution which quantitatively describes most energy-efficient Using this scheme, we quantify the “cost” of transmission by
routing scheme is still lacking. In this paper, we propose a novel, a metric based on the number of bits required for each hop
generalized clustering-based aggregation scheme, called “Annu- and the Euclidean distance involved in the hop. Hence, our
lar Slicing-based Clustering (ASC)” and show that by varying problem of finding the optimum transmission structure results
the cluster size and the distribution of clusters in the deployment in a non-linear optimization problem with certain constraints.
area, one can approach the most energy-efficient aggregation
scheme. Analytical expressions for the optimal cluster size and The rest of the paper is organized as follows: In section II
distribution have been arrived at, for a specific correlation model we present a brief introduction to some of the work related
and a cost function based on the Euclidean distance traversed to our own. In section III, we discuss the assumptions on
by the transmitted data. With the help of numerical simulation, the network we work with and the analytical models we use
it has been found that the proposed aggregation technique can for the energy-metric cost function and the correlation model.
achieve optimality over a wide range of correlation.
In section IV, we formulate the problem mathematically and
show how we arrive at a non-linear optimization problem. In
I. I NTRODUCTION section V we describe a scheme from [1], and relate it with
Recent advances in wireless communication, networking as our setup. In section VI we propose our clustering scheme for
well as in hardware technology such as nanotechnology and the problem. We then discuss in section VII the performance
mems have opened up the potential for large-scale applications of our ASC scheme in various settings and discuss the results
of Distributed Sensor Networks in fields such as defense, from numerical simulation. We then show analytically that our
environment sensing/tracking, power system monitoring etc.,. method improves upon this scheme. Section VIII is a summary
wherein the sensor nodes co-ordinate to accomplish a specified of our conclusions and contributions of this work.
sensing tasks. The foremost task of the Sensor Network in
any of its applications is to jointly collect and transmit back
II. R ELATED W ORK
the sensed data in response to query requests by the sink(s).
However, any application involving Sensor Networks should Literature has it that due to the inherent spatial correlation
also take into account the node constraints, namely the limited in the sensed data, aggregation (also called Data Fusion)
availability of energy, computational power and memory, to techniques have to be incorporated into the routing protocols.
be practically viable. Hence, a lot of emphasis is recently Several aggregation-based data gathering techniques have been
being laid on the design of efficient routing protocols for such suggested in the literature. In [2] two strategies for optimal rate
networks. allocation using entropy-based coding, such as Slepian-Wolf
By nature, physical phenomena are spatially and temporally Coding, as well as algorithms to enable optimize transmission
correlated. This spatial correlation results in redundancy in the using minimum energy have been suggested. In [3] Networked
data transmitted back to the sink node. This gives room for im- Slepian-Wolf is described in detail, and the author shows
proving the energy-efficiency of the network by compressing that finding an optimal transmission structure for an arbitrary
the incoming data at key “junction” nodes (thus eliminating traffic matrix is NP-hard. In the same article, an approximate
all redundancy) before further transmission towards the sink optimization algorithm for the rate allocation in the single-sink
node. This technique is termed Data Aggregation, Data Fusion data gathering case has been provided.
or Routing with Compression. In general, this is more efficient In [4], a clustering-based scheme for aggregation with a
compared to locally optimal techniques such as shortest path correlation model similar to ours was analyzed, and it was
routing at individual nodes. The primary focus of this paper shown that there exists a single cluster size for which near-
is to provide an energy-efficient clustering-based aggregation optimal aggregation can be achieved for a wide range of
scheme for a random deployment of sensor nodes, to route correlation coefficients. Here, it was assumed that all the
sensed data to the sink node. We consider a large number sources are located inside clusters, each of which is at D hops
IEEE Globecom 2005 3253 0-7803-9415-1/05/$20.00 © 2005 IEEE
Authorized licensed use limited to: BMS COLLEGE OF ENGINEERING. Downloaded on October 10, 2009 at 00:04 from IEEE Xplore. Restrictions apply.
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE GLOBECOM 2005 proceedings.
away from the sink node. In contrast with [4], however, we at the center. n sensor nodes are placed uniformly at random
assume a case wherein sensors are randomly “sprinkled” in in the deployment area. i.e., the expected number of nodes in
the deployment area. Also, the Euclidean distance traversed any region of area A inside the circle is given by Aπ n.
by the transmitted bits is incorporated within the cost metric, The sink node broadcasts queries to the source nodes,
which, to our knowledge is very realistic. This results in a requesting for sensed data in the sensor. It is also assumed
complicated non-linear optimization problem to solve. Using that on the average, k sensors respond to any single query.
heuristics, we then arrive at a scheme wherein we argue that We assume that these k sensors are uniformly distributed at
near-optimal aggregation can be achieved. random in the circular area. Hence the expected number of
In [1], a similar scheme wherein the deployment area is source nodes in any region given by area A is also A π k.
divided into annuli and sectors was considered. It was shown We assume that all the sensor nodes are identical, and so is
that the problem of finding the minimum routing scheme the entropy of each source.
reduces to the problem of finding the minimum weight Steiner
Tree on the network for a random choice of k vertices
(corresponding to source sensors). The Steiner Tree problem A. Correlation Model
can be summarized as follows [5]:
Here, we describe a model similar to the one in [4] for the
Let G = (V, E) be a complete graph on n vertices and m :
joint entropy of a collection of sources. However, the former
E → R be a metric. Then we need to find a subtree T ⊆ G on is based on empirical results. Here, the joint entropy depends
the vertex set S such that m(e) is minimum, for a given
e∈E(T ) upon a correlation parameter, ρ ∈ [0, 1]. In general, depends
choice of S ⊆ V . In general, solving a Steiner Tree problem on upon the conditions of the environment during the sensing,
a graph G is NP-hard, even for a 95/94 approximate solution and upon the nature of the sensed data. We assumed that
[6], [7]. A Steiner Tree-like solution proposed for the above ρ can be estimated, and is constant during one full query-
problem is termed “Semantic Correlation-aware Tree” (SCT) response correspondence. Let H0 be the entropy of any single
in [1]. The author has analyzed the case when there is perfect source node, with a “correlated part” amounting to entropy
correlation in the sensed data, i.e., the correlation coefficient, ρH0 and an “uncorrelated part” amounting to (1 − ρ)H0 . The
ρ = 1, and has shown that this solution has the same order joint entropy of any two sources is calculated thus:
of magnitude as the global optimal solution. However, we 1) Uncorrelated part = 2(1 − ρ)H0 ,
have shown that this optimality property of this scheme breaks 2) Correlated part = ρH0 .
down when the correlation coefficient is strictly below 1. By
judicious choice of the clustering scheme, we show that a near Hence, the joint entropy = H0 +(1−ρ)H0 . In general, if there
optimal solution can be achieved with our methodology, than are q1 , q2 , · · · , qj sources at each of j levels of aggregation,
the one described in [1]. then the joint entropy is given by
j

H0 {1 + (q − 1)(1 − ρ)} where q = qi .
III. A SSUMPTIONS AND M ODELS
i=1
00
1111 1
00 0
1
00
11
00
1100
11 00
1
11
00 00 1
11 00
1
0
1 B. Cost Function
00 00
11
00
11 0 11
1 00
11
0
1 0
1 00
11
00
11
0 11
1
0
1 00 00
11 It is evident that the solution to our general problem is also
00
11 0
1 00
11
Source Node 10
1 0
1 0
1 00
11 0
1 0
1
0
0
1 0
1 0
1
0 1
1 00
11 0
1 0
1 a Steiner tree. We need to define the cost function for each
0 0
1 00
11 0
1
00
11
00 11
11 00 0
1
0
1 0
1
0
1 00
11
00
11 00
11 edge. Let ξ1 , ξ2 , · · · , ξk be the locations (i.e., 2-D vectors) of
0
1
0
1 00
11 0
1 00
11
011
1 0
1 11
00
0 00
1 11 0
1 each of the responding nodes to any query in the deployment
00 0
1 00
11 0
1
00
11 0
1 00
11 0
1
00
1100
11 0
1
0
1
00
11
00
11 00
11
00
11
0
1
0
1 0
1
0
1
00
11 area. Define G = (V, E) to be the complete graph on V =
00
11 0 11
1 00
00 11
11 00
00
11 0
1
0
1 00
11 00
11 0
1 {ξ1 , ξ2 , · · · , ξk }. Then, define the metric M mapping E to R
0 11
0
1 00 0
1
0
1 00
11 0
1 1 00 11
11 00
00 1
11
0
1 0
1
0 10 0
1 0
1 0
0 1
1 0
1 0
1 0
1 as
0
1 0
1 00
11 00
11 00
11 0
1
00
11
00
11
0
1 00
11 00
11 00
11 M(e) = u − vHe .
00
11 00
11 00
11
0
1 00 11
11
0 11
1 00
00 11
11 00 0011
1100
00
11
Sensor Nodes 1 11
0 00
00
11
00 00
11
00
11
00
1100
11 where e = {u, v} and He is the expected entropy of the part
1
0 00 11
11 00 0
1
00
11 0
1 0
1 of the query-response that originates at that end of e farther
0
1
0
1 0
1
011
1 0
1
00 from the origin. The reason for choosing this metric is that it
00
11 00
11
011
1 00
11
00
11 0
100 11
00
0
1 presents a more sophisticated measure of the energy spent in
the network, since distances are also taken into account with
Fig. 1. Distribution of Sensor Nodes in the deployment area
the number of hops, compared to previous work such as [4]
which does not take the actual distances into account.
Here, we present an abstract general model of a distributed Let T = (VT , ET ) be a minimum Steiner-Tree solution to
sensor network, enabling our formulation to work for a wide our problem. Then the cost function to be minimized is
range of applications. We assume a model similar to the one
in [1]. We assume that the deployment area is in the form of a C= M(e). (1)
circular region of unit radius, where a single source is located e∈ET
IV. P ROBLEM F ORMULATION For example, the proposed solution in [1], called the
Semantic Correlation aware Tree or the SCT), is a feasible
Let the deployment area be divided into circular annuli of
solution for our approach, wherein rj = m j
and S(j) = 2j−1
m2 s .
radii {rj }m
j=1 , where rj is the radius of the j
th
annulus from
In other words, the clustering is based on division of the
the source. Each annulus corresponds to a level of aggregation.
deployment area into annular rings of equal radii, and each
Furthermore, each annulus is divided into {S(j)}m j=1 equi- sector has a certain expected number (s) of sources, from
angular sectors. Sensors within a sector in an annulus now
which data is routed to the Steiner Node, which is at the
forms a cluster. The sensor at the geometric center of the inner
center of the innermost arc region. The choice of the sequence
circular arc declares itself as a “Steiener Node” [1]. The other
S(j) is legitimate for the particular choice of rj since it
nodes in any cluster tries to transmit the incoming message
ensures that the clusters are equal in area. The optimal m
(or the sensed data itself, if the node is a source) through the
was determined for the case when ρ = 1. Also the message
shortest path to the Steiner Node of that cluster using Greedy
complexity for this solution was computed, and using Graph
Perimeter Stateless Routing (GPSR) [1],[8].
Theoretical results, it was shown that this solution has the
We note that the expected number of source nodes in any
same order of cost function as that of the optimal solution
cluster
2 is 2equal
to (determining
√ which is computationally NP-hard), √ namely
k rj − rj−1 /S(j). It can be easily verified that the worst
O( k). The constant factor was found out to be 23 3π when
case distance traveled by the message from a source to reach
ρ = 1.
the corresponding Steiner Node is given by

2 π The total entropy of the above scheme scheme when ρ < 1
f (j) = (rj − rj−1 ) + 4rj rj−1 sin2 . (2)
2S(j) can be determined as follows:
Note that there are S(j) Steiner Nodes in the j th aggregation
level. Thus, the total entropy H(j) at the j th level, given by V. A NALYSIS OF THE SCT AS A FUNCTION OF THE
the sum of the joint entropies of the Steiner Nodes, assuming C ORRELATION FACTOR
each source sensor has unit entropy is given by: We shall first derive an expression for the total entropy at
the ith level. At this level, we note that
H(j) = ρS(j) + k(1 − ρ) 1 − rj 2 2the2 expected number
(3) of covered source nodes is equal to m −i
k.
+ k(rj 2 − rj−1 2 ) for j = 1, · · · , m. m2
We also note that the
number of Steiner Nodes that are covered
Hence, the problem of determining the minimum cost rout- is equal to 2i−1
m2 s k. We assume that the sensor network uses
ing scheme reduces to the following non-linear optimization Greedy Perimeter Stateless Routing (GPRS) [4] to transmit
problem: messages from a source to the next level. Let f (s, m) =

1 2
Find m, and sequences {rj }m m
1 , {Sj }1 such that πsm 2
4k + m be the average path length from a source
m
node to a Steiner node. The total entropy at this level is given
C≤ H(j)f (j) is minimum. (4) by
j=1
2i − 1 2i − 1
subject to the constraints: H(i) = k + ρ k
m2 m2 s
2 . (8)
m ∈ Z+ ∗, (5) m − i2
+ (1 − ρ) k
S(j) ∈ Z+
∗ , ∀j
∈ {1, 2, · · · , m} and (6) m2
0 = r0 < r1 < r2 < · · · < rm−1 < rm = 1. (7) Hence the cost function is
m

C≤ f (s, m)H(i) (9)
i=1
2
ρ 4m − 3m − 1
= kf (s, m) 1 + + (1 − ρ) . (10)
s 6m
rn−2
The drawbacks of the above scheme are evident: this
rn−1
solution is conjecture-based, and does not describe how the
r1 annuli are aligned with respect to each other. A discussion
r2
on the performance of this scheme for ρ < 1 is also lacking.
rn Also, the choice of the sequence S(j) is such that messages
that have already gone through several levels of aggregation
are made to follow a circuitous path to the source, in order to
facilitate aggregation of “fresh” sensed data from the sources
within that cluster. This is unnecessary, and increases the
energy cost function. However, in the general framework we
described in Section IV, we can show that several of the above
Fig. 2. Division of deployment area into Annuli in the generic scheme drawbacks can be avoided by a judicious choice of rj and S(j)
m
k(1 − ρ) √ √
sequences. Furthermore, we can investigate the performance = ρl + √ {(m − r)( r − r − 1)}
of the scheme for ρ < 1. m m r=1
m . (18)
k
+ f (r)
VI. P ROPOSED S CHEME - A NNULAR S LICING - BASED m r=1
C LUSTERING
m
√ √
Each annulus in the proposed Annular Slicing-based Clus- {(m − r)( r − r − 1)}.
tering (ASC) is divided into equal number of sectors, and the r=1
m

annuli are oriented in such a way that corresponding sectors √ √ √ √
overlap. To ensure = {m( r − r − 1) − r( r − r − 1)}. (19)

equal area sectors in each annulus, we need
j r=1
to have rj = m . This ensures that the aggregation load on m
√ √
each Steiner Node is uniform, and none of the Steiner nodes ≤ m m− udu (20)
is under utilized or over utilized. 0
1 √
Further, it is also legitimate to expect that the division of = m m. (21)
clusters in the optimal solution would be “matched” to the 3
radial and angular marginal density functions of the original By Cauchy-Schwartz inequality, we have

probability density function describing the distribution of the m m
1
sensors. In this paper, we assume uniform distribution of √ f (r) ≤ f 2 (r).
sensors. Hence, the probability density function is given by: m r=0 r=0
m
1 ≤ f 2 (1) + f 2 (x)dx. (22)
pR,Θ (r, θ) =
, (11) 1
π
0 ≤ r ≤ 1, 0 ≤ θ ≤ 2π. m √ √
1+ 1
(2x − 1) − 2 x x − 1 cos πl dx
= . (23)
Hence, the marginal pdf’s are given by: m
2π

pR (r) = pR,Θ (r, θ)rdθ = 2r (12) m √ √ π
0 (2x − 1) − 2 x x − 1 cos dx.
2π 1 l
1
pΘ (θ) = pR,Θ (r, θ)rdr = . (13) 1 π
0 2π = m(m − 1) − m − m(m − 1) cos .
2 l
This legitimizes the division of sectors uniformly : since 1 π
the marginal pdf on θ is independent of θ. Furthermore, the + ln (2m − 1) + 2 m(m − 1) cos . (24)
4 l
expected number of nodes within an annulus is proportional One can numerically show that the above expression approx-
to rj imates quite close to
2rdr = (rj 2 − rj−1 2 ) = 1/m. (14) m
rj−1
π
f = (m − 1) − (m − 1)2 − 1 cos . (25)
which is independent of j, which is desirable. 2 l
m
k
Hence f (r)
m r=0
VII. A NALYTICAL AND N UMERICAL R ESULTS
m−1 π cos πl
We now analyze the performance of the proposed Annular ≈k 1 − cos + . (26)
Slicing-based Clustering (ASC) scheme as follows. We need m l 2(m − 1)
to define the following For minimizing the cost function, we need to minimize an
expression of the form α m−1 β
1 √ √ π m + m(m−1) :
Let f (x) = √ (2x − 1) − 2 x x − 1 cos . (15)
m l d m−1 β
α + =0 (27)
m−r dx m m(m − 1)
Let M C(r) = ρl + k(1 − ρ) √
m α+β
√ √ . (16) ⇒m= √ √ (28)
r− r−1 k α+β− β
× √ + f (r)
m m d2 m−1 β
α +
We observe that f (r) is the average path length for transmit- dx2 m m(m − 1)
ting a message from a sensor to the Steiner node in the rth β α+β
= − (29)
level. Then the cost function is given by: (m − 1)3 m3
√
m
β β
= 3
1− √ √ > 0 (30)
C≤ H(r)f (r). (17) (m − 1) α+ β
r=1
Hence, the cost function has a minimum with respect to m.

300
Therefore, this gives an expression for the optimum number
of annuli as the
√
optimum number of aggregation levels
α+β√
mopt = √α+β− β
is the optimum number of aggregation 250
levels, where α = 1 − cos πl and β = 12 cos πl .

Hence the minimum energy cost Cmin can be obtained as 200
Energy Cost
1
Cmin ≤ ρl + k(1 − ρ)+
3 150 ρ=0
mopt − 1 π cos πl
(31)
k 1 − cos +
mopt l 2(mopt − 1)
100
We can then determine the optimal l, the number of sectors ρ=1

into which each annulus is ‘sliced’ into, which minimizes the 50
0 20 40 60 80 100
energy cost. Number of Sectors
The term 31 k(1 − ρ) corresponds to the uncorrelated com- Fig. 4. Minimum Energy Cost as a function of ρ
ponents of the message. The corresponding term in the pre-
vious scheme, wherein the annuli are of the same width is
2 2
3 mk(1−ρ)favg ≥ 3 k(1−ρ). Hence we see that the suggested VIII. C ONCLUSIONS
aggregation scheme gives considerable improvement over the In this paper, we have described a general framework for
scheme in [1] for the case when ρ is smaller than 1. Using determining energy efficient data aggregation schemes for Dis-
numerical computation for the case when ρ = 1, we can show tributed Sensor Networks in general. With a framework as in
that our scheme still has an improvement of about 33% over Section IV we have demonstrated how this framework works.
[1]. A plot of the cost functions in this case is shown in Fig 3. We also described a near-optimal solution to the problem.
By plotting the energy cost as a function of the number of These heuristics are also expected to work as well for other
sectors for various ρ, we can see how the optimum l behaves interesting extensions (possibly application-specific), such as
as a function of the correlation factor. We can see whenever sensors distributed non-uniform at random, where multiple
the correlation is above 0.5, the optimum value of l does not sinks and actors handle query responses, and so on. Also,
change much. Hence, l around 25-30 would be near optimal more complicated correlation models for the data could be
for a wide range of ρ. We have assumed in our derivations considered with the same approach. Our scheme gives a
that the function bounding Cm in is well-behaved, having only basic framework to handle such generalizations, and a similar
one extremum (with respect to both the number of aggregation clustering based on annuli and sectors can also be found.
levels, m and the number of sectors l). A plot of the function
for various ρ as a function of the number of sectors l is given in R EFERENCES
4. Clearly, the energy cost function has exactly one minimum.
[1] Y.Zhu and R.Sivakumar, “Enabling efficient aggregation in distributed
Again, the function has exactly one minimum. Hence, our sensor networks,” in Technical Report, Georgia Institute of Technology.
assumptions that the bounding function is well behaved is [2] B. Beferull-Lozano R. Cristescu and M. Vetterli, “On network correlated
proven by these plots. data gathering,” in INFOCOM, Hong Kong, March 2004, IEEE.
[3] B. Beferull-Lozano R. Cristescu and M. Vetterli, “Networked slepian-
wolf: Theory, algorithms and scaling laws,” Transactions on Information
Theory, submitted December 2003.
110 [4] B. Krishnamachari S. Pattem and R. Govindan, “The impact of spatial
ASC correlation on routing with compression in wireless sensor networks,” in
100 SCT International Symposium on Information Processing in Sensor Networks
(IPSN), Berkeley, CA, April 2004, ACM/IEEE.
90
[5] D.R. Dreyer and M.L. Overton, “Two heuristics for the steiner tree
min
problem,” Journal of Global Optimization, vol. 13, pp. 95–106, 1998.

Minimum cost function, C
80
[6] M. Chlebik and J. J.Chlebikova, “Approximation hardness of the steiner
70 tree problem on graphs,” in Proc. 8th Scandinavian Workshop on
Algorithm Theory (SWAT). 2002, pp. 170–179, Springer-Verlag.
60 [7] S.S. Skiena, The Algorithm Design Manual, pp. 339–342, Springer-
50
Verlag, 1997.
[8] B. Karp and H.T. Kung, “Gpsr: Greedy perimeter stateless routing for
40 wireless networks,” in Proc. 6th Annual International Conference on Mo-
bile Computing and Networking(MobiCom 2000), Boston, Massachusetts,
30 August 2000, Sigmobile/ACM.
20
100 150 200 250 300 350 400 450 500
Number of sources, k
Fig. 3. Comparison of the two aggregation schemes for ρ = 1

Clustering-Based Correlation Aware Data

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Clustering-Based Correlation Aware Data

Caricato da

Copyright:

Formati disponibili

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts

for publication in the IEEE GLOBECOM 2005 proceedings.

Clustering-Based Correlation Aware Data

IEEE Globecom 2005 3253 0-7803-9415-1/05/$20.00 © 2005 IEEE

IEEE Globecom 2005 3254 0-7803-9415-1/05/$20.00 © 2005 IEEE

IEEE Globecom 2005 3255 0-7803-9415-1/05/$20.00 © 2005 IEEE

IEEE Globecom 2005 3256 0-7803-9415-1/05/$20.00 © 2005 IEEE

Hence, the cost function has a minimum with respect to m.

levels, where α = 1 − cos πl and β = 12 cos πl .

We can then determine the optimal l, the number of sectors ρ=1

problem,” Journal of Global Optimization, vol. 13, pp. 95–106, 1998.

Fig. 3. Comparison of the two aggregation schemes for ρ = 1

IEEE Globecom 2005 3257 0-7803-9415-1/05/$20.00 © 2005 IEEE

Potrebbero piacerti anche