Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2870890, IEEE
Transactions on Parallel and Distributed Systems
1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2870890, IEEE
Transactions on Parallel and Distributed Systems
TABLE I
combined and regenerated at the intermediate nodes according
A N ILLUSTRATIVE EXAMPLE OF THE OVERFLOW PROBLEM IN THE
to algebraic encoding. In addition to throughput enhancement
CASE OF BINARY DIGITS
[6]–[9] and data robustness [10], the other advantages of
network coding are reliability and security.
A b c
1 1 1 A. Network Coding for Data Recovery
(1, 1, 0)T (0, 3, 5)T =(0, 11, 101)T
1
2 3
Network coding can make data recovery process more
efficiently, especially in the distributed storage systems. In
1 4 5
contrast to erasure coding [11], the repaired data fragments in
network coding are mixed in the intermediate nodes. Hence,
network coding can recover data with smaller amount of
the bit length of c is extended to six. We call it the overflow information communicated during a repair process. As proved
problem in this paper. by [12], the data recovery problem of distributed storage sys-
In overflow problem, bandwidth expansion and redundant tems can be translated to the routing problem of multicasting
computation occur if the data representation formats of the networks. A new class of storage codes based on network
original data and the coded data are mismatched. This is coding namely Regenerating Code was proposed in [12]. For
mainly because the existing encoding process (e.g., splitting coding complexity reduction, the authors of [13] proposed a
the original data) does not consider i) how to place the coded low-complexity regenerating code using a new form of coding
data among multiple clouds and ii) the risk of being decoded matrix with a small field size.
by the eavesdropper. Clearly, the extended coded data would As in the distributed storage systems, recent studies [14]–
waste storage space and degrade coding efficiency. [16] demonstrated the feasibility of storing coded data to
To solve this overflow problem, we develop a systematic multiple clouds even with multiple node failures. The authors
design method to calculate the appropriate parameters of a of [14] applied network coding to optimize the reliability
network-coded cloud storage system, such as the size of performance of frequently accessed data in cloud storage
encoding matrix. The key idea of the NCSS scheme is to take systems. To simplify the repair procedures, network coding
dynamic-length alphabet representation of network coded data. with network structure based on the general erasure codes
The original data are regrouped before the encoding process. was shown to reduce the repair traffic significantly [15]. A
A complete encoding procedures and data distribution scheme new type of regenerating code that can reconstruct coded data
are jointly designed for secure cloud storage. Our contributions from multiple failures in batches rather than separately was
are described as follows. proposed in [16].
• Formulate the overflow problem of a network-coded
cloud storage system. To our best knowledge, the over- B. Network Coding for Data Security
flow problem for a network coding storage system has
Network coding can prevent data from being eavesdropped
not been investigated in the literature yet.
in a wiretap network where a wiretapper can access any one of
• Propose an overflow-avoidance network coding based
subsets of wiretap channels [17]. The goal of a secure network
secure storage (NCSS) scheme.
coding scheme for a wiretap network is to ensure a wiretapper
• Analyze the minimum storage cost subject to different
obtains no information about the original message, while all
security levels and derive the upper bound of the amount
the legitimate receivers can decode the message. A network
of encoded data that can be stored in cloud databases to
coding system was built so that a wiretapper cannot obtain any
achieve perfect secrecy.
information [18]. The construction of a secure linear network
• Provide the design guidelines for the appropriate size of
code for a wiretap network was presented in [19].
the encoding matrix so that the network coding process
For network-coded distributed storage systems, the secrecy
can be accelerated.
capacity was used to quantify the secure storage capacity [20]–
The rest of this paper is organized as follows. In Section II, [23]. The secrecy capacity is defined as the maximum amount
we give a literature survey on the related works of network of data that can be securely stored under the perfect secrecy
coding storage systems. In Section III, the overflow problem condition. Perfect secrecy means that the eavesdropper cannot
of a network-coded cloud storage system is formulated. In obtain any information of source data. In other words, perfect
Section IV, we analyze the overflow problem. We present the secrecy requires that the entropy of the plaintext is equal to
overflow-avoidance NCSS scheme in Section V. In Sections the conditional entropy of the plaintext given the eavesdropped
VI and VII, we analyze the security and storage performances data. A network coding scheme for approaching the storage
of the proposed scheme, respectively. Section VIII shows the upper bound under the perfect secrecy was proposed in [20].
experimental results. Finally, we give our concluding remarks Different secure regenerating codes to achieve perfect secrecy
in Section IX. against eavesdropping were reported in [21]–[23].
For secure storage over multiple clouds, the authors of [4]
II. R ELATED W ORK proposed a security protection scheme to prevent eavesdrop-
Network coding is a generalized store-and-forward network pers from decoding any symbol. In [24], a link eavesdropping
routing principle. Messages from different source nodes are problem was investigated in a network-coded cloud storage
1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2870890, IEEE
Transactions on Parallel and Distributed Systems
system in which transmission links between the local data- • We investigate the overflow problem in chunked network
center and its remote backup site are eavesdropped. In the coding. Through analyses, we show that the number of
considered link eavesdropping problem, the security level is bits to represent a symbol is an important factor related
defined as the probability that coded data cannot be decoded to the overflow problem.
correctly. In addition to eavesdropping attacks, some recent • Different from the previous work [33] considering binary
works [25], [26] investigated how to detect when the coded operation, we extend the performance characterization of
data are modified. chunked network codes using a general finite field.
• The encoding process and the data placement are jointly
designed in the proposed network coding framework in
C. Performance Issue of Network Coding consideration of the storage cost as well as security
Two major challenges for designing a practical network requirements. Extending [34], we consider a probabilistic
coding system include i) the computational cost of encoding security model of a network-coded cloud storage system.
and decoding and ii) the storage cost of coded data. The Our results provide a comprehensive understanding for
destination node can decode the received packet if and only if finding the best combination of coding and storage pa-
the coefficient matrix of the packet is full rank. To decrease rameters.
the probability of receiving linearly dependent packets, the
coding parameters including the field size and the encoding III. S YSTEM M ODEL AND P ROBLEM S ETUP
matrix size are assumed to be large. However, larger value of
coding parameters will lead to higher computational cost [27]. Now we discuss the coding scheme and define the overflow
In addition, to decode a received codeword, the destination problem.
node requires the coding vector which results in additional
packet overhead. Especially, the computation and storage costs A. System Model for NCSS
would be severe for a huge number of input packets [28].
Consider the original base-d data vector b = (b1 , . . . , bn )T ,
To overcome the above issues, it is proposed to separate
where elements bi are independent discrete uniformly dis-
a large file into a number of small chunks to which the
tributed integers over {0, . . . , d − 1}. To securely store b to
network coding is applied [29]. This design is also used in the
multiple cloud databases, network coding scheme that encodes
network coding storage system in which the information bits
symbols by linear transformation is considered in this paper
are divided into groups (chunks) before encoding. However, it
[4].
is still an open issue to jointly optimize the design of chunked
Let an n × n Vandermonde matrix A be the encoding
network codes and chunk transmission scheme [30].
matrix, where [Ai,j ] = (ai−1 j ) and ai are distinct nonzero
elements over a finite field Fq for q = 2k > n. Then a cloud
D. Objective of This Paper user encodes data c = (c1 , . . . , cn )T = Ab and splits the
encoded data into p segments. It is assumed that the cloud
In this paper, we focus on the performance issue of network
user can arbitrarily store any piece of the encoded data to
coding when applying network coding in multiple untrusted
any cloud database. Let c̃i (i = 1, . . . , p) be the encoded data
clouds. The objective of this work is to develop a systematic
vector stored in the i-th cloud database. A legitimate user can
design methodology of a network-coded cloud storage system.
collect c̃i from the cloud databases and obtain the original
Similar methodology for the joint coding and placement prob-
data by performing A−1 c.
lem can be found in [31]–[34]. The authors of [31] considered
We consider the security threat from an eavesdropper having
the relations among the clouds during the encoding process
infinite computing power and the knowledge of encoding
and proposed an encoding-aware data placement scheme to
matrix, but access less than half of the cloud databases [4]. The
achieve throughput gains of encoding operations. An adaptive
objective of the eavesdropper is to guess the original data. The
network coding storage scheme was proposed in [32]. The
considered cloud storage system can support different security
encoding strategy is adjusted according to the transmission
levels in different databases [35]. Define Pei as the probability
conditions (e.g., packet loss rate). However, the storage cost of
that the i-th cloud database is compromised. Also, the cloud
the coded data is not considered. In [33], the authors proposed
user specifies a security requirement Pu , which represents
to encode chunks using binary addition and bitwise cyclic
the maximum probability that an eavesdropper can guess the
shift in order to reduce encoding complexity. It is shown
original data. Next, we will show the overflow problem when
that the optimal tradeoff between storage capacity and repair
distributing encoded symbols to multiple cloud databases.
bandwidth can be achieved. The most relevant one to our
work is [34]. It investigated how to store data reliably in
multiple clouds and provided the optimal amount of data to B. Overflow Problem
be stored in the clouds. The storage cost is shown to be Although network coding scheme can prevent eavesdroppers
highly affected by the potential number of colluding cloud from obtaining the information of the original data [1], the
databases. However, the number of colluding cloud databases length of encoded data in digital format may become larger
in [34] is assumed to be known, which is impractical in than the length of the original data. This phenomenon is called
many applications. Compared with these previous works, our overflow in this paper and is formally defined as follows.
proposed methodology has the following unique features.
1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2870890, IEEE
Transactions on Parallel and Distributed Systems
k
Definition 1 (Strictly Non-overflow) Let ld (a) be the num- Secondly, we assume that si > log2 d . We take exponentia-
ber of digits that represents a in base d. A piece of encoded tion with base d on both sides and we have dsi > dlogd 2 = 2k
k
data c = (c1 , . . . , cn )T is strictly non-overflow if and only if from (1). Since bi = dsi contradicts the fact that the maximum
ld (ci ) ≤ ld (bi ) for each i. Note that the length of the encoded value of bi is 2k − 1, si = s = logk d .
2
data is equal to that of the plaintext for a strictly non-overflow
encoding process. Theorem 2 The NCSS system is α-bounded non-overflow if
si ≥ α1 logd (2k − 1) for every i.
Definition 2 (α-bounded Non-overflow) Let |c̃i | denote the Proof: Since si = ld (bi ) and ld (ci )max = logd (2k − 1),
number of elements in c̃i . A piece of encoded data c = we have
(c1 , . . . , cn )T is α-bounded Non-overflow if and only if
|c̃i |
X
|c̃i |
X ld (cj ) ≤ |c̃i | logd (2k − 1)
ld (cj ) ≤ |c̃i | αld (bi ) , j=1
j=1 1
=α · |c̃i | logd (2k − 1)
for 1 ≤ i ≤ p. α
≤α |c̃i | si
Assume the encoded data are stored in cloud databases
=α |c̃i | ld (bi ) . (3)
randomly. The increasing cost of storage or computation
resources can be measured by the extension degree α = lldd (ci)
(bi ) .
Table II shows an example for the two different overflow cases Theorems 1 and 2 provide the criteria of selecting the length
with d = 2 and p = 2. The extension degree is bounded by 3 of the plaintext. Next, we discuss the relation between the
in case 2, compared to the strictly non-overflow case 1. Note security requirement and the amount of encoded stored data.
that all the coding operations in the example are performed in
Theorem 3 The NCSS system satisfies the security require-
Galois field GF(23 ), constructed with the primitive polynomial
ment Pu if
P (x) = x3 + x + 1. Table III summarizes the notations used
in this paper. |c̃i | n
X X Pu
ld (c̃i (j)) ≤ ld (ct ) + logd ,
j=1 t=1
P ei
IV. OVERFLOW- AVOIDANCE NCSS S YSTEM
A. Overflow Analysis for 1 ≤ i ≤ p.
Now we analyze the conditions that cause the overflow Proof: Without loss of generality, we consider an
problem of a network-coded cloud storage system. Then we eavesdropper that can access only one of the two cloud
show how the overflow problem can be avoided by selecting databases. Thus, the probability that an eavesdropper can
the proper data length in encoding process. We investigate guess the original data (denoted by Pg ) is the product of the
the conditions of distributing coded data for achieving various intrusion probability of the cloud database and the probability
security levels. Based on the above analysis, we describe the of guessing the remaining encoded digits. It follows that
system design methods of the NCSS scheme.
n |P
c̃i |
P
The encoding parameters in NCSS is related to the overflow − ld (ct )− ld (c̃i (j))
t=1 j=1
problem. To avoid the overflow problem, the encoding param- Pg =Pei d
eters can be designed according to the following Theorems. ≤ Pei dlogd Pu −logd Pei
Theorem 1 Let si be the number of digits in the base-d = Pu . (4)
plaintext bi and 2k be the Galois field size of encoding
matrix A. Then, the NCSS system is strictly non-overflow if
si = s = logk d .
2
B. Proposed Scheme
k
Proof: First, we assume that si < log2 d . Then, we have
Now we present our proposed overflow-avoidance NCSS
k scheme with the required security level Pu . Our proposed
= klogd 2 = logd 2k . (1) scheme is executed in three steps. First, a dynamic-length
log2 d
alphabet representation of network-coded data is adopted
Because the coding process deals with integers, we have si ≤ based on Theorems 1 and 2. Second, the original data are
logd (2k − 1). Since ci is distributed over 0, . . . , 2k − 1 , preprocessed and regrouped. Third, the regrouped data are
the maximum number of digits used to represent an encoded encoded and distributed to the distributedly located cloud
element is ld (ci )max = logd (2k −1). Furthermore, the number databases.
of digits in bi can be represented as ld (bi ). Thus, we have Figure 2 shows the system flow of the proposed overflow-
si = ld (bi ) ≤ logd (2k − 1) = ld (ci )max . (2) avoidance NCSS scheme. Assume that a cloud user wants to
store a single-digit data array b = (b1 , . . . , bm )T with base d
As a result, the length of encoded data can be larger than the to the p cloud databases. We first choose a power k for the
length of the original data, and the overflow problem occurs. field characteristics according to the following condition:
1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2870890, IEEE
Transactions on Parallel and Distributed Systems
TABLE II
E XAMPLE OF THE DEFINITIONS FOR OVERFLOW PROBLEM
TABLE III
N OTATIONS IN THIS PAPER Input data b with base d
1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2870890, IEEE
Transactions on Parallel and Distributed Systems
Assume that the original data are b = (0, 0, 1, 0, 1, 1, 1, 0, 1) set of rows from the i-th to the j-th position of matrix
and the encoded data are stored to two cloud databases with D is represented as Di:j . In addition, bi are independent
1
Pe1 = 0.5, Pe2 = 0.25, and Pu = 64 . According to Theorem random variables uniformly distributed over Fq with entropy
1, we have s = 3. Hence, the original data are regrouped to H(bi ) = H(b).
(001, 011, 101) in the dynamic length alphabet representation For simplicity, without loss of generality, assume that t
process. The resulting coded data is (111, 011, 001). Next, contiguous components of the encoded data cp+1:p+t are
from Theorem 3, we can calculate the maximal numbers of stored to the clouds. Then we can obtain
digits that can be stored in the first and the second cloud
H(b(w) )
database are four and five, respectively. As a result, the coded
data stored in the first and the second cloud database are 1110 = H(b(w) |cp+1:p+t ) − H(b(w) |c) (9)
(w) (w)
and 11001, respectively. = I(b ; c) − I(b ; cp+1:p+t )
= H(c) − H(cp+1:p+t ) − H(c|b(w) ) + H(cp+1:p+t |b(w) )
V. S ECURITY A NALYSIS ≤ H(c) − H(cp+1:p+t ) . (10)
In this section, we analyze the proposed overflow-avoidance
NCSS scheme in terms of security level and storage cost. First, In the above equations, (9) holds because of the perfect secrecy
we discuss the issue of enhancing security level from a system criterion and due to the fact that the secret information can be
design aspect. Then, we derive the upper bound on data size reconstructed if the entire codewords are given. In (10), we
that can be stored in the cloud with unconditional security. have H(cp+1:p+t |b(w) ) − H(c|b(w) ) ≤ 0 since
To begin with, from (4) we know that the lower bound of H(c|b(w) ) − H(cp+1:p+t |b(w) ) = H(cp+t+1:n |b(w) , cp+1:p+t ) .
the security requirement Pu is
Since bi are i.i.d random variables, it follows that
n
P |P
c̃i |
−
H(b(w) ) = H bq(1) , bq(2) , . . . , bq(w)
ld (ct )− ld (c̃i (j))
t=1 j=1
Pei d ≤ Pu . (8)
= wH(b) , (11)
Since ld (ct ) is proportional to the size of Galois field, a larger
where q(j) is the j-th element of a random integer sequence
encoding matrix size n and a large value of power k of
ranged from 1 to n. Because the encoded data vector c
the field characteristics can result in higher security levels.
contains the entire information of b at most, we can obtain
However, enlarging encoding parameters causes higher coding
complexity. Next, we show that the security level can be H(c) ≤ nH(b) . (12)
enhanced to unconditional security level by storing a certain
Moreover, an n × n Vandermonde matrix A is nonsingular
amount of encoded data in the local machine. In the considered
[5]. Thus, the eavesdropper can apply Gaussian elimination
NCSS with eavesdropper, unconditional security is equivalent
to obtain the reduced row echelon form of the submatrix S,
to perfect secrecy, which means that the eavesdropper can get
whose elements are [Si,j ] = [Ai,j ] for p + 1 ≤ i, j ≤ p + t.
no information from the original message [36].
The Eavesdropper Reduced Matrix M can be obtained as
Definition 3 (Perfect Secrecy Criterion [37]) Denote S as
mp1 mpn
... | | ...
the random variable associated with the secret data fragments .. .. | I | .. ..
Mp+1:p+t = ,
and E as the random variable associated with the encoded . . t . .
fragments observed by the eavesdropper. The perfect secrecy m1p+t−1
... | | ... mnp+t−1
requires (13)
H(S|E) = H(S) , where the other element of M are the same as A. Hence, the
eavesdropper have t equations to solve n unknown elements.
where H(X) represents the entropy of a random variable X. It implies that
In the worst case, an eavesdropper can access the encoded H(cp+1:p+t ) = tH(b) . (14)
data of all the cloud databases. The following theorem can
be applied to specify the maximal amount of encoded data Substituting (11), (14) and (12) into (10), we obtain
fragments that can be stored in the cloud, while keeping the
tH(b) ≤ nH(b) − wH(b) . (15)
rest of data in a local machine to ensure perfect secrecy.
The above equation shows that we can store at most the
Theorem 4 Assume that w-digit secret information is en-
n−w components of encoded data to the clouds under perfect
coded with (n−w)-digit data b. For both strictly non-overflow
secrecy criterion. For the strictly non-overflow scheme, we
and α-bounded non-overflow schemes, a cloud user can store
n have only one digit in each component of encoded data.
n
P
at most ld (cj ) − w digits of encoded data to the cloud P
j=1 Thus, we can store at most ld (cj ) − w digits of encoded
under the perfect secrecy criterion. j=1
data to the clouds, while keeping the remaining w digits in
Proof: Let e(h) represent a subset containing any h the local machines. However, we may have multiple digits
components of vector e. We denote ei:j as the subvector in each component of encoded data for α-bounded non-
formed from the i-th to the j-th position of vector e. The overflow scheme. Let e(h̃) represent a subset containing any w
1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2870890, IEEE
Transactions on Parallel and Distributed Systems
TABLE IV
E XAMPLE OF ADOPTING OVERFLOW- AVOIDANCE NCSS SCHEME IN STORING ENCODED DATA TO TWO CLOUD DATABASES
b d k s b0 r n A c c̃
1 1 1
(0, 0, 1, 0, 1, 1, 1, 0, 1) 2 3 3 (001, 011, 101) 3 3 (111, 011, 001) (1110, 11001)
1 2 3
1 4 5
fragmentary components of vector e. With at least n unknown Subject to the security requirement Pu , the storage cost
digits, knowing c(w̃) cannot help solve b. As a result, it follows minimization problem can be expressed as
that
min f (n, l) (20a)
I c(w̃) ; b = 0 . (16) s.t. (1 − Pe )p d−αl 6 Pu (20b)
2 6 n 6 2k (20c)
Note that we still have t equations to solve n unknown
l6n (20d)
elements. That is,
α×n×s=m (20e)
H(b(w) |cp+1:p+t , c(w̃) ) = H(b(w) |cp+1:p+t ) . (17) n, l ∈ Z+ , (20f)
Finally, we obtain where s is defined in Theorem 1. An eavesdropper can guess
the original message only if he/she can intrude all the cloud
I cp+1:p+t , c(w̃) ; b(w) = I cp+1:p+t ; b(w) . (18) databases and guess the encoded data in the local machine. It
is observed that the optimization problem is nonconvex even
Consequently, we can select w digits of encoded data from if we relax the noncovex constraints n, l ∈ Z+ . The complete
different w components, i.e., select one digit for each compo- algorithm for solving this optimization problem is given in
nent. These w-digit encoded data can be stored in the local Appendix.
n
machines, while the remaining
P
ld (cj ) − w digits are stored Figure 4 shows the optimal parameter setting for encoding
j=1 matrix size n versus the original message length m for d = 2,
to the clouds. Pe = 0.5, p = 3, and Pu = 10−6 . As the message length
increases, the size of the encoding matrix increases. A smaller
encoding matrix size is preferred if Galois field size is large.
VI. S TORAGE A NALYSIS
Due to the integer constraints in the optimization problem, the
We here analyze the amount of stored encoded data with encoding matrix size increases in a step-like function.
the security requirement in terms of the probability that an Figure 5 shows the storage cost f (n, l) versus message
eavesdropper can obtain the original data. This is because only length m for d = 2, Pe = 0.5, and p = 3. Intuitively, we need
a certain amount of encoded data fragments are stored in the more storage for lower Pu . However, the storage cost with
local machines to enhance the security level, as shown in the various Pu are the same when m exceeds a certain threshold.
previous section. As the required security level increases, the This is because the considered system is in the case of lower
amount of encoded data stored at the local site increases. bound cost (i.e., l = 1). Noteworthily, a larger k can yield a
Let a cloud user keep the length-l encoded data in each smaller lower bound when m > 1000. In general, k ∈ [8, 16]
encoding operation and store the remaining encoded data to p [38]. For m < 1000, it is suggested that k = 8; otherwise,
cloud databases as shown in Fig. 3. We assume all the cloud k = 16.
databases have the same probability of being compromised The amount of stored encoded data l is another important
(i.e., Pei = Pe ) and the security requirement is Pu , which design parameter for the proposed NCSS. In practice, the
specifies the maximum probability that an eavesdropper can NCSS system with large l requires a large memory to store
guess the original message. In addition to the encoded data, all the coding coefficients. Fig. 6 shows the required l under
the encoding matrix is stored at the local site. different Pu . To achieve a higher security requirement, the user
Let m and α be the length of the original message and the needs to store more encoded data in the local site. In addition,
number of encoding operations, respectively. In addition to the l can be reduced up to 80% if a large file is encoded. It is
encoded data, the user needs to keep the encoding matrix for observed that the file size plays a bigger role in determining
decoding. In case of strictly non-overflow storage, the storage l compared to the Galois field size.
cost at the local site is the function of encoding matrix size Noth that we consider a secure network coding system with
n and the amount of stored encoded data l. As a result, the no redundancy as in [1], i.e., n input symbols are encoded
storage space used to store the encoded data and the encoding to n coded symbols, and we need all the n coded symbols
matrix at the local site is to recover the data. As shown in [12], network coding can
achieve optimal storage-bandwidth tradeoff in erasure coded-
f (n, l) = n2 s + αl . (19) distributed storage systems. The proposed scheme can be
1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2870890, IEEE
Transactions on Parallel and Distributed Systems
Cloud 30
k=8, m=50 MB
Original File k=8, m=500 MB
Split k=16, m=50 MB
Original Symbols 25 k=16, m=500 MB
Encode
Encoded Data
20
l(MB)
15
Encoding Matrix =
10
User
0
10-16 10-14 10-12 10-10 10-8 10-6 10-4 10-2 100
Fig. 3. An illustrative example of a user keeping a certain amount of encoded Pu
data at the local site in order to enhance security protection.
Fig. 6. The amount of stored encoded data l versus security requirement Pu
for different message lengths m.
10
VII. E XPERIMENTAL R ESULTS
8
Since the encoding process is performed at local machines,
6 processing delay may be the performance bottlenecks. Thus,
it is of importance to investigate the impacts of the system
4 design parameters of a secure network coding scheme on its
0 20 40 60 80 100
m(KB)
delay performance. To implement the user application and
cloud storage, we develop the coding layer and storage layer of
Fig. 4. Optimal parameter setting for encoding matrix size versus message
length under different Galois field sizes 2k .
NCSS. Each original file is associated with the metadata which
includes the coding information (e.g., encoding coefficients).
The goal of our experiments is to explore the encoding per-
formance of the proposed NCSS in terms of the file encoding
time and the storage cost. Our experiments are conducted on a
commodity computer with an Intel Core i5 processor running
140
k=8, Pu=2
10 at 2.4 GHz, 8 GB of RAM, and a 5,400 RPM Hitachi 500
k=16, Pu=210 GB Serial ATA drive with an 8 MB buffer. Table V shows the
120 k=8, Pu=220
k=16, Pu=220
parameters setting for experiments. Note that, in our setting,
100 different cloud databases are geographically separated. Hence,
the presented results are equivalent to those with p clouds, each
f(bits)
60 TABLE V
PARAMETER S ETTING
40
20 Parameter Value
0 500 1000 1500
m(bits) Original file size 2 MB
Base of bi (d) 2
Fig. 5. Storage cost versus message length for different Galois field sizes Galois field size (k) 8 to 16
2k and security requirement Pu . Number of cloud databases (p) 2 or 3
Probability of the cloud databases being compromised (Pe ) 0.5
Security requirement (Pu ) 10−6
1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2870890, IEEE
Transactions on Parallel and Distributed Systems
7 0.8
strictly non−overflow (k = 8)
GF(210) strictly non−overflow (k = 16)
0.7
6 GF(25) α−bouned overflow (α=5, k = 8)
α−bouned overflow (α=5, k = 16)
Processing Time (Seconds)
0.6
0.5
4
0.4
3
0.3
2
0.2
1
0.1
0
0 1 2 3 4 5
0
Number of Multiplications (105 times) 20 40 60 80 100 120 140
Matrix Size n
Fig. 7. Processing time versus the multiplication times for different Galois Fig. 8. Comparison of processing time between the strictly non-overflow
fields 2k . and the α-bounded non-overflow schemes versus matrix size n with p = 2.
0.8
strictly non−overflow (n = 15)
We begin by estimating the cost of basic field operation. strictly non−overflow (n = 31)
0.7
Fig. 7 shows the multiplication processing time of the network strictly non−overflow (n = 127)
coding storage system with different sizes of Galois field. α−bounded non−overflow (α=5, n = 15)
0.6
Processing Time (minute) α−bounded non−overflow (α=5, n = 31)
Although the complexity for the network coding is O(n2 ) α−bounded non−overflow (α=5, n = 127)
0.5
modular multiplication, we find that the field size only af-
fects the processing time slightly, which supports our design 0.4
methodology of selecting k. Specifically, it indicates that the
0.3
security level can be enhanced significantly by selecting an
appropriate value of k at a small computational cost. 0.2
To evaluate the computational efficiency of the proposed
0.1
NCSS scheme, we conduct an encoding test using the pro-
posed network coding scheme. Fig. 8 shows the processing 0
8 9 10 11 12 13 14 15 16
time between the strictly non-overflow and the α-bounded Power of GF Characteristic k
1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2870890, IEEE
Transactions on Parallel and Distributed Systems
10
and file recovery, which is an interesting topic to study further [24] Y.-J. Chen, L.-C. Wang, and C.-H. Liao, “Eavesdropping prevention for
in the future. network coding encrypted cloud storage systems,” IEEE Transactions
on Parallel and Distributed Systems, vol. 27, pp. 2261–2273, 2016.
[25] H. C. Chen and P. P. Lee, “Enabling data integrity protection in
regenerating-coding-based cloud storage: Theory and implementation,”
R EFERENCES IEEE transactions on parallel and distributed systems, vol. 25, no. 2,
pp. 407–416, 2014.
[1] P. F. Oliveira, L. Lima, T. T. V. Vinhoza, J. Barros, and M. Medard, [26] F. Chen, T. Xiang, Y. Yang, and S. S. Chow, “Secure cloud storage
“Trusted storage over untrusted networks,” IEEE Global Communication meets with secure network coding,” IEEE Transactions on Computers,
Conference, 2010. vol. 65, no. 6, pp. 1936–1948, 2016.
[2] H. C. H. Chen, Y. Hu, P. P. C. Lee, and Y. Tang, “NCCloud: A network- [27] P. Chau, T. D. Bui, Y. Lee, and J. Shin, “Efficient data uploading
coding-based storage system in a cloud-of-clouds,” IEEE Transactions based on network coding in LTE-Advanced heterogeneous networks,”
on Computers, vol. 63, no. 1, pp. 31–44, 2014. IEEE International Conference on Advanced Communication Technol-
[3] F. Chen, T. Xiang, Y. Yang, and S. S. M. Chow, “Secure cloud storage ogy (ICACT), pp. 252–257, 2017.
meets with secure network coding,” IEEE Transactions on Computers, [28] S. Wunderlich, J. A. Cabrera, F. H. P. Fitzek, and M. Reisslein, “Network
vol. 65, no. 6, pp. 1936–1948, 2016. coding in heterogeneous multicore IoT nodes with DAG scheduling
[4] P. F. Oliveira, L. Lima, T. T. Vinhoza, J. Barros, and M. Medard, of parallel matrix block operations,” IEEE Internet of Things Journal,
“Coding for trusted storage in untrusted networks,” IEEE Transactions vol. 4, no. 4, pp. 917–933, 2017.
on Information Forensics and Security, vol. 7, no. 6, pp. 1890–1899, [29] S. Yang and R. W. Yeung, “Batched sparse codes,” IEEE Transactions
2012. on Information Theory, vol. 60, no. 9, pp. 5322–5346, 2014.
[5] A. Klinger, “The Vandermonde matrix,” The American Mathematical [30] B. Tang and S. Yang, “An LDPC approach for chunked network codes,”
Monthly, 1967. IEEE/ACM Transactions on Networking, vol. 26, no. 1, pp. 605–617,
[6] P. Li, S. Guo, S. Yu, and A. V. Vasilakos, “Reliable multicast with 2018.
pipelined network coding using opportunistic feeding and routing,” IEEE [31] R. Li, Y. Hu, and P. P. Lee, “Enabling efficient and reliable transition
Transactions on Parallel and Distributed Systems, vol. 25, no. 12, pp. from replication to erasure coding for clustered file systems,” IEEE
3264–3273, 2014. Transactions on Parallel and Distributed Systems, vol. PP, no. 99, pp.
[7] W. Qiao, J. Li, and J. Ren, “An efficient error-detection and error- 1–1, 2017.
correction scheme for network coding,” IEEE Global Telecommunica- [32] J. Li, Y. Liu, Z. Zhang, J. Ren, and N. Zhao, “Towards green IoT
tions Conference, pp. 1–5, 2011. networking: Performance optimization of network coding based com-
[8] D. Zeng, S. Guo, Y. Xiang, and H. Jin, “On the throughput of two-way munication and reliable storage,” IEEE Access, vol. 5, pp. 8780–8791,
relay networks using network coding,” IEEE Transactions on Parallel 2017.
and Distributed Systems, vol. 25, no. 1, pp. 191–199, 2014. [33] H. Hou, K. W. Shum, M. Chen, and H. Li, “BASIC codes: Low-
[9] Y. Wu and S.-Y. Kung, “Distributed utility maximization for network complexity regenerating codes for distributed storage systems,” IEEE
coding based multicasting: A shortest path approach,” IEEE Journal on Transactions on Information Theory, vol. 62, no. 6, pp. 3053–3069,
Selected Areas in Communications, vol. 24, no. 8, pp. 1475–1488, 2006. 2016.
[10] C. Fragouli and J. L. Boudec, “Network coding: An instant primer,” [34] P. Hu, C. W. Sung, S.-W. Ho, and T. H. Chan, “Optimal coding and
ACM SIGCOMM Computer, vol. 36, no. 1, pp. 63–68, 2006. allocation for perfect secrecy in multiple clouds,” IEEE Transactions on
Information Forensics and Security, vol. 11, no. 2, pp. 388–399, 2016.
[11] Y. Hu, H. Chen, P. Lee, and Y. Tang, “NCCloud: Applying network
[35] M. Barua, X. Liang, R. Lu, and X. Shen, “ESPAC: Enabling security
coding for the storage repair in a cloud-of-clouds,” in Proc. of the 10th
and patient-centric access control for eHealth in cloud computing,”
USENIX Conf. on File and Storage Tech, vol. 1, 2012.
International Journal of Security and Networks, vol. 6, no. 2, pp. 67–76,
[12] A. Dimakis, P. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran,
2011.
“Network coding for distributed storage systems,” IEEE Transactions
[36] D. Chen, N. Zhang, R. Lu, X. Fang, K. Zhang, Z. Qin, and X. Shen, “An
on Information Theory, vol. 56, no. 9, pp. 4539–4551, 2010.
LDPC code based physical layer message authentication scheme with
[13] S.-J. Lin and W.-H. Chung, “Novel repair-by-transfer codes and system-
prefect security,” IEEE Journal on Selected Areas in Communications,
atic exact-MBR codes with lower complexities and smaller field sizes,”
vol. 36, no. 4, pp. 748–761, 2018.
IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 12,
[37] J. L. Massey, “An introduction to contemporary cryptology,” Proceed-
pp. 3232–3241, 2014.
ings of the IEEE, vol. 76, no. 5, pp. 533–549, 1988.
[14] Y. Lu, J. Hao, X.-J. Liu, and S.-T. Xia, “Network coding for data- [38] G. Angelopoulos, M. Médard, and A. P. Chandrakasan, “Energy-aware
retrieving in cloud storage systems,” International Symposium on Net- hardware implementation of network coding,” International Conference
work Coding, pp. 51–55, 2015. on Research in Networking, pp. 137–144, 2011.
[15] H. Zhang, H. Li, and S.-Y. Li, “Repair tree: Fast repair for single failure
in erasure-coded distributed storage systems,” IEEE Transactions on
Parallel and Distributed Systems, vol. 28, no. 6, pp. 1728–1739, 2017.
[16] J. Li and B. Li, “Beehive: Erasure codes for fixing multiple failures
in distributed storage systems,” IEEE Transactions on Parallel and
Distributed Systems, vol. 28, no. 5, pp. 1257–1270, 2017.
[17] L. Ozarow and A. Wyner, “Wire-tap channel II,” Advances in Cryptol-
ogy, pp. 33–50, 1985.
[18] N. Cai and R. Yeung, “Secure network coding,” in IEEE International
Symposium on Information Theory, 2002.
[19] N. Cai and R. W. Yeung, “Secure network coding on a wiretap network,”
IEEE Transactions on Information Theory, vol. 57, no. 1, pp. 424–435,
2011.
[20] A. S. Rawat, N. Silberstein, O. O. Koyluoglu, and S. Vishwanath, “Se-
cure distributed storage systems: Local repair with minimum bandwidth
regeneration,” International Symposium on Communications, Control
and Signal Processing, pp. 5–8, 2014.
[21] R. Tandon, S. Amuru, T. C. Clancy, and R. M. Buehrer, “Toward optimal
secure distributed storage systems with exact repair,” IEEE Transactions
on Information Theory, vol. 62, no. 6, pp. 3477–3492, 2016.
[22] A. Agarwal and A. Mazumdar, “Security in locally repairable storage,”
IEEE Transactions on Information Theory, vol. 62, no. 11, pp. 6204–
6217, 2016.
[23] K. Huang, U. Parampalli, and M. Xian, “On secrecy capacity of min-
imum storage regenerating codes,” IEEE Transactions on Information
Theory, vol. 63, no. 3, pp. 1510–1524, 2017.
1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2018.2870890, IEEE
Transactions on Parallel and Distributed Systems
11
1045-9219 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.