Sei sulla pagina 1di 139

1-1

Shannon Coding for the Discrete Noiseless


Channel and Related Problems
Sept 16, 2009
Man DU Mordecai GOLIN Qin ZHANG
Barcelona
HKUST
2-1
This talks punchline: Shannon Coding can be
algorithmically useful
Shannon Coding was introduced by Shan-
non as a proof technique in his noiseless
coding theorem
Shannon-Fano coding is whats primarily
used for algorithm design
Overview
3-1
Outline
Human Coding and Generalizations
A Counterexample
New Work
Previous Work & Background
Open Problems
4-1
Prex-free coding
Let =
1
,
2
, . . . ,
r
be an encoding alphabet.
Word w

is a prex of word w

if w

= wu
where u

is a non-empty word. A Code over


is a collection of words C = w
1
, . . . , w
n
.
4-2
Prex-free coding
Let =
1
,
2
, . . . ,
r
be an encoding alphabet.
Word w

is a prex of word w

if w

= wu
where u

is a non-empty word. A Code over


is a collection of words C = w
1
, . . . , w
n
.
Code C is prex-free if for all i ,= j w
i
is not a prex of w
j
.
0, 10, 11 is prex-free.
0, 00, 11 isnt.
4-3
Prex-free coding
Let =
1
,
2
, . . . ,
r
be an encoding alphabet.
Word w

is a prex of word w

if w

= wu
where u

is a non-empty word. A Code over


is a collection of words C = w
1
, . . . , w
n
.
Code C is prex-free if for all i ,= j w
i
is not a prex of w
j
.
0, 10, 11 is prex-free.
0, 00, 11 isnt.
0 1
0 0
0 0
1
1
1
1
w
1
w
2
w
3
w
4
w
5
w
6
w
1
= 00
w
2
= 010
w
3
= 011
w
4
= 10
w
5
= 110
w
6
= 111
A prex-free code can be modelled as (leaves of) a tree
5-1
The prex coding problem
Dene cost(C) =

n
i=1
cost(w
i
)p
i
Let cost(w) be the length or number of characters
in w. Let P = p
1
, p
2
, . . . , p
n
be a xed discrete
probability distribution (P.D.).
5-2
The prex coding problem
Dene cost(C) =

n
i=1
cost(w
i
)p
i
Let cost(w) be the length or number of characters
in w. Let P = p
1
, p
2
, . . . , p
n
be a xed discrete
probability distribution (P.D.).
The prex coding problem, sometimes known as the
Human encoding problem is to nd a prex-free code
over of minimum cost.
5-3
The prex coding problem
Dene cost(C) =

n
i=1
cost(w
i
)p
i
Let cost(w) be the length or number of characters
in w. Let P = p
1
, p
2
, . . . , p
n
be a xed discrete
probability distribution (P.D.).
The prex coding problem, sometimes known as the
Human encoding problem is to nd a prex-free code
over of minimum cost.
0 1
0 0
0
0
1
1
1
1
1
4
1
8
1
8
1
4
1
8
1
8
Equivalent to nding tree with
minimum external path-length
5-4
The prex coding problem
Dene cost(C) =

n
i=1
cost(w
i
)p
i
Let cost(w) be the length or number of characters
in w. Let P = p
1
, p
2
, . . . , p
n
be a xed discrete
probability distribution (P.D.).
The prex coding problem, sometimes known as the
Human encoding problem is to nd a prex-free code
over of minimum cost.
0 1
0 0
0
0
1
1
1
1
1
4
1
8
1
8
1
4
1
8
1
8
Equivalent to nding tree with
minimum external path-length
2

1
4
+
1
4

+ 3

1
8
+
1
8
+
1
8
+
1
8

6-1
The prex coding problem
Useful for Data transmission/storage.
Modelling search problems
Very well studied
7-1
Whats known
Sub-optimal codes
Shannon coding: (from noiseless coding theorem)
There exists a prex-free code with word lengths

i
= log
r
p
i
|, i = 1, 2, . . . , n.
Shannon-Fano coding: probability splitting
Try to put
1
r
of the probability in each node.
7-2
Whats known
Sub-optimal codes
Shannon coding: (from noiseless coding theorem)
There exists a prex-free code with word lengths

i
= log
r
p
i
|, i = 1, 2, . . . , n.
Shannon-Fano coding: probability splitting
Try to put
1
r
of the probability in each node.
Both methods have cost within 1 of optimal
7-3
Whats known
Human 1952: a well-know O(rnlog n)-time greedy-
algorithm (O(rn)-time if the p
i
are sorted in non-
decreasing order)
Optimal codes
Sub-optimal codes
Shannon coding: (from noiseless coding theorem)
There exists a prex-free code with word lengths

i
= log
r
p
i
|, i = 1, 2, . . . , n.
Shannon-Fano coding: probability splitting
Try to put
1
r
of the probability in each node.
Both methods have cost within 1 of optimal
8-1
Whats not as well known
The fact that the greedy Human algorithm works
is quite amazing
Almost any possible modication or generalization to
the original problem causes greedy to fail
For some simple modications, we dont even have
polynomial time algorithms.
9-1
Generalizations: Min cost prex coding
Unequal-cost coding
Allow letters to have dierent costs, say, c(
j
) = c
j
.
Discrete Noiseless Channels (in Shannons original paper)
This can be viewed as a strongly connected aperiodic directed
graph with k vertices (states).
1. Each edge leaving a vertex is labelled by an encoding letter
, with at most one -edge leaving each vertex.
2. An edge labelled by leaving vertex i has cost c
i,
.
Language restrictions
Require all codewords to be contained in some given Language L
10-1
Generalizations: Prex-free coding ....
With Unequal-cost letters
a,1
a,1
a,1
b,2
b,2
b,2
2/6, aaa, 3
1/6, aab, 4
1/6, ab, 3
2/6, b, 2
c
1
= 1; c
2
= 2.
p
i
, w
i
, c(w
i
)
10-2
Generalizations: Prex-free coding ....
With Unequal-cost letters
a,1
a,1
a,1
b,2
b,2
b,2
2/6, aaa, 3
1/6, aab, 4
1/6, ab, 3
2/6, b, 2
c
1
= 1; c
2
= 2.
Corresponds to dierent letter transmission/storage costs, e.g.,
the Telegraph Channel.
Also, to dierent costs for evaluating test outcomes in, e.g.,
group testing.
p
i
, w
i
, c(w
i
)
10-3
Generalizations: Prex-free coding ....
With Unequal-cost letters
a,1
a,1
a,1
b,2
b,2
b,2
2/6, aaa, 3
1/6, aab, 4
1/6, ab, 3
2/6, b, 2
c
1
= 1; c
2
= 2.
Corresponds to dierent letter transmission/storage costs, e.g.,
the Telegraph Channel.
Also, to dierent costs for evaluating test outcomes in, e.g.,
group testing.
Size of encoding alphabet, , could be countably innite!
p
i
, w
i
, c(w
i
)
11-1
Generalizations: Prex-free coding ....
In a Discrete Noiseless Channel
S
2
S
1
S
3
a, 1
b, 2
b, 3
b, 3
a, 2
a, 1
start
S
1
S
2
S
3
S
3
S
1
S
2
S
3
S
2
S
3
a, 1
a, 1
a, 1
b, 2
a, 2
b, 3
b, 3
b, 3
1/6, aaa, 4
2/6, b, 3
1/6, abb, 5 1/6, aab, 5
1/6, aba, 4
11-2
Generalizations: Prex-free coding ....
In a Discrete Noiseless Channel
S
2
S
1
S
3
a, 1
b, 2
b, 3
b, 3
a, 2
a, 1
start
S
1
S
2
S
3
S
3
S
1
S
2
S
3
S
2
S
3
a, 1
a, 1
a, 1
b, 2
a, 2
b, 3
b, 3
b, 3
1/6, aaa, 4
2/6, b, 3
1/6, abb, 5 1/6, aab, 5
1/6, aba, 4
Cost of letter depends upon current state.
In Shannons original paper, k = # states and [[ are both nite
11-3
Generalizations: Prex-free coding ....
In a Discrete Noiseless Channel
S
2
S
1
S
3
a, 1
b, 2
b, 3
b, 3
a, 2
a, 1
start
S
1
S
2
S
3
S
3
S
1
S
2
S
3
S
2
S
3
a, 1
a, 1
a, 1
b, 2
a, 2
b, 3
b, 3
b, 3
1/6, aaa, 4
2/6, b, 3
1/6, abb, 5 1/6, aab, 5
1/6, aba, 4
Cost of letter depends upon current state.
In Shannons original paper, k = # states and [[ are both nite
A codeword has both start and end states. In coded message,
new codeword must start from nal state of preceeding one.
11-4
Generalizations: Prex-free coding ....
In a Discrete Noiseless Channel
S
2
S
1
S
3
a, 1
b, 2
b, 3
b, 3
a, 2
a, 1
start
S
1
S
2
S
3
S
3
S
1
S
2
S
3
S
2
S
3
a, 1
a, 1
a, 1
b, 2
a, 2
b, 3
b, 3
b, 3
1/6, aaa, 4
2/6, b, 3
1/6, abb, 5 1/6, aab, 5
1/6, aba, 4
Cost of letter depends upon current state.
In Shannons original paper, k = # states and [[ are both nite
A codeword has both start and end states. In coded message,
new codeword must start from nal state of preceeding one.
Need k code trees; each one rooted with dierent state
12-1
Generalizations: Prex-free coding ....
With Language Restrictions
12-2
Generalizations: Prex-free coding ....
With Language Restrictions
Find min-cost prex code in which all words belong to given
language L.
12-3
Generalizations: Prex-free coding ....
With Language Restrictions
Find min-cost prex code in which all words belong to given
language L.
Example: L = 0

1, all binary words ending in 1.


Used in constructing self-synchronizing codes.
12-4
Generalizations: Prex-free coding ....
With Language Restrictions
Find min-cost prex code in which all words belong to given
language L.
Example: L = 0

1, all binary words ending in 1.


Used in constructing self-synchronizing codes.
One of the problems that motivated this research.
Let L be the set of all binary words that do not contain a given
pattern, e.g., 010.
No previous good way of nding min cost prex code with such
restrictions.
13-1
Generalizations: Prex-free coding ....
With Regular Language Restrictions
13-2
Generalizations: Prex-free coding ....
With Regular Language Restrictions
In this case, there is a DFA / accepting Language L.
13-3
Generalizations: Prex-free coding ....
With Regular Language Restrictions
In this case, there is a DFA / accepting Language L.
S
1
S
4
1
1
1
1
0 0 0
0
S
2
S
3
L = ((0 + 1)

000)
C
= binary strings not ending in 000
13-4
Generalizations: Prex-free coding ....
With Regular Language Restrictions
In this case, there is a DFA / accepting Language L.
Erasing the nonaccepting states, / can be drawn with a nite #
of states but a countably innite encoding alphabet.
S
1
S
4
1
1
1
1
0 0 0
0
S
2
S
3
L = ((0 + 1)

000)
C
= binary strings not ending in 000
S
1
1 1
1
0 0
01
0

1
S
2
S
3
13-5
Generalizations: Prex-free coding ....
With Regular Language Restrictions
In this case, there is a DFA / accepting Language L.
Erasing the nonaccepting states, / can be drawn with a nite #
of states but a countably innite encoding alphabet.
S
1
S
4
1
1
1
1
0 0 0
0
S
2
S
3
L = ((0 + 1)

000)
C
= binary strings not ending in 000
S
1
1 1
1
0 0
01
0

1
S
2
S
3
Note: graph doesnt need to
strongly connected. It might even
have sinks!
14-1
Generalizations: Prex-free coding ....
With Regular Language Restrictions
14-2
Generalizations: Prex-free coding ....
With Regular Language Restrictions
S
1
1 1
1
0 0
01
0

1
S
2
S
3
14-3
Generalizations: Prex-free coding ....
With Regular Language Restrictions
S
1
1 1
1
0 0
01
0

1
S
2
S
3
Can still be rewritten as a
min-cost tree problem
S
1
S
2
S
1
0
1
S
1
1
S
3
0
S
1
1
S
2
0
S
1
1
S
1
001
S
1
01
S
1
. . .
15-1
Outline
Human Coding and Generalizations
A Counterexample
New Work
Previous Work & Background
Open Problems
16-1
Previous Work: Unequal Cost Coding
Letters in have dierent costs c
1
c
2
c
3
c
r
.
Models dierent transmission/storage costs
16-2
Previous Work: Unequal Cost Coding
Letters in have dierent costs c
1
c
2
c
3
c
r
.
Models dierent transmission/storage costs
Karp (1961) Integer Linear Programming Solution
Blachman (1954), Marcus (1957), Gilbert (1995) Heuristics
G., Rote (1998) O(n
c
r
+2
) DP solution
Bradford, et. al. (2002), Dumitrescu(2006) O(n
c
r
)
G., Kenyon, Young (2002) A PTAS
16-3
Previous Work: Unequal Cost Coding
Letters in have dierent costs c
1
c
2
c
3
c
r
.
Still dont know if its NP-Hard, in P or something between.
Models dierent transmission/storage costs
Karp (1961) Integer Linear Programming Solution
Blachman (1954), Marcus (1957), Gilbert (1995) Heuristics
G., Rote (1998) O(n
c
r
+2
) DP solution
Bradford, et. al. (2002), Dumitrescu(2006) O(n
c
r
)
G., Kenyon, Young (2002) A PTAS
Big Open Question
16-4
Previous Work: Unequal Cost Coding
Letters in have dierent costs c
1
c
2
c
3
c
r
.
Still dont know if its NP-Hard, in P or something between.
Models dierent transmission/storage costs
Karp (1961) Integer Linear Programming Solution
Blachman (1954), Marcus (1957), Gilbert (1995) Heuristics
G., Rote (1998) O(n
c
r
+2
) DP solution
Bradford, et. al. (2002), Dumitrescu(2006) O(n
c
r
)
G., Kenyon, Young (2002) A PTAS
Big Open Question
Most Practical Solutions are arithmetic error approximations
17-1
Previous Work: Unequal Cost Coding
17-2
Previous Work: Unequal Cost Coding
Ecient algorithms (O(nlog n) or O(n)) that create
codes which are within an additive error of optimal.
COST OPT +K
17-3
Previous Work: Unequal Cost Coding
Ecient algorithms (O(nlog n) or O(n)) that create
codes which are within an additive error of optimal.
COST OPT +K
Krause (1962)
Csiszar (1969)
Cott (1977)
Altenkamp and Mehlhorn (1980)
Mehlhorn (1980)
G. and Li (2007)
17-4
Previous Work: Unequal Cost Coding
Ecient algorithms (O(nlog n) or O(n)) that create
codes which are within an additive error of optimal.
COST OPT +K
Krause (1962)
Csiszar (1969)
Cott (1977)
Altenkamp and Mehlhorn (1980)
Mehlhorn (1980)
G. and Li (2007)
K is a function of letter costs c
1
, c
2
, c
3
, . . .
K(c
1
, c
2
, c
3
, . . .) are incomparable between dierent algorithms
K is often function of longest letter length c
r
, problem when r = .
17-5
Previous Work: Unequal Cost Coding
Ecient algorithms (O(nlog n) or O(n)) that create
codes which are within an additive error of optimal.
COST OPT +K
Krause (1962)
Csiszar (1969)
Cott (1977)
Altenkamp and Mehlhorn (1980)
Mehlhorn (1980)
G. and Li (2007)
K is a function of letter costs c
1
, c
2
, c
3
, . . .
K(c
1
, c
2
, c
3
, . . .) are incomparable between dierent algorithms
All algorithms above are Shannon-Fano type codes; dier in how they
dene approximate split
18-1
Previous Work:
The Discrete Noiseless Channel: Only previous result seems to
be Csiszar (1969) who gives additive approximation to optimal
code, again using a generalization of Shannon-Fano splitting.
18-2
Previous Work:
The Discrete Noiseless Channel: Only previous result seems to
be Csiszar (1969) who gives additive approximation to optimal
code, again using a generalization of Shannon-Fano splitting.
Language Constraints
1-ended codes:
Capocelli, et.al., (1994) Berger, Yeung(1990) Exponential Search
Chan, G. (2000) O(n
3
) DP algorithm
Sound of Silence Binary Codes with at most k zeros
Dolev, et. al. (1999) n
O(k)
DP algorithm
General Regular Language Constraint
Folk theorem: If a DFA with m states accepting L, optimal code
can be built in n
O(m)
time. (O(m) 3m.)
18-3
Previous Work:
The Discrete Noiseless Channel: Only previous result seems to
be Csiszar (1969) who gives additive approximation to optimal
code, again using a generalization of Shannon-Fano splitting.
Language Constraints
1-ended codes:
Capocelli, et.al., (1994) Berger, Yeung(1990) Exponential Search
Chan, G. (2000) O(n
3
) DP algorithm
Sound of Silence Binary Codes with at most k zeros
Dolev, et. al. (1999) n
O(k)
DP algorithm
General Regular Language Constraint
Folk theorem: If a DFA with m states accepting L, optimal code
can be built in n
O(m)
time. (O(m) 3m.)
No good ecient algorithm known
19-1
Previous Work:
Pre-Human there were two Sub-optimal construc-
tions for basic case
19-2
Previous Work:
Pre-Human there were two Sub-optimal construc-
tions for basic case
Shannon coding: (from noiseless coding theorem)
There exists a prex-free code with word lengths

i
= log
r
p
i
|, i = 1, 2, . . . , n.
Shannon-Fano coding: probability splitting
Try to put
1
r
of the probability in each node.
20-1
l
2
Shannon Coding vs. Shannon-Fano Coding
Shannon Coding
l
1
l
i
= log
r
p
i
|
20-2
l
2
Shannon Coding vs. Shannon-Fano Coding
Shannon Coding
l
1
l
i
= log
r
p
i
|
Given depths l
i
, can build tree via
top-down linear scan. When
moving down a level, expand all
non-used leaves to be parents.
20-3
l
2
Shannon Coding vs. Shannon-Fano Coding
Shannon Coding
l
1
l
i
= log
r
p
i
|
Shannon-Fano Coding
p
1
, p
2
, . . . , p
n
p
1
, p
2
, . . . , p
i
1
p
i
1
+1
, . . . , p
i
2
p
i
r1
+1
, . . . , p
n
1/r
1/r 1/r
Given depths l
i
, can build tree via
top-down linear scan. When
moving down a level, expand all
non-used leaves to be parents.
21-1
Shannon Coding vs. Shannon-Fano Coding
Example: p
1
= p
2
=
1
3
, p
3
= p
4
= p
5
= p
6
=
1
12
21-2
Shannon Coding vs. Shannon-Fano Coding
Example: p
1
= p
2
=
1
3
, p
3
= p
4
= p
5
= p
6
=
1
12
Shannon coding
1
3
1
3
1
12
1
12
1
12
1
12
l
1
= l
2
= 2 = log
2
1
3

l
3
= l
4
= l
5
= l
6
= 4 =

log
2
1
12

Has empty slots


can be improved
21-3
Shannon Coding vs. Shannon-Fano Coding
Example: p
1
= p
2
=
1
3
, p
3
= p
4
= p
5
= p
6
=
1
12
Shannon coding
1
3
1
3
1
12
1
12
1
12
1
12
Shannon-Fano coding
1
3
1
3
1
12
1
12
1
12
1
12
22-1
Shannon Coding vs. Shannon-Fano Coding
Example: p
1
= p
2
=
1
3
, p
3
= p
4
= p
5
= p
6
=
1
12
Shannon-Fano: First, sort items and insert at root.
While a node contains more than 1 item, split its items weights
as evenly as possible. At most 1/2 nodes weight in left child.
22-2
Shannon Coding vs. Shannon-Fano Coding
Example: p
1
= p
2
=
1
3
, p
3
= p
4
= p
5
= p
6
=
1
12
1
3
1
3
,
1
12
,
1
12
,
1
12
,
1
12
Shannon-Fano: First, sort items and insert at root.
While a node contains more than 1 item, split its items weights
as evenly as possible. At most 1/2 nodes weight in left child.
22-3
Shannon Coding vs. Shannon-Fano Coding
Example: p
1
= p
2
=
1
3
, p
3
= p
4
= p
5
= p
6
=
1
12
1
3
1
3
,
1
12
,
1
12
,
1
12
,
1
12

1
3
1
12
,
1
12
,
1
12
,
1
12
1
3
Shannon-Fano: First, sort items and insert at root.
While a node contains more than 1 item, split its items weights
as evenly as possible. At most 1/2 nodes weight in left child.
22-4
Shannon Coding vs. Shannon-Fano Coding
Example: p
1
= p
2
=
1
3
, p
3
= p
4
= p
5
= p
6
=
1
12
1
3
1
3
,
1
12
,
1
12
,
1
12
,
1
12

1
3
1
12
,
1
12
,
1
12
,
1
12
1
3

1
3
1
12
,
1
12
1
3
1
12
,
1
12
Shannon-Fano: First, sort items and insert at root.
While a node contains more than 1 item, split its items weights
as evenly as possible. At most 1/2 nodes weight in left child.
22-5
Shannon Coding vs. Shannon-Fano Coding
Example: p
1
= p
2
=
1
3
, p
3
= p
4
= p
5
= p
6
=
1
12
1
3
1
3
,
1
12
,
1
12
,
1
12
,
1
12

1
3
1
12
,
1
12
,
1
12
,
1
12
1
3

1
3
1
12
,
1
12
1
3
1
12
,
1
12

1
3
1
3
1
12
1
12
1
12
1
12
Shannon-Fano: First, sort items and insert at root.
While a node contains more than 1 item, split its items weights
as evenly as possible. At most 1/2 nodes weight in left child.
23-1
Shannon Fano coding for unequal cost codes
p
1
, p
2
, . . . , p
n
p
1
, p
2
, . . . , p
i
1
p
i
1
+1
, . . . , p
i
2
p
i
k1
+1
, . . . , p
n

c
1

c
2
c
k
Previous Work. Unequal Cost Codes
c
1
c
2 c
k
: unique positive
root of

c
i
= 1
23-2
Shannon Fano coding for unequal cost codes
p
1
, p
2
, . . . , p
n
p
1
, p
2
, . . . , p
i
1
p
i
1
+1
, . . . , p
i
2
p
i
k1
+1
, . . . , p
n

c
1

c
2
c
k
Previous Work. Unequal Cost Codes
c
1
c
2 c
k
Split probabilities so approximately
c
i
of the probability in a
node is put into its i
th
child.
: unique positive
root of

c
i
= 1
23-3
Shannon Fano coding for unequal cost codes
p
1
, p
2
, . . . , p
n
p
1
, p
2
, . . . , p
i
1
p
i
1
+1
, . . . , p
i
2
p
i
k1
+1
, . . . , p
n

c
1

c
2
c
k
Previous Work. Unequal Cost Codes
c
1
c
2 c
k
Split probabilities so approximately
c
i
of the probability in a
node is put into its i
th
child.
: unique positive
root of

c
i
= 1
Note: This can work for innite alphabets, as long as exists.
23-4
Shannon Fano coding for unequal cost codes
p
1
, p
2
, . . . , p
n
p
1
, p
2
, . . . , p
i
1
p
i
1
+1
, . . . , p
i
2
p
i
k1
+1
, . . . , p
n

c
1

c
2
c
k
Previous Work. Unequal Cost Codes
c
1
c
2 c
k
Split probabilities so approximately
c
i
of the probability in a
node is put into its i
th
child.
: unique positive
root of

c
i
= 1
All previous algorithms were Shannon-Fano like.
They diered in how they implemented approximate split.
24-1
Shannon-Fano coding for unequal cost codes
: unique positive root of

c
i
= 1
Split probabilities so approximately
c
i
of the probability in a
node is put into its i
th
child.
24-2
Shannon-Fano coding for unequal cost codes
: unique positive root of

c
i
= 1
Split probabilities so approximately
c
i
of the probability in a
node is put into its i
th
child.
Example: Telegraph Channel: c
1
= 1, c
2
= 2

1
=

51
2
W
W/
W/
2
Put
1
of the roots weight
in the left subtree and
2
of
the weight in the right
25-1
Shannon-Fano coding for unequal cost codes
: unique positive root of

c
i
= 1
Split probabilities so approximately
c
i
of the probability in a
node is put into its i
th
child.
25-2
Shannon-Fano coding for unequal cost codes
: unique positive root of

c
i
= 1
Split probabilities so approximately
c
i
of the probability in a
node is put into its i
th
child.
Example: 1-ended coding. i > 0, c
i
= i.

c
i
= 1 gives
1
=
1
2
Put 2
i
of a nodes weight
into its i

th subtree
i

th encoding letter denotes string 0


i1
1.
0001
0001
001
001
01
01
1
1
26-1
Given coding letter lengths c = c
1
, c
2
, c
3
, . . ., gcd(c
i
) = 1,
let be the unique positive root of g(z) = 1

j

c
j
Previous Work. Well Known Lower Bound
26-2
Given coding letter lengths c = c
1
, c
2
, c
3
, . . ., gcd(c
i
) = 1,
let be the unique positive root of g(z) = 1

j

c
j
Note: sometimes called the capacity
Previous Work. Well Known Lower Bound
26-3
Given coding letter lengths c = c
1
, c
2
, c
3
, . . ., gcd(c
i
) = 1,
let be the unique positive root of g(z) = 1

j

c
j
Note: sometimes called the capacity
For given P.D. set H

p
i
log

p
i
.
Previous Work. Well Known Lower Bound
26-4
Given coding letter lengths c = c
1
, c
2
, c
3
, . . ., gcd(c
i
) = 1,
let be the unique positive root of g(z) = 1

j

c
j
Note: sometimes called the capacity
For given P.D. set H

p
i
log

p
i
.
Note: If c
1
= c
2
= 1 then = 2 and H

is standard entropy
Previous Work. Well Known Lower Bound
26-5
Given coding letter lengths c = c
1
, c
2
, c
3
, . . ., gcd(c
i
) = 1,
let be the unique positive root of g(z) = 1

j

c
j
Note: sometimes called the capacity
For given P.D. set H

p
i
log

p
i
.
Note: If c
1
= c
2
= 1 then = 2 and H

is standard entropy
Theorem:
Let OPT be cost of min-cost code for given P.D. and letter
costs. Then
H

OPT
Previous Work. Well Known Lower Bound
26-6
Given coding letter lengths c = c
1
, c
2
, c
3
, . . ., gcd(c
i
) = 1,
let be the unique positive root of g(z) = 1

j

c
j
Note: sometimes called the capacity
For given P.D. set H

p
i
log

p
i
.
Note: If c
1
= c
2
= 1 then = 2 and H

is standard entropy
Theorem:
Let OPT be cost of min-cost code for given P.D. and letter
costs. Then
H

OPT
Note: If c
1
= c
2
= 1 then = 2 and this is classic
Shannon Information Theoretic Lower Bound
Previous Work. Well Known Lower Bound
27-1
Outline
Human Coding and Generalizations
A Counterexample
New Work
Previous Work & Background
Open Problems
28-1
Shannon coding only seems to have been used in the proof of
the noiseless coding theorem. It never seems to have actually
been used as an algorithmic tool.
28-2
Shannon coding only seems to have been used in the proof of
the noiseless coding theorem. It never seems to have actually
been used as an algorithmic tool.
All of the (additive-error) approximation algoritms for unequal
cost coding and Csiszars (1969) approximation algorithm for
coding in a Discrete Noiseless Channel, were variations of
Shannon-Fano coding
28-3
The main idea behind our new results is that
Shannon-Fano splitting is not necessary;
Shannon-coding suces
Shannon coding only seems to have been used in the proof of
the noiseless coding theorem. It never seems to have actually
been used as an algorithmic tool.
All of the (additive-error) approximation algoritms for unequal
cost coding and Csiszars (1969) approximation algorithm for
coding in a Discrete Noiseless Channel, were variations of
Shannon-Fano coding
28-4
The main idea behind our new results is that
Shannon-Fano splitting is not necessary;
Shannon-coding suces
Shannon coding only seems to have been used in the proof of
the noiseless coding theorem. It never seems to have actually
been used as an algorithmic tool.
All of the (additive-error) approximation algoritms for unequal
cost coding and Csiszars (1969) approximation algorithm for
coding in a Discrete Noiseless Channel, were variations of
Shannon-Fano coding
Yields ecient additive-error approximation algorithms for un-
equal cost coding and the Discrete Noiseless Channel, as well as
for regular language constraints.
29-1
New Results for Unequal Cost Coding
29-2
New Results for Unequal Cost Coding
Given coding letter lengths c, let be capacity.
Then K > 0, depending only upon c, such that if
1. P = p
1
, p
2
, . . . , p
n
is any P.D., and
2.
1
,
2
, . . . ,
n
any set of integers such that
i,
i
K +log

p
i
|,
then there exists a prex free code for which the
i
are
the word lengths.
29-3
New Results for Unequal Cost Coding
Given coding letter lengths c, let be capacity.
Then K > 0, depending only upon c, such that if
1. P = p
1
, p
2
, . . . , p
n
is any P.D., and
2.
1
,
2
, . . . ,
n
any set of integers such that
i,
i
K +log

p
i
|,
then there exists a prex free code for which the
i
are
the word lengths.


i
p
i

i
K + 1 +H

(P) OPT +K + 1
29-4
New Results for Unequal Cost Coding
Given coding letter lengths c, let be capacity.
Then K > 0, depending only upon c, such that if
1. P = p
1
, p
2
, . . . , p
n
is any P.D., and
2.
1
,
2
, . . . ,
n
any set of integers such that
i,
i
K +log

p
i
|,
then there exists a prex free code for which the
i
are
the word lengths.


i
p
i

i
K + 1 +H

(P) OPT +K + 1
This gives an additive approximation of same type as Shannon-
Fano splitting without the splitting (same time complexity but
many fewer operations on reals).
29-5
New Results for Unequal Cost Coding
Given coding letter lengths c, let be capacity.
Then K > 0, depending only upon c, such that if
1. P = p
1
, p
2
, . . . , p
n
is any P.D., and
2.
1
,
2
, . . . ,
n
any set of integers such that
i,
i
K +log

p
i
|,
then there exists a prex free code for which the
i
are
the word lengths.


i
p
i

i
K + 1 +H

(P) OPT +K + 1
Same result holds for DNC and regular language restrictions.
is a function of the DNC or L-accepting automaton graph
30-1
Proof of the Theorem
We rst prove the following lemma.
Given c and corresponding then
> 0 depending only upon c such that if
n

i=1

i
,
then there exists a prex-free code with word lengths

1
,
2
, . . . ,
n
.
30-2
Proof of the Theorem
We rst prove the following lemma.
Given c and corresponding then
> 0 depending only upon c such that if
n

i=1

i
,
then there exists a prex-free code with word lengths

1
,
2
, . . . ,
n
.
Note: if c
1
= c
2
= 1 then = 2. Let = 1 and condition
becomes

i
1.
Lemma then becomes one direction of Kraft Inequality.
31-1
Proof of the Lemma
Let L(n) be the number of nodes on level n of the
innite tree corresponding to c
Can show t
1
, t
2
s.t., t
1

n
L(n) t
2

n
.
31-2
Proof of the Lemma
Let L(n) be the number of nodes on level n of the
innite tree corresponding to c
Can show t
1
, t
2
s.t., t
1

n
L(n) t
2

n
.
l
1
31-3
Proof of the Lemma
Let L(n) be the number of nodes on level n of the
innite tree corresponding to c
Can show t
1
, t
2
s.t., t
1

n
L(n) t
2

n
.
l
1
l
2
31-4
Proof of the Lemma
Let L(n) be the number of nodes on level n of the
innite tree corresponding to c
Can show t
1
, t
2
s.t., t
1

n
L(n) t
2

n
.
l
1
l
2
Grey regions are parts
of innite tree that are
erased when node k on
k
becomes leaf.
31-5
Proof of the Lemma
Let L(n) be the number of nodes on level n of the
innite tree corresponding to c
Can show t
1
, t
2
s.t., t
1

n
L(n) t
2

n
.
l
1
l
2
Grey regions are parts
of innite tree that are
erased when node k on
k
becomes leaf.
Node on
k
has L(
i

k
)
descendents on
i
l
i
31-6
Proof of the Lemma
Let L(n) be the number of nodes on level n of the
innite tree corresponding to c
Can show t
1
, t
2
s.t., t
1

n
L(n) t
2

n
.
l
1
l
2
Grey regions are parts
of innite tree that are
erased when node k on
k
becomes leaf.
Node on
i
can become
leaf i grey regions do not
cover all nodes on level
i
Node on
k
has L(
i

k
)
descendents on
i
l
i
31-7
Proof of the Lemma
Let L(n) be the number of nodes on level n of the
innite tree corresponding to c
Can show t
1
, t
2
s.t., t
1

n
L(n) t
2

n
.
l
1
l
2
Grey regions are parts
of innite tree that are
erased when node k on
k
becomes leaf.
Node on
i
can become
leaf i grey regions do not
cover all nodes on level
i
Node on
k
has L(
i

k
)
descendents on
i

i1
k=1
L(
k
) < L(
i
)
l
i
32-1
Just need to show that 0 < L(
i
)

i1
k=1
L(
k
).
Proof of the Lemma
32-2
Just need to show that 0 < L(
i
)

i1
k=1
L(
k
).
Proof of the Lemma
L(
i
)
i1

k=1
L(
k
) t
1

t
2
i1

k=1

_
t
1
t
2
i1

k=1

k
_

(t
1
t
2
)
32-3
Just need to show that 0 < L(
i
)

i1
k=1
L(
k
).
Proof of the Lemma
L(
i
)
i1

k=1
L(
k
) t
1

t
2
i1

k=1

_
t
1
t
2
i1

k=1

k
_

(t
1
t
2
)
Choose <
t
1
t
2
32-4
Just need to show that 0 < L(
i
)

i1
k=1
L(
k
).
Proof of the Lemma
L(
i
)
i1

k=1
L(
k
) t
1

t
2
i1

k=1

_
t
1
t
2
i1

k=1

k
_

(t
1
t
2
)
Choose <
t
1
t
2
> 0
33-1
Proof of the Main Theorem
Set K = log

. (Recall l
i
K + log

p
i
|)
Then
n

i=1

i=1

Klog

p
i

i=1

log

p
i
=
n

i=1
p
i
=
33-2
Proof of the Main Theorem
Set K = log

. (Recall l
i
K + log

p
i
|)
Then
n

i=1

i=1

Klog

p
i

i=1

log

p
i
=
n

i=1
p
i
=
From previous lemma, a prex free code with those word
lengths
1
,
2
, . . . ,
n
exists, and we are done
34-1
Example: c
1
= 1, c
2
= 2
34-2
Example: c
1
= 1, c
2
= 2
=

5+1
2
, K = 1
34-3
Example: c
1
= 1, c
2
= 2
=

5+1
2
, K = 1
Consider p
1
= p
2
= p
3
= p
4
=
1
4
34-4
Example: c
1
= 1, c
2
= 2
=

5+1
2
, K = 1
Consider p
1
= p
2
= p
3
= p
4
=
1
4
Note that
_
log

p
i
_
= 3.
34-5
Example: c
1
= 1, c
2
= 2
=

5+1
2
, K = 1
Consider p
1
= p
2
= p
3
= p
4
=
1
4
Note that
_
log

p
i
_
= 3.
No tree with l
i
= 3 exists.
But, a tree with l
i
= log

p
i
| + 1 = 4 does!
1
4
1
4
1
4
1
4
35-1
The Algorithm
A valid K could be found by working through the
proof of Theorem. Technically, O(1) but, practically,
this would require some complicated operations on
reals.
35-2
The Algorithm
A valid K could be found by working through the
proof of Theorem. Technically, O(1) but, practically,
this would require some complicated operations on
reals.
Alternatively, perform doubling search for K,
the smallest K for which theorem is valid.
Set K = 1, 2, 2
2
, 2
3
. . ..
Test if
i
= K + log

p
i
has valid code (can be done eciently)
until K is good but K/2 is not.
35-3
The Algorithm
A valid K could be found by working through the
proof of Theorem. Technically, O(1) but, practically,
this would require some complicated operations on
reals.
Alternatively, perform doubling search for K,
the smallest K for which theorem is valid.
Set K = 1, 2, 2
2
, 2
3
. . ..
Test if
i
= K + log

p
i
has valid code (can be done eciently)
until K is good but K/2 is not.
Note that K/2 < K K
35-4
The Algorithm
A valid K could be found by working through the
proof of Theorem. Technically, O(1) but, practically,
this would require some complicated operations on
reals.
Alternatively, perform doubling search for K,
the smallest K for which theorem is valid.
Set K = 1, 2, 2
2
, 2
3
. . ..
Test if
i
= K + log

p
i
has valid code (can be done eciently)
until K is good but K/2 is not.
Now set a = K/2, b = K, and binary search for K in [a,b].
Note that K/2 < K K
35-5
The Algorithm
A valid K could be found by working through the
proof of Theorem. Technically, O(1) but, practically,
this would require some complicated operations on
reals.
Alternatively, perform doubling search for K,
the smallest K for which theorem is valid.
Set K = 1, 2, 2
2
, 2
3
. . ..
Test if
i
= K + log

p
i
has valid code (can be done eciently)
until K is good but K/2 is not.
Now set a = K/2, b = K, and binary search for K in [a,b].
Note that K/2 < K K
Subtle point: Search will nd K

K for which code exists.


35-6
The Algorithm
A valid K could be found by working through the
proof of Theorem. Technically, O(1) but, practically,
this would require some complicated operations on
reals.
Time complexity O(n log K).
Alternatively, perform doubling search for K,
the smallest K for which theorem is valid.
Set K = 1, 2, 2
2
, 2
3
. . ..
Test if
i
= K + log

p
i
has valid code (can be done eciently)
until K is good but K/2 is not.
Now set a = K/2, b = K, and binary search for K in [a,b].
Note that K/2 < K K
Subtle point: Search will nd K

K for which code exists.


36-1
The Algorithm for innite encoding alphabets
(i) Root of

c
i
= 1 exists
Proof assumed two things.
(ii) t
1
, t
2
s.t., t
1

n
L(n) t
2

n
L(n) is number of nodes on level n of innite tree
36-2
The Algorithm for innite encoding alphabets
(i) Root of

c
i
= 1 exists
Proof assumed two things.
(ii) t
1
, t
2
s.t., t
1

n
L(n) t
2

n
L(n) is number of nodes on level n of innite tree
This is always true for nite encoding alphabet
36-3
The Algorithm for innite encoding alphabets
(i) Root of

c
i
= 1 exists
Proof assumed two things.
(ii) t
1
, t
2
s.t., t
1

n
L(n) t
2

n
L(n) is number of nodes on level n of innite tree
This is always true for nite encoding alphabet
Not necessarily true for innite encoding alphabets
Will see simple example in next section
36-4
The Algorithm for innite encoding alphabets
(i) Root of

c
i
= 1 exists
Proof assumed two things.
(ii) t
1
, t
2
s.t., t
1

n
L(n) t
2

n
L(n) is number of nodes on level n of innite tree
This is always true for nite encoding alphabet
Not necessarily true for innite encoding alphabets
Will see simple example in next section
But, if (i) and (ii) are true for an innite alphabet
Theorem/algorithm hold
36-5
The Algorithm for innite encoding alphabets
(i) Root of

c
i
= 1 exists
Proof assumed two things.
(ii) t
1
, t
2
s.t., t
1

n
L(n) t
2

n
L(n) is number of nodes on level n of innite tree
This is always true for nite encoding alphabet
Not necessarily true for innite encoding alphabets
Will see simple example in next section
But, if (i) and (ii) are true for an innite alphabet
Example: 1-Ended codes. c
i
= i.
=
1
2
and (ii) is true Theorem/algorithm hold
Theorem/algorithm hold
37-1
Extensions to DNC and Regular Language Restrictions
Discrete Noiseless Channels
S
2
S
1
S
3
a, 1
b, 2
b, 3
b, 3
a, 2
a, 1
start
S
1
S
2
S
3
S
3
S
1
S
2
S
3
S
2
S
3
a, 1
a, 1
a, 1
b, 2
a, 2
b, 3
b, 3
b, 3
37-2
Extensions to DNC and Regular Language Restrictions
Discrete Noiseless Channels
S
2
S
1
S
3
a, 1
b, 2
b, 3
b, 3
a, 2
a, 1
start
S
1
S
2
S
3
S
3
S
1
S
2
S
3
S
2
S
3
a, 1
a, 1
a, 1
b, 2
a, 2
b, 3
b, 3
b, 3
, , t
1
, t
2
s.t., t
1

n
L(n) t
2

n
Let L(n) be number of nodes on level n of innite tree
Fact that graph is biconnected and aperiodic implies that
37-3
Extensions to DNC and Regular Language Restrictions
Discrete Noiseless Channels
S
2
S
1
S
3
a, 1
b, 2
b, 3
b, 3
a, 2
a, 1
start
S
1
S
2
S
3
S
3
S
1
S
2
S
3
S
2
S
3
a, 1
a, 1
a, 1
b, 2
a, 2
b, 3
b, 3
b, 3
, , t
1
, t
2
s.t., t
1

n
L(n) t
2

n
Let L(n) be number of nodes on level n of innite tree
Fact that graph is biconnected and aperiodic implies that
Algorithm will still work for
i
K + log

p
i
,
37-4
Extensions to DNC and Regular Language Restrictions
Discrete Noiseless Channels
S
2
S
1
S
3
a, 1
b, 2
b, 3
b, 3
a, 2
a, 1
start
S
1
S
2
S
3
S
3
S
1
S
2
S
3
S
2
S
3
a, 1
a, 1
a, 1
b, 2
a, 2
b, 3
b, 3
b, 3
, , t
1
, t
2
s.t., t
1

n
L(n) t
2

n
Let L(n) be number of nodes on level n of innite tree
Fact that graph is biconnected and aperiodic implies that
Algorithm will still work for
i
K + log

p
i
,
Note: Algorithm must construct k dierent coding trees. One for each
state (tree root).
37-5
Extensions to DNC and Regular Language Restrictions
Discrete Noiseless Channels
S
2
S
1
S
3
a, 1
b, 2
b, 3
b, 3
a, 2
a, 1
start
S
1
S
2
S
3
S
3
S
1
S
2
S
3
S
2
S
3
a, 1
a, 1
a, 1
b, 2
a, 2
b, 3
b, 3
b, 3
, , t
1
, t
2
s.t., t
1

n
L(n) t
2

n
Let L(n) be number of nodes on level n of innite tree
Fact that graph is biconnected and aperiodic implies that
Algorithm will still work for
i
K + log

p
i
,
Subtle point is that any node on level l
i
can be chosen for p
i
, independent
of its state! Algorithm still works.
38-1
Extensions to DNC and Regular Language Restrictions
Regular Language Restrictions
Assumption: Language is aperiodic, i.e., N, such that n > N
there is at least one word of length n
S
1
S
4
1
1
1
1
0 0 0
0
S
2
S
3
S
1
S
2
S
1
0 1
S
1
1
S
3
0
S
1
1
S
2
0
S
1
1
S
1
001
S
1
01
S
1
. . .
Let L(n) be number of nodes on level n of innite tree
, , t
1
, t
2
s.t., t
1

n
L(n) t
2

n
Fact that language is aperiodic implies that
Algorithm will still work for
i
K + log

p
i
,
is largest dominant eigenvalue of a conn component of the DFA.
Again, any node at level l
i
can be labelled with p
i
, independent of state
39-1
Outline
Human Coding and Generalizations
A Counterexample
New Work
Previous Work & Background
Conclusion and Open Problems
40-1
A Counterexample
Let c be the countably innite set dened by
[j [ c
j
= i[ = 2C
i1
where C
i
=
1
i+1
_
2i
i
_
is the i-th Catalan number.
Constructing prex-free codes with these c can be shown
to be equivalent to constructing balanced binary prex-free
codes in which, for every word, the number of 0s equals the
number of 1s.
40-2
A Counterexample
Let c be the countably innite set dened by
[j [ c
j
= i[ = 2C
i1
where C
i
=
1
i+1
_
2i
i
_
is the i-th Catalan number.
Constructing prex-free codes with these c can be shown
to be equivalent to constructing balanced binary prex-free
codes in which, for every word, the number of 0s equals the
number of 1s.
No ecient additive-error approximation known.
40-3
A Counterexample
Let c be the countably innite set dened by
[j [ c
j
= i[ = 2C
i1
where C
i
=
1
i+1
_
2i
i
_
is the i-th Catalan number.
Constructing prex-free codes with these c can be shown
to be equivalent to constructing balanced binary prex-free
codes in which, for every word, the number of 0s equals the
number of 1s.
For this problem, the length of a balanced word = # of 0s in word.
e.g., |10| = 1, |001110| = 3.
No ecient additive-error approximation known.
41-1
A Counterexample
Let L be the set of all balanced binary words.
Set Q = {01, 10, 0011, 1100, 000111, . . .},
the language of all balanced binary words without a balanced prex.
Then L = Q

and every word in L can be uniquely decomposed


into concatenation of words in Q.
41-2
A Counterexample
Let L be the set of all balanced binary words.
Set Q = {01, 10, 0011, 1100, 000111, . . .},
the language of all balanced binary words without a balanced prex.
Then L = Q

and every word in L can be uniquely decomposed


into concatenation of words in Q.
# words of length i in Q is 2C
i1
.
41-3
A Counterexample
Let L be the set of all balanced binary words.
Set Q = {01, 10, 0011, 1100, 000111, . . .},
the language of all balanced binary words without a balanced prex.
Then L = Q

and every word in L can be uniquely decomposed


into concatenation of words in Q.
# words of length i in Q is 2C
i1
.
Prex coding in L is equivalent to prex coding with innite alphabet Q.
01
10
0011
1100
01
10
0011
1100
42-1
A Counterexample
Note: the characteristic equation is
g(z) = 1

c
j
= 1

i
2C
i1

i
=
_
1 4/
for which root does not exist ( = 4 is an algebraic
singularity).
Can prove that for , K, we can always nd
p
1
, p
2
, . . . , p
n
s.t. there is no prex code with length
l
i
= K +log

p
i
|
43-1
A Counterexample
= 4 is algebraic singularity of characteristic equation
43-2
A Counterexample
= 4 is algebraic singularity of characteristic equation
Can prove that for 4, K, we can always nd
p
1
, p
2
, . . . , p
n
s.t. there is no prex code with length
l
i
= K +log

p
i
|
43-3
A Counterexample
= 4 is algebraic singularity of characteristic equation
Can prove that for 4, K, we can always nd
p
1
, p
2
, . . . , p
n
s.t. there is no prex code with length
l
i
= K +log

p
i
|
Can also prove that for < 4, K, , we can always
nd p
1
, p
2
, . . . , p
n
s.t. if prex code with lengths
l
i
K +log

p
i
| exists, then

i
l
i
p
i
OPT > .
43-4
A Counterexample
= 4 is algebraic singularity of characteristic equation
Can prove that for 4, K, we can always nd
p
1
, p
2
, . . . , p
n
s.t. there is no prex code with length
l
i
= K +log

p
i
|
Can also prove that for < 4, K, , we can always
nd p
1
, p
2
, . . . , p
n
s.t. if prex code with lengths
l
i
K +log

p
i
| exists, then

i
l
i
p
i
OPT > .

No Shannon-Coding type algorithm can guarantee an


additive-error approximation for a balanced prex code.
44-1
Outline
Human Coding and Generalizations
A Counterexample
New Work
Previous Work & Background
Conclusion and Open Problems
45-1
Conclusion and Open Problems
We saw how to use Shannon Coding to develop ecient
approximation algorithms for prex-coding variants, e.g.,
unequal cost cost coding, coding in the Discrete Noiseless
Channel and coding with regular language constraints.
45-2
Conclusion and Open Problems
We saw how to use Shannon Coding to develop ecient
approximation algorithms for prex-coding variants, e.g.,
unequal cost cost coding, coding in the Discrete Noiseless
Channel and coding with regular language constraints.
Old Open Question: is unequal-cost coding NP-complete?
45-3
Conclusion and Open Problems
We saw how to use Shannon Coding to develop ecient
approximation algorithms for prex-coding variants, e.g.,
unequal cost cost coding, coding in the Discrete Noiseless
Channel and coding with regular language constraints.
Old Open Question: is unequal-cost coding NP-complete?
New Open Question: is there an additive-error approximation
algorithm for prex coding using balanced strings?
We just saw that Shannon Coding doesnt work.
G. & Li (2007) proved that (variant of) Shannon-Fano doesnt work.
Perhaps no such algorithm exists.
46-1
The End
T H/A/ O|
Q and A

Potrebbero piacerti anche