Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
presented by
Boris Epshtein & Lena Gorelick
Advanced Topics in Computer and Human Vision
Spring 2004
Agenda
• Motivation
• Information Theory - Basic Definitions
• Rate Distortion Theory
– Blahut-Arimoto algorithm
• Information Bottleneck Principle
• IB algorithms
– iIB
– dIB
– aIB
• Application
Motivation
Clustering Problem
Motivation
Complexity-Precision Trade-off
Complexity-Precision Trade-off
Complexity-Precision Trade-off
• Examples of approaches:
– SRM Structural Risk Minimization
– MDL Minimum Description Length
– Rate Distortion Theory
Agenda
• Motivation
• Information Theory - Basic Definitions
• Rate Distortion Theory
– Blahut-Arimoto algorithm
• Information Bottleneck Principle
• IB algorithms
– iIB
– dIB
– aIB
• Application
Definitions…
Entropy
Entropy - Example
– Fair Coin:
– Unfair Coin:
Definitions…
Entropy - Illustration
Highest Lowest
Definitions…
Conditional Entropy
Mutual Information
•
– high distortion
– very compact
Rate Distortion Theory – Cont.
• The quality of clustering is determined by
– Complexity is measured by
(a.k.a. Rate)
– Distortion is measured by
Rate Distortion Plane
D - distortion constraint
Minimal
Distortion
Maximal
Compression
Ed(X,T)
Rate Distortion Function
• Let be an upper bound constraint on the
expected distortion
Complexity Distortion
Term Term
Lagrange
Multiplier
Minimize !
Rate Distortion Curve
Minimal
Distortion
Maximal
Compression
Ed(X,T)
Rate Distortion Function
Minimize
Subject to
Normalization
Solution - Analysis
Solution:
Known
Solution:
For a fixed
When is similar to is small
closer points are attached to with higher
probability
Solution - Analysis
Solution:
Fix t
reduces the influence of distortion
does not depend on
this + maximal compression single cluster
Fix x
most of cond. prob. goes to some
with smallest distortion
hard clustering
Solution - Analysis
Solution:
Varying
Agenda
• Motivation
• Information Theory - Basic Definitions
• Rate Distortion Theory
– Blahut-Arimoto algorithm
• Information Bottleneck Principle
• IB algorithms
– iIB
– dIB
– aIB
• Application
Blahut – Arimoto Algorithm
Input:
Randomly init
• Slow convergence
Rate Distortion Theory –
Additional Insights
– Another problem would be to find optimal
representatives given the clustering.
I(Cluster;Topic)
I(Word;Cluster)
I(Cluster;Topic)=0
Not
Informative
I(Word;Cluster)=0
Very Compact
Information Bottleneck-Example
Extreme case 2:
I(Cluster;Topic)=max
Very
Informative
I(Word;Cluster)=max
Not Compact
Minimize I(Word; Cluster) & maximize I(Cluster; Topic)
Information Bottleneck
topics
words
Compactness Relevant
Information
Relevance Compression Curve
Maximal
D – relevance Relevant
Information
constraint
Maximal
Compression
Relevance Compression Function
• Let be minimal allowed value of
Smaller more relaxed relevant
information constraint
Compression Relevance
Term Term
Lagrange
Multiplier
Minimize !
Relevance Compression Curve
Maximal
Relevant
Information
Maximal
Compression
Relevance Compression Function
Minimize
Subject to
Normalization
Solution - Analysis
Solution:
Known
The solution is
implicit
Solution - Analysis
Solution:
For a fixed
When is similar to KL is small
attach such points to with higher probability
Solution:
Fix t
reduces the influence of KL
does not depend on
this + maximal compression single cluster
Fix x
most of cond. prob. goes to some
with smallest KL (hard mapping)
Relevance Compression Curve
Maximal
Relevant
Information
Hard Mapping
Maximal
Compression
Agenda
• Motivation
• Information Theory - Basic Definitions
• Rate Distortion Theory
– Blahut-Arimoto algorithm
• Information Bottleneck Principle
• IB algorithms
– iIB
– dIB
– aIB
• Application
Iterative Optimization Algorithm (iIB)
• Input:
• Randomly init
p(cluster)
• Given:
– 300 instances of with prior
– Binary relevant variable
– Joint prior
–
• Obtain:
– Optimal clustering (with minimal )
iIB simulation…
• Analogy to K-means or EM
“Semantic change” in the clustering solution
Iterative Optimization Algorithm (iIB)
Advantages:
• Defining relevant variable is often easier and more
intuitive than defining distortion measure
• Finds local minimum
Iterative Optimization Algorithm (iIB)
Drawbacks:
• Finds local minimum (suboptimal solutions)
• Need to specify the parameters
• Slow convergence
• Large data sample is required
Agenda
• Motivation
• Information Theory - Basic Definitions
• Rate Distortion Theory
– Blahut-Arimoto algorithm
• Information Bottleneck Principle
• IB algorithms
– iIB
– dIB
– aIB
• Application
Deterministic Annealing-like algorithm (dIB)
Small
Perturbation
Deterministic Annealing-like algorithm (dIB)
if are different
leave the split
else
use the old
Deterministic Annealing-like algorithm (dIB)
Illustration
Same
Different
Resolutions
Slonim & Tishby 1999
Agglomerative Algorithm (aIB)
Fix
Start with
Agglomerative Algorithm (aIB)
For each pair
Compute new
Merge and that produce the smallest
Agglomerative Algorithm (aIB)
For each pair
Compute new
Merge and that produce the smallest
Agglomerative Algorithm (aIB)
For each pair
• Non-parametric
• Simple
Agglomerative Algorithm (aIB)
Drawbacks:
• Greedy – is not guaranteed to extract even locally
minimal solutions along the tree
• Large data sample is required
Agenda
• Motivation
• Information Theory - Basic Definitions
• Rate Distortion Theory
– Blahut-Arimoto algorithm
• Information Bottleneck Principle
• IB algorithms
– iIB
– dIB
– aIB
• Application
Applications…
Modeling assumption:
For a fixed colors and
their spatial distribution
are generated by a
mixture of Gaussians in
Shiri Gordon et. al., 2003 5-dim
Applications…
Unsupervised Clustering of
Images
• Assume uniform prior
• Calculate conditional
• Apply aIB algorithm
•
– high distortion
– very compact
–
Information Bottleneck - cont’d
• Assume Markov relations: