Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract
This documents presents details of major image tagging models.
1 Problem Formulation
2 Tag Propagation
Tag propagation consists of two steps:
• Finding nearest neighbors using visual information. Concretely, k neighbors are found using
VGG features
• Propagate tags from neighbors
The propagation use the following formula:
2.1 Details
Assume that we have data from two different spaces: X1 and X2 . We want to learn the transformation
W1 and W2 which maps to a joint embedding E, where these two representations match with each
other. Alternative names for these are different views, or different modals. In addition, the number of
spaces could be generalized to N instead of 2 only.
In our case, X1 has dimension 4096 (VGG feature), and X2 has dimension 300 (tag embedding).
The criteria for optimization is canonical correlation, which is defined as following:
<W1 X1 ,W2 X2 >
argmaxW1 ,W2 ||W 1 X1 ||||W2 X2 ||
KKT condition states that the optimal solution W1∗ , W2∗ will satisfy the following (necessary)
conditions:
∂L
= Σ12 W2 − λ1 Σ11 W1 = 0
∂W1
(2)
∂L
= Σ21 W1 − λ2 Σ22 W2 = 0
∂W2
Equivalently:
W2 = Σ−1
22 Σ21 W1 /λ (5)
Σ12 Σ−1 2
22 Σ21 W1 = λ Σ11 W1 (6)
5 Ranking Loss
References
[1] Minmin Chen, Alice Zheng, and Kilian Weinberger. Fast image tagging. In International
conference on machine learning, pages 1274–1282, 2013.