Sei sulla pagina 1di 3

Learning Multiple Maps from Conditional Ordinal Triplets

(Supplementary Material)

1 K−dimensional Coordinate
Transformation Map
Though SCORE operates in (K + 1)−dimensional space,
the embedding map for each aspect t ∈ T on the tangent hy-
perplane is effectively K−dimensional. A necessary step is
to transform the (K + 1)−dimensional coordinate of objects
on t’s tangent hyperplane, i.e., {Projxt (yi )}i∈I , to their cor-
responding K−dimensional coordinates, i.e., {yit }i∈I . For
the purpose of visualizing the embedding for an aspect t on a Figure 1: Tranformation of objects’ coordinates from 3D to 2D.
scatterplot, we describe how to transform the 3D coordinates
of objects on t’s tangent hyperplaneto their corresponding 2D
coordinates in the following. However, the below analysis Projxt (yj ) Projxt (yk ) respectively:
is also applicable to high-dimensional embedding space, i.e., ||Projxt (yj ) − Projxt (yk )||
when K > 2.
= ||(Projxt (yj ) − xt ) − (Projxt (yk ) − xt )||
Since xt , Projxt (yi ), Projxt (yj ), Projxt (yk ) lie on the
tangent hyperplane Txt SK of the task t, the three vectors = ||v − w||
= ||(aj .e1 + bj .e2 ) − (ak .e1 + bk .e2 )||
u = Projxt (yi ) − xt , = ||(aj − ak ).e1 + (bj − bk ).e2 ||
v = Projxt (yj ) − xt ,
p
= (aj − ak )2 + (bj − bk )2
w = Projxt (yk ) − xt = ||Trt (Projxt (yj )) − Trt (Projxt (yk ))||. (2)
Equation 2 implies that the L2 -norm between points on
are on Txt SK as well. Txt SK are preserved through the transformation map Trt .
As illustrated in Fig. 1, the cross product xt × u is a vector Therefore, the ordinal relations between points are also pre-
on Txt SK and perpendicular to xt , u. Let’s denote: served through the transformation. Hence, we express yit =
Trt (Projxt (yi )), for all i ∈ I.
u xt × u
e1 = , e2 = .
||u|| ||xt × u|| 2 Detailed Algorithm
K
We can see that e1 , e2 form a basis of Txt S (since ||e1 || = In this supplementary material, we describe SCORE with the
||e2 || = 1, e1 T e2 = 0). From linear algebra, for each point partial derivative computations provided in Algorithm 1.
y ∈ Txt SK , there exists unique ay , by ∈ R such as:
3 Exploration on the Split Ratio
(y − xt ) = ay .e1 + by .e2 To better understand the benefits of multi-aspect modeling,
we shows the accuracies with varying r for the complete set
Consider the following transformation map where ay , by ∈ of aspects in Fig. 2.
R are defined as above: The disjoint learning baselines perform poorly for low
value of r. This is expected since the amount of observed
Trt : Txt SK → R2 data is insufficient for a single task to learn its own map ef-
y 7→ Trt (y) = (ay , by ) (1) fectively. For extremely high r, e.g., 0.7, the disjoint learning
baselines tend to do well. For Zoo, HouseVote, and Paris
Attractions, r = 0.7 respectively corresponds to approxi-
Let (aj , bj ) and (ak , bk ) be the transformation of mately 1.1M, 16.1M, and 155K triplets in training, which are
Algorithm 1 SCORE
1: Initialize xt for t ∈ T and yi for i ∈ I.
2: While not converged {
3: Draw a triplet hi, j, kit randomly from N .
4:
5: Compute the likelihood:
t t
6: lijk = δt .σijk + (1 − δt )σijk .
Projxt (yj )−Projxt (yi ) 
7: Denote ptij = dtij
(∀t ∈ T ; i, j, k ∈ I) .
8:
9: Compute the partial derivatives: 
∂Projxt (yj −yk ) t ∂Projxt (yj −yi ) t
h i
10: ∆xt = ltα δt σijk
t t
(1 − σijk ) ∂x t
pkj − ∂x t
pij + κµ;
ijk h i
∂Projxt (yi ) t
11: ∆yi = ltα δt σijk
t t
(1 − σijk ) ∂yi pij + (1 − δt )σijk (1 − σijk )yj + κµ;
ijk h   i
∂Projxt (yj )
12: ∆yj = ltα δt σijk
t t
(1 − σijk ) ∂yj pt
kj − pt
ij + (1 − δ )σ
t ijk (1 − σ ijk ) (y i − yk ) + κµ;
ijk h
−∂Projxt (yk ) t
i
13: ∆yk = ltα δt σijk
t t
(1 − σijk ) ∂yk pkj + (1 − δt )σijk (1 − σijk )(−yj ) + κµ;
ijk
14:
15: Update the model parameters:
16: z ← Rz (.Proj  each z ∈ {xt , yi , yj , yk } ;
 z (∆z )) for
t
17: δt ← δt + . σijk − σijk ; δt = arg min|δt − δ|;
δ∈[0,1]
18: }
19:
20: Return {xt }t∈T and {yi }i∈I .

(a) Zoo (17 aspects) (b) HouseVote (16 aspects)


two implications. First, it reiterates the benefit of collabora-
Presentation Accuracy
Presentation Accuracy

0.80 0.80 tive approach when the data is under-sampled, yet sufficient
0.70
to learn the relatedness and specialization of tasks. Second,
0.70
in practice it is often unclear whether the data is sufficient.
0.60 0.60 Upon such ambiguity, multi-aspects ameliorates the risk of
0.50
performing badly, while providing reasonable performance.
0.50

0.40 0.40
0.2 0.3 0.4 0.5 0.6 0.7 0.2 0.3 0.4 0.5 0.6 0.7
4 Illustrative Case Study for HouseVote
Split Ratios Split Ratios In the main manuscript, we show the visualizations for three
(c) Paris Attractions (237 aspects) aspects: type, #legs, and predator of the Zoo dataset. De-
rived from the spherical coordinates of objects and projected
Presentation Accuracy

0.80
onto each task’s hyperplane, they provide different percep-
0.70 tions of similarity over the objects.
0.60
Here, we provide another example from the HouseVote
dataset. For HouseVote, each object is a congressman. Fig. 3
0.50 provides visualizations of these objects, for three binary at-
0.40
tributes: immigration, education-spending, and crime. Some
0.3 0.4 0.5 0.6 0.7 objects overlap as they may have similar attribute values. We
Split Ratios
use size, shape, and color to represent the attribute values
of immigration, education-spending, and crime respectively.
Figure 2: Overall preservation accuracies at various split ratios. According to the sizes, the labels, and the colors in Fig. 3,
instances with same attribute values are grouped intuitively.
34.23%, 34.17%, and 44.42% of all possible triplets. With For example, Fig. 3(a) visualizes immigration attribute (rep-
sufficiently large data that cover majority of objects, each as- resented by size: small indicates a ‘no’ vote, while large in-
pect has more flexibility to specialize, with little risk in miss- dicates a ’yes’ vote). We see the clear separation between
ing out information. Also in Fig. 2, SCORE shows signifi- points of similar size: small shapes on the right, large shapes
cantly better performances than MVTE and MVMDS for the on the left. Similarly, Fig. 3(b) is a visualization for edu-
same reasons that have been discussed in the manuscript. cation spending (represented by shape: inverted triangles for
Importantly, SCORE is robust across values of r. It is ‘no’ and circles for ’yes’). Triangular objects tend to flock
the best around 0.4-0.6, and never the worst. This result has to the left, while circular objects flock to the right. Finally,
  
 


    
 
  


 
 

  
 
 


 

Figure 3: Visualizations for three attributes: immigration,


education-spending, crime (HouseVote).

in Fig. 3(c) - a map of the crime vote (represented by color:


blue for ‘no’ and red for ’yes’). It is evident blues tend to be
on the left, while red tend to be on the right. These are mul-
tiple maps defined over the same set of objects, yet reflecting
different perceptions of similarity.
Upon a closer inspection of the data, we uncover further
insights. For instance, the HouseVote data contains the po-
litical affiliation of congressmen (Republican or Democrat),
which was not used for learning the embedding maps. Over-
all, there is a tendency for Democrats to vote ‘no’ on edu-
cation spending and crime, while Republicans tend to vote
‘yes’ on both counts. On Fig. 3(b), most of blue triangles on
the left are in fact Democrats, while most of the red circles on
the right are Republicans. Interestingly, there are red trian-
gles on the left, who are likely to be Republicans voting with
Democrats, whereas the red triangles on the right are likely to
be Democrats voting with Republicans. Thus the map helps
highlight the varying similarities between congressmen de-
pending on the voting issues.
Fig. 3(d) is produced by SCORE, running on single-map
mode. Again, a single visualization map is not efficient to
capture diverse similarity perceptions. This highlight the
need for multiple maps, each for a similarity aspect.

Potrebbero piacerti anche