Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
(Supplementary Material)
1 K−dimensional Coordinate
Transformation Map
Though SCORE operates in (K + 1)−dimensional space,
the embedding map for each aspect t ∈ T on the tangent hy-
perplane is effectively K−dimensional. A necessary step is
to transform the (K + 1)−dimensional coordinate of objects
on t’s tangent hyperplane, i.e., {Projxt (yi )}i∈I , to their cor-
responding K−dimensional coordinates, i.e., {yit }i∈I . For
the purpose of visualizing the embedding for an aspect t on a Figure 1: Tranformation of objects’ coordinates from 3D to 2D.
scatterplot, we describe how to transform the 3D coordinates
of objects on t’s tangent hyperplaneto their corresponding 2D
coordinates in the following. However, the below analysis Projxt (yj ) Projxt (yk ) respectively:
is also applicable to high-dimensional embedding space, i.e., ||Projxt (yj ) − Projxt (yk )||
when K > 2.
= ||(Projxt (yj ) − xt ) − (Projxt (yk ) − xt )||
Since xt , Projxt (yi ), Projxt (yj ), Projxt (yk ) lie on the
tangent hyperplane Txt SK of the task t, the three vectors = ||v − w||
= ||(aj .e1 + bj .e2 ) − (ak .e1 + bk .e2 )||
u = Projxt (yi ) − xt , = ||(aj − ak ).e1 + (bj − bk ).e2 ||
v = Projxt (yj ) − xt ,
p
= (aj − ak )2 + (bj − bk )2
w = Projxt (yk ) − xt = ||Trt (Projxt (yj )) − Trt (Projxt (yk ))||. (2)
Equation 2 implies that the L2 -norm between points on
are on Txt SK as well. Txt SK are preserved through the transformation map Trt .
As illustrated in Fig. 1, the cross product xt × u is a vector Therefore, the ordinal relations between points are also pre-
on Txt SK and perpendicular to xt , u. Let’s denote: served through the transformation. Hence, we express yit =
Trt (Projxt (yi )), for all i ∈ I.
u xt × u
e1 = , e2 = .
||u|| ||xt × u|| 2 Detailed Algorithm
K
We can see that e1 , e2 form a basis of Txt S (since ||e1 || = In this supplementary material, we describe SCORE with the
||e2 || = 1, e1 T e2 = 0). From linear algebra, for each point partial derivative computations provided in Algorithm 1.
y ∈ Txt SK , there exists unique ay , by ∈ R such as:
3 Exploration on the Split Ratio
(y − xt ) = ay .e1 + by .e2 To better understand the benefits of multi-aspect modeling,
we shows the accuracies with varying r for the complete set
Consider the following transformation map where ay , by ∈ of aspects in Fig. 2.
R are defined as above: The disjoint learning baselines perform poorly for low
value of r. This is expected since the amount of observed
Trt : Txt SK → R2 data is insufficient for a single task to learn its own map ef-
y 7→ Trt (y) = (ay , by ) (1) fectively. For extremely high r, e.g., 0.7, the disjoint learning
baselines tend to do well. For Zoo, HouseVote, and Paris
Attractions, r = 0.7 respectively corresponds to approxi-
Let (aj , bj ) and (ak , bk ) be the transformation of mately 1.1M, 16.1M, and 155K triplets in training, which are
Algorithm 1 SCORE
1: Initialize xt for t ∈ T and yi for i ∈ I.
2: While not converged {
3: Draw a triplet hi, j, kit randomly from N .
4:
5: Compute the likelihood:
t t
6: lijk = δt .σijk + (1 − δt )σijk .
Projxt (yj )−Projxt (yi )
7: Denote ptij = dtij
(∀t ∈ T ; i, j, k ∈ I) .
8:
9: Compute the partial derivatives:
∂Projxt (yj −yk ) t ∂Projxt (yj −yi ) t
h i
10: ∆xt = ltα δt σijk
t t
(1 − σijk ) ∂x t
pkj − ∂x t
pij + κµ;
ijk h i
∂Projxt (yi ) t
11: ∆yi = ltα δt σijk
t t
(1 − σijk ) ∂yi pij + (1 − δt )σijk (1 − σijk )yj + κµ;
ijk h i
∂Projxt (yj )
12: ∆yj = ltα δt σijk
t t
(1 − σijk ) ∂yj pt
kj − pt
ij + (1 − δ )σ
t ijk (1 − σ ijk ) (y i − yk ) + κµ;
ijk h
−∂Projxt (yk ) t
i
13: ∆yk = ltα δt σijk
t t
(1 − σijk ) ∂yk pkj + (1 − δt )σijk (1 − σijk )(−yj ) + κµ;
ijk
14:
15: Update the model parameters:
16: z ← Rz (.Proj each z ∈ {xt , yi , yj , yk } ;
z (∆z )) for
t
17: δt ← δt + . σijk − σijk ; δt = arg min|δt − δ|;
δ∈[0,1]
18: }
19:
20: Return {xt }t∈T and {yi }i∈I .
0.80 0.80 tive approach when the data is under-sampled, yet sufficient
0.70
to learn the relatedness and specialization of tasks. Second,
0.70
in practice it is often unclear whether the data is sufficient.
0.60 0.60 Upon such ambiguity, multi-aspects ameliorates the risk of
0.50
performing badly, while providing reasonable performance.
0.50
0.40 0.40
0.2 0.3 0.4 0.5 0.6 0.7 0.2 0.3 0.4 0.5 0.6 0.7
4 Illustrative Case Study for HouseVote
Split Ratios Split Ratios In the main manuscript, we show the visualizations for three
(c) Paris Attractions (237 aspects) aspects: type, #legs, and predator of the Zoo dataset. De-
rived from the spherical coordinates of objects and projected
Presentation Accuracy
0.80
onto each task’s hyperplane, they provide different percep-
0.70 tions of similarity over the objects.
0.60
Here, we provide another example from the HouseVote
dataset. For HouseVote, each object is a congressman. Fig. 3
0.50 provides visualizations of these objects, for three binary at-
0.40
tributes: immigration, education-spending, and crime. Some
0.3 0.4 0.5 0.6 0.7 objects overlap as they may have similar attribute values. We
Split Ratios
use size, shape, and color to represent the attribute values
of immigration, education-spending, and crime respectively.
Figure 2: Overall preservation accuracies at various split ratios. According to the sizes, the labels, and the colors in Fig. 3,
instances with same attribute values are grouped intuitively.
34.23%, 34.17%, and 44.42% of all possible triplets. With For example, Fig. 3(a) visualizes immigration attribute (rep-
sufficiently large data that cover majority of objects, each as- resented by size: small indicates a ‘no’ vote, while large in-
pect has more flexibility to specialize, with little risk in miss- dicates a ’yes’ vote). We see the clear separation between
ing out information. Also in Fig. 2, SCORE shows signifi- points of similar size: small shapes on the right, large shapes
cantly better performances than MVTE and MVMDS for the on the left. Similarly, Fig. 3(b) is a visualization for edu-
same reasons that have been discussed in the manuscript. cation spending (represented by shape: inverted triangles for
Importantly, SCORE is robust across values of r. It is ‘no’ and circles for ’yes’). Triangular objects tend to flock
the best around 0.4-0.6, and never the worst. This result has to the left, while circular objects flock to the right. Finally,