Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract
Recently, graph neural networks have attracted
great attention and achieved prominent perfor-
arXiv:1901.08150v1 [cs.LG] 23 Jan 2019
1) Hypergraph convolution defines a basic convolutional the sequence. (Atwood & Towsley, 2016) demonstrate that
operator in a hypergraph. It enables an efficient informa- diffusion-based representations can serve as an effective
tion propagation between vertices by fully exploiting the basis for node classification. (Zhuang & Ma, 2018) further
high-order relationship and local clustering structure therein. explore a joint usage of diffusion and adjacency basis in
We mathematically prove that graph convolution is a spe- a dual graph convolutional network. (Gilmer et al., 2017)
cial case of hypergraph convolution when the non-pairwise defines a unified framework via a message passing function,
relationship degenerates to a pairwise one. where each vertex sends messages based on its states and
updates the states based on the message of its immediate
2) Apart from hypergraph convolution where the underlying
neighbors. (Hamilton et al., 2017) propose GraphSAGE,
structure used for propagation is pre-defined, hypergraph
which customizes three aggregating functions, i.e., element-
attention further exerts an attention mechanism to learn a
wise mean, long short-term memory and pooling, to learn
dynamic connection of hyperedges. Then, the information
embeddings in an inductive setting.
propagation and gathering is done in task-relevant parts
of the graph, thereby generating more discriminative node Moreover, some other works concentrate on gate mecha-
embeddings. nism (Li et al., 2016), skip connection (Pham et al., 2017),
jumping connection (Xu et al., 2018), attention mecha-
3) Both hypergraph convolution and hypergraph attention
nism (Velickovic et al., 2018), sampling strategy (Chen et al.,
are end-to-end trainable, and can be inserted into most vari-
2018a;b), hierarchical representation (Ying et al., 2018b),
ants of graph neural networks as long as non-pairwise rela-
generative models (You et al., 2018b; Bojchevski et al.,
tionships are observed. Extensive experimental results on
2018), adversarial attack (Dai et al., 2018), etc. As a thor-
benchmark datasets demonstrate the efficacy of the proposed
ough review is simply unfeasible due to the space limitation,
methods for semi-supervised node classification.
we refer interested readers to surveys for more representative
methods. For example, (Zhang et al., 2018) and (Zhou et al.,
2. Related Work 2018) present two systematical and comprehensive surveys
over a series of variants of graph neural networks. (Bron-
Graphs are a classic kind of data structure, where its vertices
stein et al., 2017) provide a review of geometric deep learn-
represent the objects and its edges linking two adjacent
ing. (Battaglia et al., 2018) generalize and extend various ap-
vertices describe the relationship between the corresponding
proaches and show how graph neural networks can support
objects.
relational reasoning and combinatorial generalization. (Lee
Graph Neural Network (GNN) is a methodology for learn- et al., 2018) particularly focus on the attention models for
ing deep models or embeddings on graph-structured data, graphs, and introduce three intuitive taxonomies. (Monti
which was first proposed by (Scarselli et al., 2009). One key et al., 2017) propose a unified framework called MoNet,
aspect in GNN is to define the convolutional operator in the which summarizes Geodesic CNN (Masci et al., 2015),
graph domain. (Bruna et al., 2013) firstly define convolution Anisotropic CNN (Boscaini et al., 2016), GCN (Kipf &
in the Fourier domain using the graph Laplacian matrix, Welling, 2016) and Diffusion CNN (Atwood & Towsley,
and generate non-spatially localized filters with potentially 2016) as its special cases.
intense computations. (Henaff et al., 2015) enable the spec-
As analyzed above, most existing variants of GNN assume
tral filters spatially localized using a parameterization with
pairwise relationships between objects, while our work
smooth coefficients. (Defferrard et al., 2016) focus on the
operates on a high-order hypergraph (Berge, 1973; Li &
efficiency issue and use a Chebyshev expansion of the graph
Milenkovic, 2017) where the between-object relationships
Laplacian to avoid an explicit use of the graph Fourier ba-
are beyond pairwise. Hypergraph learning methods differ
sis. (Kipf & Welling, 2016) further simplify the filtering
in the structure of the hypergraph, e.g., clique expansion
by only using the first-order neighbors and propose Graph
and star expansion (Zien et al., 1999), and the definition of
Convolutional Network (GCN), which has demonstrated
hypergraph Laplacians (Bolla, 1993; Rodriguez, 2003; Zhou
impressive performance in both efficiency and effectiveness
et al., 2007). Following (Defferrard et al., 2016), (Feng et al.,
with semi-supervised classification tasks.
2019) propose a hypergraph neural network using a Cheby-
Meanwhile, some spatial algorithms directly perform con- shev expansion of the graph Laplacian. By analyzing the
volution on the graph. For instance, (Duvenaud et al., 2015) incident structure of a hypergraph, our work directly defines
learn different parameters for nodes with different degrees, two differentiable operators, i.e., hypergraph convolution
then average the intermediate embeddings over the neighbor- and hypergraph attention, which is intuitive and flexible in
hood structures. (Niepert et al., 2016) propose the PATCHY- learning more discriminative deep embeddings.
SAN architecture, which selects a fixed-length sequence
of nodes as the receptive field and generate local normal-
ized neighborhood representations for each of the nodes in
Hypergraph Convolution and Hypergraph Attention
N
X X(l+1) = σ(D−1 HWB−1 HT X(l) P), (6)
B = Hi . (2)
i=1
which enjoys similar mathematical properties as Eq. (5),
except that the propagation is directional and asymmetric in
Note that D ∈ RN ×N and B ∈ RM ×M are both diagonal this case.
matrices.
As X(l+1) is differentiable with respect to X(l) and P, we
In the following, we define the operator of convolution on can use hypergraph convolution in end-to-end training and
the hypergraph G. optimize it via gradient descent.
e1 e2
x1 1 0
x2 1 0
x1 Attention
x3 1 1
x4 0 1
x2 x5 0 1
Transition Probability
Figure 2. Schematic illustration of hypergraph convolution with 5 vertices and 2 hyperedges. With an optional attention mechanism,
hypergraph convolution upgrades to hypergraph attention.
One natural solution is to exert an attention learning module objects like newspaper or text. Then, it is problematic to
on H. In this circumstance, instead of treating each vertex directly learn an attention module over the incidence matrix
as being connected by a certain hyperedge or not, the atten- H. We leave this issue for future work.
tion module presents a probabilistic model, which assigns
non-binary and real values to measure the degree of connec- 3.4. Summary and Remarks
tivity. We expect that the probabilistic model can learn more
category-discriminative embeddings and the relationship The pipeline of the proposed hypergraph convolution and
between vertices can be more accurately described. hypergraph attention is illustrated in Fig. 2. Both two op-
erators can be inserted into most variants of graph neural
Nevertheless, hypergraph attention is only feasible when networks when non-pairwise relationships are observed, and
the vertex set and the hyperedge set are from (or can be used for end-to-end training.
projected to) the same homogeneous domain, since only in
this case, the similarities between vertices and hyperedges As the only difference between hypergraph convolution and
are directly comparable. In practice, it depends on how the hypergraph attention is an optional attention module on the
hypergraph G is constructed. For example, (Huang et al., incidence matrix H, below we take hypergraph convolution
2010) apply hypergraph learning to image retrieval where as a representative to further analyze the properties of our
each vertex collects its k-nearest neighbors to form a hyper- methods. Note that the analyses also hold for hypergraph
edge, as also the way of constructing hypergraphs in our attention.
experiments. When the vertex set and the edge set are com- Relationship with Graph Convolution. We prove that
parable, we define the procedure of hypergraph attention graph convolution (Kipf & Welling, 2016) is a special case
inspired by (Velickovic et al., 2018). For a given vertex xi of hypergraph convolution mathematically.
and its associated hyperedge xj , the attentional score is
Let A ∈ RN ×N be the adjacency matrix used in graph
exp (σ(sim(xi P, xj P))) convolution. When each edge only links two vertices in a
Hij = P , (7)
k∈Ni exp (σ(sim(xi P, xk P))) hypergraph, the vertex degree matrix B = 2I. Assuming
equal weights for all the hyperedges (i.e., W = I), we have
where σ(·) is a non-linear activation function and sim(·) is
an interesting observation of hypergraph convolution. Based
a similarity function that computes the pairwise similarity
on Eq. (5), the definition of hypergraph convolution then
between two vertices. Ni is the neighborhood set of xi ,
becomes
which can be pre-accessed on some benchmarks, such as
the Cora and Citeseer datasets (Sen et al., 2008). 1
X(l+1) =σ( D−1/2 HHT D−1/2 X(l) P),
With the incidence matrix H enriched by an attention mod- 2
1 −1/2
ule, one can also follow Eq. (5) and Eq. (6) to learn the =σ D (A + D)D−1/2 X(l) P
intermediate embedding of vertices layer-by-layer. Note 2 (8)
that hypergraph attention also propagates gradients to H in 1
=σ (I + D−1/2 AD−1/2 )X(l) P
addition to X(l) and P. 2
In some applications, the vertex set and the hyperedge set =σ(ÂX(l) P),
are from two heterogeneous domains. For instance, (Zhou
et al., 2007) assume that attributes are hyperedges to connect where  = 1/2A e = I+D−1/2 AD−1/2 . As we can
e and A
Hypergraph Convolution and Hypergraph Attention
e −1/2 A
 = D eDe −1/2 . (9)
skip connections since the receptive field grows exponen-
In the specific case of hypergraph convolution, we are using tially with respect to the model depth. In the experiments,
a simplified solution, that is dividing A
e by 2.
we will verify the compatibility of the proposed operators
With GCN as a bridge and springboard to the family of with skip connections in model training.
graph neural networks, it then becomes feasible to build Multi-head. To stabilize the learning process and im-
connections with other frameworks, e.g., MoNet (Monti prove the representative power of networks, multi-head
et al., 2017), and develop the higher-order counterparts of (a.k.a. multi-branch) architecture is suggested in relevant
those variants to deal with non-pairwise relationships. works, e.g., (Xie et al., 2017; Vaswani et al., 2017; Velick-
Implementation in Practice. The implementation of hy- ovic et al., 2018). hypergraph convolution can be also ex-
pergraph convolution appears sophisticated as 6 matrices tended in that way, as
are multiplied for symmetric convolution (see Eq. (5)) and (l+1)
5 matrices are multiplied for asymmetric convolution (see Xk = HConv X(l) , Hk , Pk ,
Eq. (6)). However, it should be mentioned that D, W and K (12)
(l+1)
B are all diagonal matrices, which makes it possible to X(l+1) = Aggregate Xk ,
k=1
efficiently implement it in common used deep learning plat-
forms. where Aggregate(·) is a certain aggregation like concate-
nation or average pooling. Hk and Pk are the incidence
For asymmetric convolution, we have from Eq. (6) that matrix and weight matrix corresponding to the k-th head,
respectively. Note that only in hypergraph attention, Hk is
D−1 HWB−1 HT = (D−1 H)W(HB−1 )T , (10)
different over different heads.
where D−1 H and HB−1 perform L1 normalization of the
incidence matrix H over rows and columns, respectively. In 4. Experiments
space-saving applications where matrix-form variables are
allowed, normalization can be simply done using standard In this section, we evaluate the proposed hypergraph con-
built-in functions in public neural network packages. volution and hypergraph attention in the task of semi-
supervised node classification.
In case of space-consuming applications, one can readily
implement a sparse version of hypergraph convolution as 4.1. Experimental Setup
well. Since H is usually a sparse matrix, Eq. (10) does
not necessarily decrease the sparsity too much. Hence, we Datasets. We employ three citation network datasets, in-
can conduct normalization only on non-zero indices of H, cluding the Cora, Citeseer and Pubmed datasets (Sen et al.,
resulting in a sparse transition matrix. 2008), following previous representative works (Kipf &
Welling, 2016; Velickovic et al., 2018). Table 1 presents an
Symmetric hypergraph convolution defined in Eq. (5) can
overview of the statistics of the three datasets.
be implemented similarly, with a minor difference in nor-
malization using the vertex degree matrix D. The Cora dataset contains 2, 708 scientific publications di-
vided into 7 categories. There are 5, 429 edges in total, with
Skip Connection. Hypergraph convolution can be inte-
each edge being a citation link from one article to another.
grated with skip connection (He et al., 2016) as
Each publication is described by a binary bag-of-word repre-
(l+1)
sentation, where 0 (or 1) indicates the absence (or presence)
Xk = HConv X(l) , Hk , Pk + X(l) , (11)
of the corresponding word from the dictionary. The dictio-
nary consists of 1, 433 unique words.
where HConv(·) represents the hypergraph convolution op-
erator defined in Eq. (5) (or Eq. (6)). Some similar structures Like the Cora dataset, the Citeseer dataset contains 3, 327
(e.g., highway structure (Srivastava et al., 2015) adopted in scientific publications, divided into 6 categories and linked
Highway-GCN (Rahimi et al., 2018)) can be also applied. by 4, 732 edges. Each publication is described by a binary
bag-of-word representation of 3, 703 dimensions.
It has been demonstrated (Kipf & Welling, 2016) that deep
graph models cannot improve the performance even with The Pubmed dataset is comprised of 19, 717 scientific pub-
Hypergraph Convolution and Hypergraph Attention
One may be also interested in another question, i.e., does it Analysis of Hidden Representation. Table 4 presents the
bring performance improvements if using hypergraph convo- performance comparison between GCN* and hypergraph
lution (or attention) in conjunction with graph convolution convolution with an increasing length of the hidden repre-
(or attention)? We further investigate this by averaging the sentation. The number of heads is set to 1.
transition probability learned by non-pairwise models and It is easy to find that the performance keeps increasing with
pairwise models with equal weights, and report the results an increase of the length of the hidden representation, then
in Table 2. As it shows, a positive synergy is only observed peaks when the length is 16. Moreover, hypergraph con-
on the Cora dataset, where the best results of convolution volution consistently beats GCN* with a variety of feature
operator and attention operator are improved to 82.63 and dimensions. As the only difference between GCN* and
82.74, respectively. By contrast, our methods encounter a hypergraph convolution is the used graph structure, the per-
slight performance decrease on the Citeseer dataset. From formance gain purely comes from a more robust way of
another perspective, it also supports our above claim that establishing the relationships between objects. It firmly
different data may fit different structural graph models. demonstrates the ability of our methods in graph knowledge
Analysis of Skip Connection. We study the influence of embedding.
skip connection (He et al., 2016) by adding an identity
mapping in the first hypergraph convolution layer. We report 4.3. Comparison with State-of-the-art
in Table 3 two settings of the weight decay, i.e., λ=3e-4 We compare our method with the state-of-the-art algorithms,
(default setting) and λ=1e-3. which have followed the experimental setting in (Yang et al.,
As it shows, both GCN* and GAT* report lower recognition 2016) and reported classification accuracies on the Cora,
rates when integrated with skip connection compared with Citeseer and Pubmed datasets. Note that the results are
those reported in Table 2. In comparison, the proposed directly quoted from the original papers, instead of be-
non-pairwise models, especially hypergraph convolution, ing re-implemented in this work. Besides GCN and GAT,
the selected algorithms also include Manifold Regulariza-
Hypergraph Convolution and Hypergraph Attention
Table 5. Comparison with the state-of-the-art methods in terms of classification accuracy (%). The best and second best results are marked
in red and blue, respectively.
Method Cora dataset Citeseer dataset Pubmed dataset
Multilayer Perceptron 55.1 46.5 71.4
Manifold Regularization (Belkin et al., 2006) 59.5 60.1 70.7
Semi-supervised Embedding (Weston et al., 2012) 59.0 59.6 71.7
Label Propagation (Zhu et al., 2003) 68.0 45.3 63.0
DeepWalk (Perozzi et al., 2014) 67.2 43.2 65.3
Iterative Classification Algorithm (Lu & Getoor, 2003) 75.1 69.1 73.9
Planetoid (Yang et al., 2016) 75.7 64.7 77.2
Chebyshev (Defferrard et al., 2016) 81.2 69.8 74.4
Graph Convolutional Network (Kipf & Welling, 2016) 81.5 70.3 79.0
MoNet (Monti et al., 2017) 81.7 - 78.8
Variance Reduction (Chen et al., 2018b) 82.0 70.9 79.0
Graph Attention Network (Velickovic et al., 2018) 83.0 72.5 79.0
Ours 82.7 71.2 78.4
tion (Belkin et al., 2006) , Semi-supervised Embedding (We- & Welling, 2016) and graph attention network (Velickovic
ston et al., 2012), Label Propagation (Zhu et al., 2003), et al., 2018), are special cases of our methods. Hence, our
DeepWalk (Perozzi et al., 2014), Iterative Classification Al- proposed hypergraph convolution and hypergraph attention
gorithm (Lu & Getoor, 2003), Planetoid (Yang et al., 2016), are more flexible in dealing with arbitrary orders of relation-
Chebyshev (Defferrard et al., 2016), MoNet (Monti et al., ships and diverse applications, where both pairwise and non-
2017), and Variance Reduction (Chen et al., 2018b). pairwise formulations are likely to be involved. Thorough
experimental results with semi-supervised node classifica-
As presented in Table 5, our method achieves the second
tion demonstrate the efficacy of the proposed methods.
best performance on the Cora and Citeseer dataset, which
is slightly inferior to GAT (Velickovic et al., 2018) by 0.3 There are still some challenging directions that can be fur-
and 1.3, respectively. The performance gap is attributed ther investigated. Some of them are inherited from the
to multiple factors, such as the difference in deep learning limitation of graph neural networks, such as training sub-
platforms and better parameter tuning. As shown in Sec. 4.2, stantially deeper models with more than a hundred of lay-
thorough experimental comparisons under the same setting ers (He et al., 2016), handling dynamic structures (Zhang
have demonstrated the benefit of learning deep embeddings et al., 2018; Zhou et al., 2018), batch-wise model train-
using the proposed non-pairwise models. Nevertheless, we ing, etc. Meanwhile, some issues are directly related to the
emphasize again that pairwise and non-pairwise models proposed methods in high-order learning. For example, al-
have different application scenarios, and existing variants though hyperedges are equally weighted in our experiments,
of graph neural networks can be easily extended to their it is promising to exploit a proper weight mechanism when
non-pairwise counterparts with the proposed two operators. extra knowledge of data distributions is accessible, and even,
adopt a learnable module in a neural network then optimize
On the Pubmed dataset, hypergraph attention reports 78.4 in
the weight with gradient descent. The current implementa-
classification accuracy, better than 78.1 achieved by GAT*.
tion of hypergraph attention cannot be executed when the
As described in Sec. 4.1, the original implementation of
vertex set and the hyperedge set are from two heterogeneous
GAT adopts 8 output attention heads while only 1 is used
domains. One possible solution is to learn a joint embedding
in GAT* to ensure the consistency of model architectures.
to project the vertex features and edge features (Simonovsky
Even though, hypergraph attention also achieves a compara-
& Komodakis, 2017; Schlichtkrull et al., 2018) in a shared
ble performance with the state-of-the-art methods.
latent space, which requires further exploration.
5. Conclusion and Future Work Moreover, it is also interesting to plug hypergraph con-
volution and hypergraph attention into other variants of
In this work, we have contributed two end-to-end trainable graph neural network, e.g., MoNet (Monti et al., 2017),
operators to the family of graph neural networks, i.e., hy- GraphSAGE (Hamilton et al., 2017) and GCPN (You et al.,
pergraph convolution and hypergraph attention. While most 2018a), and apply them to other domain-specific applica-
variants of graph neural networks assume pairwise relation- tions, e.g., 3D shape analysis (Boscaini et al., 2016), vi-
ships of objects of interest, the proposed operators handle sual question answering (Narasimhan et al., 2018), chem-
non-pairwise relationships modeled in a high-order hyper- istry (Gilmer et al., 2017; Liu et al., 2018) and NP-hard
graph. We theoretically demonstrate that some recent rep- problems (Li et al., 2018).
resentative works, e.g., graph convolution network (Kipf
Hypergraph Convolution and Hypergraph Attention
References Dai, H., Li, H., Tian, T., Huang, X., Wang, L., Zhu, J., and
Song, L. Adversarial attack on graph structured data. In
Agarwal, S., Branson, K., and Belongie, S. Higher order
ICML, 2018.
learning with graphs. In ICML, pp. 17–24, 2006.
Defferrard, M., Bresson, X., and Vandergheynst, P. Con-
Atwood, J. and Towsley, D. Diffusion-convolutional neural
volutional neural networks on graphs with fast localized
networks. In NeurlPS, pp. 1993–2001, 2016.
spectral filtering. In NeurlPS, pp. 3844–3852, 2016.
Bahdanau, D., Cho, K., and Bengio, Y. Neural machine
Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bom-
translation by jointly learning to align and translate. In
barell, R., Hirzel, T., Aspuru-Guzik, A., and Adams, R. P.
ICLR, 2015.
Convolutional networks on graphs for learning molecular
Battaglia, P. W., Hamrick, J. B., Bapst, V., Sanchez- fingerprints. In NeurlPS, pp. 2224–2232, 2015.
Gonzalez, A., Zambaldi, V., Malinowski, M., Tacchetti, Feng, Y., You, H., Zhang, Z., Ji, R., and Gao, Y. Hypergraph
A., Raposo, D., Santoro, A., Faulkner, R., et al. Rela- neural networks. In AAAI, 2019.
tional inductive biases, deep learning, and graph networks.
arXiv preprint arXiv:1806.01261, 2018. Fout, A., Byrd, J., Shariat, B., and Ben-Hur, A. Protein
interface prediction using graph convolutional networks.
Belkin, M., Niyogi, P., and Sindhwani, V. Manifold regular- In NeurlPS, pp. 6530–6539, 2017.
ization: A geometric framework for learning from labeled
and unlabeled examples. Journal of machine learning Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and
research, 7:2399–2434, 2006. Dahl, G. E. Neural message passing for quantum chem-
istry. In ICML, 2017.
Berge, C. Graphs and hypergraphs. 1973.
Hamilton, W., Ying, Z., and Leskovec, J. Inductive rep-
Bojchevski, A., Shchur, O., Zügner, D., and Günnemann, S. resentation learning on large graphs. In NeurlPS, pp.
Netgan: Generating graphs via random walks. In ICML, 1024–1034, 2017.
2018.
He, K., Zhang, X., Ren, S., and Sun, J. Deep residual
Bolla, M. Spectra, euclidean representations and clusterings learning for image recognition. In CVPR, pp. 770–778,
of hypergraphs. Discrete Mathematics, 117(1-3):19–39, 2016.
1993.
Henaff, M., Bruna, J., and LeCun, Y. Deep convolu-
Boscaini, D., Masci, J., Rodolà, E., and Bronstein, M. Learn- tional networks on graph-structured data. arXiv preprint
ing shape correspondence with anisotropic convolutional arXiv:1506.05163, 2015.
neural networks. In NeurlPS, pp. 3189–3197, 2016.
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-r.,
Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., and Van- Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath,
dergheynst, P. Geometric deep learning: going beyond T. N., et al. Deep neural networks for acoustic modeling
euclidean data. IEEE Signal Processing Magazine, 34(4): in speech recognition: The shared views of four research
18–42, 2017. groups. IEEE Signal processing magazine, 29(6):82–97,
2012.
Bruna, J., Zaremba, W., Szlam, A., and LeCun, Y. Spec-
tral networks and locally connected networks on graphs. Huang, Y., Liu, Q., Zhang, S., and Metaxas, D. N. Image
arXiv preprint arXiv:1312.6203, 2013. retrieval via probabilistic hypergraph ranking. In CVPR,
pp. 3376–3383, 2010.
Chen, J., Ma, T., and Xiao, C. Fastgcn: fast learning with
graph convolutional networks via importance sampling. Kingma, D. P. and Ba, J. Adam: A method for stochastic
In ICLR, 2018a. optimization. In ICLR, 2015.
Chen, J., Zhu, J., and Song, L. Stochastic training of graph Kipf, T. N. and Welling, M. Semi-supervised classifica-
convolutional networks with variance reduction. In ICML, tion with graph convolutional networks. arXiv preprint
pp. 941–949, 2018b. arXiv:1609.02907, 2016.
Clevert, D.-A., Unterthiner, T., and Hochreiter, S. Fast Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet
and accurate deep network learning by exponential linear classification with deep convolutional neural networks.
units (elus). In ICLR, 2016. In NeurlPS, pp. 1097–1105, 2012.
Hypergraph Convolution and Hypergraph Attention
Lee, J. B., Rossi, R. A., Kim, S., Ahmed, N. K., and Koh, Rodriguez, J. A. On the laplacian spectrum and walk-regular
E. Attention models in graphs: A survey. arXiv preprint hypergraphs. Linear and Multilinear Algebra, 51(3):285–
arXiv:1807.07984, 2018. 297, 2003.
Li, P. and Milenkovic, O. Inhomogeneous hypergraph clus- Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and
tering with applications. In NeurlPS, pp. 2308–2318, Monfardini, G. The graph neural network model. IEEE
2017. Transactions on Neural Networks, 20(1):61–80, 2009.
Li, Y. and Gupta, A. Beyond grids: Learning graph represen- Schlichtkrull, M., Kipf, T. N., Bloem, P., van den Berg, R.,
tations for visual recognition. In NeurlPS, pp. 9245–9255, Titov, I., and Welling, M. Modeling relational data with
2018. graph convolutional networks. In European Semantic
Web Conference, pp. 593–607, 2018.
Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. Gated
graph sequence neural networks. In ICLR, 2016. Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B.,
Li, Z., Chen, Q., and Koltun, V. Combinatorial optimization and Eliassi-Rad, T. Collective classification in network
with graph convolutional networks and guided tree search. data. AI magazine, 29(3):93, 2008.
In NeurlPS, pp. 537–546, 2018.
Simonovsky, M. and Komodakis, N. Dynamic edgecondi-
Liu, Q., Allamanis, M., Brockschmidt, M., and Gaunt, A. tioned filters in convolutional neural networks on graphs.
Constrained graph variational autoencoders for molecule In CVPR, 2017.
design. In NeurlPS, pp. 7806–7815, 2018.
Srivastava, R. K., Greff, K., and Schmidhuber, J. Highway
Lu, Q. and Getoor, L. Link-based classification. In ICML, networks. arXiv preprint arXiv:1505.00387, 2015.
pp. 496–503, 2003.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
Maas, A. L., Hannun, A. Y., and Ng, A. Y. Rectifier non- L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. Attention
linearities improve neural network acoustic models. In is all you need. In NeurlPS, pp. 5998–6008, 2017.
ICML, 2013.
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio,
Masci, J., Boscaini, D., Bronstein, M., and Vandergheynst, P., and Bengio, Y. Graph attention networks. In ICLR,
P. Geodesic convolutional neural networks on riemannian 2018.
manifolds. In ICCVW, pp. 37–45, 2015.
Wang, X., Ye, Y., and Gupta, A. Zero-shot recognition via
Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., semantic embeddings and knowledge graphs. In CVPR,
and Bronstein, M. M. Geometric deep learning on graphs pp. 6857–6866, 2018.
and manifolds using mixture model cnns. In CVPR, 2017.
Weston, J., Ratle, F., Mobahi, H., and Collobert, R. Deep
Narasimhan, M., Lazebnik, S., and Schwing, A. Out of the
learning via semi-supervised embedding. In Neural Net-
box: Reasoning with graph convolution nets for factual
works: Tricks of the Trade, pp. 639–655. 2012.
visual question answering. In NeurlPS, pp. 2659–2670,
2018. Xie, S., Girshick, R., Dollár, P., Tu, Z., and He, K. Aggre-
Niepert, M., Ahmed, M., and Kutzkov, K. Learning convo- gated residual transformations for deep neural networks.
lutional neural networks for graphs. In ICML, pp. 2014– In CVPR, pp. 5987–5995, 2017.
2023, 2016.
Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K.-i.,
Perozzi, B., Al-Rfou, R., and Skiena, S. Deepwalk: Online and Jegelka, S. Representation learning on graphs with
learning of social representations. In KDD, pp. 701–710, jumping knowledge networks. In ICML, 2018.
2014.
Yang, Z., Cohen, W. W., and Salakhutdinov, R. Revisiting
Pham, T., Tran, T., Phung, D. Q., and Venkatesh, S. Column semi-supervised learning with graph embeddings. In
networks for collective classification. In AAAI, pp. 2485– ICML, 2016.
2491, 2017.
Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton,
Rahimi, A., Cohn, T., and Baldwin, T. Semi-supervised user W. L., and Leskovec, J. Graph convolutional neural net-
geolocation via graph convolutional networks. In ACL, works for web-scale recommender systems. In KDD,
2018. 2018a.
Hypergraph Convolution and Hypergraph Attention
Ying, Z., You, J., Morris, C., Ren, X., Hamilton, W., and
Leskovec, J. Hierarchical graph representation learning
with differentiable pooling. In NeurlPS, pp. 4805–4815,
2018b.
You, J., Liu, B., Ying, R., Pande, V., and Leskovec, J. Graph
convolutional policy network for goal-directed molecular
graph generation. In NeurlPS, 2018a.
You, J., Ying, R., Ren, X., Hamilton, W., and Leskovec,
J. Graphrnn: Generating realistic graphs with deep auto-
regressive models. In ICML, pp. 5694–5703, 2018b.
Zhang, C., Hu, S., Tang, Z. G., and Chan, T. Re-revisiting
learning on hypergraphs: confidence interval and subgra-
dient method. In ICML, pp. 4026–4034, 2017.
Zhang, M. and Chen, Y. Link prediction based on graph
neural networks. In NeurlPS, pp. 5171–5181, 2018.
Zhang, Z., Cui, P., and Zhu, W. Deep learning on graphs: A
survey. arXiv preprint arXiv:1812.04202, 2018.
Zhou, D., Huang, J., and Schölkopf, B. Learning with
hypergraphs: Clustering, classification, and embedding.
In NeurlPS, pp. 1601–1608, 2007.
Zhou, J., Cui, G., Zhang, Z., Yang, C., Liu, Z., and Sun,
M. Graph neural networks: A review of methods and
applications. arXiv preprint arXiv:1812.08434, 2018.
Zhu, X., Ghahramani, Z., and Lafferty, J. D. Semi-
supervised learning using gaussian fields and harmonic
functions. In ICML, pp. 912–919, 2003.