Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
2 Related Work
Figure 1: A syntax-tree and a DAG representation of the formula
Most machine learning enhanced large-theory ATP systems
∀A, B, C. r(A, B) ∧ p(A) ∨ ¬q(B, f (A)) ∨ q(C, f (A)) . Only extract symbol and structure-based features from their input
clause (sub)graphs are embedded in our system. formulas (e.g., the depth or symbol count of a clause) in addi-
tion to hand-designed features known to be relevant to proof
search (e.g., the age of a clause, referring to the time point
tics) formula characteristics like variable quantifications and
when it was generated). While symbol-based features for a
shared subexpressions are quite important, which are not cap-
formula are generally just the multiset of symbols found in
tured by the syntax tree. Observe, for instance, that the syntax
that formula, structure-based features vary in their design and
tree of the example contains several nodes for A, while, as
implementation. Earlier large-theory ATP systems like Fly-
mentioned above, all contexts A occurs in should influence
speck [Kaliszyk and Urban, 2012] and MaLARea [Urban et
its interpretation. A simple iterative approach based on oc-
al., 2008] derived their features from the multisets of terms
currences of A also does not fully overcome the issue, since
and sub-terms present in the formulas. These subsequently
a variable A may be interpreted differently in the context of
inspired the development of term walks (see Section 4) found
different quantifiers/formulas. For that reason, the most re-
in Mash [Kühlwein et al., 2013] and Enigma [Jakubuv and
cent works focus on graph structures [Crouse et al., 2019a;
Urban, 2017; Chvalovský et al., 2019]. In [Kaliszyk et al.,
Olšák et al., 2019; Paliwal et al., 2019], and apply graph neu-
2015], pattern-based features were introduced from discrimi-
ral networks (GNNs) for encoding. Figure 1 (right) depicts an
nation and substitution trees that could capture the notion of
exemplary graph representation with subexpression sharing.
a term matching, unifying, or generalizing another term. In a
While such more sophisticated embeddings seem to be rea-
similar vein, pattern-based features are also used in TRAIL,
sonable, they also incur an increase in runtime, which might
where the features extracted for a literal were argument-order
in turn result in an increased number of proof failures (given
preserving templates generated from the complete set of paths
a time constraint).
between the root and each leaf of the literal.
It is not clear how these different encodings compare di- Recently, deep learning methods have demonstrated via-
rectly and which kind of embedding is generally best, or if bility in the setting of real-time theorem proving, with the
there is such an embedding at all. The missing evidence is work of [Loos et al., 2017] being the first to show how a deep
mainly due to the fact that the evaluations of more advanced neural network could be incorporated into an ATP without in-
representations do not consider similarly involved embed- curring insurmountable computational overhead. Since then,
dings in the context of the same environment, but rather sim- there has been a flurry of activity [Chvalovský et al., 2019;
ple or very similar baselines (which are easier to implement). Bansal et al., 2019; Olšák et al., 2019; Crouse et al., 2019b]
In this paper, we conduct an experimental study about en- surrounding the applicability of neural networks in the ATP
codings of FOL formulas for automated theorem proving. domain. One of the main goals of these approaches is to also
The goal is to shed light on the differences between them learn the formula embeddings in a fully automated way.
and to evaluate the usefulness in the context of one exemplary The latest developments for representing logical formulas
neural ATP; we use the TRAIL system [Crouse et al., 2019b]. as vectors have revolved around graph neural networks [Wang
Our results may influence and guide future development in et al., 2017; Paliwal et al., 2019; Olšák et al., 2019]. These
general. In summary, our contributions are as follows: networks are appealing in the automated theorem proving do-
• We implemented term walks [Jakubuv and Urban, 2017] main because of the inherent graph structure of logical formu-
and the pattern-based encoding proposed in [Crouse et las and the potential for such neural representations to require
al., 2019b]; and several variants of graph neural net- less expert knowledge to achieve results than more traditional
works which, to the best of our knowledge, have not hand-crafted representations. Thus far, they have been ap-
been considered in this context before, and integrated plied in both offline tasks [Wang et al., 2017; Crouse et al.,
them into the TRAIL theorem prover. 2019a], and online theorem proving [Paliwal et al., 2019;
Olšák et al., 2019]. However, [Paliwal et al., 2019] focus on
• We evaluated the embeddings on the standard bench-
higher-order logic formulas where the corresponding graphs
marks Mizar [Grabowski et al., 2010] and TPTP [Sut-
are very different from those for FOL. The latter were only
cliffe, 2009].
considered very recently in [Olšák et al., 2019] for the first
• We show that there is no single best-performing encod- time. [Wang et al., 2017] focus on a graph representation that
ing, but that there are considerable differences in terms only slightly extends parse trees by shared constant and vari-
able names, while [Crouse et al., 2019a] extend them by spe- We evaluate term walks, which have been used in [Jakubuv
cial edges for quantification and subexpression sharing (see and Urban, 2017; Goertzel et al., 2018; Chvalovský et al.,
also Figure 1 (right))1 , and [Olšák et al., 2019] propose a 2019], and TRAIL patterns, as representatives for patterns that
special hypergraph representation. All the works use variants capture entire literals (vs. parts of fixed depth or length).
of message-massing neural networks [Gilmer et al., 2017] to
learn and finally obtain a single numerical vector representa- 4.1 Term Walks
tion for each formula. As outlined in the introduction, literals in the clauses are con-
As of yet, little is known about the trade-offs between sidered as trees where all variables and skolem terms are re-
each of the aforementioned vectorization strategies as they placed by a special symbol, respectively. Note that the latter
would be used in a neural-guided theorem prover. This is in helps reflecting structural similarities between literals that are
part due to some features not well lending themselves to the indicative of unifiability. Additionally, a root node is added
neural-guided theorem proving setting (e.g., features defined and labelled by either or ⊕, depending on whether the
for full terms would be far too sparse, which is why recent literal appears negated or not. Every directed node path of
approaches apply feature hashing [Chvalovský et al., 2019]). length 3 in these trees (oriented from the root) represents a
The work of [Kaliszyk et al., 2015] provides an extensive feature. For a negative literal such as ¬q(B, f (A)), we would
comparative analysis of various non-neural vectorization ap- thus count term walks ( , q, ∗), ( , q, f ), and (q, f, ∗). The
proaches, however, their evaluation focuses on sparse learners multiset of features for a clause consists of all features of its
and it evaluates features in the setting of offline premise se- literals; and the final embedding vector for the clause has the
lection (measuring both theorem prover performance and bi- same size as this set, every position is associated to one fea-
nary classification accuracy) rather than as part of the internal ture, and contains the multiplicities of the features at the cor-
guidance of an ATP system. responding positions.
Table 2: Average time spent per problem on Mizar for each phase. Minimal and maximal median numbers are in bold, best in completion
rate are in gray.