- Beagle 0.9.47
- CSE 1.6
- CSE_E 1.5
- cvc5 1.0
- cvc5 1.0.5
- Drodi 3.5.1
- Duper 1.0
- E 3.0
- E 3.1
- GKC 0.8
- iProver---3.8
- Lash 1.13
- LEO-II 1.7.0
- Leo-III 1.7.8
- Princess 230619
- Prover9 1109a
- Satallax 3.4
- Toma 0.4
- Twee 2.4.1
- Twee 2.4.2
- Vampire 4.7
- Vampire 4.8
- Zipperposition 2.1.999
- Zipperposition 2.1.9999

Data61, Australia

Beagle first converts the input formulas into clause normal form. Pure arithmetic (sub-)formulas are treated by eager application of quantifier elimination. The core reasoning component implements the Hierarchic Superposition Calculus with Weak Abstraction (HSPWA) [BW13]. Extensions are a splitting rule for clauses that can be divided into variable disjoint parts, and a partial instantiation rule for variables with finite domain, and two kinds of background-sorted variables trading off completeness vs. search space.

The HSPWA calculus generalizes the superposition calculus by integrating theory reasoning in a black-box style. For the theories mentioned above, Beagle combines quantifier elimination procedures and other solvers to dispatch proof obligations over these theories. The default solvers are an improved version of Cooper's algorithm for linear integer arithmetic, and the CVC4 SMT solver for linear real/rational arithmetic. Non-linear integer arithmetic is treated by partial instantiation and additional lemmas.

Beagle uses strategy scheduling by trying (at most) three flag settings sequentially.

https://bitbucket.org/peba123/beagle

JiangXi University of Science and Technology, China

- A multi-clause dynamic deduction algorithm based on full use of clauses is added by a different type of clause adequacy evaluation method.
- A multi-clause synergized deduction algorithm with full use of synergized clauses is added, which can make the substitution of candidate literals involved in deduction from simple to complex.

- Clause Activity and Complexity Strategy. A measure and calculation method of clause activity and complexity is added by analyzing the properties of the variable terms and the structure of function terms of clauses, which can evaluate clauses with different term structures.
- Literal Matching Degree Strategy. A literal measurement and calculation method based on term matching degree is added, which can evaluate the complexity of candidate literals with different term structures participating in multi-clause deduction.

**Acknowledgement:**
Development of CSE 1.6 has been partially supported by the General Research Project of Jiangxi
Education Department (Grant No. GJJ200818).

Southwest Jiaotong University, China

- Lemma filtering mainly based on deduction weight of binary clauses.
- Fine-grained dynamic time allocation scheme in different run stages.

**Acknowledgement:**
Development of CSE_E 1.5 has been partially supported by the National Natural Science Foundation
of China (NSFC) (Grant No. 61976130).
Stephan Schulz for his kind permission on using his E prover that makes CSE_E possible.

University of Iowa, USA

cvc5 has native support for problems in higher-order logic, as described in [BR+19]. It uses a pragmatic approach for HOL, where lambdas are eliminated eagerly via lambda lifting. The approach extends the theory solver for quantifier-free uninterpreted functions (UF) and E-matching. For the former, the theory solver for UF in cvc5 now handles equalities between functions using an extensionality inference. Partial applications of functions are handle using a (lazy) applicative encoding where some function applications are equated to the applicative encoding. For the latter, several of the data structures for E-matching have been modified to incorporate matching in the presence of equalities between functions, function variables, and partial function applications.

https://github.com/cvc5/cvc5

University of Iowa, USA

Like other SMT solvers, cvc5 treats quantified formulas using a two-tiered approach. First, quantified formulas are replaced by fresh Boolean predicates and the ground theory solver(s) are used in conjunction with the underlying SAT solver to determine satisfiability. If the problem is unsatisfiable at the ground level, then the solver answers "unsatisfiable". Otherwise, the quantifier instantiation module is invoked, and will either add instances of quantified formulas to the problem, answer "satisfiable", or return unknown. Finite model finding in cvc5 targets problems containing background theories whose quantification is limited to finite and uninterpreted sorts. In finite model finding mode, cvc5 uses a ground theory of finite cardinality constraints that minimizes the number of ground equivalence classes, as described in [RT+13]. When the problem is satisfiable at the ground level, a candidate model is constructed that contains complete interpretations for all predicate and function symbols. It then adds instances of quantified formulas that are in conflict with the candidate model, as described in [RT+13]. If no instances are added, it reports "satisfiable".

cvc5 has native support for problems in higher-order logic, as described in [BR+19]. It uses a pragmatic approach for HOL, where lambdas are eliminated eagerly via lambda lifting. The approach extends the theory solver for quantifier-free uninterpreted functions (UF) and E-matching. For the former, the theory solver for UF in cvc5 now handles equalities between functions using an extensionality inference. Partial applications of functions are handle using a (lazy) applicative encoding where some function applications are equated to the applicative encoding. For the latter, several of the data structures for E-matching have been modified to incorporate matching in the presence of equalities between functions, function variables, and partial function applications.

https://github.com/cvc5/cvc5

Amateur Programmer, Spain

The following features have been added since last CASC competition:

- Improved literal selection including lookahead as in [HR+16].
- SinE distance for clauses and symbols as in [Sud19].
- Layered clause selection as in [GS20].
- Stochastic strategy inspired in [Sud22].

- Otter, Discount and Limited Resource Strategy [RV03] saturation algorithms.
- A basic implementation of AVATAR architecture [Vor14].
- Several literal and term reduction orderings.
- Several literal selection options [HR+16].
- Several layered clause selection heuristics with adjustable selection ratios [GS20].
- Classical clause relevancy pruning.
- Some strategies are run a second time with a previously applied randomization to the problem [Sud22].
- Drodi can generate a learning file from successful proofs and use the file to guide clause selection strategy. It is based in the enhanced ENIGMA method. However, unlike ENIGMA, the learning file is completely general and can be used with any kind of problems. This generality allows the use of the same learning file in both FOF and UEQ CASC competition divisions. The learning file is generated over a set of TPTP problems before the CASC competition using built-in Drodi functions that include a L2 Support Vector Machine. Drodi integrated learning functions are a generalization of ENIGMA [JU17, JU18]. Literals polarity, equality, skolem and variable occurrences are stored in clause feature vectors. Unlike ENIGMA, instead of storing the specific functions and predicates themselves only the SinE distance and arity of functions and non equality predicates are stored in clause feature vectors with different features assigned to predicates and functions.

Carnegie Mellon University, USA

Duper's core architecture is based on saturation with a set of inference and simplification rules that operate on dependently-typed terms but primarily perform first-order and some higher-order reasoning. The calculus is closely based on Zipperposition's [BB+21] though it has been adapted to be sound in Lean's type theory. The unification procedure extends [VBN20] to dependent type theory, and remains complete for the higher order fragment of Lean. Duper uses streams to represent conclusions of inference rules and interleaves between unification and inference, as in [BB+21].

https://github.com/leanprover-community/duper

DHBW Stuttgart, Germany

For CASC-J11, E implements a multi-core strategy-scheduling automatic mode. The total CPU time available is broken into several (unequal) time slices. For each time slice, the problem is classified into one of several classes, based on a number of simple features (number of clauses, maximal symbol arity, presence of equality, presence of non-unit and non-Horn clauses, possibly presence of certain axiom patterns...). For each class, a schedule of strategies is greedily constructed from experimental data as follows: The first strategy assigned to a schedule is the the one that solves the most problems from this class in the first time slice. Each subsequent strategy is selected based on the number of solutions on problems not already solved by a preceding strategy.

About 140 different strategies have been thoroughly evaluated on all untyped first-order problems from TPTP 7.3.0. We have also explored some parts of the heuristic parameter space with a short time limit of 5 seconds. This allowed us to test about 650 strategies on all TPTP problems, and an extra 7000 strategies on UEQ problems from TPTP 7.2.0. About 100 of these strategies are used in the automatic mode, and about 450 are used in at least one schedule.

https://www.eprover.org

DHBW Stuttgart, Germany

For CASC-29, E implements a two-stage multi-core strategy-scheduling automatic mode. The total CPU time available is broken into several (unequal) time slices. For each time slice, the problem is classified into one of several classes, based on a number of simple features (number of clauses, maximal symbol arity, presence of equality, presence of non-unit and non-Horn clauses, possibly presence of certain axiom patterns...). For each class, a schedule of strategies is greedily constructed from experimental data as follows: The first strategy assigned to a schedule is the the one that solves the most problems from this class in the first time slice. Each subsequent strategy is selected based on the number of solutions on problems not already solved by a preceding strategy. The strategies are then scheduled onto the available cores and run in parallel.

About 140 different strategies have been thoroughly evaluated on all untyped first-order problems from TPTP 7.3.0. We have also explored some parts of the heuristic parameter space with a short time limit of 5 seconds. This allowed us to test about 650 strategies on all TPTP problems, and an extra 7000 strategies on UEQ problems from TPTP 7.2.0. About 100 of these strategies are used in the automatic mode, and about 450 are used in at least one schedule.

https://www.eprover.org

Tallinn University of Technology, Estonia

GKC is used as a foundation (GK Core) for building a common-sense reasoner GK. In particular, GK can handle inconsistencies and perform probabilistic and nonmonotonic reasoning [Tam21, Tam22].

The WASM version of the previous GKC 0.6 is used as the prover engine in the educational
`http://logictools.org` system.
It can read and output proofs in the TPTP, simplified TPTP and JSON format, the latter compatible
with JSON-LD
[TS21].

GKC only looks for proofs and does not try to show non-provability. These standard inference rules have been implemented in GKC:

- Binary resolution with optionally the set of support strategy, negative or positive ordered resolution or unit restriction.
- Hyperresolution.
- Factorization.
- Paramodulation and demodulation with the Knuth-Bendix ordering.

We perform the selection of a given clause by using several queues in order to spread the selection
relatively uniformly over these categories of derived clauses and their descendants: axioms,
external axioms, assumptions and goals.
The queues are organized in two layers.
As a first layer we use the common ratio-based algorithm of alternating between selecting
*n* clauses from a weight-ordered queue and one clause from the FIFO queue with the
derivation order.
As a second layer we use four separate queues based on the derivation history of a clause.
Each queue in the second layer contains the two sub-queues of the first layer.

https://github.com/tammet/gkc/

University of Manchester, United Kingdom

University of Innsbruck, Austria

University of Greifswald, Germany

Unfortunately the LEO-II system still uses only a very simple sequential collaboration model with first-order ATPs instead of using the more advanced, concurrent and resource-adaptive OANTS architecture [BS+08] as exploited by its predecessor LEO.

The LEO-II system is distributed under a BSD style license, and it is available from

http://www.leoprover.org

University of Greifswald, Germany

Leo-III cooperates with external first-order ATPs that are called asynchronously during proof search; a focus is on cooperation with systems that support typed first-order (TFF) input. For this year's CASC, CVC4 [BC+11] and E [Sch02, Sch13] are used as external systems. However, cooperation is in general not limited to first-order systems. Further TPTP/TSTP-compliant external systems (such as higher-order ATPs or counter model generators) may be included using simple command-line arguments. If the saturation procedure loop (or one of the external provers) finds a proof, the system stops, generates the proof certificate and returns the result.

https://tptp.org/NonClassicalLogic/

The term data structure of Leo-III uses a polymorphically typed spine term representation augmented with explicit substitutions and De Bruijn-indices. Furthermore, terms are perfectly shared during proof search, permitting constant-time equality checks between alpha-equivalent terms.

Leo-III's saturation procedure may at any point invoke external reasoning tools. To that end, Leo-III includes an encoding module which translates (polymorphic) higher-order clauses to polymorphic and monomorphic typed first-order clauses, whichever is supported by the external system. While LEO-II relied on cooperation with untyped first-order provers, Leo-III exploits the native type support in first-order provers (TFF logic) for removing clutter during translation and, in turn, higher effectivity of external cooperation.

Leo-III is available on GitHub:

https://github.com/leoprover/Leo-III

University of Regensburg, Germany

The list of contributors is available on
`https://github.com/uuverifiers/princess/blob/master/AUTHORS`.

The following options are used in the competition:

-portfolio=casc +threads +printProof -inputFormat=tptp

The version 230619 submitted to CASC-29 is the standard version of Princess. This is in contrast to CASC-26 (2017), when Princess was participating the last time, when a version tailor-made for CASC was used.

Sources and binaries are available from

https://github.com/uuverifiers/princess

University of New Mexico, USA

Prover9 has available positive ordered (and nonordered) resolution and paramodulation, negative ordered (and nonordered) resolution, factoring, positive and negative hyperresolution, UR-resolution, and demodulation (term rewriting). Terms can be ordered with LPO, RPO, or KBO. Selection of the "given clause" is by an age-weight ratio.

Proofs can be given at two levels of detail: (1) standard, in which each line of the proof is a stored clause with detailed justification, and (2) expanded, with a separate line for each operation. When FOF problems are input, proof of transformation to clauses is not given.

Completeness is not guaranteed, so termination does not indicate satisfiability.

Given a problem, Prover9 adjusts its inference rules and strategy according to syntactic properties of the input clauses such as the presence of equality and non-Horn clauses. Prover9 also does some preprocessing, for example, to eliminate predicates.

For CASC Prover9 uses KBO to order terms for demodulation and for the inference rules, with a simple rule for determining symbol precedence.

For the FOF problems, a preprocessing step attempts to reduce the problem to independent subproblems by a miniscope transformation; if the problem reduction succeeds, each subproblem is clausified and given to the ordinary search procedure; if the problem reduction fails, the original problem is clausified and given to the search procedure.

http://www.cs.unm.edu/~mccune/prover9/

Universität Innsbruck, Austria

Proof search: A branch is formed from the axioms of the problem and the negation of the conjecture (if any is given). From this point on, Satallax tries to determine unsatisfiability or satisfiability of this branch. Satallax progressively generates higher-order formulae and corresponding propositional clauses [Bro13]. These formulae and propositional clauses correspond to instances of the tableau rules. Satallax uses the SAT solver MiniSat to test the current set of propositional clauses for unsatisfiability. If the clauses are unsatisfiable, then the original branch is unsatisfiable. Optionally, Satallax generates lambda-free higher-order logic (lfHOL) formulae in addition to the propositional clauses [VB+19]. If this option is used, then Satallax periodically calls the theorem prover E [Sch13] to test for lfHOL unsatisfiability. If the set of lfHOL formulae is unsatisfiable, then the original branch is unsatisfiable. Upon request, Satallax attempts to reconstruct a proof which can be output in the TSTP format.

http://cl-informatik.uibk.ac.at/~mfaerber/satallax.html

Japan Advanced Institute of Science and Technology, Japan

https://www.jaist.ac.jp/project/maxcomp/

Chalmers University of Technology, Sweden

Twee features ground joinability testing [MN90] and a connectedness test [BD88], which together eliminate many redundant inferences in the presence of unorientable equations. The ground joinability test performs case splits on the order of variables, in the style of [MN90], and discharges individual cases by rewriting modulo a variable ordering.

- Select and normalise the lowest-scored critical pair, and if it is not redundant, add it as a rewrite rule to the active set.
- Normalise the active rules with respect to each other.
- Normalise the goal with respect to the active rules.

For CASC, to take advantage of multiple cores, several versions of Twee run in parallel using different parameters (e.g., with the goal-directed transformation on or off).

Twee uses an LCF-style kernel: all rules in the active set come with a certified proof object which traces back to the input axioms. When a conjecture is proved, the proof object is transformed into a human-readable proof. Proof construction does not harm efficiency because the proof kernel is invoked only when a new rule is accepted. In particular, reasoning about the passive set does not invoke the kernel. The translation from Horn clauses to equations is not yet certified.

Twee can be downloaded as open source from:

http://nick8325.github.io/twee

Chalmers University of Technology, Sweden

Twee features ground joinability testing [MN90] and a connectedness test [BD88], which together eliminate many redundant inferences in the presence of unorientable equations. The ground joinability test performs case splits on the order of variables, in the style of [MN90], and discharges individual cases by rewriting modulo a variable ordering.

- Select and normalise the lowest-scored critical pair, and if it is not redundant, add it as a rewrite rule to the active set.
- Normalise the active rules with respect to each other.
- Normalise the goal with respect to the active rules.

Each critical pair is scored using a weighted sum of the weight of both of its terms. Terms are treated as DAGs when computing weights, i.e., duplicate subterms are counted only once per term.

For CASC, to take advantage of multiple cores, several versions of Twee run in parallel using different parameters (e.g., with the goal-directed transformation on or off).

The passive set is represented compactly (12 bytes per critical pair) by storing only the information needed to reconstruct the critical pair, not the critical pair itself. Because of this, Twee can run for an hour or more without exhausting memory.

Twee uses an LCF-style kernel: all rules in the active set come with a certified proof object which traces back to the input axioms. When a conjecture is proved, the proof object is transformed into a human-readable proof. Proof construction does not harm efficiency because the proof kernel is invoked only when a new rule is accepted. In particular, reasoning about the passive set does not invoke the kernel.

Twee can be downloaded as open source from:

https://nick8325.github.io/twee

University of Manchester, United Kingdom

There are only small changes between Vampire 4.7 and Vampire 4.6 in the tracks relevant to CASC. As TFA did not run in 2021, the updates related to the paper "Making Theory Reasoning Simpler" [RSV21] that were present last year should have an impact this year. This work introduces a new set of rules for the evaluation and simplification of theory literals. We have also added some optional preprocessing steps inspired by Twee (see "Twee: An Equational Theorem Prover" [Sma21]) but these have not been fully incorporated into our strategy portfolio so are unlikely to make a significant impact.

- Choices of saturation algorithm:
- Limited Resource Strategy [RV03]
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference

- Splitting via AVATAR [Vor14]
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes of comparing literals [HR+16].
- Age-weight ratio that specifies how strongly lighter clauses are preferred for inference selection. This has been extended with a layered clause selection approach [GS20].
- Set-of-support strategy with extensions for theory reasoning.
- For theory-reasoning:
- Ground equational reasoning via congruence closure.
- Addition of theory axioms and evaluation of interpreted functions [RSV21].
- Use of Z3 with AVATAR to restrict search to ground-theory-consistent splitting branches [RB+16].
- Specialised theory instantiation and unification [RSV18].
- Extensionality resolution with detection of extensionality axioms

- For higher-order problems:
- Translation to polymorphic first-order logic using applicative form and combinators
- A superposition calculus [BR20] utilising a KBO-like ordering [BR20] for orienting combinator equations. The calculus introduces an inference, narrow, for rewriting with combinator equations.
- Proof search heuristics targeting the growth of clauses resulting from narrowing.
- An extension of unification with abstraction to deal with functional and boolean extensionality.
- Various inferences to deal with booleans

https://vprover.github.io/

TU Wien, Austria

There have been a number of changes and improvements since Vampire 4.7, although it is still the same beast. Most significant from a competition point of view are long-awaited refreshed strategy schedules. As a result, several features present in previous competitions will now come into full force, including new rules for the evaluation and simplification of theory literals. A large number of completely new features and improvements also landed this year: highlights include a significant refactoring of the substitution tree implementation, the arrival of encompassment demodulation to Vampire, and support for parametric datatypes.

Vampire's higher-order support has also been re-implemented from the ground up. The new implementation is still at an early stage and its theoretical underpinnings are being developed. There is currently no documentation of either.

- Choices of saturation algorithm:
- Limited Resource Strategy [RV03]
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference

- Splitting via AVATAR [Vor14]
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes of comparing literals [HR+16].
- Age-weight ratio that specifies how strongly lighter clauses are preferred for inference selection. This has been extended with a layered clause selection approach [GS20].
- Set-of-support strategy with extensions for theory reasoning.
- For theory-reasoning:
- Ground equational reasoning via congruence closure.
- Addition of theory axioms and evaluation of interpreted functions [RSV21].
- Use of Z3 with AVATAR to restrict search to ground-theory-consistent splitting branches [RB+16].
- Specialised theory instantiation and unification [RSV18].
- Extensionality resolution with detection of extensionality axioms

Vrije Universiteit Amsterdam, The Netherlands

Zipperposition's code can be found at

https://github.com/sneeuwballen/zipperpositionand is entirely free software (BSD-licensed).

Zipperposition can also output graphic proofs using graphviz. Some tools to perform type inference and clausification for typed formulas are also provided, as well as a separate library for dealing with terms and formulas [Cru15].

Ludwig-Maximilians-Universität München, Germany

Zipperposition's code can be found at

https://github.com/sneeuwballen/zipperpositionand is entirely free software (BSD-licensed).

Zipperposition can also output graphic proofs using graphviz. Some tools to perform type inference and clausification for typed formulas are also provided, as well as a separate library for dealing with terms and formulas [Cru15].