- Darwin CASC-J2
- DCTP 1.3-EPR (CASC-19 EPR Division winner)
- DCTP 1.31 and 10.21p
- Dilemma 0.1
- E and EP 0.82
- E-SETHEO csp04
- Gandalf c-2.6-SAT (CASC-19 SAT Division, Assurance class, winner)
- Mace2 2.2
- Mace4 2004-D
- Octopus 2004
- Otter 3.3
- Paradox 1.0 (CASC-19 SAT Division, Model class, winner)
- Paradox 1.1-casc
- SOS 1.0
- THEO J2004
- Vampire 5.0 (CASC-19 FOF Division winner)
- Vampire 6.0 (CASC-19 MIX Division winner)
- Vampire 7.0
- Waldmeister 702 (CASC-19 UEQ Division winner)
- Waldmeister 704

Darwin [BFT04] is an automated theorem prover for first order clausal logic. It is the first implementation of the Model Evolution Calculus [BT03]. The Model Evolution Calculus lifts the propositional DPLL procedure to first-order logic. One of the main motivations for this approach was the possibility of migrating to the first-order level some of those very effective search techniques developed by the SAT community for the DPLL procedure.

The current version of Darwin implements first-order versions of unit propagation inference rules analogously to a restricted form of unit resolution and subsumption by unit clauses. To retain completeness, it includes a first-order version of the (binary) propositional splitting inference rule.

Proof search in Darwin starts with a default interpretation for a given clause set, which is evolved towards a model or until a refutation is found.

The central data structure is the *context*. A context
represents an interpretation as a set of first-order literals.
The context is grown by using instances of literals from the input clauses.
The implementation of Darwin is intended to support basic operations
on contexts in an efficient way. This involves the handling of large
sets of candidate literals for modifying the current context. The
candidate literals are computed via simultaneous unification between
given clauses and context literals. This process is sped up by
storing partial unifiers for each given clause and merging them for
different combinations of context literals, instead of redoing the
whole unifier computations. For efficient filtering of unneeded
candidates against context literals, discrimination tree or substitution
tree indexing is employed.
The splitting rule generates choice points in the derivation which are
backtracked using a form of backjumping similar to the one used in
DPLL-based SAT solvers.
A generalized version of Backjumping
called Dynamic Backtracking is implemented by representing the
derivation tree by a graph of states and their dependencies.

Darwin is implemented in OCaml and has been tested under Linux. It is available from:

http://www.mpi-sb.mpg.de/~baumgart/DARWIN/

Darwin traverses the search space by iterative deepening over the term depth of candidate literals. Darwin employs a uniform search strategy for all problem classes.

Darwin is a first prototype implementation for the Model Evolution calculus. We expect its performance to be strong in the EPR division; we anticipate performance below average in the MIX division, and weak performance in the SAT division.

Max-Planck-Institut für Informatik, Germany

DCTP 1.3 [Ste02a] is an automated theorem
prover for first order clause logic.
It is an implementation of the disconnection calculus described in
[Bil96,LS01,Ste02b].
The disconnection calculus is a proof confluent and inherently
cut-free tableau calculus with a weak connectedness condition.
The inherently depth-first proof search is guided by a literal selection
based on literal instantiatedness or literal complexity and a heavily
parameterised link selection.
The pruning mechanisms mostly rely on different forms of *variant
deletion* and *unit based strategies*.
Additionally the calculus has been augmented by full tableau pruning.

The new DCTP 1.3 has been enhanced with respect to clause preprocessing, selection functions and closure heuristics. Most prominent among the improvements is the introduction of a unification index for finding connections, which also replaces the connection graph hitherto used.

DCTP 1.3 has been implemented as a monolithic system in the Bigloo dialect of the Scheme language. The most important data structures are perfect discrimination trees, which are used in many variations. It runs under Solaris and Linux.

DCTP 1.3 is a single strategy prover.

DCTP 1.3-EPR is the CASC-19 EPR division winner.

Technische Universität München, Germany

DCTP 1.31 [Ste02a] is an automated theorem prover
for first order clause logic.
It is an implementation of the disconnection calculus described in
[Bil96,LS01,
Ste02b].
The disconnection calculus is a proof confluent and inherently
cut-free tableau calculus with a weak connectedness condition.
The inherently depth-first proof search is guided by a literal selection
based on literal instantiatedness or literal complexity and a heavily
parameterised link selection.
The pruning mechanisms mostly rely on different forms of
*variant deletion* and *unit based strategies*.
Additionally the calculus has been augmented by full tableau pruning.

DCTP 10.21p is a strategy parallel version using the technology of E-SETHEO [SW99] to combine several different strategies based on DCTP 1.31.

DCTP 1.31 has been implemented as a monolithic system in the Bigloo dialect of the Scheme language. The most important data structures are perfect discrimination trees, which are used in many variations. DCTP 10.21p has been derived of the Perl implementation of E-SETHEO and includes DCTP 1.31 as well as the E prover as its CNF converter. Both versions run under Solaris and Linux.

We are currently integrating a range of new techniques into DCTP which are mostly based on the results described in [LS04], as well as a certain form of unit propagation. We are hopeful that these improvements will be ready in time for CASC-J2.

DCTP 1.31 is a single strategy prover. Individual strategies are started by DCTP 10.21p using the schedule based resource allocation scheme known from the E-SETHEO system. Of course, different schedules have been precomputed for the syntactic problem classes. The problem classes are more or less identical with the sub-classes of the competition organisers. Again, we have no idea whether or not this conflicts with the organisers' tuning restrictions.

We expect both DCTP 1.31 and DCTP 10.21p to perform reasonably well, in particular in the EPR (in any case) and SAT (depending on the selection of problems for the competition) categories.

Chalmers University of Technology, Sweden

Dilemma 0.1 is a theorem prover based on an extension of Stålmarck's
method to first order classical logic
[Bjö03a, Bjö03b,
Bjö04].
This calculus is proof confluent and does not use backtracking.
It incorporates all non-branching rules of both KE and KI, and also
the dilemma rule, that is a branch-and-merge rule. It works like the
analytic cut rule, except that the branches can be recombined to one
branch again, keeping the intersection of all new consequences.
Dilemma formulas may contain free variables (called *rigid*),
that may not be substituted in the branches, but are turned into
universal variables after a merge. Rigid variables are not instantiated
destructively, but possible instantiations are detected and stored
for later, when they will be used to find more specialized dilemma
formulas.

The system is currently being developed in C++, and if a working prototype exists at the time of the CASC-J2 then it will enter the demonstration division.

The prototype will have no special strategies.

The prototype will probably not be very competitive. We enter mostly for the possibility of getting feedback.

Technische Universität München, Germany, and ITC/irst, Italy

E 0.82 [Sch02, Sch04a] is a purely equational theorem prover. The core proof procedure operates on formulas in clause normal form, using a calculus that combines superposition (with selection of negative literals) and rewriting. No special rules for non-equational literals have been implemented, i.e., resolution is simulated via paramodulation and equality resolution. The basic calculus is extended with rules for AC redundancy elemination, some contextual simplification, and pseudo-splitting.

E is based on the DISCOUNT-loop variant of the *given-clause*
algorithm, i.e. a strict separation of active and passive facts.
Proof search in E is primarily controlled by a literal selection
strategy, a clause evaluation heuristic, and a simplification
ordering.
The prover supports a large number of preprogrammed literal
selection strategies, many of which are only experimental.
Clause evaluation heuristics can be constructed on the fly by combining
various parameterized primitive evaluation functions, or can be
selected from a set of predefined heuristics.
Supported term orderings are several parameterized instances of
Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO).

E uses a preprocessing step to convert formulas in full first order format to clause normal form. Preprocessing also unfolds equational definitions and performs some simplifications on the clause level.

The automatic mode can selects literal selection strategy, term ordering, and search heuristic based on simple problem characteristics of the preprocessed clausal problem.

EP 0.82 is just a combination of E 0.82 in verbose mode and a proof analysis tool extracting the used inference steps.

E is implemented in ANSI C, using the GNU C compiler.
The most outstanding feature is the global sharing of rewrite steps.
Current versions of E add *rewrite links* from rewritten to new terms.
In effect, E is caching rewrite operations as long as sufficient memory
is available.
Other important features are the use of *perfect discrimination trees*
with age and size constraints for rewriting and unit-subsumption,
*feature vector indexing*[Sch04b]
for forward and backward subsumption and contextual literal cutting,
and a new polynomial implementation of LPO
[Loe04b].

The program has been successfully installed under SunOS 4.3.x, Solaris 2.x, HP-UX B 10.20, MacOS-X, and various versions of Linux. Sources of the latest released version are available freely from:

http://www.eprover.org

EP 0.82 is a simple Bourne shell script calling E and the postprocessor in a pipeline.

E's automatic mode is optimized for performance on TPTP 2.6.0. The optimization is based on about 90 test runs over the library (and previous experience) and consists of the selection of one of about 50 different strategies for each problem. All test runs have been performed on SUN Ultra 60/300 machines with a time limit of 300 seconds (or roughly equivalent configurations). All individual strategies are general purpose, the worst one solves about 49% of TPTP 2.6.0, the best one about 60%.

E distinguishes problem classes based on a number of features, all of which have between two and 4 possible values. The most important ones are:

- Is the most general non-negative clause unit, Horn, or Non-Horn?
- Is the most general negative clauce unit or non-unit?
- Are all negative clauses unit clauses?
- Are all literals equality literals, are some literas equality literals, or is the problem non-equational?
- Are there a few, some, or many clauses in the problem?
- Is the maximum arity of any function symbol 0, 1, 2, or greater?
- Is the sum of function symbol arities in the signature small, medium, or large?

Wherever there is a three-way split on a numerical feature value, the limits are selected automatically with the aim of splitting the set of problems into approximately equal sized parts based on this one feature.

For classes above a threshold size, we assign the absolute best heuristic to the class. For smaller, non-empty classes, we assign the globally best heuristic that solves the same number of problems on this class as the best heuristic on this class does. Empty classes are assigned the globally best heuristic. Typically, most selected heuristics are assigned to more than one class.

In the last year, E performed well in the MIX category of CASC and came in third in the UEQ division. We believe that E will again be among the strongest provers in the MIX category, in particular due to its good performance for Horn problems. In UEQ, E will probably be beaten only by Waldmeister. We cannot predict performance on FOF problems yet, but hope that E will be competitive.

EP 0.82 will be hampered by the fact that it has to analyse the
inference step listing, an operation that typically is about as
expensive as the proof search itself. Nevertheless, it should be
competitive among the MIX^{*} and FOF^{*} systems.

Technische Universität München, Germany

E-SETHEO is a compositional theorem prover for formulae in first order clause logic, combining the systems E [Sch01] and SETHEO [MI+97]. It incorporates different calculi and proof procedures like superposition, model elimination and semantic trees (the DPLL procedure). Furthermore, the system contains transformation techniques which may split the formula into independent subparts or which may perform a ground instantiation. Finally, advanced methods for controlling and optimizing the combination of the subsystems are applied. The first-order variant of E-SETHEO no longer uses Flotter [WGR96] as a preprocessing module for transforming non-clausal formulae to clausal form. Instead, a more primitive normal form transformation is employed.

Since version 99csp of E-SETHEO, the different strategies are run sequentially, one after the other. E-SETHEO csp04 incorporates the new version of the disconnection prover DCTP [Ste02a] with modified unification and a form of unit propagation as well as a new version of the E prover [Sch01]. The new E prover also replaces TPTP2X as the CNF conversion tool. The new Scheme version of SETHEO that is in use features local unit failure caching [LS01] and lazy root paramodulation, an optimisation of lazy paramodulation which is complete in the Horn case [LS02]. Other than that (and a new resource distribution scheme), E-SETHEO csp04 is identical to E-SETHEO csp03.

According to the diversity of the contained systems, the modules of E-SETHEO are implemented in different programming languages like C, Scheme, and Shell tools. Due to the replacement of TPTP2X, E-SETHEO is finally Prolog-free.

The program runs under Solaris and Linux. Sources are available from the authors.

Individual strategies are started be E-SETHEO depending on the allocation
of resources to the different strategies, so-called *schedules*,
which have been computed from experimental data using machine learning
techniques as described in [SW99].
Schedule selection depends on syntactic characteristics of the input
formula such as the Horn-ness of formulae, whether a problem contains
equality literals or whether the formula is in the Bernays-Schönfinkel
class.
The problem classes are more or less identical with the sub-classes of the
competition organisers.
We have no idea whether or not this conflicts with the organisers'
tuning restrictions.

We expect E-SETHEO to perform well in all categories it participates in.

Tallinn Technical University, Estonia

Gandalf [Tam97,Tam98] is a family of automated theorem provers, including classical, type theory, intuitionistic and linear logic provers, plus finite a model builder. The version c-2.6 contains the classical logic prover for clause form input and the finite model builder. One distinguishing feature of Gandalf is that it contains a large number of different search strategies and is capable of automatically selecting suitable strategies and experimenting with these strategies.

The finite model building component of Gandalf uses the Zchaff propositional logic solver by L.Zhang [MM+01] as an external program called by Gandalf. Zchaff is not free, although it can be used freely for research purposes. Gandalf is not optimised for Zchaff or linked together with it: Zchaff can be freely replaced by other satisfiability checkers.

Gandalf is implemented in Scheme and compiled to C using the Hobbit Scheme-to-C compiler. Version scm5d6 of the Scheme interpreter scm by A.Jaffer is used as the underlying Scheme system. Zchaff is implemented in C++.

Gandalf has been tested on Linux, Solaris, and MS Windows under Cygwin.

Gandalf is available under GPL from:

http://www.ttu.ee/it/gandalf

One of the basic ideas used in Gandalf is time-slicing: Gandalf typically runs a number of searches with different strategies one after another, until either the proof is found or time runs out. Also, during each specific run Gandalf typically modifies its strategy as the time limit for this run starts coming closer. Selected clauses from unsuccessful runs are sometimes used in later runs.

In the normal mode Gandalf attempts to find only unsatisfiability.
It has to be called with a `-sat` flag to find satisfiability.
The following strategies are run:

- Finite model building by incremental search through function symbol interpretations.
- Ordered binary resolution (term depth): only for problems not containing equality.
- Finite model building using MACE-style flattening and the external propositional prover.

Gandalf c-2.6-SAT is the CASC-19 SAT division, Assurance class, winner.

Argonne National Laboratory, USA

Mace2 [McC01] searches for finite models of first-order (including equality) statements. Mace2 iterates through domain sizes, starting with 2. For a given domain size, a propositional satisfiability problem is constucted from the ground instances of the statements, and a DPLL procedure is applied.

Mace2 is an entirely different program from Mace4 [McC03b], in which the ground problem for a given domain size contains equality and is decided by rewriting.

Mace2 is coded in ANSI C. It uses the same code as Otter [McC03a] for parsing input, and (for the most part) accepts the same intput files as Otter. Mace2 is packaged and distributed with Otter, which is available at the following URL:

http://www.mcs.anl.gov/AR/otter

Mace2 has been evolving slowly for about ten years. Two important strategies have been added recently. In 2001, a method to reduce the number of isomorphic models was added; this method is similar in spirit to the least number optimization used in rewrite-based methods, but it applies only to the first five constants. In 2003, a clause-parting method (based on the variable occurrences in literals of flattened clauses) was added to improve performace on inputs with large clauses. Although Mace2 has several experimental features, it uses one fixed stragegy for CASC.

Mace2 is not expected to win any prizes, because it uses one fixed strategy, and no tuning has been done with respect to the TPTP problem library. Also, Mace2 does not accept function symbols with arity greater than 3 or predicate symbols with arity greater than 4. Overall performace, however, should be respectable.

Argonne National Laboratory, USA

Mace4 [McC03b] searches for finite models of first-order (unsorted, with equality) statements. Given input clauses, it generates ground instances over a finite domain, then it uses a decision procedure based on rewriting try to determine satisfiability. If there is no model of that size, it increments the domain size and tries again. Input clauses are not "flattened" as they are in procedures that reduce the problem to propositional satisfiability without equality, for example, Mace2 [McC01].

Mace4 is an entirely different program from Mace2, in which the problem for a given domain size is reduced to a purely propositional SAT problem that is decided by DPLL.

Mace4 is coded in ANSI C and is available at the following URL:

http://www.mcs.anl.gov/AR/mace4

The two main parts of the Mace4 method are (1) selecting the next empty cell in the tables of functions being constructed and deciding which values need to be considered for that cell, and (2) propagating assignments. Mace4 uses the basic least number heuristic (LNH) to reduce isomorphism. The LNH was introduced in Falcon [Zha96] and is also used in SEM. Effective use of the LNH requires careful cell selection. Propagation is by ground rewriting and inference rules to derive negated equalities.

Mace4 is not expected to win any prizes, because it uses one fixed strategy, and no tuning has been done with respect to the TPTP problem library. Overall performace, however, should be respectable. An early version of Mace4 competed in CASC-2002 under then name ICGNS.

McGill University

Octopus is a parallel ATP system. It is an improved version of the single-processor ATP system THEO [New01]. Inference rules used by Octopus include binary resolution, binary factoring, instantiation, demodulation, and hash table resolutions. Octopus performs 5000-20000 inferences/second on each processor.

Octopus is implemented in C and currently runs under Linux and FREEBSD. In the competition, it will run on a network of 150-200 PCs housed in various laboratories at McGill University's School of Computer Science. The processors communicate using PVM [GB+94].

Octopus begins by determining a number of weakened versions of the given theorem, and then assigns one such version to each computer. Each computer then attempts to prove the weakened version of the theorem assigned to it. If successful, the computer then uses the proof found to weakened theorem to help prove the given theorem. In essence, Octopus combines learning and parallel theorem proving.

In the current version of Octopus, a weakened version of a theorem consists of the same clauses of the given theorem except for one. In that one clause, a constant or function is replaced by a variable that doesn't appear elsewhere in the clause. If a proof exists to the given theorem, a proof exists to the weakened version, and often, though far from always, the proof of the weakened version is easier to find.

When a proof is found to a weakened version, certain clauses in the proof are added to the base clauses of the given theorem. In addition, base clauses that participated in the proof of the weakened version are placed in the set-of-support. Octopus then tries to prove the given theorem with the augmented set of base clauses.

Each processor in the system uses different values for the maximum number of literals and terms in inferences generated when looking for a proof. Thus while two computers may try to solve the same weakened version of the given theorem, they do it with different values for the maximum number of literals and terms in derived inferences.

Octopus is only marginally better than it was last year, although it will run on about 4 times as many computers.

Argonne National Laboratory, USA

Otter 3.3 [McC03a] is an ATP system for statements in first-order (unsorted) logic with equality. Otter is based on resolution and paramodulation applied to clauses. An Otter search uses the "given clause algorithm", and typically involves a large database of clauses; subsumption and demodulation play an important role.

Otter is written in C. Otter uses shared data structures for clauses and terms, and it uses indexing for resolution, paramodulation, forward and backward subsumption, forward and backward demodulation, and unit conflict. Otter is available from:

http://www-unix.mcs.anl.gov/AR/otter/

Otter's original automatic mode, which reflects no tuning to the TPTP problems, will be used.

Otter has been entered into CASC-J2 as a stable benchmark against which progress can be judged (there have been only minor changes to Otter since 1996 [MW97], nothing that really affects its performace in CASC). This is not an ordinary entry, and we do not hope for Otter to do well in the competition.

*Acknowledgments: Ross Overbeek, Larry Wos, Bob Veroff, and
Rusty Lusk contributed to the development of Otter.*

Chalmers University of Technology and Gothenburg University, Sweden

Paradox 1.0 [CS03] is a finite-domain model generator. It is based on a MACE-style [McC94] flattening and instantiating of the FO clauses into propositional clauses, and then the use of a SAT solver to solve the resulting problem.

Paradox incorporates the following novel features: New polynomial-time
*clause splitting heuristics*, the use of *incremental SAT*,
*static symmetry reduction* techniques, and the use of *sort
inference*.

The main part of Paradox is implemented in Haskell using the GHC compiler. Paradox also has a built-in incremental SAT solver which is written in C++. The two parts are linked together on the object level using Haskell's Foreign Function Interface. Paradox uses the following non-standard Haskell extensions: local universal type quantification and hash-consing.

There is only one strategy in Paradox:

- Analyze the problem, finding an upper bound N on the domain size of models, where N is possibly infinite. A finite such upper bound can for example be found for EPR problems.
- Flatten the problem, and split clauses and simplify as much as possible.
- Instantiate the problem for domain sizes 1 up to N, applying the SAT solver incrementally for each size. Report "SATISFIABLE" when a model is found.
- When no model of sizes smaller or equal to N is found, report "CONTRADICTION".

In this way, Paradox can be used both as a model finder and as an EPR solver.

Paradox 1.0 is the CASC-19 SAT division, Model class, winner.

Chalmers University of Technology and Gothenburg University, Sweden

Paradox 1.1-casc [CS03] is a finite-domain model generator. It is based on a MACE-style [McC94] flattening and instantiating of the first-order clauses into propositional clauses, and then the use of a SAT solver to solve the resulting problem.

Paradox incorporates the following features: Polynomial-time
*clause splitting heuristics*, the use of *incremental SAT*,
*static symmetry reduction* techniques, and the use of *sort
inference*.

The main differences with Paradox 1.0 are: a different SAT-solver, better memory behaviour, and a faster clause instantiation algorithm.

The main part of Paradox is implemented in Haskell using the GHC compiler. Paradox also has a built-in incremental SAT solver which is written in C++. The two parts are linked together on the object level using Haskell's Foreign Function Interface.

There is only one strategy in Paradox:

- Analyze the problem, finding an upper bound N on the domain size of models, where N is possibly infinite. A finite such upper bound can for example be found for EPR problems.
- Flatten the problem, and split clauses and simplify as much as possible.
- Instantiate the problem for domain sizes 1 up to N, applying the SAT solver incrementally for each size. Report "SATISFIABLE" when a model is found.
- When no model of sizes smaller or equal to N is found, report "CONTRADICTION".

Paradox 1.1-casc should perform slightly better than Paradox 1.0.

Australian National University

SOS (Son Of SCOTT) Version 1.0 is yet another attempt to enhance Otter
[McC03a] by adding semantic guidance taken
from models generated by a finite domain constraint solver.
Its range of inference rules is the same as that of Otter: for CASC,
that means it uses hyperresolution for most problems with non-unit
clauses together with paramodulation and dynamic demodulation for
equational reasoning.
SOS does not impose any restriction on the inference rules of Otter.
Like its predecessors [HS02] it maintains a
model of some subset of the clauses deduced and updates this model
from time to time during the search.
Unlike its predecessors, it treats most of the clauses as *soft*
constraints, thus maintaining a single approximate model of all of the
clauses rather than many exact models of different subsets. Thus it
represents a new solution to the problem of trading off the
computational cost of computing models against the guidance resulting
from them.

Because it contains Otter as a sub-program, all proofs produced by SOS are Otter proofs. The soundness of the system therefore follows trivially from the soundness of Otter.

For more information on the architecture and performance of SOS, see the forthcoming paper [SBP04] in ECAI 2004. At the time when that paper was written, the program was provisionally called "SOFTIE".

The system is written in C. Most of the code is that of Otter and of the constraint solver FINDER [Sla94], both of which use very standard libraries. At present it has been compiled only under Solaris and Linux. The sources are available from:

http://csl.anu.edu.au/~jks/software/sos.tar.gz

Note that the program is very new and still at an experimental stage, so substantial changes are expected in future versions.

All formulae in the set of support are tested against the guiding
model and marked as true or false. Most given clauses are chosen to be
the oldest of the shortest of the *false* clauses, though some
proportion of choices default to Otter's criteria. The guiding model
is repeatedly updated to make true as many instances (on its domain)
of the usable clauses as possible. For CASC there is a time limit of 2
seconds on each model search, resulting in indeterminism because on
different runs, different models may be the best found within the
limit. SOS behaves differently from Otter not only because of the
preference for false clauses but also because the weight reduction
strategy is somewhat different: weight restriction begins more
aggressively than in Otter, but subsequently each weight limit is
phased in more gradually.

SOS should beat its parent Otter overall. It is expected to perform as well as the previous versions of SCOTT on UEQ problems, and tolerably well on Horn clause problems (HEQ and HNE). In other subsections of MIX it will fail for the same reasons as Otter. It is not expected to threaten more modern high-performance provers except possibly in UEQ. Its performance depends somewhat on the time limit, because its inference speed is lower than those of the other provers.

McGill University

THEO [New01] is a resolution-refutation theorem prover for first order clause logic. It uses binary resolution, binary factoring, instantiation, demodulation, and hash table resolutions.

THEO is written in C and runs under both LINUX and FREEBSD. It contains about 35000 lines of source code. Originally it was called The Great Theorem Prover.

THEO uses a large hash table (16 million entries) to store clauses. This permits complex proofs to be found, some as long as 500 inferences. It uses what might be called a brute-force iteratively deepening depth-first search for a contradiction while storing information about clauses - unit clauses in particular - in its hash table.

In the last year or so, THEO has been modified so that when given a theorem, it first determines a list of potential ways to "weaken" the theorem. It then randomly selects one of the weakenings, tries to prove the weakened version of the theorem, and then uses the results from this effort to help prove the given theorem. A weakened version is created in a number of different ways, including modifying one clause by replacing a constant or function by a variable. Certain clauses from the proof of the weakened version are added to the base clauses when THEO next attempts to prove the given theorem. In addition, base clauses that participated in the proof of the weakened version are placed in the set-of-support. THEO then attempts to prove the given theorem with the revised set of base clauses.

THEO is only marginally better than it was last year.

University of Manchester, England

Vampire [RV01, RV02] 5.0 is an automatic theorem prover for first-order classical logic. Its kernel implements the calculi of ordered binary resolution and superposition for handling equality. The splitting rule is simulated by introducing new predicate symbols. A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. The only term ordering used in Vampire at the moment is a special non-recursive version of the Knuth-Bendix ordering that allows efficient approximation algorithms for solving ordering constraints. By the system installation deadline we may implement the standard Knuth-Bendix ordering. A number of efficient indexing techniques are used to implement all major operations on sets of terms and clauses. Although the kernel of the system works only with clausal normal forms, the preprocessor component accepts a problem in the full first-order logic syntax, clausifies it and performs a number of useful transformations before passing the result to the kernel.

Vampire 5.0 is implemented in C++. The main supported compiler version is gcc 2.95.3, although in the nearest future we are going to move to gcc 3.x. The system has been successfully compiled for Linux and Solaris. It is available from:

http://www.cs.man.ac.uk/~riazanoa/Vampire/

The Vampire kernel provides a fairly large number of features for strategy selection. The most important ones are:

- Choice of the main saturation procedure : (i) OTTER loop, with or without the Limited Resource Strategy, (ii) DISCOUNT loop.
- A variety of optional simplifications.
- Parameterised simplification ordering.
- A number of built-in literal selection functions and different modes of comparing literals.
- Age-weight ratio that specifies how strongly lighter clauses are preferred for inference selection.

The standalone executables for Vampire 5.0 and Vampire 5.0-CASC use very simple time slicing to make sure that several kernel strategies are tried on a given problem.

The automatic mode of Vampire 5.0 is primitive. Seven problem classes are distinguished corresponding to the competition divisions HNE, HEQ, NNE, NEQ, PEQ, UEQ and EPR. Every class is assigned a fixed schedule consisting of a number of kernel strategies called one by one with different time limits.

Vampire 5.0 is the CASC-19 FOF division winner.

University of Manchester, England

Vampire [RV02] 6.0 is an automatic theorem prover for first-order classical logic. Its kernel implements the calculi of ordered binary resolution, superposition for handling equality and ordered chaining for transitive predicates. The splitting rule is simulated by introducing new predicate symbols. A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. The reduction orderings used are the standard Knuth-Bendix ordering and a special non-recursive version of the Knuth-Bendix ordering that allows efficient approximation algorithms for solving ordering constraints. A number of efficient indexing techniques are used to implement all major operations on sets of terms and clauses. Although the kernel of the system works only with clausal normal forms, the preprocessor component accepts a problem in the full first-order logic syntax, clausifies it and performs a number of useful transformations before passing the result to the kernel.

Vampire 6.0 is implemented in C++. The supported compilers are gcc 2.95.3, 3.x and Microsoft Visual C++. The system has been successfully compiled for Linux, Solaris and Win32. It is available (conditions apply) from:

http://www.cs.man.ac.uk/~riazanoa/Vampire/

The Vampire kernel provides a fairly large number of features for strategy selection. The most important ones are:

- Choice of the main saturation procedure : (i) OTTER loop, with or without the Limited Resource Strategy, (ii) DISCOUNT loop.
- A variety of optional simplifications.
- Parameterised reduction orderings.
- A number of built-in literal selection functions and different modes of comparing literals.
- Age-weight ratio that specifies how strongly lighter clauses are preferred for inference selection.

The standalone executable for Vampire 6.0 uses very simple time slicing to make sure that several kernel strategies are tried on a given problem.

The automatic mode of Vampire 6.0 is primitive. Seven problem classes are distinguished corresponding to the competition divisions HNE, HEQ, NNE, NEQ, PEQ, UEQ and EPR. Every class is assigned a fixed schedule consisting of a number of kernel strategies called one by one with different time limits.

Vampire 6.0 is the CASC-19 MIX division winner.

University of Manchester, England

Vampire [RV02] 7.0 is an automatic theorem prover for first-order classical logic. Its kernel implements the calculi of ordered binary resolution and superposition for handling equality. The splitting rule and negative equality splitting are simulated by the introduction of new predicate definitions and dynamic folding of such definitions. A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion (optionally modulo commutativity), subsumption resolution, rewriting by ordered unit equalities, basicness restrictions and irreducibility of substitution terms. The reduction orderings used are the standard Knuth-Bendix ordering and a special non-recursive version of the Knuth-Bendix ordering. A number of efficient indexing techniques is used to implement all major operations on sets of terms and clauses. Run-time algorithm specialisation is used to accelerate some costly operations, e.g., checks of ordering constraints. Although the kernel of the system works only with clausal normal forms, the preprocessor component accepts a problem in the full first-order logic syntax, clausifies it and performs a number of useful transformations before passing the result to the kernel. When a theorem is proven, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF. The current release features a built-in proof checker for the clausifying phase, which will be extended to check complete proofs.

Vampire 7.0 is implemented in C++. The supported compilers are gcc 3.2.x, gcc 3.3.x, and Microsoft Visual C++. This version has been successfully compiled for Linux, but has not been fully tested on Solaris and Win32. It is available (conditions apply) from:

http://www.cs.man.ac.uk/~riazanoa/Vampire

The Vampire kernel provides a fairly large number of features for strategy selection. The most important ones are:

- Choice of the main saturation procedure : (i) OTTER loop, with or without the Limited Resource Strategy, (ii) DISCOUNT loop.
- A variety of optional simplifications.
- Parameterised reduction orderings.
- A number of built-in literal selection functions and different modes of comparing literals.
- Age-weight ratio that specifies how strongly lighter clauses are preferred for inference selection.
- Set-of-support strategy.

The automatic mode of Vampire 7.0 is derived from extensive experimental data obtained on problems from TPTP v2.6.0. Input problems are classified taking into account simple syntactic properties, such as being Horn or non-Horn, presence of equality, etc. Additionally, we take into account the presence of some important kinds of axioms, such as set theory axioms, associativity and commutativity. Every class of problems is assigned a fixed schedule consisting of a number of kernel strategies called one by one with different time limits.

We expect the new version to perform much better than the last MIX winner Vampire 6.0. The kernel implementation has undergone a number of significant changes, and several new features has been added, such as basicness restrictions and memory defragmentation. Another source of improvement w.r.t. Vampire 6.0 is better selection of best strategies based on extensive experiments.

Waldmeister 702 is an implementation of unfailing Knuth-Bendix completion [BDP89] with extensions towards ordered completion (see [AHL00]) and basicness [BG+92, NR92]. The system saturates the input axiomatization, distinguishing active facts, which induce a rewrite relation, and passive facts, which are the one-step conclusions of the active ones up to redundancy. The saturation process is parameterized by a reduction ordering and a heuristic assessment of passive facts.

Only recently, we have designed a thorough refinement of the system architecture concerning the representation of passive facts [HL02]. The aim of that work - the next Waldmeister loop - is, besides gaining more structural clarity, to cut down memory consumption especially for long-lasting proof attempts, and hence less relevant in the CASC setting.

The system is implemented in ANSI-C and runs under Solaris and Linux. The central data strucures are: perfect discrimination trees for the active facts; element-wise compressions for the passive ones; and sets of rewrite successors for the conjectures. Waldmeister can be found on the Web at:

http://www-avenhaus.informatik.uni-kl.de/waldmeister

Our approach to control the proof search is to choose the search
parameters according to the algebraic structure given in the problem
specification [HJL99].
This is based on the observation that proof tasks sharing major parts of
their axiomatization often behave similar.
Hence, for a number of domains, the influence of different reduction orderings
and heuristic assessments has been analyzed experimentally; and in most cases
it has been possible to distinguish a strategy uniformly superior on the whole
domain.
In essence, every such strategy consists of an instantiation of the first
parameter to a Knuth-Bendix ordering or to a lexicographic path ordering, and
an instantiation of the second parameter to one of the weighting functions
*addweight*, *gtweight*, or *mixweight*, which, if
called on an equation $$*s* = *t*, return
$|$*s*| + |*t*|,
$|max$_{>}(*s*,*t*)|, or
$|max$_{>}(*s*,*t*)| · (|*s*|
+ |*t*| + 1) + |*s*| + |*t*|,
respectively, where $|$*s*| denotes the number of symbols
in $$*s*.

Waldmeister 702 is the CASC-19 UEQ division winner.

Waldmeister 704 is a system for unit equational deduction. Its theoretical basis is unfailing completion in the sense of [BDP89] with refinements towards ordered completion (cf. [AHL03]). The system saturates the input axiomatization, distinguishing active facts, which induce a rewrite relation, and passive facts, which are the one-step conclusions of the active ones up to redundancy. The saturation process is parameterized by a reduction ordering and a heuristic assessment of passive facts [HJL99]. For an in-depth description of the system, see [Hil03].

Waldmeister 704 improves over last year's version in several respects. Firstly, the detection of redundancies in the presence of associative-commutative operators has been strenghtened (cf. [Loe04a]). In a class of AC-equivalent equations, an element is redundant if each of its ground instances can be rewritten, with the ground convergent rewrite system for AC, into an instance of another element. Instead of elaborately checking this kind of reducability explicitly, it can be rephrased in terms of ordering constraints and efficiently be approximated with a polynomial test. Secondly, the last teething troubles of the implementation of the Waldmeister loop have been overcome. Thirdly, a number of strategies have slightly been revised.

The prover is coded in ANSI-C. It runs on Solaris, Linux, and newly also on MacOS X. In addition, it is now available for Windows users via the Cygwin platform. The central data strucures are: perfect discrimination trees for the active facts; group-wise compressions for the passive ones; and sets of rewrite successors for the conjectures. Visit the Waldmeister Web pages at:

http://www.waldmeister.org

The approach taken to control the proof search is to choose the search
parameters according to the algebraic structure given in the problem
specification [HJL99].
This is based on the observation that proof tasks sharing major parts of
their axiomatization often behave similar.
Hence, for a number of domains, the influence of different reduction orderings
and heuristic assessments has been analyzed experimentally; and in most cases
it has been possible to distinguish a strategy uniformly superior on the whole
domain.
In essence, every such strategy consists of an instantiation of the first
parameter to a Knuth-Bendix ordering or to a lexicographic path ordering, and
an instantiation of the second parameter to one of the weighting functions
parameters according to the algebraic structure given in the problem
specification [HJL99].
This is based on the observation that proof tasks sharing major parts of
their axiomatization often behave similar.
Hence, for a number of domains, the influence of different reduction orderings
and heuristic assessments has been analyzed experimentally; and in most cases
it has been possible to distinguish a strategy uniformly superior on the whole
domain.
In essence, every such strategy consists of an instantiation of the first
parameter to a Knuth-Bendix ordering or to a lexicographic path ordering, and
an instantiation of the second parameter to one of the weighting functions
*addweight*, *gtweight*, or *mixweight*, which, if
called on an equation $$*s* = *t*, return
$|$*s*| + |*t*|,
$|max$_{>}(*s*,*t*)|, or
$|max$_{>}(*s*,*t*)| · (|*s*|
+ |*t*| + 1) + |*s*| + |*t*|,
respectively, where $|$*s*| denotes the number of symbols
in $$*s*.

We hope to finally outperform Waldmeister 702.

- AHL00
- Avenhaus J., Hillenbrand T., Löchner B. (2000),
**On Using Ground Joinable Equations in Equational Theorem Proving**, Baumgartner P., Zhang H.,*Proceedings of the 3rd International Workshop on First Order Theorem Proving*(St Andrews, Scotland), pp.33-43. - AHL03
- Avenhaus J., Hillenbrand T., Löchner B. (2003),
**On Using Ground Joinable Equations in Equational Theorem Proving**,*Journal of Symbolic Computation*36(1-2), pp.217-233, Elsevier Science. - BDP89
- Bachmair L., Dershowitz N., Plaisted D.A. (1989),
**Completion Without Failure**, Ait-Kaci H., Nivat M.,*Resolution of Equations in Algebraic Structures*, pp.1-30, Academic Press. - BG+92
- Bachmair L., Ganzinger H., Lynch C., Snyder W. (1992),
**Basic Paramodulation and Superposition**, Kapur D.,*Proceedings of the 11th International Conference on Automated Deduction*(Saratoga Springs, USA), pp.462-476, Lecture Notes in Artificial Intelligence 607, Springer-Verlag. - BT093
- Baumgartner P., Tinelli C. (2003),
**The Model Evolution Calculus**, Baader F.,*Proceedings of the 19th International Conference on Automated Deduction*(Miami, USA), pp.350-364, Lecture Notes in Artificial Intelligence 2741, Springer-Verlag. - BFT04
- Baumgartner P., Fuchs A., Tinelli C. (2004),
**Darwin - A Theorem Prover for the Model Evolution Calculus**, Sutcliffe G., Schulz S., Tammet T.,*Proceedings of the Workshop on Empirically Successful First Order Reasoning, 2nd International Joint Conference on Automated Reasoning*(Cork, Ireland), Electronic Notes in Theoretical Computer Science. - Bil96
- Billon J-P. (1996),
**The Disconnection Method: A Confluent Integration of Unification in the Analytic Framework**, Miglioli P., Moscato U., Mundici D., Ornaghi M.,*Proceedings of TABLEAUX'96: the 5th Workshop on Theorem Proving with Analytic Tableaux and Related Methods*(Palermo, Italy), pp.110-126, Lecture Notes in Artificial Intelligence 1071, Springer-Verlag. - Bjö03a
- Björk M. (2003),
**Extending St?lmarck's Method to First Order Logic**, Cialdea Mayer M Pirri F,*TABLEAUX 2003 Position Papers and Tutorials*(Rome, Italy), pp.23-36, Dipartimento di Informatica e Automazione, Universita degli Studi di Roma Tre. - Bjö03b
- Björk M. (2003),
**St?lmarck's Method for Automated Theorem Proving in First Order Logic**, PhD thesis, Department of Computing Science, Chalmers University of Technology, Gothenburg, Sweden. - Bjö04
- Björk M. (2004),
**Adding Equivalence Classes to Stalmarck's Method in First Order Logic**,*IJCAR 2004 Doctoral Programme*(Cork, Ireland). - CS03
- Claessen K., Sorensson N. (2003),
**New Techniques that Improve MACE-style Finite Model Finding**, Baumgartner P., Fermueller C.,*Proceedings of the CADE-19 Workshop: Model Computation - Principles, Algorithms, Applications*(Miami, USA). - GB+94
- Geist A., Beguelin A., Dongarra J., Jiang W., Manchek R., and
Sunderam V. (1994),
**PVM: Parallel Virtual Machine: A Users Guide and Tutorial for Network Parallel Computing**, MIT Press. - HJL99
- Hillenbrand T., Jaeger A., Löchner B. (1999),
**Waldmeister - Improvements in Performance and Ease of Use**, Ganzinger H.,*Proceedings of the 16th International Conference on Automated Deduction*(Trento, Italy), pp.232-236, Lecture Notes in Artificial Intelligence 1632, Springer-Verlag. - HL02
- Hillenbrand T., Löchner B. (2002),
**The Next Waldmeister Loop**, Voronkov A.,*Proceedings of the 18th International Conference on Automated Deduction*(Copenhagen, Denmark), pp.486-500, Lecture Notes in Artificial Intelligence 2392, Springer-Verlag. - Hil03
- Hillenbrand T. (2003),
**Citius altius fortius: Lessons Learned from the Theorem Prover Waldmeister**, Dahn I., Vigneron L.,*Proceedings of the 4th International Workshop on First-Order Theorem Proving*(Valencia, Spain), Electronic Notes in Theoretical Computer Science 86.1, Elsevier Science. - LS01
- Letz R., Stenz G. (2001),
**Model Elimination and Connection Tableau Procedures**, Robinson A., Voronkov A.,*Handbook of Automated Reasoning*, pp.2015-2114, Elsevier Science. - LS02
- Letz R., Stenz G. (2002),
**Integration of Equality Reasoning into the Disconnection Calculus**, Fermüller C., Egly U.,*Proceedings of TABLEAUX 2002: Automated Reasoning with Analytic Tableaux and Related Methods*(Copenhagen, Denmark), pp.176-190, Lecture Notes in Artificial Intelligence 2381, Springer-Verlag. - LS04
- Letz R., Stenz G. (2004),
**Generalised Handling of Variables in Disconnection Tableaux**, Rusinowitch M., Basin D.,*Proceedings of the 2nd International Joint Conference on Automated Reasoning*(Cork, Ireland), pp.289-306, Lecture Notes in Artificial Intelligence 3097. - Loe04a
- Loechner B. (2004),
**A Redundancy Criterion Based on AC Ground Reducibility**, Rusinowitch M., Basin D.,*Proceedings of the 2nd International Joint Conference on Automated Reasoning*(Cork, Ireland), pp.45-59, Lecture Notes in Artificial Intelligence 3097. - Loe04b
- Loechner B. (2004),
**What to know when implementing LPO**, Sutcliffe G., Schulz S., Tammet T.,*Proceedings of the Workshop on Empirically Successful First Order Reasoning, 2nd International Joint Conference on Automated Reasoning*(Cork, Ireland), Electronic Notes in Theoretical Computer Science. - McC94
- McCune W.W. (1994),
**A Davis-Putnam Program and its Application to Finite First-Order Model Search: Quasigroup Existence Problems**, ANL/MCS-TM-194, Argonne National Laboratory, Argonne, USA. - McC01
- McCune W.W. (2001),
**MACE 2.0 Reference Manual and Guide**, ANL/MSC-TM-249, Argonne National Laboratory, Argonne, USA. - McC03a
- McCune W.W. (2003),
**Otter 3.3 Reference Manual**, ANL/MSC-TM-263, Argonne National Laboratory, Argonne, USA. - McC03b
- McCune W.W. (2003),
**Mace4 Reference Manual and Guide**, ANL/MSC-TM-264, Argonne National Laboratory, Argonne, USA. - MW97
- McCune W.W., Wos L. (1997),
**Otter: The CADE-13 Competition Incarnations**,*Journal of Automated Reasoning*18(2), pp.211-220. - MI+97
- Moser M., Ibens O., Letz R., Steinbach J., Goller C., Schumann J.,
Mayr K. (1997),
**SETHEO and E-SETHEO: The CADE-13 Systems**,*Journal of Automated Reasoning*18(2), pp.237-246. - MM+01
- Moskewicz M., Madigan C., Zhao Y., Zhang L., Malik S. (2001),
**Chaff: Engineering an Efficient SAT Solver**, Blaauw D., Lavagno L.,*Proceedings of the 39th Design Automation Conference*(Las Vegas, USA), pp.530-535. - New01
- Newborn M. (2001),
**Automated Theorem Proving: Theory and Practice**, Springer. - NR92
- Nieuwenhuis R., Rivero J.M. (1992),
**Basic Superposition is Complete**, Krieg-Brückner B.,*Proceedings of the 4th European Symposium on Programming*(Rennes, France), pp.371-390, Lecture Notes in Computer Science 582, Springer-Verlag. - RV01
- Riazanov A., Voronkov A. (2001),
**Vampire 1.1 (System Description)**, Gore R., Leitsch A., Nipkow T.,*Proceedings of the International Joint Conference on Automated Reasoning*(Siena, Italy), pp.376-380, Lecture Notes in Artificial Intelligence 2083, Springer-Verlag. - Sch01
- Schulz S. (2001),
**System Abstract: E 0.61**, Gore R., Leitsch A., Nipkow T.,*Proceedings of the International Joint Conference on Automated Reasoning*(Siena, Italy), pp.370-375, Lecture Notes in Artificial Intelligence 2083, Springer-Verlag. - Sch02
- Schulz S. (2002),
**E: A Brainiac Theorem Prover**,*AI Communications*15(2-3), pp.111-126. - Sch04a
- Schulz S. (2004),
**System Abstract: E 0.81**, Rusinowitch M., Basin D.,*Proceedings of the 2nd International Joint Conference on Automated Reasoning*(Cork, Ireland), pp.223-228, Lecture Notes in Artificial Intelligence 3097. - Sch04b
- Schulz S. (2004),
**Simple and Efficient Clause Subsumption with Feature Vector Indexing**, Sutcliffe G., Schulz S., Tammet T.,*Proceedings of the Workshop on Empirically Successful First Order Reasoning, 2nd International Joint Conference on Automated Reasoning*(Cork, Ireland), Electronic Notes in Theoretical Computer Science. - SBP04
- Slaney J.K., Binas A., Price D. (2004),
**Guiding a Theorem Prover with Soft Constraints**,*Proceedings of the European Conference on Artificial Intelligence*(Valencia, Spain). - RV02
- Riazanov A., Voronkov A. (2002),
**The Design and Implementation of Vampire**,*AI Communications*15(2-3), pp.91-110. - SW99
- Stenz G., Wolf A. (1999),
**E-SETHEO: Design, Configuration and Use of a Parallel Automated Theorem Prover**, Foo N.,*Proceedings of AI'99: The 12th Australian Joint Conference on Artificial Intelligence*(Sydney, Australia), pp.231-243, Lecture Notes in Artificial Intelligence 1747, Springer-Verlag. - Ste02a
- Stenz G. (2002),
**DCTP 1.2 - System Abstract**, Fermüller C., Egly U.,*Proceedings of TABLEAUX 2002: Automated Reasoning with Analytic Tableaux and Related Methods*(Copenhagen, Denmark), pp.335-340, Lecture Notes in Artificial Intelligence 2381, Springer-Verlag. - Ste02b
- Stenz G. (2002),
**The Disconnection Calculus**, PhD thesis, Institut für Informatik, Technische Universität München, Munich, Germany. - SZS04
- Sutcliffe G., Zimmer J., Schulz S. (2004),
**TSTP Data-Exchange Formats for Automated Theorem Proving Tools**, Sorge V., Colton S., Fisher M., Gow J.,*Distributed and Multi-Agent Reasoning*, Frontiers in Artificial Intelligence and Applications, IOS Press. - Tam97
- Tammet T. (1997),
**Gandalf**,*Journal of Automated Reasoning*18(2), pp.199-204. - Tam98
- Tammet T. (1998),
**Towards Efficient Subsumption**, Kirchner C., Kirchner H.,*Proceedings of the 15th International Conference on Automated Deduction*(Lindau, Germany), pp.427-440, Lecture Notes in Artificial Intelligence 1421, Springer-Verlag. - WGR96
- Weidenbach C., Gaede B., Rock G. (1996),
**SPASS and FLOTTER**, McRobbie M., Slaney J.K.,*Proceedings of the 13th International Conference on Automated Deduction*(New Brunswick, USA), pp.141-145, Lecture Notes in Artificial Intelligence 1104, Springer-Verlag. - Zha96
- Zhang J. (1996),
**Constructing Finite Algebras with FALCON**,*Journal of Automated Reasoning*17(1), pp.1-22.