Entrants' System Descriptions
CHewTPTP 1.0
Eric McGregor
Clarkson University, USA
Architecture
ChewTPTP version 1.0 implements the method described in
[BK+08,DH+07].
ChewTPTP transforms a set of first order clauses into a propositional encoding
(modulo recursive type theory) of the existence of a rigid first order
connection tableau and the satisfiability of unification constraints, which
is then fed to Yices.
For the unification constraints, terms are represented as recursive datatypes,
and unification constraints are equations on terms.
Additional instances of the first order clauses may be added for the
non-rigid case.
Strategies
The strategy is as follows:
- Encode the existence of a rigid connection tableau with unification
constraints.
- Run SMT solver on encoding.
- If the solver returns satisfiable, we verify that the model represents
a rigid tableau.
If it does we return unsatisfiable, otherwise we add additional clauses
and goto 2.
- If the solver returns unsatisfiable we add additional instances of the
initial clauses to the problem and goto 1.
Implementation
ChewTPTP is written in C++ and uses Yices SMT solver.
Yices requires GMP, the GNU Multiprecision Library.
Expected Competition Performance
The system is a demonstration of the method described above and is not
expected to win its division.
Darwin 1.3
Peter Baumgartner^{1},
Alexander Fuchs^{2},
Cesare Tinelli^{2}
^{1}National ICT Australia, Australia,
^{2}The University of Iowa, USA
Architecture
Darwin [BFT04,BFT06a]
is an automated theorem prover for first order clausal logic.
It is an implementation of the Model Evolution Calculus
[BT03].
The Model Evolution Calculus lifts the propositional DPLL procedure to
first-order logic. One of the main motivations for this approach
was the possibility of migrating to the first-order level some of
those very effective search techniques developed by the SAT community
for the DPLL procedure.
The current version of Darwin implements first-order versions of unit
propagation inference rules analogously to a restricted form of unit
resolution and subsumption by unit clauses.
To retain completeness, it includes a first-order version of the (binary)
propositional splitting inference rule.
Proof search in Darwin starts with a default interpretation for a
given clause set, which is evolved towards a model or until a
refutation is found.
Strategies
Darwin traverses the search space by iterative deepening over the term
depth of candidate literals.
Darwin employs a uniform search strategy for all problem classes.
Implementation
The central data structure is the context.
A context represents an interpretation as a set of first-order literals.
The context is grown by using instances of literals from the input clauses.
The implementation of Darwin is intended to support basic operations
on contexts in an efficient way.
This involves the handling of large sets of candidate literals for modifying
the current context.
The candidate literals are computed via simultaneous unification between
given clauses and context literals.
This process is sped up by storing partial unifiers for each given clause
and merging them for different combinations of context literals, instead of
redoing the whole unifier computations.
For efficient filtering of unneeded candidates against context literals,
discrimination tree or substitution tree indexing is employed.
The splitting rule generates choice points in the derivation which are
backtracked using a form of backjumping similar to the one used in
DPLL-based SAT solvers.
Improvements to the previous version include additional preprocessing steps,
less memory requirements, and lemma learning [BFT06b].
Darwin is implemented in OCaml and has been tested under various Linux
distributions.
It is available from:
http://goedel.cs.uiowa.edu/Darwin/
Expected Competition Performance
Darwin 1.3 is the CASC-21 EPR division winner.
E 1.0pre and EP 1.0pre
Stephan Schulz
Institut für Informatik, Technische Universität, Germany
Architecture
E 1.0pre [Sch02,Sch04a]
is a purely equational theorem prover. The core proof procedure operates on
formulas in clause normal form, using a calculus that combines
superposition (with selection of negative literals) and rewriting. No
special rules for non-equational literals have been implemented, i.e.,
resolution is simulated via paramodulation and equality
resolution. The basic calculus is extended with rules for AC
redundancy elimination, some contextual simplification, and
pseudo-splitting with definition caching. The latest versions of E
also supports simultaneous paramodulation, either for all inferences
or for selected inferences.
E is based on the DISCOUNT-loop variant of the given-clause
algorithm, i.e., a strict separation of active and passive facts.
Proof search in E is primarily controlled by a literal selection
strategy, a clause evaluation heuristic, and a simplification
ordering. The prover supports a large number of preprogrammed literal
selection strategies, many of which are only experimental. Clause
evaluation heuristics can be constructed on the fly by combining
various parameterized primitive evaluation functions, or can be
selected from a set of predefined heuristics. Supported term orderings
are several parameterized instances of Knuth-Bendix-Ordering (KBO) and
Lexicographic Path Ordering (LPO).
The prover uses a preprocessing step to convert formulas in full first
order format to clause normal form. This step may introduce
(first-order) definitions to avoid an exponential growth of the
formula. Preprocessing also unfolds equational definitions and
performs some simplifications on the clause level.
The automatic mode determines literal selection strategy, term
ordering, and search heuristic based on simple problem characteristics
of the preprocessed clausal problem.
EP 1.0pre is just a combination of E 1.0pre in verbose mode and
a proof analysis tool extracting the used inference steps.
Strategies
E has been optimized for performance over the TPTP. The automatic mode
of E 1.0pre is inherited from previous version and is based on
about 90 test runs over TPTP 3.1.1. It consists of the selection of
one of about 40 different strategies for each problem. All test runs
have been performed on SUN Ultra 60/300 machines with a time limit of
300 seconds (or roughly equivalent configurations). All individual
strategies are refutationally complete. The worst one solves about 49%
of TPTP 3.0.1, the best one about 60%.
E distinguishes problem classes based on a number of features, all of
which have between 2 and 4 possible values. The most important ones
are:
- Is the most general non-negative clause unit, Horn, or Non-Horn?
- Is the most general negative clause unit or non-unit?
- Are all negative clauses unit clauses?
- Are all literals equality literals, are some literals equality
literals, or is the problem non-equational?
- Are there a few, some, or many clauses in the problem?
- Is the maximum arity of any function symbol 0, 1, 2, or greater?
- Is the sum of function symbol arities in the signature small,
medium, or large?
Wherever there is a three-way split on a numerical feature value,
the limits are selected automatically with the aim of
splitting the set of problems into approximately equal
sized parts based on this one feature.
For classes above a threshold size, we assign the absolute best
heuristic to the class. For smaller, non-empty classes, we assign the
globally best heuristic that solves the same number of problems on
this class as the best heuristic on this class does. Empty classes are
assigned the globally best heuristic. Typically, most selected
heuristics are assigned to more than one class.
Implementation
E is implemented in ANSI C, using the GNU C compiler. At the core is a
implementation of aggressively shared first-order terms in a term
bank data structure. Based on this, E supports the global sharing
of rewrite steps. Rewriting is implemented in the form of rewrite
links from rewritten to new terms. In effect, E is caching
rewrite operations as long as sufficient memory is available. Other
important features are the use of perfect discrimination
trees with age and size constraints for rewriting and
unit-subsumption, feature vector
indexing [Sch04b] for forward- and
backward subsumption and contextual literal cutting, and a new
polynomial implementation of LPO [Loe04].
The program has been successfully installed under SunOS 4.3.x,
Solaris 2.x, HP-UX B 10.20, MacOS-X, and various
versions of Linux. Sources of the latest released version are
available freely from:
http://www.eprover.org
EP 1.0pre is a simple Bourne shell script calling E and the
postprocessor in a pipeline.
Expected Competition Performance
In the last years, E performed well in most proof categories. We
believe that E will again be among the stronger provers in the FOF and
CNF categories. We hope that E will at least be a useful complement to
dedicated systems in the other categories.
EP 1.0p will be hampered by the fact that it has to analyse the
inference step listing, an operation that typically is about as
expensive as the proof search itself. Nevertheless, it should be
competitive in the FOF proof class.
E-Darwin 1.0
Björn Pelzer
University Koblenz-Landau, Germany
Architecture
E-Darwin 1.0 is an automated theorem prover for first order clausal logic
with equality.
It is a modified version of the Darwin prover [BFT04],
and it implements the Model Evolution Calculus with Equality
[BT05].
The general operation of the original Darwin has been extended by inferences
for handling equality, specifically a paramodulation and a reflexivity
inference.
Whereas the original Darwin derived units only, the new inferences in
E-Darwin allow the derivation of multi-literal clauses.
A more extensive detection and handling of redundancy has been added as well.
Strategies
The uniform search strategy is identical to the one employed in the
original Darwin.
Implementation
E-Darwin is implemented in the functional/imperative language OCaml.
The system has been tested on Unix and is available under the GNU Public
License from the E-Darwin website at:
http://www.uni-koblenz.de/~bpelzer/edarwin
Darwin's method of storing partial unifiers has been adapted to equations
and subterm positions for the paramodulation inference in E-Darwin.
A combination of perfect and non-perfect discrimination tree indexes is
used to store the context and the clauses.
Expected Competition Performance
The original Darwin has performed very well in the previous years, but
since E-Darwin differs significantly from the original, it should be
regarded as a new system.
Very little testing has been done so far, making it difficult to estimate
the performance.
E-KRHyper 1.1
Björn Pelzer
University Koblenz-Landau, Germany
Architecture
E-KRHyper 1.1 [PW07] is a theorem proving and
model generation system for first-order logic with equality.
It is an implementation of the E-hyper tableau calculus
[BFP07], which integrates a superposition-based
handling of equality [BG98] into the hyper
tableau calculus [BFN96].
The system is an extension of the KRHyper theorem prover
[Wer03], which implements the original hyper tableau
calculus.
An E-hyper tableau is a tree whose nodes are labeled with clauses and which is
built up by the application of the inference rules of the E-hyper tableau
calculus.
The calculus rules are designed such that most of the reasoning is performed
using positive unit clauses.
A branch can be extended with new clauses that have been derived from the
clauses of that branch.
A positive disjunction can be used to split a branch, creating a new branch for
each disjunct.
No variables may be shared between branches, and if a case-split creates
branches with shared variables, then these are immediately substituted
by ground terms.
The grounding substitution is arbitrary as long as the terms in its range
are irreducible: the branch being split may not contain a positive equational
unit which can simplify a substituting term, i.e., rewrite it with one that
is smaller according to a reduction ordering.
When multiple irreducible substitutions are possible, each of them must be
applied in consecutive splittings in order to preserve completeness.
Redundancy rules allow the detection and removal of clauses that are redundant
with respect to a branch.
The hyper extension inference from the original hyper tableau calculus is
equivalent to a series of E-hyper tableau calculus inference applications.
Therefore the implementation of the hyper extension in KRHyper by a
variant of semi-naive evaluation [Ull89] is retained
in E-KRHyper, where it serves as a shortcut inference for the resolution of
non-equational literals.
Strategies
E-KRHyper uses a uniform search strategy for all problems.
The E-hyper tableau is generated depth-first, with E-KRHyper always working
on a single branch.
Refutational completeness and a fair search control are ensured by an
iterative deepening strategy with a limit on the maximum term weight of
generated clauses.
Implementation
E-KRHyper is implemented in the functional/imperative language OCaml, and
Runs on Unix and MS-Windows platforms.
The system accepts input in the TPTP format and in the TPTP-supported
Protein format.
The calculus implemented by E-KRHyper works on clauses, so first order
formula input is converted into CNF by an algorithm which follows the
transformation steps as used in the clausification of Otter
[McC94].
E-KRHyper operates on an E-hyper tableau which is represented by linked
node records.
Several layered discrimination-tree based indexes (both perfect and
non-perfect) provide access to the clauses in the tableau and support
backtracking.
E-KRHyper is available under the GNU Public License from:
http://www.uni-koblenz.de/~bpelzer/ekrhyper
Expected Competition Performance
Most work on E-KRHyper since version 1.0 has been concerning its interfaces
and embedding in natural language processing applications, so only a minor
improvement in comparison to last year's performance can be expected.
Equinox 3.0
Koen Claessen
Chalmers University of Technology, Sweden
Architecture
Equinox is an experimental theorem prover for pure first-order logic
with equality. It finds ground proofs of the input theory, by solving
successive ground instantiations of the theory using an incremental
SAT-solver. Equality is dealt with using a Nelson-Oppen framework.
Implementation
The main part of Equinox is implemented in Haskell using the GHC
compiler. Equinox also has a built-in incremental SAT solver (MiniSat)
which is written in C++. The two parts are linked together on the
object level using Haskell's Foreign Function Interface.
Strategies
There is only one strategy in Equinox:
- Give all ground clauses in the problem to a SAT solver.
- Run the SAT-solver.
- If a contradiction is found, we have a proof and we terminate.
- If a model is found, we use the model to indicate which new
ground instantiations should be added to the SAT-solver.
- Goto 2.
Expected Competition Performance
Equinox should perform reasonably well. There should be problems that
it can solve that few other provers can handle.
iProver 0.5
Konstantin Korovin
The University of Manchester, England
Architecture
iProver is an automated theorem prover which is based on an instantiation
calculus Inst-Gen [GK03,
Kor08a], complete for first-order logic.
One of the distinctive features of iProver is a modular combination of
first-order reasoning with ground reasoning.
In particular, iProver currently integrates MiniSat for reasoning with
ground abstractions of first-order clauses.
In addition to instantiation, iProver implements ordered resolution calculus
and a combination of instantiation and ordered resolution, see
[Kor08b] for the implementation details.
The saturation process is implemented as a modification of a given
clause algorithm.
We use non-perfect discrimination trees for the unification indexes,
priority queues for passive clauses and
a compressed vector index for subsumption and subsumption resolution
(both forward and backward).
The following redundancy eliminations are implemented:
blocking non-proper instantiations; dismatching constraints
[GK04,Kor08b];
global subsumption [Kor08b]; resolution-based
simplifications and propositional-based simplifications.
Equality is dealt with (internally) by adding the necessary axioms of equality.
iProver has a satisfiablity mode which includes a finite model finder,
based on an adaptation of ideas from DarwinFM and Paradox to the iProver
setting.
Strategies
iProver v0.5 has around 40 options to control the proof
search including options for literal selections,
passive clause selections, frequency of calling the SAT solver,
simplifications and options for combination of instantiation with resolution.
At the CASC competition iProver
will execute a fixed schedule of selected options.
For the LTB division a small script partitions
the time available to the given problem set and runs
iProver with the allocated time limit, and
time is repartitioned after each solved problem.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses MiniSat.
iProver accepts cnf and fof formats.
In the case of fof format, either Vampire or E prover is used for
clausification.
iProver is available at:
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
The instantiation method behind iProver is a decision procedure for the
EPR class, and we expect a good performance in this class.
We also expect a reasonable performance in FOF and CNF classes.
MaLARea 0.3
Josef Urban
Charles University in Prague, Czech Republic
Architecture
MaLARea 0.3 [Urb07,
US+08] is a metasystem for ATP in large theories
where symbol and formula names are used consistently.
It uses several deductive systems (now E,SPASS,Paradox,Mace), as well
as complementary AI techniques like machine learning (the SNoW system)
based on symbol-based similarity, model-based similarity, term-based
similarity, and obviously previous successful proofs.
Strategies
The basic strategy is to run ATPs on problems, then use the machine learner
to learn axiom relevance for conjectures from solutions, and use the most
relevant axioms for next ATP attempts.
This is iterated, using different timelimits and axiom limits.
Various features are used for learning, and the learning is complemented
by other criteria like model-based reasoning, symbol and term-based
similarity, etc.
Implementation
The metasystem is implemented in ca. 2500 lines of Perl.
It uses many external programs - the above mentioned ATPs and machine
learner, TPTP utilities, LADR utilities for work with models, and some
standard Unix tools.
MaLARea is available at
http://kti.ms.mff.cuni.cz/cgi-bin/viewcvs.cgi/MPTP2/MaLARea/
The metasystem's Perl code is released under GPL2.
Expected Competition Performance
Thanks to machine learning, MaLARea is strongest on batches of many
related problems with many redundant axioms where some of the problems
are easy to solve and can be used for learning the axiom relevance.
MaLARea is not very good when all problems are too difficult (nothing to
learn from), or the problems (are few and) have nothing in common.
Some of its techniques (selection by symbol and term-based similarity,
model-based reasoning) could however make it even there slightly stronger
than standard ATPs.
MaLARea has a very good performance on the MPTP Challenge, which is a
predecessor of the LTB division.
MetaProver 1.0
Matthew Streeter
Carnegie Mellon University, USA
Architecture
MetaProver is a hybridization of solvers entered in last year's CASC
competition.
When given a problem instance, MetaProver runs one or more solvers subject to
time limits, according to a fixed schedule.
The following is an example of such a schedule.
- Run E 0.999 for 0.01 seconds.
- Run Geo 2007f for 0.02 seconds.
- Run Metis 2.0 for 0.01 seconds.
- Run E 0.999 for 0.03 seconds.
- Run Geo 2007f for 0.06 seconds.
- Run Paradox 2.2 for 0.3 seconds.
- Run E 0.999 for 0.07 seconds.
- Run Paradox 2.2 for 1.48 seconds.
- Run Metis 2.0 for 0.43 seconds.
- Run Paradox 2.2 for 3.34 seconds.
- Run iProver 0.2 for 32.75 seconds.
- Run E 0.999 for 164.54 seconds.
Note that the average time such a schedule requires to solve a problem instance
can be much lower than that of any of the algorithms used in the schedule.
To compute an effective schedule, the solvers entered in last year's CASC
competition were run on all the TPTP-v3.4.1 benchmark instances in the SAT
and FNT divisions, subject to a five minute time limit per instance (the SAT
and FNT divisions were chosen because preliminary experiments indicated that
a scheduling approach would work well in those divisions).
Using the timing data obtained from these runs, a schedule was then computed
that, if run on each benchmark instance, would solve the instances in the
minimum average time.
Two schedules were computed, one for the FNT instances and one for the SAT
instances.
The schedules were computed using the greedy approximation algorithm described
in [SGS07].
When executing each step of a schedule, we run the specified solver from
scratch and terminate it once it has run for the specified amount of time.
Alternatively, we could suspend the solver (rather than terminate it) at
the end of the step and resume the solver if and when it is used by a later
step.
We took the former approach for simplicity, and because we were concerned
about having multiple solvers in memory at the same time.
Because the time required to solve a particular benchmark instance varies
across machines, a schedule must be calibrated for use on a particular machine.
This calibration is performed as part of MetaProver's installation script.
The example schedule shown above is the schedule used by MetaProver for
instances in the FNT division, calibrated for execution on an Intel
Xeon 3.6 GHz machine with 4 GB of memory.
Strategies
The strategies used by MetaProver include those used by its component
algorithms: DarwinFM 1.4.1, E 0.999, Geo 2007f, iProver 0.2, Metis 2.0,
Paradox 1.3, and Paradox 2.2.
At a higher level, MetaProver adopts the strategy of running its component
algorithms according to a schedule derived from the algorithms' performance
on the relevant TPTP benchmark instances, as described in the previous
section.
This schedule is computed using performance data for a large number of
benchmark instances, and thus it is reasonable to expect the schedule's
performance to generalize well to new, previously unseen instances.
Implementation
MetaProver is implemented as a set of bash scripts, and runs on Linux.
It is available online at:
http://www.cs.cmu.edu/~matts/MetaProver
Expected Competition Performance
MetaProver outperforms last year's SAT division winner (Paradox 1.3) and
last year's FNT division winner (Paradox 2.2) on the relevant TPTP benchmarks.
Metis 2.1
Joe Hurd
Galois, Inc., USA
Architecture
Metis 2.1 [Hur03] is a proof tactic used in the
HOL4 interactive theorem
prover. It works by converting a higher order logic goal to a set of
clauses in first order logic, with the property that a refutation of
the clause set can be translated to a higher order logic proof of the
original goal.
Experiments with various first order calculi
[Hur03] have shown a given clause
algorithm and ordered resolution to best suit this application,
and that is what Metis 2.1 implements.
Since equality often appears in interactive theorem prover goals,
Metis 2.1 also implements the ordered paramodulation
calculus.
Strategies
Metis 2.1 uses a fixed strategy for every input problem.
Negative literals are always chosen in favour of positive literals, and
terms are ordered using the Knuth-Bendix ordering with uniform symbol
weight and precedence favouring reduced arity.
Implementation
Metis 2.1 is written in Standard ML, for ease of integration with
HOL4. It uses indexes for resolution, paramodulation, (forward)
subsumption and demodulation. It keeps the Active clause set
reduced with respect to all the unit equalities so far derived.
In addition to standard size and distance measures, Metis 2.1 uses
finite models to weight clauses in the Passive set. When
integrated with higher order logic, a finite model is manually
constructed to interpret standard functions and relations in such a
way as to make many axioms true and negated goals false. Non-standard
functions and relations are interpreted randomly, but with a bias towards
making negated goals false. Since it is part of
the CASC competition rules that standard functions and relations are
obfuscated, Metis 2.1 will back-off to interpreting all functions and
relations randomly (except equality), using a finite model with 4 elements.
Metis 2.1 reads problems in TPTP format and outputs detailed proofs in
TSTP format, where each proof step is one of 6 simple inference rules.
Metis 2.1 implements a complete calculus, so when the set of clauses is
saturated it can soundly declare the input problem to be unprovable (and
outputs the saturation set).
Metis 2.1 is free software, released under the GPL. It can be
downloaded from:
http://www.gilith.com/software/metis
Expected Competition Performance
The major change between Metis 2.0, which was entered into CASC-21, and
Metis 2.1 is the TSTP proof format. There were only minor changes to the
core proof engine, so Metis 2.1 is expected to perform at approximately
the same level and end up in the lower half of the table.
Muscadet 3.0
Dominique Pastre
Université René Descartes Paris‑5, France
Architecture
The MUSCADET theorem prover is a knowledge-based system.
It is based on Natural Deduction, following the terminology of
[Ble71] and
[Pas78],
and uses methods which resembles those used by humans.
It is composed of an inference engine, which interprets and executes rules,
and of one or several bases of facts,
which are the internal representation of "theorems to be proved".
Rules are either universal and put into the system, or built by the system
itself by metarules from data (definitions and lemmas).
Rules may add new hypotheses, modify the conclusion, create objects,
split theorems into two or more subtheorems
or build new rules which are local for a (sub-)theorem.
Strategies
There are specific strategies for existential, universal, conjonctive or
disjunctive hypotheses and conclusions.
Functional symbols may be used, but an automatic creation of intermediate
objects allows deep subformulae to be flattened and treated as if the
concepts were defined by predicate symbols.
The successive steps of a proof may be forward deduction (deduce new hypotheses
from old ones), backward deduction (replace the conclusion by a new one) or
refutation (only if the conclusion is a negation).
The system is also able to work with second order statements. It
may also receive knowledge and know-how for a specific domain from a
human user; see [Pas89] and
[Pas93]. These two possibilities
are not used while working with the TPTP Library.
Implementation
Muscadet 3.0 [Pas01] is implemented in SWI-Prolog.
Rules are written as declarative Prolog clauses.
Metarules are written as sets of Prolog clauses, more or less declarative.
The inference engine includes the Prolog interpreter and some procedural Prolog
clauses.
Proofs are given in natural style
(for each step, that is for each action or rule application,
the system gives the new fact, the precedent facts its comes from
and an explanation).
Muscadet 3.0 is available from:
http://www.math-info.univ-paris5.fr/~pastre/muscadet/muscadet.html
Expected Competition Performance
The best performances of Muscadet will be for problems
manipulating many concepts in which all statements (conjectures,
definitions, axioms) are expressed in a manner similar to the
practice of humans, especially of mathematicians
[Pas02,Pas07].
It will have poor performances for problems using few concepts but large
and deep formulas leading to many splittings.
Muscadet 3.0 will probably have the same performances as Muscadet 2.7a
(2007 CASC-21 version + bugfixed),
but will give an out proof for most solved problems.
OSHL-S 0.1
Hao Xu, David Plaisted
University of North Carolina at Chapel Hill, USA
Architecture
OSHL-S is a theorem prover based on the architecture and strategy
introduced in [PZ00] with a few improvements.
A preliminary form of type inference is employed to reduce the number of
instances that are generated before a contradicting instance is found.
Strategies
OSHL-S employs a uniform strategy for all problems, which is
similar to OSHL:
It starts with a all negative or all positive model.
In each iteration, it finds a contradicting instance according to a
relaxed ordering and type information, and then modifies the model to
make the instance satisfiable, if possible.
Implementation
OSHL-S is implemented in Java using Java SE 6 SDK and the Netbeans IDE.
The profiler and the GUI provided in Java SE 6 and the Netbeans
IDE are employed to profile the system, which provide performance
data that are used for optimization.
The implementation of OSHL-S combines a few strategies for improving the
performance of the system, including caching of terms, clauses, and solutions
to integer programming problems, lazy-evaluation, and backtracking.
Expected Competition Performance
The performance should be good for near propositional problems;
otherwise, the competition performance is unknown.
Otter 3.3
William McCune
Argonne National Laboratory, USA
Architecture
Otter 3.3 [McC03a] is an ATP system for
statements in first-order (unsorted) logic with equality.
Otter is based on resolution and paramodulation applied to clauses.
An Otter search uses the "given clause algorithm", and typically involves
a large database of clauses; subsumption and demodulation play an important
role.
Strategies
Otter's original automatic mode, which reflects no tuning to the TPTP
problems, will be used.
Implementation
Otter is written in C.
Otter uses shared data structures for clauses and terms, and it uses
indexing for resolution, paramodulation, forward and backward subsumption,
forward and backward demodulation, and unit conflict.
Otter is available from:
http://www-unix.mcs.anl.gov/AR/otter/
Expected Competition Performance
Otter has been entered into CASC as a stable benchmark against which
progress can be judged (there have been only minor changes to Otter since
1996 [MW97], nothing that really affects
its performace in CASC).
This is not an ordinary entry, and we do not hope for Otter to do well
in the competition.
Acknowledgments: Ross Overbeek, Larry Wos, Bob Veroff, and
Rusty Lusk contributed to the development of Otter.
Paradox 1.3
Koen Claessen, Niklas Sörensson
Chalmers University of Technology and
Gothenburg University, Sweden
Architecture
Paradox [CS03] is a
finite-domain model generator. It is based on a MACE-style
[McC94] flattening and instantiating of the
first-order clauses into propositional clauses, and then the use of
a SAT solver to solve the resulting problem.
Paradox incorporates the following features: Polynomial-time
clause splitting heuristics, the use of incremental SAT,
static symmetry reduction techniques, and the use of sort
inference.
The main differences with Paradox 1.0 are: a better SAT-solver,
better memory behaviour, and a faster clause instantiation algorithm.
Strategies
There is only one strategy in Paradox:
- Analyze the problem, finding an upper bound N on the domain size of
models, where N is possibly infinite.
A finite such upper bound can be found, for example, for EPR problems.
- Flatten the problem, and split clauses and simplify as much as possible.
- Instantiate the problem for domain sizes 1 up to N, applying the SAT
solver incrementally for each size.
Report "SATISFIABLE" when a model is found.
- When no model of sizes smaller or equal to N is found, report
"CONTRADICTION".
In this way, Paradox can be used both as a model finder and as an EPR solver.
Implementation
The main part of Paradox is implemented in Haskell using the GHC compiler.
Paradox also has a built-in incremental SAT solver which is written in C++.
The two parts are linked together on the object level using Haskell's Foreign
Function Interface.
Expected Competition Performance
Paradox 1.3 is the CASC-21 SAT division winner.
Paradox 2.2 and 3.0
Koen Claessen, Niklas Sörensson
Chalmers University of Technology, Sweden
Architecture
Paradox 2.2 is a rewrite of Paradox 1.3.
Paradox 2.2 does not have all
the features yet that Paradox 1.3 has. Some experimental features,
such as type-based model finding, have been added.
Paradox 3.0 has the same description as Paradox 2.2.
See the description of Paradox 1.3 for
general information.
Expected Competition Performance
Paradox 2.2 is the CASC-21 FNT division winner.
randoCoP 1.1
Jens Otten, Thomas Raths
University of Potsdam, Germany
Architecture
randoCoP [RO08] is an automated theorem prover
for classical first-order logic. It is an extension of
the leanCoP [Ott08a,OB03]
prover, which is a very compact implementation of the connection calculus.
It integrates a technique that randomly alters the proof
search order by reordering the axioms of the given problem and
the literals within its clausal form.
Strategies
The shell script of randoCoP repeatedly reorders the
axioms and the literals within the clausal form, before
invoking the two most successful variants of the leanCoP 2.0
core prover. These variants use restricted backtracking
[Ott08b] and benefit significantly from the
reordering techniques used by randoCoP.
Implementation
randoCoP is implemented in Prolog. The essential
extensions consist of a modified shell script, which
is used to invoke the leanCoP 2.0 core prover,
and a few predicates that realize the reordering of axioms
and literals.
A preprocessing component translates first-order formulas
into a (definitional) clausal form.
Equality can be handled by adding the equality axioms.
Version 1.1 of randoCoP returns a compact connection
proof, which is then translated into a readable proof.
The source code of randoCoP is available at
http://www.leancop.de
Expected Competition Performance
The performance of randoCoP is in particular good
on problems involving large theories, i.e., problems
that contain a large number of axioms.
SInE 0.3 and SInE-VD 0.3
Krystof Hoder
Charles University in Prague, Czech Republic
Architecture
SiNE 0.3 is an axiom selection system for first order theories.
It uses a syntactic approach based on symbols presence in axioms and
conjecture.
(When we say symbols, we mean functional, predicate and constant symbols
taken together.)
A relation D (as in "Defines") is created between symbols and axioms which
represents the fact that for a symbol there are some axioms that "give it
its meaning".
When the relation is constructed, the actual axiom selection starts.
At the beginning only the conjecture is selected, in each iteration the
selection is extended by all axioms that are D-related to any symbol used
in already selected axioms.
The iteration goes until no more axioms are selected.
Then the selected axioms are handed to an underlying inference engine.
Strategies
The construction of the D relation is inspired by the idea that general
symbols are more likely to define the meaning of more specific axioms than
vice-versa.
So given a generality measure on symbols, SiNE puts each axiom into the
relation with the least general of its symbols.
(When there are more of them, all are put in the relation.)
The generality measure used is the number of axioms in which the symbol
occurs.
(General symbols as s_Entity are likely to be used more often
than specific symbols like s_Monday.)
One slight optimisation for SUMO problems is also used: When we run into
a symbol ending with "_M" we remove the suffix for the selection process.
The "_M" suffix is used when a predicate symbol should be used as functional.
This strategy selects about 2% of axioms on problems CSR(075-109).
Implementation
The axiom selection is implemented in Python.
At first all problem files are read and include directives are extracted,
then problems which include the same sets of axioms are grouped together.
After that an iteration over phases starts.
Each phase defines one proving attempt on each problem.
It specifies the amount of time (relative to remaining time and unsolved
problem count), whether axioms from other successful proofs (in the same
group of problems) are included, and benevolence.
Benevolence greater than 1 means that not only the least general symbols can
be D-related to their formulas.
In each phase groups are iterated through, in each group the axioms which
would be included by problems are loaded and preprocessed, constructing the
D relation.
(This takes most of time of the whole axiom selection.)
Then for each problem a set of axioms is selected based on all symbols that
occur in the problem file, and an underlying prover is called.
There are two underlying provers supported: E and Vampire9.
E will be used in the competition division and Vampire9 in the demonstration
division (as the usage of Vampire in the competition division was not
allowed by its developers).
SInE is available from:
http://www.ms.mff.cuni.cz/~hodek4am/sine.html
Expected Competition Performance
The batch mode was run on the problems from CSR[075-109]+[1-3].p whose status
was Theorem.
Using Vampire SInE proved 67 of 69 problems with an average time 130 seconds
per problem.
Using E, 64 problems were proved which took 20 seconds per problem.
Vampire 8.1
Andrei Voronkov
University of Manchester, England
No system description supplied.
However, see the
description of Vampire 8.0 for general information.
Minor changes have been made, including a bugfix in the FOF to
CNF conversion.
Expected Competition Performance
Vampire 8.1 is the CASC-21 CNF division winner.
Vampire 9.0
Andrei Voronkov
University of Manchester, England
None supplied.
Expected Competition Performance
Vampire 9.0 is the CASC-21 FOF division winner.
Waldmeister 806
Thomas Hillenbrand^{1}, Bernd Löchner^{2}
^{1}Max-Planck-Institut für Informatik Saarbrücken, Germany
^{2}Technische Universität Kaiserslautern, Germany,
No system description supplied.
However, see the
description of Waldmeister 704 for general information.
Expected Competition Performance
Waldmeister 806 is the CASC-21 UEQ division winner.
Zenon 0.5.0
Damien Doligez
INRIA, France
Architecture
Zenon 0.5.0 [BDD07] is based on the tableau method
with free variables.
It uses a nondestructive way of handling free variables, which enables
a purely local search procedure: each branch is closed before the
next one is explored.
Zenon outputs totally formal proofs that can be checked by Coq.
Strategies
Implementation
Zenon is written in Objective Caml.
It can be downloaded from:
http://focal.inria.fr/zenon
Expected Competition Performance
Zenon is still in the prototype stage and we don't really expect
brilliant results at this point.
References
- BG98
- Bachmair L., Ganzinger H. (1998),
Equational Reasoning in Saturation-Based Theorem Proving,
Bibel W., Schmitt P.H.,
Automated Deduction, A Basis for Applications,
pp.352-397,
Applied Logic Series 10,
Kluwer Academic Publishers.
- BFN96
- Baumgartner P., Furbach U., Niemelä I. (1996),
Hyper Tableaux,
Alferes J., Pereira L., Orlowska E.,
Proceedings of JELIA'96: European Workshop on Logic in Artificial
Intelligence
(Evora, Portugal),
pp.1-17,
Lecture Notes in Artificial Intelligence 1126,
Springer-Verlag.
- BT03
- Baumgartner P., Tinelli C. (2003),
The Model Evolution Calculus,
Baader F.,
Proceedings of the 19th International Conference on Automated
Deduction
(Miami, USA),
pp.350-364,
Lecture Notes in Artificial Intelligence 2741,
Springer-Verlag.
- BFT04
- Baumgartner P., Fuchs A., Tinelli C. (2004),
Darwin - A Theorem Prover for the Model Evolution
Calculus,
Sutcliffe G., Schulz S., Tammet T.,
Proceedings of the Workshop on Empirically Successful First Order
Reasoning, 2nd International Joint Conference on Automated Reasoning
(Cork, Ireland).
- BT05
- Baumgartner P., Tinelli C. (2005),
The Model Evolution Calculus with Equality,
Nieuwenhuis R.,
Proceedings of the 20th International Conference on Automated
Deduction
(Tallinn, Estonia),
pp.392-408,
Lecture Notes in Artificial Intelligence 3632,
Springer-Verlag.
- BFT06a
- Baumgartner P., Fuchs A., Tinelli C. (2006),
Implementing the Model Evolution Calculus,
International Journal on Artificial Intelligence Tools 15(1),
pp.21-52.
- BFT06b
- Baumgartner P., Fuchs A., Tinelli C. (2006),
Lemma Learning in the Model Evolution Calculus,
Hermann M., Voronkov A.,
Proceedings of the 13th International Conference on Logic for
Programming, Artificial Intelligence, and Reasoning
(Phnom Penh, Cambodia),
Lecture Notes in Artificial Intelligence.
- BFP07
- Baumgartner P., Furbach U., Pelzer B. (2007),
Hyper Tableaux with Equality,
Pfenning F.,
Proceedings of the 21st International Conference on Automated
Deduction
(Bremen, Germany),
pp.492-507,
Lecture Notes in Artificial Intelligence 4603,
Springer-Verlag.
- Ble71
- Bledsoe W.W. (1971),
Splitting and Reduction Heuristics in Automatic Theorem
Proving,
Artificial Intelligence 2,
pp.55-77.
- BK+08
- Bongio J., Katrak C., Lin H., Lynch C., McGregor R. (2008),
Encoding First Order Proofs in SMT,
Krstic S., Oliveras A.,
Proceedings of the 5th International Workshop on Satisfiability
Modulo Theories
(Berlin, Germany),
pp.71-84,
Electronic Notes in Theoretical Computer Science 198(2).
- BDD07
- Bonichon R., Delahaye D., Doligez D. (2007),
Zenon : An Extensible Automated Theorem Prover Producing Checkable
Proofs,
Dershowitz N., Voronkov A.,
Proceedings of the 14th International Conference on Logic for
Programming, Artificial Intelligence, and Reasoning
(Yerevan, Armenia),
pp.151-165,
Lecture Notes in Artificial Intelligence 4790.
- CS03
- Claessen K., Sorensson N. (2003),
New Techniques that Improve MACE-style Finite Model
Finding,
Baumgartner P., Fermueller C.,
Proceedings of the CADE-19 Workshop: Model Computation - Principles,
Algorithms, Applications
(Miami, USA).
- DH+07
- Deshane T., Hu W., Jablonski P., Lin H., Lynch C., McGregor R. (2007),
Encoding First Order Proofs in SAT,
Pfenning F.,
Proceedings of the 21st International Conference on Automated
Deduction
(Bremen, Germany),
pp.476-491,
Lecture Notes in Artificial Intelligence 4603,
Springer-Verlag.
- GK03
- Ganzinger H., Korovin K. (2003),
New Directions in Instantiation-Based Theorem Proving,
Kolaitis P.,
Proceedings of the 18th IEEE Symposium on Logic in Computer
Science
(Ottawa, Canada),
pp.55-64.
- GK04
- Ganzinger H., Korovin K. (2004),
Integrating Equational Reasoning into Instantiation-Based Theorem
Proving,
Marcinkowski J., Tarlecki A.,
Proceedings of the 18th International Workshop on Computer Science
Logic, 13th Annual Conference of the EACSL
(Karpacz, Poland),
pp.71-84,
Lecture Notes in Computer Science 3210.
- Hur03
- Hurd J. (2003),
First-Order Proof Tactics in Higher-Order Logic Theorem
Provers,
Archer M., Di Vito B., Munoz C.,
Proceedings of the 1st International Workshop on Design and
Application of Strategies/Tactics in Higher Order Logics
(Rome, Italy),
pp.56-68,
NASA Technical Reports NASA/CP-2003-212448.
- Kor08a
- Korovin K. (2008),
An Invitation to Instantiation-Based Reasoning: From Theory to
Practice,
Podelski A., Voronkov A., Wilhelm R.,
Volume in Memoriam of Harald Ganzinger,
Springer-Verlag.
- Kor08b
- Korovin K. (2008),
iProver - An Instantiation-Based Theorem Prover for First-order
Logic (System Description),
Baumgartner P., Armando A., Gilles D.,
Proceedings of the 4th International Joint Conference on Automated
Reasoning
(Sydney, Australia),
Lecture Notes in Artificial Intelligence.
- Loe04
- Loechner B. (2004),
What to Know When Implementing LPO,
Sutcliffe G., Schulz S., Tammet T.,
Proceedings of the Workshop on Empirically Successful First Order
Reasoning, 2nd International Joint Conference on Automated Reasoning
(Cork, Ireland).
- MW97
- McCune W.W., Wos L. (1997),
Otter: The CADE-13 Competition Incarnations,
Journal of Automated Reasoning 18(2),
pp.211-220.
- McC94
- McCune W.W. (1994),
A Davis-Putnam Program and its Application to Finite First-Order
Model Search: Quasigroup Existence Problems,
ANL/MCS-TM-194,
Argonne National Laboratory, Argonne, USA.
- McC03a
- McCune W.W. (2003),
Otter 3.3 Reference Manual,
ANL/MSC-TM-263,
Argonne National Laboratory, Argonne, USA.
- OB03
- Otten J., Bibel W. (2003),
leanCoP: Lean Connection-Based Theorem Proving,
Journal of Symbolic Computation 36(1-2),
pp.139-161.
- Ott08a
- Otten J. (2008),
leanCoP 2.0 and {\sf ileancop 1.2}: High Performance Lean Theorem
Proving in Classical and Intuitionistic Logic,
Baumgartner P., Armando A., Gilles D.,
Proceedings of the 4th International Joint Conference on Automated
Reasoning
(Sydney, Australia),
pp.Accepted,
Lecture Notes in Artificial Intelligence.
- Ott08b
- Restricting Backtracking in Connection Calculi,
Technical Report,
Institut für Informatik, University of Potsdam, Potsdam, Germany.
- Pas78
- Pastre D. (1978),
Automatic Theorem Proving in Set Theory,
Artificial Intelligence 10,
pp.1-27.
- Pas89
- Pastre D. (1989),
Muscadet : An Automatic Theorem Proving System using Knowledge
and Metaknowledge in Mathematics,
Artificial Intelligence 38,
pp.257-318.
- Pas93
- Pastre D. (1993),
Automated Theorem Proving in Mathematics,
Annals of Mathematics and Artificial Intelligence 8,
pp.425-447.
- Pas01
- Pastre D. (2001),
Muscadet 2.3 : A Knowledge-based Theorem Prover based on Natural
Deduction,
Gore R., Leitsch A., Nipkow T.,
Proceedings of the International Joint Conference on Automated
Reasoning
(Siena, Italy),
pp.685-689,
Lecture Notes in Artificial Intelligence 2083,
Springer-Verlag.
- Pas02
- Pastre D. (2002),
Strong and Weak Points of the Muscadet Theorem Prover,
AI Communications 15(2-3),
pp.147-160.
- Pas07
- Pastre D. (2007),
Complementarity of a Natural Deduction Knowledge-based Prover
and Resolution-based Provers in Automated Theorem Proving,
http://www.math-info.univ-paris5.fr/~pastre/compl-NDKB-RB.pdf.
- PW07
- Pelzer B., Wernhard C. (2007),
System Description: E-KRHyper,
Pfenning F.,
Proceedings of the 21st International Conference on Automated
Deduction
(Bremen, Germany),
pp.508-513,
Lecture Notes in Artificial Intelligence 4603,
Springer-Verlag.
- PZ00
- Plaisted D.A., Zhu Y. (2000),
Ordered Semantic Hyper-linking,
Journal of Automated Reasoning 25(3),
pp.167-217.
- Sch02
- Schulz S. (2002),
E: A Brainiac Theorem Prover,
AI Communications 15(2-3),
pp.111-126.
- Sch04a
- Schulz S. (2004),
System Abstract: E 0.81,
Rusinowitch M., Basin D.,
Proceedings of the 2nd International Joint Conference on Automated
Reasoning
(Cork, Ireland),
pp.223-228,
Lecture Notes in Artificial Intelligence 3097.
- Sch04b
- Schulz S. (2004),
Simple and Efficient Clause Subsumption with Feature Vector
Indexing,
Sutcliffe G., Schulz S., Tammet T.,
Proceedings of the Workshop on Empirically Successful First Order
Reasoning, 2nd International Joint Conference on Automated Reasoning
(Cork, Ireland).
- SGS07
- Streeter M., Golovin D., Smith S.F. (2007),
Combining Multiple Heuristics Online,
Holte R.C., Howe A.,
Proceedings of the 22nd Conference on Artificial Intelligence
(Vancouver, Canada),
pp.1197-1203,
AAAI Press.
- Ull89
- Ullman J. (1989),
Principles of Database and Knowledge-Base Bystems,
Computer Science Press, Inc..
- US+08
- Urban J., Sutcliffe G., Pudlak P., Vyskocil J. (2008),
MaLARea SG1: Machine Learner for Automated Reasoning with
Semantic Guidance,
Baumgartner P., Armando A., Gilles D.,
Proceedings of the 4th International Joint Conference on Automated
Reasoning
(Sydney, Australia),
Lecture Notes in Artificial Intelligence.
- Urb07
- Urban J. (2007),
MaLARea: a Metasystem for Automated Reasoning in Large
Theories,
Urban J., Sutcliffe G., Schulz S.,
Proceedings of the CADE-21 Workshop on Empirically Successful
Automated Reasoning in Large Theories
(Bremen, German),
pp.45-58,
CEUR Workshop Proceedings 257.
- Wer03
- Wernhard C. (2003),
System Description: KRHyper,
Fachberichte Informatik 14--2003,
Universität Koblenz-Landau, Koblenz, Germany.