Edinburgh Research Explorer The Use of Max-Sat for Optimal Choice of Automated Theory Repairs

. The ABC system repairs faulty Datalog theories using a combination of abduction, belief revision and conceptual change via reformation. Abduction and Belief Revision add/delete axioms or delete/add preconditions to rules, respectively. Reformation repairs them by changing the language of the faulty theory. Unfortunately, the ABC system overproduces repair suggestions. Our aim is to prune these suggestions to leave only a Pareto front of the optimal ones. We apply an algorithm for solving Max-Sat problems, which we call the Partial Max-Sat algorithm , to form this Pareto front.


Introduction
We model the environment as a logical theory.Such a theory will need to be repaired when: errors are detected; the environment changes; or it needs to be re-tuned to cope with new tasks.The ABC system repairs faulty logical theories [9].It is given a theory, T, as a set of axioms in the decidable logic Datalog [2], and some observations S, represented as a pair of sets of ground propositions.One set, T (S), is of propositions observed to be true of the environment and the other, F(S), of those observed to be false.T is used to make predictions about the environment.When these predictions conflict with the observations in S, the ABC system applies a sequence of repairs to T until it is fault free.ABC is unique in repairing the language of T as well as the axioms.
T's predictions are wrong if it proves something in F(S) (incompatibility) or fails to prove something in T (S) (insufficiency).The ABC system then tries to repair T either by adding/deleting axioms, deleting/adding preconditions to rules or changing T's language.Language changes are implemented by reformation [1] and consist of splitting/merging predicates or constants, or adding/deleting arguments of predicates.Unfortunately, ABC produces too many repair options.In this paper, we describe and evaluate the use of Partial Max-Sat to detect and prune sub-optimal repairs.Optimal repairs, are those that minimise the number of any remaining or newly introduced faults.Our hypothesis then is: Marius Urbonas was funded by a studentship from the Student Awards Agency Scotland, Alan Bundy was funded by EPSRC grant F14R10199 and Juan Casanova by an EPSRC CDT in Data Science and a Brainnwave studentship.We are grateful to Joshua Knowles for suggesting this project, and to several anonymous reviewers for suggestions that improved the paper.
Our Partial Max-Sat based algorithm prunes sub-optimal repairs from ABC's output.It usually terminates successfully with a significantly smaller set of fault-free, optimal repaired theories.
The results supporting this claim are discussed in §5.
Note that a Pareto-front is required because there are conflicting requirements on the repair process.Repairing an incompatibility reduces the number of theorems in order to remove the false one.Repairing an insufficiency, on the other hand, increases the number of theorems to add the true but unprovable one.So, there may be multiple incomparable and conflicting optimal repairs.
Several of the algorithms we use have worst-case exponential complexity in time and/or space.These complexities do not compound and our scaling experiment at the end of §2.4 shows a quadratic time complexity.
The ABC system is not intended to be a stand-alone system, but to be a component of a larger system, for instance, with sensors, planners, actuators, etc.The wider context will help address some of the current gaps in ABC, e.g.,Where do the observations come from?How to choose the best optimal, fault-free repair?How to assign meaningful names to newly created concepts?

Background
We first describe the ABC system.We define: what we mean by a fault; the Datalog theories that ABC repairs; SL Resolution, which ABC uses for deduction; the repair operations it uses; and we illustrate the overproduction problem that this paper addresses.

Faults as Reasoning Failures
Both incompatibility and insufficiency arise from reasoning failures: mismatches between the theorems of a theory T and the observations of the environment T (S), F(S) .A ground proposition is a formula of the form P (C 1 , . . ., C n ), where P is an n-ary predicate and the C i s are constants.So, ideally: That is, the true ground propositions are theorems of T and the false ones are not.The language of T is given in Definition 2 and the inference system in §2.3.

Definition 1 (Incompatible and Insufficient
) The ABC system detects and repairs both kinds of faults.

Datalog Theories
To ensure termination of proof search, it is convenient to limit the ABC System to a decidable logic.Datalog theories are not only decidable but are sufficiently expressive to admit a wide range of practical applications.Although reformation has been implemented for richer logics [1,11].Datalog is a logic programming language consisting of Horn clauses in which there are no functions except constants.We represent clauses in Kowalski normal form: an implication between a conjunction of the negated propositions and a disjunction of the positive propositions, i.e., in Kowalski normal form, a clause: In Horn clauses n = 0 or n = 1, so they fit one of the four forms in Definition 2.

Definition 2 (Datalog Formulae)
Let the language of a Datalog theory T be a triple P, C, V , where P are the propositions, C are the constants and V are the variables.We will adopt the convention that variables are written in lower case, and constants and predicates start with a capital letter 1 .A proposition is a formula of the form P (t 1 , . . ., t n ), where t j ∈ C ∪ V for 1 ≤ j ≤ n, i.e., there are no compound terms.Let R ∈ P and Q i ∈ P for 0 ≤ i ≤ m in T. R is called the head of the clause and the conjunction of the Q i s forms the body.
These usually represent the rules of T. Assertion: =⇒ R.These usually represent the facts of T. Goals: Q 1 ∧ . . .∧ Q m =⇒ .These usually arise from the negation of the conjecture to be proved and from subsequent subgoals in a derivation.Empty Clause: =⇒ .This represents false, which is the target of a refutation-style proof.Deriving it, therefore, represents success in proving a conjecture.
Repairs operate on the language of T and on both its implications and assertions.The Datalog safety condition requires that every variable that appears in the head of a clause also appears in the body.Variables in the head but not the body are called orphans2 .There are other Datalog restrictions, but these are to make it behave efficiently as a programming language and we do not need to adopt them.As we will see, despite these restrictions, Datalog is sufficiently expressive for many practical applications.
A small Datalog theory is given in Example 1.The axioms assert that all birds can fly and are feathered, penguins are birds, and Tweety and Polly are both birds.

Deduction by SL Resolution
Deduction in Datalog is decidable but exponential.So, if there is no proof of a conjecture, the search will eventually terminate without success, so we can be sure that the conjecture is not a theorem.Such finite failure is important for detecting insufficiencies, so was one of the technical reasons for choosing Datalog.
However, if the minimal proof is long then the search for it could exhaust the available resources.Fortunately, in many practical applications, the number of rules is small compared to the facts 3 .So proofs are quite short, even when the number of axioms is large.Resolution proofs work by refutation: the conjecture to be proved is negated and added to the axioms.If the empty clause, =⇒ , is derived then the conjecture has been proved by reductio ad absurdum.In Horn clauses, the negated conjecture takes the form of a goal clause.For deduction, we use SL Resolution [8], a deductive rule that is particularly well suited to fault diagnosis.A single SL Resolution step takes the following form: where the highlighted R i is the selected goal, the highlighted P is the rule head it is resolved with and σ is the most general substitution of terms for variables that will make P and R i identical.Note that, to prevent the same variable appearing in both the selected proposition and the head of the axiom, the variables in the axiom should be renamed to new variables.To aid readability, we will do this conservatively.
A SL Resolution refutation on Horn clauses takes the form of a linear sequence of SL Resolutions steps (4) in which a goal in each goal clause is resolved with either the head of an implication (rule) or an assertion (fact).This has the advantage that we can apply any repair directly to the axiom involved in either the current or an earlier SL Resolution step in the current branch, so we do not need to inherit the repair back up through derived clauses to an axiom.This advantage is inherited by restricting to Datalog, as all its formulae are Horn clauses, which is the second technical reason for choosing Datalog.

Repair Operations
Incompatibility and insufficiency faults are diagnosed and repaired in a dual way.F(S) and T (S) are both finite sets.The ABC system tries to prove each member of these sets.
If a member of F(S) is proved then we have discovered an incompatibility.Similarly, if a member of T (S) is not proved then we have discovered an insufficiency.Incompatibilities can be repaired by blocking the unwanted proof.Insufficiencies can be repaired by unblocking a wanted, but failed proof.The repair operations used by the ABC system are listed in Definitions 3 and 4. They are drawn from the literature on abduction and belief revision, plus our own work on reformation.Note that a single repair application may not produce a fault-free theory.Several applications may be required.
New applications, however, occasionally reveal the opportunity or necessity of new kinds of repair operations or the generalisation of existing operations.So, the space of repair operations seems open-ended and we make no claim to have exhausted the possibilities.In fact, given the unbounded nature of ingenuity, we doubt that an exhaustive classification of repair operations exists or, even if one did, that it could be proved to be exhaustive.
Definition 3 (Repair Operations for Incompatibility) In the case of incompatibility, the unwanted proof can be blocked by causing any of the resolution steps to fail.Suppose the targeted resolution step is between a goal P (s 1 , . . ., s n ) and an axiom Body =⇒ P (t 1 , . . ., t n ), where each s i and t i pair can be unified.Possible repair operations are as follows: Belief Revision 1: Delete the targeted axiom.Belief Revision 2: Add an additional precondition to the body of an earlier rule axiom which will become an unprovable subgoal in the unwanted proof.Note that we disallow repairs that would change S.This is because S consists of observations of the environment.Our goal is to repair the theory T so that it predicts our observations S of the environment -not the other way around.There is also the practical consideration that if a predicate, say, P (C) ∈ T (S), were changed to, say, P (C, N ormal) and P (C, Abnormal) then we would have no basis to say whether either of them belonged to T (S) or F(S).This would make it difficult to track the progress of a sequence of repairs.This restriction is implemented by a mechanism that protects nominated predicates and constants from being changed by repairs [9].
A repair of an incompatibility can be illustrated with T T w from Example 1 and the refutation in Example 2. Suppose we observe that T weety cannot fly, i.e., that F ly(T weety) ∈ F(S).Since refutation 2 proves F ly(T weety), we have an incompatibility.Suppose we decide to break the unwanted refutation 2 at the highlighted resolution step.One repair suggestion is to apply Reformation 2 from Definition 3.This will give the repaired theory ν(T T w )4 : Bird(x, N ormal ) =⇒ F ly(x) Bird(x, y ) =⇒ F eathered(x) P enguin(y) =⇒ Bird(y, Abnormal ) =⇒ P enguin(T weety) =⇒ Bird(P olly, N ormal ) where N ormal and Abnormal are two new constants.F ly(T weety) is no longer a theorem of this repaired theory.
The naming of these two new constants was suggested by the observation that new constants introduced by repair Reformation 2, i.e.. by giving P a new argument, often distinguish two kinds of P , where the abnormal kind was from the axiom in the now broken resolution step.
These repair operations have been applied to a wide range of examples, some of which can be found in Table 1.In addition, we have evaluated the scalability of the ABC system by applying it to the alignment of two commercial databases with sample sizes up to 1020 entries 5 .Known misalignments were put into F(S) and the remainder into T (S).The time taken to find all repairs for a sample was shown to be a quadratic function of the size of the sample, so the ABC system was shown experimentally to have a feasible computational complexity.

Overproduction of Repair Suggestions
The main problem with the theory repair mechanism outlined in §2.4, is overproduction, i.e., it makes too many repair suggestions.The contribution of this paper is a Partial Max-Sat-based mechanism for pruning sub-optimal repair suggestions.To illustrate the problem, let us consider some of the other repair suggestions that the ABC system generates for repairing the incompatibility in the theory T T w from Example 1.
Note that the ABC system can break the unwanted proof in Example 2 at each of the 3 resolution steps, and those steps can be broken using each of the 5 repair operations described in Definition 3, sometimes in more than one way.For the purposes of analysis, let us additionally assume the observations F eathered(T weety) ∈ T (S) and F ly(P olly) ∈ T (S).Note that both F eathered(T weety) and F ly(P olly) are theorems of T. So a new insufficiency will be introduced if either of them is not a theorem of the repaired theory ν(T).Consider the following repair suggestions to T.
Belief Revision 1: Delete axiom (2), for instance.Note that, F eathered(T weety) is no longer a theorem, so this deletion will cause an insufficiency.Belief Revision 2: Add an additional precondition to the body of axiom (2).User interaction is required to suggest a suitable precondition.Moreover, F eathered(T weety) is no longer a theorem, so this repair will also cause an insufficiency.Reformation 1: Rename Bird in axiom (2) to the new predicate Bird .Note that F eathered(T weety) is no longer a theorem, which causes the same insufficiency as in the previous two repairs.If, instead, Bird in axiom 1 were renamed, then F ly(P olly) would cease to be a theorem which would cause a different insufficiency.Reformation 2: This is the repair described in §2.4.Note that F eathered(T weety) and F ly(P olly) are still theorems, so this repair avoids the insufficiencies caused by the other four repairs.Reformation 3: This is not applicable to axiom (2), but could be applied to axiom (3) to rewrite it to =⇒ P enguin(T weety ).Note that F eathered(T weety) is no longer a theorem.In addition, a new incompatibility will be caused if it is observed that F ly(T weety ) ∈ F(S).
Without pruning sub-optimal repairs, the ABC System makes 10 repair suggestions for this faulty theory.For incompatibilities with several or longer unwanted proofs, the number of repair suggestions can be much more.The pruning mechanism described in §4, will prune all but the Reformation 2 repair described in §2.4.

Pruning out Sub-Optimal Repairs
The ABC System is applied to Datalog theories, whereas Partial Max-Sat, which is the main component of our pruning mechanism, and similar Sat-based algorithms, are designed for propositional logic.The theory behind reducing Datalog-like theories to propositional ones is well known, but is briefly discussed in §3.1.This is followed by a brief introduction to Partial Max-Sat in §3.2 and how we use it in §4.

Turning First-Order Theories into Propositional Logic
All Datalog theories can be converted into equivalent propositional ones.Note that if we ground all axioms in a theory T by instantiating their variables in all possible ways with constants we will get another theory Ground(T) in which all the axioms are variablefree Horn clauses.Since Datalog theories have no non-nullary functions, Ground(T) has only a finite number of axioms.Moreover, Ground(T) has a model iff T has one [6].We can view Ground(T) as a propositional theory, so SAT-related algorithms can be applied to it to solve T problems.Since every occurrence of each variable in T must be instantiated in |C| ways then this grounding is an exponential process in time and space.
Definition 5 (Grounding a Datalog Theory) The Ground function is illustrated in Example 3.
Example 3 (Grounding a Theory) Let T pqr be the following set of axioms: Then Ground(T pqr ) is the set:

Partial Max-Sat
Partial Max-Sat (pM axSat) specifies the problem in which given two arguments, ϕ h and ϕ s , denoting sets of ground hard and soft clauses respectively, the goal is to find all assignments of truth values to them such that: (a) all clauses in ϕ h are satisfied, i.e., have a model, and (b) the maximum number of clauses in ϕ s are satisfied.We use Herbrand models instead of Tarskian models.Herbrand [6] has shown that a theory has a Tarskian model iff it has a Herbrand model.A Herbrand model that meets this specification is called optimal.
Definition 6 (Optimal Herbrand Models) A Herbrand model assigns a truth value to each propositional variable.In our case these are the ground propositions created by the Ground function.
The Herbrand Base HB(T) of a Datalog theory T is: The Herbrand Models HM(T) of T are subsets of HB(T) for which Let pM axSat be an algorithm, specified in Definition 7, that returns size of the subset of ϕ s that is not satisfied by an optimal Herbrand model.Note that, as a consequence of ( 6), this size will be the same for all such models.
where hm is any optimal Herbrand Model.
We augmented the ABC system with a third-party Partial Max-Sat solver [7], based on the Fu & Malik algorithm [3].

Evaluating Fitness of Repairs
This section discusses which repairs are considered to be sub-optimal and how to detect them using automated reasoning.
We want to find repairs ν(T) of a faulty T so as to maximise the size of {φ ∈ T (S)|ν(T) φ} and minimise the size of {φ ∈ F(S)|ν(T) φ}.It will not, in general, be possible to achieve both of these requirements with a single repair, so we need to find all repairs ν that are optimal wrt some measures of these potentially conflicting requirements.

Pareto Optimality
It suffices to define what it means for one theory to strictly dominate another.The Pareto front of optimal repairs is then just the maximal set of repairs such that no member is strictly dominated by any other repair.Any repair not in the Pareto front is sub-optimal.
We will first need to define the insufficiency set IS(T, S) of members of T (S) that are not theorems and the incompatibility set IC(T, S) of members of F(S) that are theorems.Then IC(ν k (T), S) is empty for both repairs and IS(ν k (T), S) is empty for ν r2 , but for ν b1 , IS(ν k (T), S) = {F eathered(T weety), F ly(P olly)}.Therefore, ν r2 strictly dominates ν b1 and, hence, ν b1 is sub-optimal.

Pruning mechanism
The pruning mechanism provides a way to reduce the search space of repairs of a given faulty theory in an automatic way.The inputs to the pruning mechanism are given by the ABC theory-repair algorithm: a Datalog-like theory T, a set of repairs {ν 1 , ν 2 , . . ., ν k } and a pair of sets of environmental observations T (S), F(S) .The output is the sub-set of repair suggestions that are Pareto optimal: {ν n1 , . . ., ν nj }. Figure 1 shows the high-level components of the mechanism.At each step the ABC system generates a set of repairs of a faulty theory.C1 applies each generated repair and converts the resulting Datalog-like theory to propositional logic using Ground (see §3.1).C2 is the central part of the mechanism, which uses pM axSat to determine how many faults were fixed by each repair ν, and how many new faults it introduces to the repaired theory ν(T).In C3 the set of Pareto optimal repairs are returned to the cycle as (possibly only partially) repaired Datalog theories.This process is repeated on the repaired theories until no faults remain or no further repairs are generated.Any faultfree theories are returned to the user.S remains unchanged throughout.
Even though, given unbounded resources, the constituent processes of this cycle each terminate, there is the possibility of non-termination of the whole cycle.ABC might reach a situation in which faults still remain, but each repair of them fails to decrease the overall number of faults.This can happen because, as we saw in §2.5, a repair can introduce new faults when fixing an old one.This has not happened in any of our test examples, but it remains a theoretical possibility.It is this kind of whole cycle non-termination that gives rise to the 'usually' caveat in our hypothesis.

Evaluation
In this section we evaluate the hypothesis: Our Partial Max-Sat based algorithm prunes sub-optimal repairs from ABC's output.It usually terminates successfully with a significantly smaller set of fault-free, optimal repaired theories.
By construction, the Pareto-fronts generated by the Partial Max-Sat based filter consist solely of optimal repairs.Table 1 shows the result of repairing a test set of 10 faulty theories 6 .It shows the reductions in size between the set of repair suggestions originally generated by the ABC system and these Pareto-fronts.
No one standard benchmark test set is available that could be used to evaluate the diverse abilities of the ABC system.In order to show the generality of our techniques and avoid bias in the evaluation, these test examples were instead drawn from benchmark test and development sets used in research papers in a diverse range of areas of AI, including non-monotonic reasoning, belief revision, etc.
As previously noted, even these optimal repairs may not eliminate all the faults in the input theory.Further rounds of repair may be required to the resulting partially repaired theories.The size reductions are given for only the first round of repairs.This recursive process may be viewed as a search tree, where the nodes are labelled with theories and the arcs between them with optimal repairs.For success, we require only one branch of this search tree to terminate with a leaf node labelled with a fault-free theory, but sometimes multiple fault-free theories are found.Table 1 shows that success was achieved in all 10 examples.
The columns of Table 1 give the following statistics: Name: The names of the 10 faulty theories in our test set, plus Tweety (1) 7 .
#A: The number of axioms in each faulty theory.#Unfil: The number of first-round,unfiltered repair suggestions.#Fil: The size of the Pareto front after the first round.The percentages in parentheses indicate the reduction achieved.#H and S: The size of the initial hard and soft clause sets.#PV: The size of the initial propositional variable set.Time: The average, over 3 runs, of the time (µs) to generate a fault-free theory.Succ(n): n is the number of fault-free theories generated, if any.Reference: The citations of the source of the example, with a note on any adaptions.
Note that the repair process terminates with success for all our 11 examples.The size reduction achieved by our filtering process varies widely from 0% to 93%.This variation can be partially explained by the number of fault-free theories that are eventually returned -where a large number of fault-free repairs exist, then at least that number of repair sequences are needed to find them all.From these results we conclude that our hypothesis has been empirically confirmed.

Fig. 1 .
Fig.1.The components (C1-C3) of the pruning mechanism repair, where T is a Datalog-like theory, {ν1, ..., ν k } is a set of repairs generated by the ABC algorithm, T (T), F(T) are the observations from the environment and the output is a set of optimal, fault-free repairs.

Definition 9
is used to determine whether a repair ν k is sub-optimal and should be pruned.This requires us to calculate the sizes of incompatibility and insufficiency sets: |IC(ν k (T), S)| and |IS(ν k (T), S)|.Definition 10 calculates N C and N S by specifying the ϕ h and ϕ s to apply pM axSat to.Theorem 1 proves N C and N S to be |IC(ν k (T), S)| and |IS(ν k (T), S)|, respectively.Definition 10 (Calculating N C and N S for ν k using pMaxSat)Let N C = pMaxSat(ϕ h , ϕ s ), where:ϕ h = Ground(ν k (T)) ∧ ϕ s = {β =⇒ |β ∈ F(S)} Let N S = |T (S)| − pMaxSat(ϕ h , ϕ s ), where:ϕ h = Ground(ν k (T)) ∧ ϕ s = {β =⇒ |β ∈ T (S)}Theorem1 (Correctness of Definition 10) Definition 10 correctly calculates the size of the incompatibility and insufficiency sets of a repaired theory ν k (T), i.e.N C = |IC(ν k (T), S)| ∧ N S = |IS(ν k (T), S)| Proof Summary The proofs for N C and N S are similar.First apply the definitions of ϕ h given in Definition 10.Use Definition 7 to apply the definitions of N C and N S .Then use Definition 8 to show the equivalences, appealing to the consistency of Datalog theories.
Reformation 1: Rename P in the targeted axiom to the new predicate P .Reformation 2: Increase the arity of all occurrences P in the axioms by one.Ensure, recursively, that the new arguments, s n+1 and t n+1 , in the targeted occurrence of P , are not unifiable.Reformation 3: For some i, suppose s i is C. Since s i and t i unify, t i is either C or a variable.Change t i to the new constant C .Definition 4 (Repair Operations for Insufficiency) In the case of insufficiency, the wanted but failed proof can be unblocked by causing a currently failing resolution step to succeed.Suppose the chosen resolution step is between a goal P (s 1 , . . ., s m ) and an axiom Body =⇒ P (t 1 , . . ., t n ), where either P = P or, for some i, s i and t i cannot be unified.Possible repair operations are: Abduction 1: Add a new axiom whose head unifies with the goal P (s 1 , . . ., s m ).Abduction 2: Locate the rule whose body proposition created this goal and delete this proposition from the rule.Reformation 4: Replace P (t 1 , . . ., t n ) in the axiom with P (s 1 , . . ., s m ).Reformation 5: Suppose s i and t i are not unifiable.Remove the i th argument from all occurrences of P .Reformation 6: If s i and t i are not unifiable, then they are unequal constants, say, C and C .Either (a) rename all occurrences of C in the axioms to C or (b) replace the offending occurrence of C in the targeted axiom by a new variable.