Features and subtasks - Human-Oriented Automatic Theorem Proving

Using features of a problem state to choose appropriate moves.

In this post, I assume that the reader is broadly familiar with what a problem state is and with how move types can be used to transform one problem state into another. An introduction to these concepts can be found in the previous post to this one.

The algorithm described (not with full precision) in that post was quite crude in the sense that it used a waterfall architecture, meaning that one fixes an order of priority for the move types and simply "cascades" down the list until one finds a move type that can be applied to the problem state. In particular, if more than one move was possible, the choice made by the program did not depend on the problem state. It is a surprising fact that an algorithm of this kind can solve some not completely trivial problems.

However, it also fails on problems that ought to be easy, and it seems that allowing the choice of moves to be guided by suitable features of the problem state, and by suitably chosen intermediate subgoals (such as, for example, obtaining additional information about some term that has arisen naturally, even if one does not know precisely what that information will be or how it will be used) has the potential to increase the power of the program very substantially. At the time of writing, we are working on a "second-generation" program of that type, which we hope will provide a proof of principle to back up this claim.

Here I shall work through an example in order to illustrate the kind of problem for which the basic waterfall architecture gets rapidly stuck, but a program guided by features and subgoals will probably be able to find a solution. Since the word "subgoal" is used for several purposes already in automatic theorem proving, we shall use the word "subtask" instead, even though "subgoal" probably captures the meaning better.

If $f(\overline A)\subset\overline{f(A)}$ for every set $A$, then $f$ is continuous.

This is a slightly tricky problem, so let me begin by writing out a conventional proof.

Proposition. Let $X$ and $Y$ be topological spaces and let $f:X\to Y$ be a function such that $f(\overline A)\subset\overline{f(A)}$ for every subset $A\subset X$. Then $f$ is continuous.

Proof. Let $B$ be a closed subset of $Y$. We would like to show that $f^{-1}(B)$ is closed in $X$. For this it is sufficient to prove that $\overline{f^{-1}(B)}\subset f^{-1}(B)$. By hypothesis, we know that $f(\overline{f^{-1}(B)})\subset\overline{f(f^{-1}(B))}$. But $f(f^{-1}(B)\subset B$, and the operation of taking the closure is monotone, so $\overline{f(f^{-1}(B))}\subset\overline{B}$. Since $B$ is closed, $\overline B=B$. By transitivity of set inclusion, we find that $f(\overline{f^{-1}(B)})\subset B$, which implies that $\overline{f^{-1}(B)}\subset f^{-1}(B)$ as desired. $\square$

For the remainder of this post, I shall present a sequence of moves that get from the initial problem state to TRUE, giving in each case a "motivation" for the move -- that is, pointing to some aspect of the problem state that "improves" when the move is applied. The hope is that these motivations are sufficiently precise that they can be used as the basis for the design of a more powerful algorithm.

Before I start, I want to specify some of the results that I am assuming will be in the library. That is important, because I want to present a high-level proof that uses basic properties of the closure operation rather than a low-level proof that works directly from the definition of the closure of a set as the intersection of all closed sets that contain it. The basic properties I have in mind are things like the following. (I shall state these results informally, but of course they would be represented in a more formal way in the actual library.)

A set $B$ is closed if and only if $\overline B\subset B$.
If $B\subset C$, then $\overline B\subset\overline C$.
If $B\subset C$ and $C$ is closed, then $\overline B\subset C$.
$\overline{\overline B}=\overline B$.

I will also make use of various basic set-theoretic facts about images and inverse images.

If $B\subset C$, then $f^{-1}(B)\subset f^{-1}(C)$.
If $B\subset C$, then $f(B)\subset f(C)$.
$f(f^{-1}(B)\subset B)$.
$f(A)\subset B$ if and only if $A\subset f^{-1}(B)$.

Finally, another result that plays an important role is the "non-standard" expansion of "is_continuous" given by the following equivalent definition.

A function $f:X\to Y$ is continuous if the inverse image of every closed subset of $Y$ is closed in $X$.

Now let us write down in tableau form a suitable initial problem state.

$$ \begin{array}{|c|} \hline X: \text{topological space}\\ Y: \text{topological space}\\ f:X\to Y\\ \hline \\ \hline \forall A\subset X\ f(\overline{A})\subset\overline{f(A)}\\ \hline \\ \hline \text{is_continuous}(f)\\ \hline \end{array} $$

It is far from obvious at this stage how to use the hypothesis, but it is premature to turn $A$ into a metavariable, since there is not yet any hint of a subset of $X$, and turning $A$ into a metavariable doesn't help us do anything. Instead, there's an obvious move, which is to expand the definition of continuity. Furthermore, this does something very good for us -- it creates a subset of $X$!

That said, there's a very interesting issue that arises, which is that if we expand the definition in the obvious way using open sets, we end up with a subset of $X$ that is of no use to us. What makes this interesting is that many a human mathematician would not realize this until they had tried out the open-sets expansion and got stuck, and even then it might well feel like an "Aha!" moment. (Whether or not it did so would depend on the experience of the mathematician in question.)

Nevertheless, it is not all that hard to justify choosing the closed-sets definition. Indeed, the hypothesis is about closures, and closures are to do with closed sets, so there is more of a chance of a good match between hypothesis and target if we use the closed-sets definition.

That justification makes good sense in human terms, but it is not completely obvious how to translate it into a general criterion that an algorithm can use. For us, the fact that the words "closure" and "closed" are related probably plays an important role, but it doesn't feel very mathematical to use linguistic relationships like that. More important is that there is a reason that the two words are related: the definition of closure and two of the elementary properties listed above mention closed sets and they do not mention open sets. So there's a clear advantage to using the closed-sets definition: not only does it produce a subset of $X$, but we want to prove that that subset is closed, and that potentially relates it to the closure operation. Note that for the program to realize this, it will have to spot that the closure operation is related to closed sets, which means it will have to look into the library and see that it contains statements that involve both the closure operation and the predicate "is closed".

The next steps are tidying ones, so are straightforward and don't really need justifying (though it might still be interesting to understand why not).

$$ \begin{array}{|c|} \hline X: \text{topological space}\\ Y: \text{topological space}\\ B: \text{subset of }Y\\ f:X\to Y\\ \hline \\ \hline \forall A\subset X\ f(\overline{A})\subset\overline{f(A)}\\ \text{closed_in}(B,Y)\\ \hline \\ \hline \text{closed_in}(f^{-1}(B),X)\\ \hline \end{array} $$

At this stage there are a few things we can do, so let us briefly assess our options. One is to instantiate $A$ with $f^{-1}(B)$. Another is to replace either of the statements of the form closed_in($F,Z$) by the statement $\overline F\subset F$, using one of the library results. Or we could replace them by the information that their complements are open. If we were working in a metric space context instead, there would be even more options, such as expanding the definition of "closed_in". Another option we have available is to replace any closure $\overline E$ by a double closure $\overline{\overline E}$.

To decide between these, let us look at the benefits that each might bring.

The target is a statement about $f^{-1}(B)$, and if we instantiate $A$ with $f^{-1}(B)$, then we will have a hypothesis about $f^{-1}(B)$ to work with. Moreover, that hypothesis involves closures, so there is a hope (given a quick glance at the library) that the hypothesis will be useful.
If we replace the target by $\overline B\subset B$, then we have eliminated the predicate "is_closed" from the target and made it all about closures. This could align the target better with the hypothesis.
If we replace the hypothesis "is_closed($B$)" by $\overline B\subset B$, then again we may have more chance of matching it with the first hypothesis, which is about closures rather than closed sets.
Expanding the definition of closed sets (in a metric space context) seems premature, as we have high-level options available.
Replacing a closure by a double closure introduces new and more complicated terms into the picture, for no apparent gain.

Looking at these reasons, the first three are all quite good ones. The first is probably the best, since having the same variable or term appearing in different statements is almost always useful (at least in problems of this level of difficulty), whereas involving the same operation (in this case the closure operation) is not quite so obviously going to be useful. But this is not to talk down the second option -- it's a good one but the first seems even better. I stress also that this judgment is more like a hunch backed up by my mathematical experience and not a claim that I can back up with strong theoretical arguments. But let's go with it, and let's eliminate the uninstantiated statement and the type declarations to keep things tidy.

$$ \begin{array}{|c|} \hline f(\overline{f^{-1}(B)})\subset\overline{f(f^{-1}(B))}\\ \text{closed_in}(B,Y)\\ \hline \\ \hline \text{closed_in}(f^{-1}(B),X)\\ \hline \end{array} $$

I should also point out that the above move has a potential disadvantage, which is that it creates a pretty complicated term $\overline{f(f^{-1}(B))}$, which is not matched anywhere else. So it will become a priority to simplify it or to match it.

At this point it seems like an excellent idea to rewrite the target in terms of closures, as previously suggested. Note that now it has become a stronger move than it was before, because not only does it introduce the concept of a closure into the target, but it creates a term $\overline{f^{-1}(B)}$ that matches an occurrence of that term among the hypotheses, raising the chance of some reasoning being possible. So let's go ahead.

$$ \begin{array}{|c|} \hline f(\overline{f^{-1}(B)})\subset\overline{f(f^{-1}(B))}\\ \text{closed_in}(B,Y)\\ \hline \\ \hline \overline{f^{-1}(B)}\subset f^{-1}(B)\\ \hline \end{array} $$

Again we have options. We could expand the definition of set containment, introducing elements into the picture. But that would be a pity -- always we would like to argue at a high level if we can. We could rewrite the second hypothesis in terms of closures, but that doesn't seem to match anything just yet. Or another possibility would be to rewrite the target using the rule $A\subset f^{-1}(B)\iff f(A)\subset B$. That last option looks excellent because it creates a match for the term $f(\overline{f^{-1}(B)})$ and also the variable $B$ not as part of a bigger expression, raising the hope of a match with the second hypothesis. We could also use the same rule to rewrite the first hypothesis, but that creates a monster term $f^{-1}(\overline{f(f^{-1}(B))}$, so seems less desirable, though not terrible. So let's go with rewriting the target.

$$ \begin{array}{|c|} \hline f(\overline{f^{-1}(B)})\subset\overline{f(f^{-1}(B))}\\ \text{closed_in}(B,Y)\\ \hline \\ \hline f(\overline{f^{-1}(B)})\subset B\\ \hline \end{array} $$

And now we've got ourselves into a situation where we can do library modus tollens. The first hypothesis, together with the library statement that set inclusion is transitive, allows us to replace the target. This is quite clearly the best thing to do as it eliminates a rather complicated term from the problem state and justifies the effort we put into finding a match.

$$ \begin{array}{|c|} \hline \text{closed_in}(B,Y)\\ \hline \\ \hline \overline{f(f^{-1}(B))}\subset B\\ \hline \end{array} $$

We have now reached the point where rewriting the hypothesis "closed_in$(B,Y)$" looks like a very good idea. It will create a match for $B$, and better still it will do so on the right side of a set inclusion, so will allow us to do another library modus tollens. So let's go ahead.

$$ \begin{array}{|c|} \hline \overline B\subset B\\ \hline \\ \hline \overline{f(f^{-1}(B))}\subset B\\ \hline \end{array} $$

And now the library modus tollens move.

$$ \begin{array}{|c|} \hline \\ \hline \overline{f(f^{-1}(B))}\subset\overline{B}\\ \hline \end{array} $$

Now we are in a strange situation where we have no hypotheses. What are our options? One is to expand the set inclusion in terms of elements, but again we are very reluctant to do that. Another is to go hunting in the library for hypotheses that we could bring in. And there's an obvious one, namely the monotonicity of the closure operation, which matches this target extremely well. So library modus tollens reduces the problem to the following.

$$ \begin{array}{|c|} \hline \\ \hline f(f^{-1}(B))\subset B\\ \hline \end{array} $$

Note that this is clear progress, because it has eliminated the notion of closure from the problem, thereby reducing it to a set-theoretic one.

And now we finish off very quickly, since the set-theoretic statement in question is already a library result!

Translating the above sequence of deductions into natural language

We are trying to build programs that do mathematics in a human way. A sign of success will be if it is easy to convert its raw output into a form that reads in a similar way to what a human mathematician might say if giving a running commentary on how they were solving the problem. The above sequence of moves translates into something like the following. (I'm not using an algorithm to do this, but from my experience with ROBOT I think it would not be too hard to create one that produced natural-language output that was a reasonable approximation to what I will now write.)

Let $X$ and $Y$ be topological spaces and let $f:X\to Y$. We are assuming that $f(\overline A)\subset\overline{f(A)}$ for every subset $A$ of $X$, and we would like to prove that $A$ is continuous.
Since the hypothesis is about closures, and closures are about closed sets, let us use the closed-sets expansion of continuity. That is, we would like to prove that for every subset $B\subset Y$, if $B$ is closed in $Y$, then $f^{-1}(B)$ is closed in $X$.
So let $B$ be a closed set in $Y$. We would like to prove that $f^{-1}(B)$ is closed in $X$.
The only information we seem to be able to obtain about $f^{-1}(B)$ is what we get if we apply the hypothesis, so let's do that. It tells us that $f(\overline{f^{-1}(B)}\subset\overline{f(f^{-1}(B))}$.
Our hypothesis involves $\overline{f^{-1}(B)}$. It would therefore be nice to reformulate the target in a way that also involves $\overline{f^{-1}(B)}$, which we can do, as it is equivalent to the statement $\overline{f^{-1}(B)}\subset f^{-1}(B)$.
Continuing in this vein, we can reformulate the target in such a way that it involves the expression $f(\overline{f^{-1}(B)})$, as it is equivalent to the statement $f(\overline{f^{-1}(B)})\subset B$.
That allows us to reduce the problem to that of showing that $\overline{f(f^{-1}(B)}\subset B$.
Since $B$ is closed, it is sufficient to prove that $f(f^{-1}(B))\subset B$. (Here I have used a library result that I forgot to include above, which allows me to streamline the argument slightly. But I could have followed the argument above more closely.) This is a clear simplification of the problem.
In fact, we are done, since it is a standard fact (and easy to prove) that $f(f^{-1}(B))\subset B$ for any set $B$ (assuming everything is defined appropriately).

Concluding remarks

What makes this problem difficult for a crude algorithm of the type sketched out in the previous post is that it relies quite heavily on library reasoning, and in some places there are several options for what can be done with library reasoning. In such a situation, it helps enormously to take into account desirable features of problem states that might be introduced (or undesirable features eliminated) when a given result is applied.

Of course, one could just do a brute-force search, and for this problem that would probably be manageable. But what is nice about features such as "contains a match for term T" or "eliminates the unmatched term U" is that they give us a good way of predicting which library results will be fruitful, without our having to go to the lengths of actually applying them.

A small remark here is that not all matches of terms are useful, and one can if one wishes bring in more refined considerations. For example, suppose we want to prove that $\overline{f^{-1}(B)}$ is contained in some other set. If we manage to obtain an additional hypothesis of the form $C\subset\overline{f^{-1}(B)}$, that will not help us, as it is the wrong way round. So we need a match to be "on the right side". Such monotonicity considerations can help us avoid getting excited by spurious matches, though for many problems it seems that when matches are generated, they naturally appear on the right side, so one can often get away with not worrying about monotonicity.

My aim in this post was not to present (even informally) an algorithm that uses features of the problem state, but simply to highlight some of the features that such an algorithm might well want to use. There is no single right approach to exactly how those features should single out the best move at each step, and I imagine that until our theoretical understanding advances, we will end up using quite a lot of trial and error to find good approaches (such as scoring systems that weight moves according to the quality of the features they introduce or remove).

The post would have been very different if I had not assumed that the library contained several useful results. It is interesting to think about how a proof might be found in their absence. For example, at the stage where the target was reduced to $\overline{f(f^{-1}(B))}\subset\overline B$, if the library did not contain the result $C\subset D\implies\overline C\subset\overline D$, what might an algorithm be expected to do? Is it reasonable to expect it to conjecture both that that monotonicity holds and that $f(f^{-1}(B))\subset B$ for all sets $B$? Probably not. It is more likely that if it was asked to prove such a statement, it would resort to lower-level arguments, and in the process of applying those lower-level arguments would discover for itself that closure is monotone and that $f(f^{-1}(B))\subset B$. It would be wonderful if it could then go back and formulate those statements as lemmas. I believe that it will be possible to develop an algorithm with that kind of capability, but that is a longer-term goal, and discussion of it belongs in a future post.

I have ended up talking much more about features than about subtasks, but I do not see a sharp distinction between the two. Given any desirable feature, one can create a subtask "obtain that feature", and given a subtask one can define a feature "achieves that subtask". A feature becomes a subtask when one temporarily forgets about the main goal of reducing the problem state to TRUE and prioritizes the subgoal of obtaining the feature. If the use of a feature is more to increase one's confidence in a proposed move, then it remains more of a feature than a subtask.

← Previous Post Next Post →