Artificial Intelligence

Name: Abram Demski
Location: Reed City, Michigan, United States

I am just now graduating (this sunday). On the other hand, I might not edit this blurb for a while. I will be attending Central Michigan University in the fall, where I will study cognitive science and computer programming.

Tuesday, May 13, 2008

An interesting collection of videos--

http://videolectures.net/Top/

A great number of these are related to AI, although only a few are actually in the "ai" section. For example, there is a large computer vision section, and a larger one about data mining.

Wednesday, April 16, 2008

More AGI Goodies

http://people.csail.mit.edu/kersting/plmr/

The above website has a list of AI systems, all created with the goal of integrating logic and probability theory. This goal is more interesting than it sounds at first. There are trivial integrations of logic and probability theory (such as using logic to reason about probability) and there are non-trivial integrations. Nontrivial integrations are of great interest, because they allow algorithms from both arenas to be applied, they allow easy "softening" of existing hard-logic knowledge bases that prove to be too inflexible (amazingly, Cyc has started taking that strategy), and they open up new possibilities for learning and inference. All of the above systems are openly available (which is amazing!).

Of particular interest is Alchemy. Alchemy has a deceivingly simple-sounding and intuitive scheme: take a set of logical statements and attach a weight to each, representing how much the system should endorse each claim. The weights don't just range between 0 and 1 like probabilities, they can be as large as you like (infinity would mean absolutely true). Together, all of the propositions and there weights are transformed by the system onto a standard type of probabilistic model (a markov random process), which can be reasoned about using well-established algorithms. However, the group also has invented some amazing-sounding algorithms of their own...

Labels: , ,

Friday, April 11, 2008

This is old news now, but the First Annual AGI Conference occurred recently. This is fairly exciting in and of itself, but more exciting is that all the papers are available free online. This provides a very interesting snapshot of what the emerging "AGI Community" is and will be about.

For those who don't know, AGI is a term created to distinguish general AI (aiming at human-level intelligence in a broad range of tasks) from narrow AI (which aims for high performance on some single, specialized task). Typical narrow-AI applications include chess playing programs, stock-market forecasters, face recognition software, and generally anything else AI is used for these days. A typical example of AGI (standing for artificial general intelligence) might be a project aiming to make a "baby machine" (an AI that is good at nothing, but could be trained to do anything).

http://www.agiri.org/wiki/index.php?title=Artificial_General_Intelligence

Friday, March 28, 2008

Godel incompleteness is not a sufficient measure of completeness.




To begin this argument, I need to talk about Skolem's Paradox, an intriguing result concerning the foundation of mathematics.

http://en.wikipedia.org/wiki/Skolem's_paradox

Skolem's Paradox shows that the logical constructions of set theory will always be insufficient, in a way slightly worse than Godel's Incompleteness. The problem is that any logical characterization of set theory fails to characterize sets completely. The same rule applies to many entities. For example, any attempt at a logical description of the notion "Turing Machine" will similarly fail. Also "number", "connectedness", "higher-order predicate", and "logic" itself, to name a few. All of these cannot be described logically.

The basis for this claim lies in the fact that these entities cannot be described in first-order logic, which (if you haven't heard of it) is a restricted logic that doesn't fall prey to Godel's theorem (because it is not sufficiently powerful to represent its own rules, and so doesn't contain self-reference). 1st-order logic is often taught as if it were Logic, period. In fact, some have argued that this is the truth: after all, what is logic if there isn't a complete set of deduction rules? However, 1st order logic is too weak to describe the majority of mathematical entities.

Nonetheless, mathematicians try. What Skolem did was essentially call them on the error, using a nice theorem he and Lowenheim had developed. In fact, the same error is committed *whenever* a logic attempts to be anything more than 1st-order: all you're really doing is trying to make 1st-order logic talk about higher-order entities. (In a sense, this is *why* Godel's Theorem is true; that is, it's *why* the deduction rules are incomplete: they can't ever really do anything more than 1st-order logic.)

But this leaves a big gaping hole. How DO we reason about mathematical objects? If logical constructions fail, what are we doing? Math SEEMS logical. The easy way out is to act like humans have a magical ability to talk about these things, while any formal construction is doomed to fail. Luckily, this is not the only way out.

But the other way out is *much* more complicated! Sorry.

The other way begins with the notion of "limit computation". This is an idea in theoretical computer science, which (sort of) allows a computer to calculate things that it cannot normally calculate. The prime example of something a computer cannot calculate is the halting problem.

http://en.wikipedia.org/wiki/Halting_problem

The halting problem is very much like Godel's theorem. The two seem to me to be the same theorem developed in different settings. Godel's theorem says that there will be truths about formal logic that formal logic cannot deduce, while the halting problem shows that there will be facts about computer programs that computer programs will be unable to calculate.

Perhaps it seems odd that we can have a mathematically well-defined value but be unable to compute it, just as it seems odd that we have well-defined mathematical entities with no proper logical characterization. The intriguing thing, though, is that the halting problem is "limit-computable".

http://en.wikipedia.org/wiki/Hypercomputation
http://en.wikipedia.org/wiki/Zeno_machine

A limit-computer is a computer that is allowed to revise its output. The program never halts; instead, the output "converges" to the correct answer.

This convergence occurs in a finite amount of time, but the problem is that we don't know for sure *when* it has occurred; so to get guaranteed results, we've got to wait forever. But giving up the sureness of finite-time computation allows us to construct programs that do what others cannot.

So, translate this back to logic. We can increase the number of entities we can reason about by allowing facts to be revised: we may make a wrong inference, so long as any wrong inferences will be corrected along the way. This is called a trial-and-error predicate.

http://www.jstor.org/view/00224812/di985142/98p0302e/0

But there's more! :)

Just as no halting program can solve the halting problem, no converging program can solve the "convergence problem". Just as we can ask if a normal program halts, we can ask if a limit-program converges to an answer or keeps revising its answer forever.

But just as we can solve the halting problem by resorting to limit-computers, we can solve the convergence problem by resorting to an augmented limit-computer that has access to another computer (meaning it can run as many limit-computations as it likes). Equivalently, we can give the computer as many infinities as it needs, rather than just one, to converge (which again amounts to it being able to run as many limit computations as it likes). In fact, we can give a computer larger and larger infinities of computation time, resulting in the ability to compute more and more things.

The question arises: if we give the computer "as large an infinity as it likes", can we compute any mathematically well-defined value? I do not know the answer. But it *sounds* reasonable...

If we're willing to grant this wild assumption, then we can again transfer this development to the logical domain. Essentially, we allow trial-and-error predicates to be defined in terms of other trial-and-error predicates. This gives up the guarantee that all fallible inferences will be eventually corrected (unless by "eventually" we mean "after arbitrarily large infinities of time have passed").

Why is all this in a blog about AI?

Well, if I'm right, then the "magical" quality that humans posses and formal systems do not is the ability to make fallible inferences. Any AI based in infallible logic would be unable to understand mathematics, but an AI that included a good fallible reasoning system would be able to. Perhaps this comes automatically with any good learning algorithm, but perhaps not; perhaps only learning systems with very specific properties are sufficient. This needs further research! One avenue is "nonmonotonic logic", which is very similar to the logic I'm proposing.

http://en.wikipedia.org/wiki/Nonmonotonic_logic

However, standard nonmonotonic logic doesn't have quite as much machinery as I want... I think it is equal to normal limit-computation, rather than the forms of computation involving larger infinities.

But that's enough speculation for today.

Friday, March 07, 2008

I recently read an article called "Complex Systems, Artificial Intelligence and Theoretical Psychology" by Richard Loosemore. The argument it makes goes something like this:

1. A "complex system" (referring to complex systems science) is one that displays behavior that cannot be predicted analytically using the system's defining rules. (This is called, in the paper, a global-local disconnect: the local rules that play the role of the "physical laws" of the system are disconnected analytically from the global behavior displayed by the system.)

2. The mind seems to be a complex system, and intelligence seems to be a global phenomenon that is somewhat disconnected with the local processes that create it. (Richard Loosemore argues that no real proof of this can be given, even in principle; such proofs are blocked by the global-local disconnect. However, he thinks it is the case, partly because no analytical solution for the mind's behavior has been found so far.)

3. The mind therefore has global-local disconnect. Richard Loosemore argues that, therefore, artificial intelligence cannot be achieved by a logical approach that attempts to derive the local rules from the list of desired global properties. Instead, he proposes an experimental approach to artificial intelligence: researchers should produce and test a large number of systems based on intuitions about what will produce results, rather than devoting years to the development of single systems based on mathematical proofs of what will produce results.


I agree with some points and disagree with others, so I'll try to go through the argument approximately in order.

First, I do take the idea of a complex system seriously. Perhaps the idea that the global behavior of some systems cannot be mathematically predicted seems a bit surprising. It IS surprising. But it seems to be true.

My acceptance of this idea is due in part to an eye-opening book I recently read, called "Meta math: the quest for omega", by Gregory Chaitin.

Chaitin's motivation is to determine why Godel's Theorem, proving the incompleteness of mathematics, is true. He found Godel's proof convincing, but not very revealing: it only gives a single counterexample, a single meaningful theorem that is true but mathematically unprovable. But this theorem is a very strange-sounding one, one that nobody would ever really want. Perhaps it was the only place where math failed, or perhaps math would only fail in similarly contrived cases. Chaitin wanted some indication of how bad the situation was. So he found an infinite class of very practical-sounding, useful theorems, all but a handful of which are unreachable by any formal logic! Terrible!

Perhaps I'll go through the proof in another post.

Chaitin shows us, then, that there really are global properties that are analytically unreachable. In fact, in addition to his infinite class of unreachable-but-useful theorems, he talks about a global property that any given programming language will have, but which is analytically unreachable: the probability that a randomly generated program will ever produce any output. This probability has some very interesting properties, but I won't go into that.

I think the term Chaitin used for math's failure is somewhat more evocative than "global-local disconnect". He uses the term "irreducible mathematical fact", a fact of mathematics that is "true for no reason", at least no logical reason. Notice that he still refers to them as mathematical facts, because they are true statements about mathematical entities. As with other mathematical facts, it still seems as if their truth would be unchanged in any possible world. Yet, they are "irreducible": logically disconnected from the body of provable facts of mathematics.

So, math sometimes cannot tell us what we want to know, even when we are asking seemingly reasonable questions about mathematically defined entities. Does this mean anything about artificial intelligence?


Another term related to global-local disconnect, this one mentioned in the paper by Loosemore, is "computational irreducibility". This term was introduced by Stephan Wolfram. The idea is that physics is able to predict the orbits of the planets and the stress on a beam in an architectural design because these physical systems are computationally reducible; we can come up with fairly simple equations that abbreviate (with high accuracy) a huge number of physical interactions. If they were not reducible, we would be forced to simulate each atom to get from the initial state to the final result. This is the situation that occurs in complex systems. Unlike the "mathematically irreducible" facts just discussed, there IS a way to get the answer: run the system. But there are no shortcuts. This form of global-local disconnect is considerably easier to deal with, but it's still bad enough.

It's this second kind of irreducibility, computational irreducibility, that I see as more relevant to AI. Take the example of a planning AI. To find the best plan, it must search through a great number of possibilities. Smarter AIs will try to reduce the computation by ruling out some possibilities, but significant gains can be made only if we're willing to take a chance and rule out possibilities we're not sure are bad. The computation, in other words, is irreducible-- we'll have a sort of global-local disconnect simply because if we could predict the result from the basic rules, we wouldn't need the AI to find it for us.

So it seems Loosemore was right: intelligence does seem to involve complexity, and unavoidably so. But this sort of irreducibility clearly doesn't support the conclusion that Loosemore draws! The global-local disconnect seems caused by taking a logical approach to AI, rather than somehow negating it.

But there is a way to salvage Loosemore's position, at least to an extent. I mentioned briefly the idea of shortcutting an irreducible computation by compromising, allowing the system to produce less-than-perfect results. In the planning example, this meant the search didn't check some plans that may have been good. But for more complicated problems, the situation can be worse; as we tackle harder problems, the methods must become increasingly approximate.

When Loosemore says AI, he means AGI: artificial general intelligence, the branch of AI devoted to making intelligent machines that can deal with every aspect of the human world, rather than merely working on specialized problems like planning. In other words, he's talking about a really complicated problem, one where approximation will be involved in every step.

Whereas methods of calculating the answer to a problem often seem to follow logically from the problem description, approximations usually do not. Approximation is hard. It's in this arena that I'm willing to grant that, maybe, the "logical approach" fails, or at least becomes subservient to the experimental approach Loosemore argues for.

So, I think there is a sort of split: the "logical" approach applies to the broad problem descriptions (issues like defining the prior for a universal agent), and to narrow AI applications, but the "messy" approach must be used in practice on difficult problems, especially AGI.

Friday, February 15, 2008

Well, I think I've come up with something of an answer. I want a prior to be interpreted as a frequency estimate on possible worlds. This sounds funny, because we can't possibly estimate such a frequency: we only live in one world. But this is actually just fine: we shouldn't be estimating it, because it's our prior. It's what we use to estimate.

Anything more we learn, we learn using our prior. So can't improve upon our prior. If you've got a bad prior, tough luck.

A prior is an estimate of the frequency of alternative worlds. The perfect prior would contain all knowledge we ever needed; it would give our actual world-of-birth a probability of 1, and all other worlds, 0. But no two people are born to the same world, so evolution couldn't find this prior. (By the way, we could also view evolution as using a prior-- this prior is given to it by the very nature of chemistry and physics, and is not very good, but far better then it might have been.) So a slightly weaker and more useful notion of the perfect prior would be one that would do a fair job if all humans had it. Forcing all humans to have the same prior (which is close to being true) causes the perfect prior to have far more interesting structure (ie some learning occurs), although it still would have freakish foresight for things common to all humans (it would know the one true physics, for example).

Since what I'm interested in is learning, I want some way of ruling out this freakish foresight: I want to talk about a universal prior, one that will learn well no matter what the true physics turns out to be (and so on). I'm rejecting the Solomonoff prior because I think computability is too strict a requirement, but I also know that some restrictions are needed (otherwise there is no structure for the prior to take advantage of). What kind of a universal prior is this? And once I've figured that out, is it really of any use?

Labels:

Saturday, February 09, 2008

More on the Interpretation of Probability

In my previous discussion, I have failed to distinguish between logical necessity and physical necessity. This distinction is critical for my analysis requirement 2: a probability is a statement of uncertainty that could always be turned into certainty given more information. In my first post on the interpretation of probability, I talked about alternatives (A) and (B):

(A): The universe has some set of uncaused events.

(B): The universe does not have uncaused events.

(A) implies that there are actual alternatives: the universe could have happened differently, if these uncaused events happened differently.

(B) implies that there are no such alternatives. However, this comes at the cost of implying an infinite chain of causes. Worse, as I reasoned before, we should still desire a cause even of this infinite chain; an infinite past history is not enough, because we also need an explanation of why that particular infinite past is the one we have. Furthermore, we need a cause for this cause, and so on.

Now comes the new stuff. When I stated this, I had in mind that the explanation for the infinite past was physics. This isn't quite sufficient: physics needs to be supplemented by a single completely-specified time-slice. Then a deterministic physics can specify the infinite past and future for us. So allowing this, we further ask: why does the universe contain this particular physics? To answer this, we create a meta-physics that specifies for us an explanation of why physics is the way it is. (My impression is that there is work in theoretical physics corresponding to this desire.) Again, in addition to a meta-physics, we need something like a physics-slice sufficient to specify the rest of physics (that is, a minimal number of variables that determine the rest via the meta-physical laws).

(I should note that my concept of "meta-physics" is not the typical concept of "metaphysics": metaphysics is a philosophical pursuit, but meta-physics is essentially very abstract physics, which worries about things like how we might predict the existence of electrons from base principles.)

The chain continues to meta-meta-physics, meta-meta-meta-physics, et cetera ad infinitum.

All of these things are akin to physical necessity. Each fact is determined by some law of a corresponding physics (except for the determining slices). However, as the sets of rules get more abstract, it seems as if we will hit the ceiling of mathematical necessity. This may have happened already: the physical theory of A. Garrett Lisi (which from what I know looks like what I called a meta-physics) describes our (meta-) physics as an algebra, so that the obvious alternatives are different algebras. (As I understand it: our physics obeys the algebra E8, but E8 does not fully determine our physics; in addition, we need information about symmetry-breaking. So E8 is the meta-physics and symmetry breaking information is the minimal set of variables needed to determine everything else about the physics. But don't take my word for fact.) The algebraic alternatives are governed by mathematical laws. The mathematical laws are governed by logic. So logic may be as little as 3 metas away! (E8 is meta-physics, so math is meta-meta, so logic is meta-meta-meta-physics.) This seems to stop the questionable infinite regress: it seems at least plausible for logic to be uncaused and unexplained.

But even this doesn't tie us down to one possible world, namely because of all those minimal slices that we specify along the way. These do the real work of specifying everything; logic's being "at the top" is only a convenient trick. Presumably we could forever play the game of navigating plausible infinite regresses of explanation and plausible places for the regress to stop, but it seems that we could never find a real end to it. (This is particularly chronic if we start to question which logic is at the top, i.e. classical vs intuitionistic vs many other possibilities, and why that logic is at the top, i.e. do we need a meta-logical theory?)

Therefore we are forced into (A)! There must be fundamentally unexplained things, and therefore actual alternatives.

By the way, I don't think that exempts humanity from the investigation of the hierarchy I've described ultimately topped by logic: this hierarchy seems very important despite its ultimate futility (we cannot in principle explain everything, but we must still explain as much as possible).

Concluding (A) does not force me to give up (1) or (2). In particular, I had previously assumed that it went against (2), because I took "more information" to mean causal (or explanatory) information. This is unnecessary; all I need to say is that all meaningful statements are either true or false. Thus the "more information" may merely be the fact itself. As an example, suppose that some physical events really are random: physical law dictates probabilities but not the definite outcome. What I'm saying is that there still is a definite fact of the matter, although not determined by physical law; if we knew everything there is to know, we would know the outcome.

There are still some clarifications needed. I have dealt with issues arising from my first post, but there are also difficulties with what I said in my second post. Basically, in that post I trapped myself into a fundamental confusion that always will arise for those that try to do away with the concept of a bayesian prior. This is hinted at in the infinite regress I get into when I try to take into account uncertainty about relevant information. At some point, if the probability estimate given is to be coherent, the person must invoke a prior belief concerning each probability. So to revise the conclusion I reached there:

A probability (used by a person) is (1) a belief concerning a frequency, (2) a statement of uncertainty given limited information which can always be turned into certainty given more information, and (3) based on some prior belief (updated by the limited information available).

The image here is that our limited information narrows down the space of possible worlds somewhat, and that we hold beliefs about the frequencies of events in the remaining possible worlds. So, for example, if we assert a 50% probability that our friend will buy a pink car, we mean that (given our prior) our information narrows us down to a space of possible worlds such that in about half our friend will get a pink car.

I want to interpret the prior probability in a way consistent with the way I interpret the beliefs formed using the prior plus evidence. Otherwise, it doesn't seem like a full interpretation. I think the best way of doing this, in line with the idea that a probability is a frequency estimate, is to say that the prior is the person's estimate of the frequency of all possible worlds.

I admit this sounds strange. Accepting (A) forces me to say that there are actual alternatives, meaning possible worlds. It seems somewhat reasonable to attach probabilities to these (by attaching probabilities to the uncaused facts). But to go a step further and call these frequencies? Does this make sense?

I suppose I'm forced to leave this question open for now. I think the answer is yes, but my only reason for thinking so is the way it simplifies the whole scheme.

Labels: