Artificial Intelligence as Creative Pattern
Recognition is not Enough
Sao Paulo - Brazil
In this paper, I present some hypotheses that investigates intelligence as being dependent not only on the recognition of regularities in sensory inputs but also on the creative manipulation of these recognized regularities. It is argued that common sense is a sub-product of intelligence and that it should not be thought of as being the starting point in the construction of one artificial system. Some evidences brought by neuroscience and cognitive psychology are used to suggest that the human mind does not work using logical principles and that the creative grouping of patterns into higher level structures is the root of the intelligent process. I discuss briefly the role of creativity and analogy with these pattern manipulations and their importance for the overall process. The grouping of patterns into templates and these into rules shows one alternative for the growth of intelligence from the basic percepts of the agent. A simple model of the mind is presented to consolidate the proposed architecture.
In this somewhat speculative paper I try to put together some ideas about AI, its problems and potential solutions. Often, my viewpoint will be that of cognitive psychology. Other times, that of an engineer trying to solve a problem. Other times, as a scientist in search of a formalizing path using the "clues" given by former work on the matter. But frequently, as an Artificial Intelligence practitioner trying to combine the best ideas of several distinct fields. This multidisciplinary and somewhat disorganized way of seeing AI is, perhaps, what I think is the best way to tackle this enormously complex problem.
2. Common Sense is not Intelligence
The greatest difficulty of Artificial Intelligence to date is that of making computers reason with ordinary common sense. Everybody knows that to enter a room one must first open the door. When we talk about common sense and AI, one of the first things that come to our minds is Douglas B. Lenat's project CYC (CYC 1998, Lenat 1990, Whitten 1995, Mahesh et. al 1996, Yuret 1996). It is, without any doubt, the most representative and ambitious project on common sense mechanization developed so far. The first paragraphs of the Preface of Lenat and Guha's book are well worth quoting:
"In this book, we would like to present a surprisingly compact, powerful, elegant set of reasoning methods that form a set of first principles that explain creativity, humor, and common-sense reasoning -- a sort of "Maxwell's Equations" of Thought. Yes, we'd like very much to present them but, sadly, we don't believe they exist. We don't believe that there's any shortcut to being intelligent; the "secret" is to have lots of knowledge. Knowledge doesn't have to be just dry almanac-like facts, of course; much of what we need to know to get by in the real world is dynamic: problem-solving methods and rules of thumb." (Lenat 1990) page xvii
Yes, I believe that the secret is to have lots of knowledge. Common sense reasoning demands a great quantity of intertwined knowledge of facts about the real world, facts that show subtleties that we're so used to deal with, that we hardly perceive. However, is this knowledge (or the mere possession of this knowledge) all there's necessary to achieve intelligence? In other words, is intelligence the result of just a bunch of interconnected facts about the world? I don't think so, and in this paper I will try to present some thoughts about this point of view. Let me start by introducing a short story.
Imagine a farmer, working at the field with agriculture. He have been raised in a humble family, with few possessions, but since childhood he kept a vivid interest in everything that happened around him. He loved designing mechanical artifacts, with levers, pulleys and strings. He was always curious, exploring his neighborhood and noticing the diversity of plants and animals around his house. He never had a chance to get formal education, being barely capable of signing his own name, but nevertheless he managed to grow up with a solid understanding of how the things of the world worked. As an adult and because of his inventiveness and success in his enterprises, he was able to buy his own farm and had a comfortable and productive life.
At the same time of the birth of the farmer, another child was born. It was a on a rich family, living on a large city. The child grew surrounded by all care available in the modern society. Unfortunately, because of the constraints imposed by the large and violent cities, the child had to spend most of his time inside a safe (but boring) apartment. Although the child didn't demonstrate a special curiosity by the things that surrounded him, he managed to study on the best schools and eventually graduated from a known university. After that, he got his Ph.D. in an obscure field, with the bare minimum to pass. His life was all much the same, without nothing that could be classified as an interesting achievement.
This fictitious story serves as scenery for our (difficult) question: who is the most intelligent person, the humble farmer or the Ph.D.? Does the formal education and knowledge of the Ph.D. helped him in being more intelligent? Subjected to the same new and unknown problem, which could present the best solutions?
The quest for mechanized common sense can lead us into thinking that intelligence emerges from lots of accumulated knowledge. But it does not seem to be so. It seems to be exactly the opposite: the intelligent entity is the one who will end up accumulating more valuable knowledge, because it will be more capable of deriving useful information from the flood of raw data that it receives through its senses. As we will see, this notion is of crucial importance to the main argument of this paper.
2.1 Knowledge Representation and Inference
One of the most basic principles of symbolicist approaches to AI so far requests that there should be a place for two distinct components in one intelligent architecture: the knowledge representation and the inference engine. This division is in fact very useful for our approach to the problem, because modularity seems to facilitate the design and implementation of complex systems. However, this division can mask the real nature of intelligence. Is it the knowledge representation? Is it the inference procedures? Is it a magic effect of the combination of both?
Later in this paper I will present some speculations on a different point of view: that knowledge representation and inference should be almost indistinguishable from one another and the intelligence that a system presents is the result of a simultaneous growth in its capacity to represent and reason. This approach seems to indicate the direction of connectionist systems, in which this integration is achieved naturally. In fact, several connectionists see no other way of obtaining intelligence by other mechanisms. However, I will suggest that it is possible to find non-connectionist mechanisms that exhibit similar behavior, with the added benefit of being more "economic" in terms of hardware requirements(1).
3. Limits of Disembodied Systems
We usually forget that the hardest problems our brain have to solve are perceptual and motor ones. This awareness is the result of the more recent (last decades) contributions given by neuroscience. Coordinating limbs with adaptively and with precision, walking, running, identifying objects through vision, processing audition, identifying thousands of smells, all these tasks are very difficult problems and consume a lot of the neurons of our brain, occupying most of our efforts during childhood. Yet, these are not the capacities that distinguishes us from other animals. Apes, for example, may have comparable (or even better) perceptual and motor mechanisms than most of us, because they live in an environment in which these abilities are fundamental for survival (jumping from trees, using their tail as a fifth limb, sensing audible clues indicating danger, etc).
Thus, it seems unreasonable to ascribe our distinguished intelligence just to better perceptual capacity. But the fact is that we are more intelligent than any other animal. There is, certainly, something more. What is it? Is it language? Could language explain our intelligence?
There's been a lot of thought on this matter. The main problem of this line of thought is that it is the same as saying that the most important (or even the only) form of thinking is language-related. This, too, have been object of intense discussion. However, it is increasingly clear that we have a lot of thinking that does not come close to being language-related. We use a lot of visual analogies, we utter phrases using onomatopoeia, we often visualize spatial situations. Our qualitative thinking about physical systems is rarely linguistic (although the expression of what we were thinking frequently is). And we make abstractions and comparisons that are hard to express in language.
If language were on the center of our thinking, it obviously would be all we need to transfer any kind of knowledge from one person to another. This, indeed, seems obvious for a lot of people. But this is not what happens in practice. Language's most problematic characteristic is its difficulty in conveying sensory experiences. The next section groups some thoughts addressed to those who believe otherwise.
3.1 Limits of Language
Have you tried to teach somebody to ride a bicycle using just words? No, it's not possible. Without experimentation, without feeling the difficulty to balance, the problem of coordinating hands, feet, etc., one would not learn how to ride a bicycle. The same happens when one is learning how to drive a car: you can't do it "by the book", you have to get inside one and exercise the controls. Make errors and learn to correct them. Ditto for flying a plane. This knowledge is not acquirable just by reading. This knowledge is not transferrable through language alone.
How come some mechanics can detect the malfunctioning of one engine just by hearing it? Could he teach us how to do this recognition without demonstrating the noise and without resorting to analogies between similar sounds? (you'll see in a moment why he can't use other sounds as examples for our thought experiment to be worthwhile). This knowledge cannot be written, it is not purely expressible in words.
This is why it is necessary to have laboratory experiments in college. This is why we happen to have any kind of hands-on practice sessions in most technical subjects (mathematics being one noble exception, although the "practice" sessions are done with paper and pencil and inside one's mind).
I have never climbed up the mast of a large sailboat but I can imagine how it should feel (the wind blowing in our face, the slight dizziness due to the drift of the sea, the sun warming the body, the vision of the horizon, the visual fear of height, etc). Of course, these "envisioned feelings" are a far cry from the real sensations, but they are certainly a good approximation that we are able to do. An approximation that reveals how wide and sophisticated our internal "simulation" of the world can be. You probably understood my description because you have a set of sensations (acquired by other sensory experiences) that can compose this scenery for you. Now take a native from the Sahara's desert, one person that never climbed a tree and never saw the sea (not difficult to find there). He will have a hard time in trying to understand what we felt, because he will not be able to compose the "picture", he won't be able to "run the movie" in his head, no matter how convincing and expressive our words are. The same happens in the way a native blind man imagines our description of the tonality variation of the green in a forest. He will never fully understand it. That knowledge is not in the words. It is somewhere else.
Recall the sound the mechanic is trying to teach us: this is easy if he uses an analogy with a similar sounding event (squeaky, high pitched, etc). But what happens if his apprentice is deaf?
This kind of knowledge cannot be captured by classical disembodied AI. There's nothing we can do to put it inside a workstation PC (other than, of course, linking it to a video camera and a pair of arms).
Without further considerations, this could mean the end of the "traditional" approaches to AI, something that have been raised by several critics before. Should we all join the robotics guys?
3.2 Embodying Intelligence
Are robots the only way to achieve intelligent mechanisms? Do we need a body to have intelligence? No, I don't think so. I propose here to find another "guilty" for our intelligence. It is a kind of abstraction that we naturally derive from the repeating sequence of events that we perceive occurring in the world. It is a kind of abstraction that can have parallels with the plasticity of our brain. It is an abstraction that, with some obvious limitations, can be reproduced inside a conventional desktop computer. This is, perhaps, one of the most essential proposals of this paper.
As we suggested in the previous section, human level intelligence demands human level sensory perception. But artificial intelligence (in the same way that an airplane is an artificial bird) may eventually be constructed even without all those sensory equipments. It will not be able to perform like a human being, it will never understand some of our more "sensory" analogies, it will never understand our world the way we do. But my bet is that it will be able to do a lot of useful things, like a reasonable level of natural language understanding, an intelligent (and comprehensible) teaching tutor, a creative (and amazing) assistant to the scientist, a (serious) centralizer of corporate knowledge, a worthwhile technical help desk (with infinite patience), etc. These are subsets of the old dreams of AI. And all of these applications seem to fundamentally require the basic principles that stand behind intelligence.
3.3 Is Interaction With the World Necessary?
There are some researchers that put interaction with the world on top of the priorities for the emergence of intelligence. Others think that just by perceiving, watching the events unfolding before one's eyes is enough. My position rests somewhere between both.
I think that just perceiving the regularities (and working internally on the recognized perceptions) is enough. Just perceiving the regularities of the information brought by the "input" channels is enough to gradually develop some kinds of associations that end up as being considered intelligent, provided that this is done by a capable agent (which means, with enough "machinery" to perceive those regularities and group them into significant "chunks" and, from these, to be able to predict things).
On the other hand, just this "contemplative" position is not enough to provoke two other desirable aspects: measurability and efficacy.
Measurability:With a purely contemplative position, the agent may grow intelligence but nobody external to it will be able to evaluate it. Evaluation is done by observing the agent interacting with the rest of the world. Let's take a look at one extreme example: say a rock is able to alter it's internal quantum properties based on the "information" it captures from the environment (light, temperature, humidity, vibrations, etc). These internal quantum properties have a special "mechanics" to detect and organize the received patterns into rules and "theories" that can be used to give the rock predictive abilities. However, the rock does not tell us nothing about this. So, for us, it keeps being a dumb ignorant rock.
Efficacy:I believe that the possibility of interacting with the environment is a wonderful mechanism to "disambiguate" the information we (or any other agent that proposes to be intelligent) receives from the world. Just look at what we do when our computer stops working. After exhausting "external" inspection we may eventually open it and (suppose we don't know nothing about electronics) look at it in trying to help figure out the various causes of problems that we are able to imagine. We are trying to look for things that can help us exclude several alternatives, to reduce our state-space search to a more manageable size. Even knowing very little of electronics, we may detect a pin in a flat cable that is bent. Or we may see that the golden contacts of one expansion board looks a little bit dirty, which can provoke contact failure. What could happen if we tried to remove this dirt? Remove the dirt, delete it, erase it. And if I used a rubber eraser over the gold contacts? Presto! We have just discovered one of the "tricks" of the experimented technicians. This discovery is the result of several creative interactions with the environment (and a little bit of good luck too...).
A child may use a similar process when learning to recognize one different animal: look from other angles, ask questions about characteristics of the animal, group with what she previously knew about similar specimens, and so on. There is an important point that will receive further attention later: a fraction of this "reasoning" appears to happen unconsciously and automatically.
Summarizing: intelligence can emerge from a receive-only machine (sort of an eternal "voyeur") but it will be much more effective if this agent can ask questions, propose experiments, interact with the world, drop the ball, punch our nose to see if we bleed, etc.
4. The Symbol Grounding Problem
Among the critics of traditional AI we have strong contenders like John Searle's Chinese Room (Searle 1980). Searle's argument is strong, but I want to highlight one initial observation:
"My discussion here will be directed at the claims I have defined as those of strong AI, specifically the claim that the appropriately programmed computer literally has cognitive states and that the programs thereby explain human cognition." (Searle 1980) page 67, Boden's edition
It would be very unproductive to enter here this old and extremely debated subject. I want just to raise one doubt: the disbelief that computers could have cognitive states. Obviously, this is highly dependent on our definition of cognitive states. But as will be clear later, taken not anthropomorphically, it is not unwise to ascribe some cognitive abilities in some artificial architectures.
Stevan Harnad's Symbol Grounding Problem (Harnad 1990) is, in one view, an extension to Searle's criticism. It starts proposing the "Chinese dictionary" as the first and only reference offered to an intelligent entity. From this, Harnad develops valid considerations about discrimination and identification to finally propose the necessity of "grounding" of symbols in iconic representations and these into distal objects of our sensory surfaces. This is enough, for Harnad, to propose one hybrid architecture in which symbolic elements are over connectionist modules, responsible for the support of the concepts on the sensory ground.
Harnad uses the example of a "horse" as a symbol created by our mind in an effort of identification and categorization of the world (in this case, all the sensory impressions that we have about horses). It is interesting to me that a conventional dog may not come up with an equivalent "symbol" in his brain, but it is obviously able to categorize things: domestic animals are very proficient in identifying known and unknown people. They learn very fast what is animate and what is inanimate. This seems to be on the right track, if we assume that dogs are also possessors of some sort of intelligence.
As we will see later, my proposal reminds some of Harnad's ideas, except for the important "point of support". Harnad is concerned, using other words, in building a house over a solid ground. It will be clear later that my proposition lacks this naturally constructed solid ground. But that does not mean it won't give rise to a useable house: my proposal is to put it in orbit, just as a satellite: no ground, but stable, because it uses different principles for its support. What I'm after is the house, not its support.
Sensory perceptions are clearly, in the case of humans, the main driving force of the emergence of intelligence, but it is not a fundamental part of the mechanics of it. It is interesting to conjecture what could happen if one could provide one "brain in a vat" (see Dennett 1978) with a series of impulses coming from a virtual, contrived world. If this virtual world had any kind of structure and regularity, it is reasonable to think that the brain would work toward finding and understanding it.
This paper is concerned with finding what is this core, this essential mechanism, what are the basic (and irreplaceable) elements that comprise it. It seems possible to argue that, within the limitations of specialized, dedicated intelligent artifacts, one can ground that mechanism in an artificial symbol grounding, just like the virtual world mentioned. This issue, however, will be tackled in another paper.
4.1 What is the limit
One of the important questions addressed here seems to be the support of the strong AI hypothesis: can a text-only AI system be able to develop a human-like kind of intelligence? As we have seen, without some sort of sensory input of the world the answer is, definitely, no, by no means. This conclusion took decades to be understood by the initial pioneers of the field and it is possible that even today some of them are not fully persuaded. It was this discovery that supported the flood of criticism that AI faced.
It is important to observe, however, that this aspect does not mean that one AI system couldn't develop another kind of intelligence. A different kind that uses this text-only input as its unique "sensory percept" and, using a properly constructed artificial grounded symbols, be able to reason and understand enough of the world to be useful to us. It is certainly not a bird, but it could be our airplane!
5. Bottom-Up x Top-Down, Symbolicists x Connectionists
Now that I have equated, for the purposes of our AI system, text input with sensory input, it's time to delve into another highly debated issue. Top-Down approach, that method used since AI was born in the mid 50's, tries to model intelligent systems starting from their highest cognitive functions, hoping that they can later be mapped to each sensory input by a robot. Bottom-Up approach starts from the basic sensory inputs and hopes that intelligence emerges naturally, as the system gets more and more complex. Top-Down approaches are usually associated with the brittleness of Expert Systems, while Bottom-Up followers are seen as builders of insect-like "intelligence". One could think that both methods should find one another in half the way. According to general consensus, this is unlikely.
Most of the Top-Down approaches are build using symbolicist techniques and most Bottom-Up using connectionist. This division created two opposing views of AI that rarely talk with one another.
On one side, for example, Fodor and Pylyshyn (Fodor 1988) defending with good arguments the symbolicist view point of productivity, systematicity and inferential coherence of symbol systems. On the other side, connectionists (Rumelhart 1989, Elman, Miikkulainen, Smolensky 1989 and several others) with lots of good results in low level perception (speech perception, language parsing, past tense acquisition, aphasia-like effects, etc).
This kind of contrast also appears between psychology versus neuroscience. But here, things started to get a little bit different. Interdisciplinary fields began to appear, giving rise to branches such as Cognitive Psychology and Behavioral Neurology, culminating with the creation of a new science: Cognitive Neuroscience (for a complete and up to date introduction to the field see Gazzaniga 1998), in great development nowadays.
It is not, though, totally insane to propose a similar kind of thing for the problems of AI: one sub-discipline that could focus on the intermediary area, between top-down and bottom-up.
That is one of the things I am proposing in this paper: a middle model. The most important part of intelligence, it will be suggested, is not in the lower level perceptual mechanism. All animals have comparable capabilities as ours but they don't have comparable intelligence. The secret does not seem to be in the high level cognitive abilities of the man, as modeled by purely psychological or cognitive/symbolic methods (Newell's SOAR or Anderson's ACT). These capabilities seem to be the result of other capabilities of lower level. Besides, this is what has been done for decades in the traditional AI with few truly general progress.
The secret may be lurking in this middle level, the level that gives rise to higher level behaviors but, in humans, emerges from low level perceptions. In this paper, I suggest that this low level perception does not have to be "sensory", at least in the meaning we usually employ this word. It can be anything that is intrinsically coherent.
6. Memorization, Understanding and Knowledge
As we learn more and more about the world around us, we start to notice things that are not directly visible from the our raw perceptions of it. Let's think like a child for a while.
As a child, I look to a tree and I'm told that it grows towards the sky. That's interesting, because if I drop a rock it tends to fall on the ground. Trees somewhat defy this tendency. My parents are greater than I am, and they say they have been small like me. They too grew up toward the sky. Both my parents and trees are said to be alive. So (among several other "wrong" conclusions the child infer), there's something that associates life to the opposite direction of other natural things like this tendency to fall (gravity) and the usual immobility of rocks(2).
How this conclusion survives and the wrong ones don't? Maybe it is because we get reinforcements, like when we see our dog: it moves, barks, do all sort of things that our plastic superman toy don't. There's something very different about my dog and my plastic toys. Something different makes my dog walk and run and never stay quiet in a position. My plastic toy always stay in the position I left it. Dogs are like myself, and like my parents and like the trees. My plastic toys are like rocks.
This perception, hard to put into words, may remain buried inside a child's mind for quite some time. But the effect of its presence will certainly influence all the reasoning and future perceptions of this child. Life, that elusive word even to adults like us, happens to be an intuitive and natural concept in the mind of a child. Yet, much before this child is able to learn all the words involved in this description, it will have the root of that knowledge firmly planted inside his brain. A knowledge that will add significantly to his common sense arsenal, to the point of suggesting what is alive and what is not by a simple visual inspection. Something that's utterly difficult for our computers to do (so far). So what am I trying to suggest here? Let me use someone else's words:
"As I see it, one does not learn information; one learns from information. But after learning from the information, one can very well forget all of the information, yet retain the knowledge acquired from that information. And one does not learn much by reading a textbook. You have to think about and analyze what you are reading in order to learn from it; reading is rarely sufficient." Neil Rickert, Northern Illinois University, newsgroup discussion
Yes! There seems to be something strangely incompatible between the declarative expressions of knowledge (no matter if given by language or first-order logic or semantic networks) and the kind of "intuition" that seems to be so natural in a child. That difference can also be noticed between somebody that had just read something and somebody that had read and thought about it. Something happens between the reading and the "thinking" that makes the difference between rote memorization of information and really understanding what it conveys. The mechanism involved in such process is what I'm willing to ascribe to intelligence, no matter how elusive it may appear at first.
7. Learning, Memory, Reasoning
What could be the minimum components of a system that can be said to present intelligent behavior? It must be able to learn, it must be able to store and retrieve information and must be able to reason, to produce new conclusions. Lets spend some time on each of these aspects.
Frame-based knowledge representation systems were important introductions to the Knowledge Representation arsenal of a few decades ago. The ideas of inheritance, default values and others were seminal to several other KR schemes. But it is surprising that other important thoughts of one of its proponents, Marvin Minsky, were easily forgotten:
"Thinking always begins with suggestive but imperfect plans and images; these are progressively replaced by better -- but usually still imperfect -- ideas." (Minsky 1974) page 3
Everybody who works with AI knows about Tweety. It is that featherless bird, the penguin, used in almost all textbooks about AI, mainly in the section that treats about logic. They all seem to explain how logic solves the problem of Tweety, being a bird, is not able to fly because it does not have wings. They say Tweety is abnormal and the logic expression constructed with this new condition is this:
x, (Bird(x) ~Abnormal(x)) -> Fly(x)
Things start to get a little bit complicated if we buy a ticket from United Airlines and give it to Tweety (Ginsberg 1993). Is this the way a child learns things, through logic entailment? Is this the way our AI systems are supposed to work?
Imagine if our AI system started learning about birds and the first example given was that of a penguin. It can't fly, it is told by the "teacher". So, the system will assume that birds don't fly (A bad teacher? Maybe, but the system must be resilient to this). Later, the system learns about eagles. It will mark eagles as an exception, because it is told that it flies. Now, it is told about pigeons, then owls, then flamingos, and so on. Should it mark all of them as exceptions? Or should it revise his default value?
I offer here another way to see the problem. The system does not learn that birds fly and that there are exceptions. The system learns that it have examples of birds who fly and example of birds who don't and that the number of flying birds appears to be much greater than those who don't. This conclusion is also the result of perception, an acquaintance of internal knowledge, in a reflexive manner. It may find out that birds who fly is a good guess, but it is not certain. There's a good possibility that a recently known bird would fly, but this is not for sure. If we are talking about birds that live in the desert, the answer will possibly be different. This is the way the world works: vague, inexact, cross-linked, context-dependent facts.
And this seems the way children learn. They take a book and read about kiwis, the bird from New Zealand with vestigial wings and hairs instead of feathers. How should the child proceed from here? He/she will generalize (wrongly, in this case) that birds have hairs instead of feathers but the child will leave this "learning" open for future revisions, adapting their generalizations when he/she finds other information or when it asks a "teacher" (parents, friends). We know about this, just try to remember how often we laugh and have fun with the things children say and ask. They seem to be fast on some wrong (but funny) generalizations, but they are equally fast in correcting the wrong ones.
7.1.1 Learning Wrong Things
There's nothing wrong with learning the wrong things, as long as you keep your mind open to correct that information in the future. In fact, this kind of wrong learning is also necessary in experts (Minsky 1994).
My main point here is that there is not such a thing as a good teacher. If the system is supposed to interact with the world, then it must be prepared to learn fragments of information that, more often than not, doesn't make sense at all. The best learning mechanism appears to be the one that, besides being good at generalizations (induction again!), besides extracting information from almost anything, also supports future completion/correction of past fragments.
Speaking of memory reminds any computer buff of things such as indexing and retrieval. We think about SQL, and B+Tree databases. We think about RAM and address lines. Hash tables and linked lists. Are these concepts useful to AI at the level we are discussing things? Let me propose one experiment.
Send your 5-year old child to the jungle. Give him a cellular phone and tell him to find two different birds. Now, over the phone, tell him to describe to you the birds in a way that you can identify and name both. He will tell you about the length of the legs, the size of the head, the color of the feathers, the curvature of the beak and more. Eventually, you will identify them as being, for example, a flamingo and an owl. The interesting point to note here is how the names popped up in your mind suddenly: they emerged at once (even as just a strong suggestion), just like if we had found the position of the last piece of a jigsaw puzzle.
This is memory, in human terms. It seems to be a collection of "items" linked to one another in such a way that pulling one item brings lots of related stuff (as if we touched our finger on a very viscous fluid and slowly lifted it; see figure). Pulling in another point will bring to the surface another "peak". The moment of the discovery seems to happen when we find one (or more) links between these peaks.
When we think about all the links between world knowledge that a person has, not only of birds but of cities, liquids, trees, time, geography, airplanes, history, dimensions, climate, metals, politics, animals, cars, peoples, computers, factories, professions, etc., we end up with a lot of knowledge that transcends immensely the mere words in a lexicon and the ontological links between nodes. It's simply astronomic, and it's inside the head of a conventional man, which spent several years to acquire it. My point here is that the words and the meaning connections between them doesn't seem to be enough (this is obviously not new). It seems that important are not only the semantic connections but also the "usage" of connections, the links that reflect clearly the way the concepts appear in the world and the daily usage of phrases.
Logic has been proposed as the fundamental mechanism of thought for centuries. The logicist approach to AI, with John McCarthy among the most important proponents, is seductive in several aspects. After all, is there anything so elegant and simple to understand as modus ponens?
If x is a bird, then x flies
Bird(x) -> Fly(x)
Tweety is a bird Bird(Tweety)
So, tweety flies Fly(Tweety)
Modus tollens may be a little bit less intuitive, but is still very appealing:
If x is a bird, then x flies
Bird(x) -> Fly(x)
Tweety does not fly ~Fly(Tweety)
So, tweety is not a bird ~Bird(Tweety)
However, there are several psychological indications that our brain does not work this way. Modus tollens is particularly problematic. Note that one could claim that it is not fundamental that one AI system duplicates the way our brain works and that logic could be the "natural" way to evolve to better thinking mechanisms. But given our advantage on common sense and analogical reasoning and the failures of logicist AI so far, it seems interesting to look closer at what psychologists say.
One of the contributions of Cognitive Science to AI was a new way to understand our reasoning. Cognitive scientists developed several experiments which shed new light over old subjects. One of the more known of such experiments is Wason Selection Task (Thagard 1996, Wason 1966; see also Wharton 1998). The subjects are given a set of cards with a letter in one side and a number on the other. Only one rule is presented: If a card has an 'A' on one side, then it necessarily should have the number '4' on the other side. Now consider the following set of four cards (only one side shown):
A B 4 7
The subjects are asked to determine what cards one must turn to see if the rule still holds. The test shows that most people select to turn card 'A', to see if there's a corresponding '4' on the other side. This is application of modus ponens, that in our example said "If 'A', then '4'". Point to the logicists.
However, besides suggesting to check the card with '4' in it (having no logic at all), lots of people failed to check the number '7', which could constitute a problem, if it had an 'A' on the other side. This should be very straightforward, if our mind were logical, because it is the application of modus tollens (If 'A' then '4'; '7' is not '4', so by modus tollens, not 'A' should be on the other side). This is a suggestion that people are not oriented toward logical conclusions. More importantly, this can lead us to dismiss psychologically oriented investigations in AI and go to correct this deficiency of human reasoning with the implementation of a more reliable reasoning system in AI systems. However, there's more to know about this experiment.
Further investigations on the Wason Selection Task revealed new and interesting results. When the experiment is repeated using not meaningless letters and numbers, but concepts that can be thought inscribed in a context, the outcomes were different.
Imagine one adult bar that does not allow entrance of people with less than 18 years old. This is the equivalent of the rule "If A, then 4" showed above: "If Age>18 Then you may go inside the bar". Now suppose that we put in one side of the card the words INSIDE-BAR or OUTSIDE-BAR and on the other side of the card the age of the people. The question is, which of the letters below we would have to turn to know if the rule has been broken?
INSIDE-BAR OUTSIDE-BAR 25 16
It is clear that we will have to check the card 'INSIDE-BAR', to see if there is somebody with less than 18 on the other side of the card. It's also clear that we should check the card with '16': it can be a minor trying to stay INSIDE-BAR. So why is this easier to deduce than the first experiment? If our minds were ruled essentially by logic, there would be no significant difference between these experiments.
Several explanations were deployed to explain this kind of result. Cheng and Holyoak (Cheng 1985) introduced the Pragmatic Reasoning Schemas. Johnson-Laird and Byrne (Johnson-Laird 1983) argue that deductive reasoning is done through mental models.
For our purpose in the present paper I will introduce one word that, hopefully, may subsume both terms: it is "pattern". As we will see later, my use of this word is diverse, depending on the "level" in which it is applied. Unfortunately, this word is also used in contexts different than the one I'm proposing here. For now, let's assume that pattern means "symbolic pattern".
8. Why Patterns?
Artificial neural networks (ANN) is the answer of the connectionists for the AI enigma. With constructs that most closely resembles the neurons we have in our brains, it is argued that this is a better way to go, instead of the symbolic approaches. Even being a far cry from the real ones, ANNs demonstrate some important concepts related with patterns: classification and recognition. The fact that ANNs use distributed representations of concepts is argued, by some, to be its main advantage. I am tending to stick with another point: again patterns. But instead of focusing just on the patterns we can find in raw input signals, I would like to propose finding a place for patterns in all levels of the cognitive architecture. This is different from the usual approaches of using patterns just on the initial levels (closer to the sensory signals).
We are able to identify the mood of somebody just by reading what he writes, the "way" he writes. How that works?
We are able to easily determine the best way to solve one hard problem, if we happen to have solved similar problems in the past. Sometimes, we just can't explain why we have solved one problem using a particular technique. How we knew it would work? It's not always clear. That's why specialist knowledge is hard to transfer: it is necessary a "hands on period" in which the student learns something that cannot be put into words.
I envision patterns in all levels of our cognition. In one level, we may have thousands, maybe millions of learned patterns about, for example, discourse structure. In another, we will have hundreds of thousands of heuristic patterns, telling us what works and what don't. If we happen to come up real fast with the answer to a problem, it is like if we already had it solved in our memory.
As a simple example of patterns in high level positions, consider the following sequence of phrases:
i) I told you, this will not work
ii) We told you it's not going to happen
iii) As I told you, our department is not effective
After some time listening to these phrases and associating their occurrence with emotional responses from who uttered it (anger, impatience or criticism), it is natural that one come up with the following internal "rule":
[somebody] "told you" [something] => Negative Emotional Involvement
This is a rule because it have one antecedent and one consequent. But it is also a pattern, because it presents conditions to match several occurrences. If one is listening "live" to a person uttering these phrases, the emotional involvement will be even more nitid because of other details, such as voice tone, gestures, facial expression. Under that circumstances, you may perceive additional details that can make a difference between an angry response and an ironic comment. But if you are reading this in a paper, you don't have these visual and/or auditive clues. Nevertheless, we are able to detect that there is some kind of emotional condition involved, even if we just read it.
What I propose in this paper is that the majority of our knowledge, that "lots of knowledge" that Lenat referred, is not in the form of logic expressions, explicit declarative assertions or simple rules but are, in fact, in the form of vague, fluid and flexible patterns, similar (but not limited) to the one shown. Not one, ten or one hundred, but millions of them. Of course, their organization, structure, format and link with one another is very important. To further define these aspects, we should keep one eye on the clues offered by the human cognitive capabilities.
One of the most compelling and interesting knowledge representation formalisms introduced some decades ago was frames (Minsky 1974, Rosenberg 1977, Goldstein 1977). If you asked me what was the characteristic of the frames that I liked most, I would answer inheritance. It is a mechanism that naturally supports the beginning of inductive reasoning and seem to have some psychological plausibility. For the sake of the ideas presented in this paper, it will be clear that I will equate inheritance reasoning with pattern copying.
9. Structures and Stories
Suppose you started talking with a man in a bus stop. He told you some things about human relationships and the importance of education. He spoke to you about the difficulty of understanding children sometimes and the difficulty of making them understand you. After some time of dialog, you will have a sensation that this person speaks as if he had a kid. You ask him and he confirms: he is a "fresh" father, with a newborn baby.
This kind of "intuitive perception" happens frequently during our daily life. Instead of ascribing this to something elusive or mystical (frequently called our "inner voice"), we can get better explanations if we analyze the importance of stories in our lives and the way information is conveyed through them. Speaking of stories and Artificial Intelligence is the same as speaking of Roger C. Schank:
"It is likely that the bulk of what passes for intelligence is no more than a massive indexing and retrieval scheme that allows an intelligent entity to determine what information it has in its memory that is relevant to the situation at hand, to search for and find that information" (Schank 90) page 84
It is easy to note that any story can be seen, from a sufficiently high level, as one single pattern. Then, each element of that pattern can be "broken" into successive parts (which are also patterns), until we end with phrases which, in turn, can be thought of as being composed of syntactic patterns. Most of the words can be further broken into morphological structures.
Stories as patterns are not new. Scott Turner (Turner 1994) presented a computer model called MINSTREL in which storytelling is done in a creative way. However, one of the first observations Turner did was that form alone is not enough. Patterns of stories may convey form, but may fail miserably on content, to the point of being incomprehensible or even laughable. It is the meaning (or world knowledge) the glue that fits each part of the form together. But what if meaning itself is also made of patterns?
10. Analogies and Fluid Concepts
Douglas Hofstadter wrote a whole book about concepts and analogies. Not simple concepts, but fluid concepts. Not simple analogies, but creative ones. It is still not entirely clear how creative ideas flourish in our brain, but we are getting closer:
"The essence of perception is the awakening from dormancy of a relatively small number of prior concepts -- precisely the relevant ones. The essence of understanding a situation is very similar; it is the awakening from dormancy of a relatively small number of prior concepts -- again, precisely the relevant ones -- and applying them judiciously so as to identify the key entities, roles and relationships in the situation. Creative human thinkers manifest an exquisite selectivity of this sort -- when they are faced with a novel situation, what bubbles up from their unconscious and pops to mind is typically a small set of concepts that "fit the glove", without a host of extraneous and irrelevant concepts being consciously activated or considered." Hofstadter (1995) page 210
Hofstadter's work originated Copycat, an analogy-making program that operates on letter sequences (Hofstadter 1995, Mitchell 1993). Copycat operates using a process known as parallel terraced scan, a term coined by Hofstadter. This method, in summary, works by exploring several courses of action in parallel, initially in a highly nondeterministic, random order. As the depth of exploration increases, the method begins to give more and more attention to selected entries, those who seem to be more relevant. The random component is still present, but it is not preponderant anymore. After some time, the essence of the processing goes from highly parallel and random to more serial and deterministic.
Copycat uses a slipnet, which is a connectionist network of concepts(3). Several codelets work over this network. Codelets are small programs that operate in a specific part of the problem and are run in parallel.
With this architecture, Copycat is able to develop interesting solutions to analogy problems in the letter domain (for instance, if abc maps to abd, where ijk maps to?). According to the proponents of Copycat methods, this analogy problems are directly related with intelligence. Here's what says Robert French, who was a member of Hofstadter's group:
"Perhaps the most fundamental assumption underlying the research presented in this book is that the cognitive mechanisms that give rise to human analogy-making form one of the key components of intelligence. In other words, our ability to perceive and create analogies is made possible by the same mechanisms that drive our ability to categorize, to generalize, and to compare different situations. These abilities are all manifestations of what I will refer to as 'high-level perception'. Unlike low-level perception, which involves primarily the modality-specific processing of raw sensory information, high-level perception is that level of processing at which concepts begin to play a significant role" (French 1995) page 1
This is a very important clue to explore: intelligence (at least in the sense of human-like) appears to be directly related to these high-level perception of concepts, in which analogies contribute importantly.
But Copycat have some problems. The most important, in my opinion, is the fact that it does not learn. That means, it does not gets better results with experience. Subjected to the same problem, the answers may be a little bit different, because Copycat carry a random component. But one cannot say that it improves its answers on successive runs.
The other problem is that Copycat is not able to develop its own categorization. It does not create new concepts in its slipnet. As Gary McGraw says (McGraw 1995), Copycat uses "spoon-fed" roles, which causes it to be "doomed to succeed". We have seen that this is not good, because there's much to learn from one's mistakes. What would we get if we adapted Copycat's mechanism with other techniques that discovered new forms of combining the letters?
I'm proposing a mechanism that receives information from the external world and apply to this information pattern discovery algorithms that works using the concepts of Copycat's terraced scan and parallel exploration. But instead of using just "spoon-fed" rules, use "invented" ones too. From a set of input percepts (for example, a spoken phrase), the agent tries to generate lots of possible candidates for relations. Maybe tens, hundreds or even thousands of attempted correlations not only between the concepts within the phrase itself but, more importantly, with everything else that is "near" the context in which it was uttered (the proximity of the context is very important). This somewhat "dumb" process, with a high random and mindless component at first, is somewhat similar to the initial phases of Copycat and are similar to the perceptions of a recently born baby.
Now the important part is that the invented correlations are submitted to the same process: verification of regularities and things that can be seen again as patterns occurring regularly. This seems to be a mechanism that is able to discover which methods give good results. The methods themselves were produced by this "random induction". Their evaluation will follow the same steps.
After some time, most of the attempted correlations will die away, because there will be no reinforcements(4)
Making a parallel with the human brain, this would be the stage in which the baby is now a child, trying to learn the first spoken words (later I will return to this issue). As the number of experiences grow, so do the number of "confirmed" patterns and "slipnet concepts" added. This is where the child's experience with the world helps a lot: the more the child interacts with it, the more confirmed patterns it will get.
I like to compare this refinement process with that of disambiguating: some times, the child will receive conflicting information. Instead of throwing out one of the instances and keeping only the most probable, the child keeps both of them. In the future, there will be another significant experience (or the child may decide to ask somebody) that will provide the necessary disambiguation. Even when one pattern is preferred over the other, the "loser" pattern don't seem to be disposed: the child may end up discovering that the confirmation she received happened to be wrong. In fact, this happens frequently and reinforces the suspicion on the child that even confirmed things can be altered in the future. I believe that the perception of the conditions that caused this "bogus" confirmation in the first time is another important thing to be learned by the agent, in a way that will sum up as being the "natural" critical thinking that, in different scales, all human beings have.
It is in this phase that I believe analogy begins to be applied, although not consciously by the child. This mechanism appears to benefit a lot from comparison with known experiences. Facing new situations that appears to be similar to previous ones tempts one to use the same mechanisms that were used successfully before. When it is found one analogy that almost fit the case at hand, it seems wise to apply to it a small "transformation" that may fit into the current problem. This transformation can be "discovered" using the same principles we've been using so far: at first, randomly and in parallel, then successively in a more deterministic way. If one of these transformations (or a group of them) leads to good results, this transformation will be learned and stored as one method (heuristic?) with a fair chance of giving good results in forthcoming experiences (reinforcement may, in the future, classify this transformation as even more important). Again, patterns of transformations discovered and confirmed mostly in the same way as their lower level cousins.
11. Patterns in, Rules out
In the previous section, we saw how patterns can be used to provide learning of experiences. But the process does not appear to stop there. After some time, the child's brain will perceive great similarities between discovered patterns. This perception may use the same mechanisms that were used to discover the initial patterns. The end result is finding a set of patterns that seem to have common points. Then, it is perceived that some patterns can be summarized into one or more templates.
A template may be seen as a special pattern that matches several other patterns. It is an expression of an inductive, generalizing mechanism. Recall that we have suggested that some of the patterns subjected to this process will not be the product of direct sensory input: they may have been generated, "invented" by the agent. The same process of invention should be applied to templates.
The mechanism that grouped patterns is now applied to templates: several coherent templates may give origin to a higher level structure, a rule. A rule summarizes what several templates and patterns indicate individually. Although a rule is much more precise, it also is, because of this, more inflexible(5).
This rule doesn't need to be definitive, it may be just tentative. This seems to happens frequently with children, they often make those unusual (and sometimes funny) conclusions that we patiently must correct (or else, future experiences will do).
The emergence of rules from patterns and templates can also be seen as an attempt of our brains to control or reduce combinatorial explosion. The process of internal pattern generation proposed in the preceding sections are prone to cause this problem, so these mechanisms of reduction seem to be highly necessary. This can also be understood as a search, by the agent, of the best method to compress the information. This vision is discussed extensively by J Gerard Wolff (Wolff 1993).
Rules don't seem to be static objects. Future reinforcements may further strengthen or weaken any rule. After some time (which means, after several other experiences), the rules that end up being bad misfits will be thrown out (they will be forgotten, as vanishing synapses and dendrites, in the case of human brain or of "memory space" reclaimed, in computers; the same process obviously occur with patterns and templates). The rules that survived will gain a higher position, an indication that they hold often.
The principle of patterns originating templates and these subsuming into rules does not seem to be the end of the story either. If a person happens to have several interrelated and solid rules, he may be tempted to group them into a single substrate. I found a good name for this: a theory.
Theories are made of solid rules that not only hold by themselves (which means, have enough templates and patterns to support it), but also support one another, in a coherent way. Good theories can be traced back to the templates or patterns that support it(6). Bad theories contains fragile constructions that are easily broken: just take a problematic rule and ask "Why?". If, to support this rule, you use only other rules of the same level, then your theory does not stand by itself. Again, this remits us to the importance of the symbol grounding, which is what is able to support this edifice.
12. Rules in, Patterns out
We could stop with the preceding section, as it seems to complete the perception / learning / reasoning cycle. However, there's more to explore and what seems to suggest this is one of the fronts in which AI has been severely criticized. Hubert Dreyfus is a long time critic of AI. His critics in (Dreyfus 1992, Dreyfus 1979) can make us think. However, I will take one specific focus of his criticisms (Dreyfus 1986).
What is the fundamental difference between a novice who have just finished a specialization course in an exotic matter and one expert in that same subject? Why after reading a book about this subject, one does not have the depth and understanding that another person who read the book and thought about it has? What happens in our mind after we learn the concepts, after we are exposed to the rules and theories of the subject at hand?
As Hubert pointed out, the difference seems to be lurking in the fact that an advanced novice just knows the rules, he "knows that", but an expert knows more than the rules, he "knows how". The comparison with bicycle riding is, as I said before, very illustrative and Hubert uses it: nobody can be proficient in bicycle riding without doing experiments and feeling directly all the related sensations.
I have told before that this can also be a deficiency of our language: it is not powerful enough to communicate the most deep and subtle sensations. This suggests, again, that we have in our brain one "level" that is additional to the language, something that is behind the memory of the rules we learn and consciously remember. This level seems to be closely associated with rules, but are not as clear and visible as the rules themselves. Hubert also comments about the "intuition" of the experts. His phrase about this tells us what appears to happen in their minds:
"Proficient performers do not rely on detached deliberation in going about their tasks. Instead, memories of similar experiences in the past seem to trigger plans like those that worked before. Proficient performers recall whole situations from the past and apply them to the present without breaking them down into components or rules." (Dreyfus 1986) page 33
So an expert does not need to break a previous experience into rules. What are, then, the individual components of this previous experience? May I suggest patterns and templates?
Facing a problem in a specific subject forces the novice to think a lot, consciously using the rules and methods he had just learned, trying to adjust individually each rule (antecedent/consequent) to the problem at hand. This takes time and effort. On the other hand, the expert seems to look at the problem and come with a solution easily, as if it popped in his mind. "An expert is one who does not have to think. He knows", says Frank Lloyd Wright.
How can we understand the mechanisms that are involved in these processes? Is it replicable mechanically? I am about to suggest one possibility.
This is, by far, the most speculative part of this paper. Here I will propose that the things we learn, no matter if reading a book or doing a lab experiment, end up being accumulated, after some time, in the same "common store" and using basically the same basic structures, which are inaccessible to our conscious thinking. This deep store of information is what is usually called "intuition", a subject, as we have seen, often associated with mystical beliefs. It seems to resist all our conscious scrutiny effort and that is why it is so difficult to formalize it. We consciously know for sure only the "tip of the iceberg", exactly that part of the knowledge that traditional symbolicist AI is trying to implement, unsuccessfully, for decades.
Observations of the nature fill our brain with sensory experiences. From these experiences, we derive the regularities that make the patterns. But, as adults, we also learn a great deal through language. In this case, knowledge may enter as rules, when we read a book, when we see a diagram or listen to a lecture. But it turns into expert knowledge only after it is "interiorized" and becomes part of the intuitive (and unconscious) mechanisms of our mind. I suspect that this mechanism is strongly dependent on pattern recognition, generation, matching, adaptation and analogy, just the reverse way of the sensory perceptions. Unfortunately, getting evidences to support these conjectures is not easy and because of that this paper will not be able to get out of the speculative closet for a while. But indications that this supposition may be reasonable are starting to appear.
A research article published in the Science magazine could show us some interesting results (see Shadmehr 1997). The abstract of this paper reads:
"Computational studies suggest that acquisition of a motor skill involves learning an internal model of the dynamics of the task, which enables the brain to predict and compensate for mechanical behavior. During the hours that follow completion of practice, representation of the internal model gradually changes, becoming less fragile with respect to behavioral interference. Here, functional imaging of the brain demonstrates that within 6 hours after completion of practice, while performance remains unchanged, the brain engages new regions to perform the task; there is a shift from prefrontal regions of the cortex to the premotor, posterior parietal, and cerebellar cortex structures. This shift is specific to recall of an established motor skill and suggests that with the passage of time, there is a change in the neural representation of the internal model and that this change may underlie its increased functional stability." (Shadmehr 1997) page 821
In short, a novel mechanical task performed by a subject under PET (Positron Emission Tomography) revealed that after some time this task became progressively better executed (stiffness of the limb decreases, movements become smoother, etc) and the internal (neural) model responsible for the task accomplishment changes position within the brain. The mechanical task, novel and irregularly performed at first gives rise, after some time, to a better performed one, in a different position of the brain. The results obtained by this research is similar to other discoveries. Cohen et. al (Cohen 1997) showed that early age blind people had their primary visual cortex activated because of Braille reading and other tactile discrimination tasks. Normal sighted people don't show this effect. Similar results were obtained by (Hamilton 1998). In all these cases, the initial conscious sensations were substituted by automatic, more effective, unconscious ones.
Thus, it is not unwise to suppose that this same mechanism can be responsible for the interiorizing of knowledge that happens with the transition from novice to expert. In the same way that a mechanical movement turns out being more precise and smoother, the reasoning of an expert turns out to be deeper, faster, more effective. Here is a way to experiment personally with this.
If you drive the same car for some time (some months, for example) you end up getting used to it, just like if you "dressed it". You are able to control it perfectly, deviate from obstacles with ability. But you don't seem to think very much about these tasks, you just do it, so much that you can occupy your mind with other thoughts or talk with a somebody.
However, things get different when you change your car. With the new car you will take some time to get used to it. It seems that you "feel" more of this new car and sometimes doesn't know exactly how to handle it. The accelerator looks different, the sensibility of the brakes is strange, the precision of the steering wheel is all new. You're a novice in that car and when you're driving it you must focus your attention on what you're doing, having few spare time for other thoughts. But after some time (days or weeks) you again get used to it, to the point of not identifying those "strange" feelings you had previously. They had disappeared, they became interiorized, invisible to the conscious. Now, you may think again about anything else and keep driving with good performance It is like if you had an automatic pilot that drives the car for you while your conscious mind is free to do other things. You became an expert.
Let's get back to Shadmehr's article. The important point here is not only the alteration of the part of the brain responsible for the sensorimotor coordination, but also the improvement in performance. The progressive smoothness of a movement, no matter what the mechanism involved in the brain, may be considered as an enhancement, a completion of previously irregular, inflexible movements. This completion suggests the rounding of rigid trajectories, like the interpolation between points with a smooth curve.
In the case of acquired knowledge, this "filling" may be seen as the bridging of each new concept or rule learned with something that joins smoothly this rule with the others (recently learned or already acquired) and here I suggest that this is done through a descending path that arrives at the templates and patterns. These bridges or links are what I'm proposing to be the fundamental difference between novices and experts.
And these bridges are the essential mechanisms to be implemented in computers to transform them into intelligent entities. This mechanism can also be seen to operate, in humans, during the acquisition of rules and methods through reading or verbal exposition or diagrams, whatever. After some time of exposure to these methods, it is up to the brain to develop the sedimentation of these rules into a fertile ground that contains the necessary links to previously known concepts and also with one another.
I envision this process as being exactly the opposite of what has been discussed in the previous sections. Recall that we have started from patterns that turned into templates and these turned into rules. Here the rules, learned mostly through language, visual diagrams or audible information, are grounded after some time in templates and patterns that stand firmly rooted into the previous knowledge and sensory experiences of the agent. The same mechanism used to ground one new rule to existing templates can be also used to link one new rule to another, solidifying the whole group.
13. The Importance of Analogies
One of the most important aspects delineated in the previous sections is the pattern completion mechanism, the "bridging". According to this hypothesis, new knowledge received, for example, through language, must have several holes filled with bridges. As we have seen, the brain appears to automatically construct those bridges during the "sedimentation" process.
This demands, obviously, a lot of effort. Couldn't nature find a way to accelerate or improve this pattern bridging? I suggest that one good way to do this is to use good analogies. However, differently from the way we often consider analogies, the use I make here is internal, unconscious and automatic.
The external way of using analogies is well known. One way to understand the movement of electrons in a copper wire is to think about water flowing in a pipe. This mechanism is useful, according to what I'm suggesting in this paper, because it allows one to copy already established structures to another position, filling lots of holes at once. This does not mean that the copied structures will be definitive: it is just tentative, it is just a suggestion that may eventually guide the search, by the agent, of confirmation details. This often happens consciously, but I believe that it also happens unconsciously.
Just to try to identify how analogies may be used unconsciously, let's see one example. Can you tell me what is the similarity between money and gasoline? Can you tell me what is the difference between them and, say, a shirt? I don't think we can find easily any good analogies to "pack" together these things (money similar to gasoline and both are different from shirts). But that does not mean that the unconscious mind of a child can't see the relationships!
Some time ago, my 7-year-old son asked me what happens when we lend money. He was curious that the borrower returns not the exact piece of paper we lent, but another one, maybe even more pieces of paper than just one. We know that all that matters in this transaction is the value. The same happens with gasoline: if your car stopped without it, you can borrow some liters from a friend. Later, you will return other liquid, not the same. You may even give him the equivalent in money, instead of gasoline. But when you lend your shirt to a friend that have just fallen into mud, you happen to expect, later, the devolution of the same shirt, not another.
It is this similarity, on the one hand, and the difference with shirts, on the other, that captures some very vague concepts like disposable, consumable, reference values and concepts like personal, possessed, not transferrable goods. These are hard to define concepts but that happen to find their way in our minds through simple analogies, that appears to be the result of unconscious perceptions of similarity and difference. This analogy will be used internally by the agent as "leads" when it faces new concepts that involve something similar. It may help substantially during the categorization process.
It is the continual perception of these similarities and the use of previous experiences that I am willing to ascribe to intelligent entities. And as I said previously, not one, ten or a hundred. But millions of them.
14. Creativity as Generation of New Patterns
Creativity has been the subject of several investigations by cognitive psychologists. It is considered as one of the fundamental aspects of human intelligence. Yet, so little is known and that is because creativity is such an elusive phenomenon: it is unpredictable and hard to control. Most creative people don't have a single clue to explain how it works. As (Boden 1994) puts it, creativity is a mysterious, unconscious process. Unconscious? That may remind us of our recent proposals.
Boden brings up conceptual spaces as the area where a generative system defines a range of possibilities (examples of conceptual spaces are the set of chess moves, the allowed combination of molecular structures, a set of jazz melodies). Creativity operates on this space, looking for unexplored areas, "borders"with infrequently used concepts. Then, with a small change, a major transformation may take place.
Very often, creativity seems to be the result of novel combinations of old ideas. It seems to be improbable, unusual combinations, that have not been tried before and that are immediately recognized as having high value.
There are several changes that can be used to affect the conceptual space: you can drop rules of the generative system, negate a constraint, suppress a constraint, etc. The point is to introduce slight variations that may produce outstanding results. Although one can consciously try these methods when analyzing one problem, in general this process does not pay off. Most of the time this process produces valuable results when it occurs spontaneously, unconsciously. After some time "experimenting", the unconscious discovers something that emerges in the form of one "Aha!".
Chomsky called language generation as being creative, because of its infinite fecundity. In my opinion, creativity appears not only on language generation or during the "invention" periods that we experiment one time or other. I think creativity is a process that happens almost continuously, which means also during the "internalization" of learning, that phase where the knowledge is transformed from rules to templates and patterns. This bears some relation with what we saw before, when we talked about Copycat.
Roger Schank proposed a mechanism called Dynamic Memory which can be used to mechanize creativity (Schank 1988). The main idea of dynamic memory is the integration of new information not only with current knowledge structures but also with relevant previous experiences of the agent. The basic unit of dynamic memory is the MOP (Memory Organization Packet), which can be seen as generalizations of experiences like going to your office, making a hotel reservation, etc. All this seems similar to our previous suggestions: templates and patterns.
My proposal for the core of creativity depends a lot on the initial phases of Copycat: random and non-deterministic exploration. The idea that this process happens frequently at the unconscious level leaves us (I mean, our consciousness) out of the process. But a computer implementation of this method seems straightforward: random "experiments" with open areas of our web of patterns and templates, trying unusual combinations, using uncommon analogies. Then, recursive application of these processes to the obtained patterns and templates. Among several suggestions that we can try to enrich the creative process (in all levels), let's list a few (Chernow 1997):
a) Generalize: Go beyond the set of problems to a more general area.
b) Specialize: Look to details of the problem that may point to a solution.
c) Miniaturize: Use the solution of a larger problem as model for the current one.
d) Change Direction: Start by the end, or by the middle.
e) Suspend judgment: "forget" momentarily one or more restrictive rules.
Starting with such a list may seem artificial. This, of course, does not prevent the agent to discover, by itself, more methods that can be used to "trigger" the appearance of good results. This should happen naturally, because the agent should perceive what were the methods that worked best in the past and try to use them again in new experiences.
15. Poverty of Stimulus Revisited
One of the supporting points of the theory of Innateness of Language is Chomsky's argument from the poverty of the stimulus. Although this is still a highly debated area, it seems that the innateness argument is losing terrain. Several authors that defend the language organ (see for example Pinker 1994) use the rate of grammar learning by children as one of the indications of innate mechanisms in action in the brain. They say that conventional inductive mechanisms cannot explain this rate of learning.
There is, of course, several challengers like (Deacon 1997) with arguments from neuroscience and evolutionary anthropology and (Elman et. al. 1996), with some indications that recurrent neural networks present similar learning abilities as those presented by children. Although I sympathize with both (Deacon and Elman), I see alternatives to the premises of the latter (use of neural networks). It seems that the mechanisms delineated previously (initial random creation of experimental patterns, terraced scan, analogical mapping, recursive application of the mechanisms on invented patterns) can eventually offer to symbolicists alternatives to the purely connectionist methods.
Another good source of information about this subject is Morten Christiansen's Ph.D. thesis (Christiansen 1994), in which he also defends recurrent connectionist models as being able to "learn complex recursive regularities". Let me highlight here two words: recursive and regularities. As we have seen, the proposal explained earlier is an interesting way of obtaining these characteristics.
In (Elman et. al. 1996) it is presented also connectionist solutions to the "English past tense" problem. It is known that children often "overregulate" the past tense formation with constructions such as "go-ed". Pinker and others explain this phenomenon by using two different mechanisms, one that accounts for the rules of adding "-ed" to the verb and other that treats the exceptions (irregular verbs) through rote memorization. Elman presents a connectionist system that is capable of treating, within a single mechanism, the regular and irregular forms of the verbs. This could be faced as an indication pointing to the dismissing of a symbolic rule-based mechanism in language. But recent evidences involving aphasic patients (Marslen-Wilson 1997) indicate that regular and irregular verb forms are processed in different parts of our brain. And new evidences of this using fMRI are also appearing.
All this is to say that this tendency of overregulation of past tense may further indicate that the use of rules, templates and patterns as distinct (but collaborative and interwoven) mechanisms is reasonable(7).
The Poverty of Stimulus has also been the subject of another important study (Landauer 1997). Latent Semantic Analysis (LSA) is a technique being employed in several applications involving text (one-to-many comparison, essay grading, pairwise comparison, cross-language retrieval, etc., for more information see LSA 1998). LSA is a technique based on LSI (Latent Semantic Indexing, see Berry et.al. 1996). Both techniques rely on a linear algebra technique called SVD (Singular Value Decomposition, more information on Berry 1994). Using this mathematical technique, no direct involvement with meaning is achieved. But the conclusions are bold (Landauer 1997):
a) LSA learns a great deal about word meaning similarities from text.
b) More than half of the word knowledge learned is the result of indirect induction, effect of exposure to texts that don't contain the original words.
c) It appears safe to conclude that there is enough information contained in human language texts to allow comparison to the learning exhibited by humans, when subjected to similar conditions. In particular, LSA seems to present comparable performance to human induction without the need of language-specific innate knowledge.
d) The rate at which LSA acquires word knowledge is comparable to the rate at which school-children gain it when reading.
This method, as well as all techniques that are sensorially disconnected from the world, suffers from the symbol grounding problem. However, this is not enough to avoid the gaining of some induced semantic regularities (according to Landauer, LSA, for example, could put close to rabbit the words dog, animal, object, furry, cute, fast, ears, etc)
It seems inevitable to conclude that there is something more in text than just words linked by meaning and grammar. It is the perception of these patterns and recursive regularities that can bring to the surface interesting information in which one AI system can develop part of its intelligence(8).
16. A Simple Model of the Mind
We have seen that it is tempting to assume that most of our unconscious level operates using patterns and template matching and filling. Somehow these patterns and templates end up emerging as rules and some of them grow strong enough to be consciously perceived. As exposed in a previous section, I find it interesting also to evaluate the possibility of this route working backwards (consciously learned rules being "dismantled" into templates and patterns). This can explain why a good teacher is the one who uses lots of easy to understand analogies. As we've seen, known analogies are "ready to use" templates, that can be effortlessly copied to fill in existing holes, bridging previously unrelated rules and easing the learning process. At first, these bridges will be somewhat rough, but nevertheless will allow the agent to have a first "understanding" of the subject. Later, these bridges will be gradually transformed into solid knowledge, with the reception (from experience or other rules) of small snippets that slightly improve (or modify) the bridges.
The following figure summarizes much of what I have been saying so far. Perceptions seem to affect not only our conscious mind, but also our inner "pattern processor". Expressions are usually the result of our conscious deliberation but may occasionally be commanded by unconscious mechanisms (such as when one expert solves a problem without being able to easily explain why his solution works).
Perceptions by unconscious methods is the primary activity of babies and children. After some time, conscious mechanisms appear to start working. By the time the child gets into school, the template to rule transition is already working, ready to put into work the development of personal theories of the world. From this moment, the child's brain start receiving a lot of rules by language (spoken, written, diagrammatic, audible, etc). It is a new learning period, in which the brain must discover how to link all those rules to the grounded patterns captured previously.
This model is, of course, utterly simple. It does not take into account several other extremely important aspects like emotions, mental models of the listener, visual forms of reasoning, etc. My intention here is not to develop a theory of the mind, but just propose another kind of foundation in which AI could start experimenting.
17. Scientific and Intuitive Discoveries
The mechanisms proposed in this paper may also be used to explain the way scientists discover things. They start with intuitive suggestions. When those suggestions are strong enough, the scientist start devising experiments to confirm his impressions. The experiments show confirmation or not of the expected behavior. However, often the experiment presents unexpected results that usually can only be interpreted by that same intuition. After some time cycling (imperceptibly) through these methods, the scientist gains enough confidence and understanding to try to formalize his theory by the first time (this step is still provisory, it is not the final formalization). Then it is time to think on rules, methods, theories, the high level reasoning that demands conscious attention. On the course of this formalization, the scientist can be faced with new problems that eventually demand new experiments and new insights, which often provoke the restart of the investigation process. It is not odd, thus, to propose that our computer implementations of intelligence should operate under the same principles.
This does not prevent the reasoning only "by the rules". Some reasoning (like several of the mathematical ones) depends on large sequences of logical reasoning. What is important to note is that whenever one finds one difficult (or even apparently impossible) problem in logical terms, then it is time to resort to that intuitive part of the reasoning. I propose that for this mechanism of logic/intuition to work correctly it is necessary that most of the rules being used stay grounded on patterns and templates. As I have said, it is that grounding that explains the difference between a novice and a proficient thinker (a novice is usually not a good researcher in that matter) and it is the free cycling of ideas between these levels that produce the level of performance that we humans have.
It may seem easy to suppose that creativity is the single most important characteristic one have to possess to be successful in the production of new and exciting discoveries in science. However, things are not exactly this way. On the contrary, great discoveries are usually the result of hard work. Most of the good ideas were the result of a combination of factors that included, among creativity, persistence and knowledge (not to mention the occasional participation of luck).
As one example, Watson and Crick's discovery of the double-helix structure of the DNA (Weisberg 1988) is the outcome of a sequence of hard work. Creativity in this case, contrary to popular belief, was just one additional resource used, although very important. In summary, if creativity alone is not enough, neither do logic alone. Both are necessary simultaneously and it appears as reasonable to assume that this could also be the best modus operandi of one artificial system.
18. Future Work
This paper raises more doubts than explain issues. There are several untouched topics that must be addressed in future work. For instance, a clear definition of what constitutes a "pattern" is, obviously, needed. How patterns are stored in a computer's memory, how to judge their relative importance and what are the learning algorithms are points in need of development. What are the levels in which patterns are suitable? How to use induction effectively (without its harms), how to detect causal relationships and correlations? What is the architecture that wraps all patterns together? How to create meaningful concepts, how to categorize the world? Is the mechanism of analogy manipulation the same in all levels (patterns, templates, rules)? These are just a few points that need further work.
There's one point made by Lenat that seems to be inevitable, though: common sense, besides being different from intelligence, demands lots of knowledge. The process used in obtaining patterns may easily lead, as noted previously, to combinatorial explosion, which could render the system useless even on the most powerful computers of the future. It is, then, a worthy concern the establishment of methods of clustering similar patterns into templates and these into rules as a method of reducing that explosion of combinations to manageable chunks. That can be naturally obtained once the system begins to reach available (or allotted) resources to the task. This mechanism also appears to be related to the natural "forgetting" of things that are no longer useful.
Another point left unexplored is the general architecture of such an AI system. It is my belief that the fundamental needs of such an architecture have been established in Minsky's Society of Mind (Minsky 1986). It is tempting to think of each "mindless" agent as using its own pattern, template, rule repository in a relatively independent (although collaborative) manner.
When we try to solve a problem that systematically refuses to be solved, we have several options. We may try again and again. Or we can try to review our goal. In this paper, I have presented a mixed bag of known viewpoints, thoughts by respectable scientists and some already implemented techniques. Some speculations have been launched. Perhaps what is new is the way of grouping these things together in the attempt to look for unexplored territories. But one of the main purposes of this paper was to rethink our targets.
Artificial Intelligence is a relatively new "science" but one that had received considerable attention. Initial attempts tried to concentrate on the "upper level" of our thinking abilities (mainly logic). Then, because failures were accumulating, the line of research changed to small, specific domains, where success was easily attained. But again, failure emerged, when these small-worlds experiments were difficult to expand to real-world problems.
Some bold attempts at unifying cognitive architectures were made but, once more, failing on the most basic assumptions about the origin of intelligence. I consider this humble contribution as a way to review this problem from the beginning. More importantly, it is an attempt to reassess one the goals of AI: human-like intelligence is not achievable through these methods. But intelligence as the capacity to creatively solve a large range of problems doesn't have to be human-like to be useful, and this line of research is probably one of the ways to let us experiment with the most important concepts behind the workings of our own brain.
I would like to thank to Jean-Loup Komarower, from Monash University, for helpful comments in a draft of this paper. This work is part of a research being done with funds from SCN Informatica S/C Ltda.
Berry, Michael W. and Dumais, Susan T. and Letsche, Todd A. (1996) Computational Methods for Intelligent Information Access.
Berry, Michael W. and Dumais, Susan T. and O'Brien, G. W. (1994) Using Linear Algebra for Intelligent Information Retrieval. Report CS-94-270, Comp. Science Dept. University of Tennessee.
Bloom, Paul and Markson, Lori (1998) Capacities Underlying Word Learning. Trends in Cognitive Science Vol. 2, No. 2, Feb 1998, p 67-73.
Boden, Margaret A. (1994) What is Creativity?. In Dimensions of Creativity, Bradford Book, MIT Press.
Cheng, P. W. and Holyoak, Keith J. (1985) Pragmatic reasoning schemas. Cognitive Psychology 17, 391-416.
Chernow, Fred B. (1997) The Sharper Mind. Prentice-Hall, Inc.
Christiansen, Morten H. (1994) Infinite Languages, Finite Minds: Connectionism, Learning and Linguistic Structure. Ph.D. thesis, University of Edinburg.
Cohen, Leonardo G. et. al (1997) Functional Relevance of Cross-Modal Plasticity in Blind Humans. NATURE Vol 389, 11 Sept 1997, pag 180.
CYC (1998) Cycorp web site. http://www.cyc.com
Deacon, Terrence W. (1997) The Symbolic Species. W. W. Norton & Company, Inc
Dennett, Daniel C. (1978) Where am I? in Brainstorms, Philosophical Essays on Mind and Psychology. Bradford books.
Dreyfus, Hubert L. (1992) What Computers Still Can't Do. The MIT Press, London, England. Revised edition of "What Computers Can't Do" (1979).
Dreyfus, Hubert L. and Dreyfus, Stuart (1986) Why Computers May Never Think Like People. Technology Review v. 89 (Jan 86). Also appeared in Ruggles III, Rudy L. (ed) (1997) Knowledge Management Tools. Butterworth-Heinemann.
Elman, Jeffrey L. et al. (1996) Rethinking Innateness. Bradford Book, MIT Press.
Fodor, Jerry A. and Pylyshyn, Zenon W. (1988) Connectionism and Cognitive Architecture: A Critical Analysis. Cognition 28, 3-71. Also published in Haugeland, John (ed) (1997) Mind Design II, MIT Press.
French, Robert M. (1995) The Subtlety of Sameness. Bradford Book, MIT Press.
Gazzaniga, Michael S. and Ivry, Richard B. and Mangun, George R., (1998) Cognitive Neuroscience, The Biology of Mind. W. W. Norton & Company, Inc.
Ginsberg, Matt (1993) Essentials of Artificial Intelligence. Morgan Kaufmann Publishers, Inc.
Goldstein, Ira P. and R. Bruce Roberts (1977) The FRL Primer. MIT memo 408, July 1977.
Hamilton, Roy H. and Pascual-Leone, Alvaro (1998) Cortical Plasticity Associated With Braille Learning. Trends in Cognitive Sciences, Vol 2, No. 5, May 1998, pag 168.
Harnad, Stevan (1990) The Symbol Grounding Problem. Physica D 42: 335-346.
Hofstadter, Douglas (1995) Fluid Concepts and Creative Analogies. BasicBooks, HarperCollins Publishers, Inc.
Johnson-Laird, Philip N. and Byrne, R. M. (1991) Deduction. Erlbaum, Hillsdale, N.J.
Landauer, Thomas K. and Dumais, Susan T. (1997) A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction and Representation of Knowledge. Psychological Review 104, p 211-240.
Lenat, Douglas B. and Guha, R. V. (1990), Building Large Knowledge-Based Systems. Addison-Wesley Publishing Company, Inc.
LSA (1998), Latent Semantic Analysis Web Site, http://lsa.colorado.edu/content.html, Institute of Cognitive Science, Dept. of Psychology, University of Colorado at Boulder.
Mahesh, Kavi and Nirenburg, S. and Cowie, J. and Farwell, D. (1996) An Assessment of CYC for Natural Language Processing. Computing Research Laboratory, New Mexico State University http://crl.nmsu.edu/Research/Pubs/MCCS/Postscript/mccs-96-302.ps
Marslen-Wilson, W.D. and Tyler, L. K. (1997) Dissociating Types of Mental Computation. Nature 386, p 592-594.
McGraw Jr., Gary E. (1995) Letter Spirit: Emergent High-Level Perception of Letters Using Fluid Concepts. Ph.D. thesis Dept. Comp. Science/Cognitive Science Program, Indiana University, Sept 1995.
Minsky, Marvin (1986) The Society of Mind. Touchstone Book, Simon & Schuster.
Minsky, Marvin (1994) Negative Expertise. International Journal of Expert Systems, Vol. 7, No. 1. This paper also appears in: Feltovich, Paul J.(ed.) et al (1997) Expertise in Context. American Association for Artificial Intelligence.
Minsky, Marvin (1974) A Framework for Representing Knowledge. MIT Memo 306.
Mitchell, Melanie (1993) Analogy-Making as Perception, A Computer Model. Bradford Book, MIT Press.
Pinker, Steven (1994) The Language Instinct. William Morrow and Company Inc.
Rosenberg, Steven T. (1977) Frame-based Text Processing. MIT memo AIM 431, Nov. 1977.
Rumelhart, David E. (1989) The Architecture of Mind: A Connectionist Approach. Also published in Haugeland, John (ed) (1997) Mind Design II, MIT Press.
Schank, Roger C. (1988) Creativity as a Mechanical Process, in The Nature of Creativity, Sternberg, Robert (ed), Cambridge University Press.
Schank, Roger C. (1990) Tell Me A Story, Narrative and Intelligence. Northwestern University Press (1995 edition).
Searle, John R. (1980) Minds, Brains, and Programs. Behavioral and Brain Sciences 3, 417-24, also published in Boden, Margaret A. (ed) The Philosophy of Artificial Intelligence, Oxford Readings in Philosophy, (1990) Oxford University Press.
Shadmehr, Reza and Holcomb, Henry H. (1997) Neural Correlates of Motor Memory Consolidation. SCIENCE Vol 277, Aug 1997, pp 821-825.
Smolensky, Paul (1989) Connectionist Modeling: Neural Computation / Mental Connections. Also published in Haugeland, John (ed) (1997) Mind Design II, MIT Press.
Thagard, Paul (1996) Mind, Introduction to Cognitive Science. Massachusetts Institute of Technology, Bradford Book.
Turner, Scott R. (1994) The Creative Process, a computer model of storytelling. Lawrence Erlbaum Associates, Inc.
Wason, Peter C. (1966) Reasoning. In B. M. Foss (ed.), New horizons in psychology. Harmondsworth, Penguin.
Weisberg, Robert W. (1988) Problem Solving and Creativity, in The Nature of Creativity, Sternberg, Robert (ed), Cambridge University Press.
Wolff, J. Gerard (1993) Computing, Cognition and Information Compression. In AI Communications 6 (2), 107-127, 1993.
Wharton, Charles W. and Grafman, Jordan (1998) Deductive Reasoning and the Brain. Trends in Cognitive Science Vol 2, No. 2, Feb 1998 p 54-59.
Whitten, David (1995) Unofficial, Unauthorized CYC FAQ http://www.mcs.com/~drt/software/cycfaq
Yuret, Deniz (1996) The Binding Roots of Symbolic AI: A brief review of the CYC project. MIT Artificial Intelligence Laboratory.
1. Connectionist systems demand a lot of "hardware". A typical PC workstation of today is not able to process artificial neural networks greater than tens of thousands of nodes, with just tens of connections between them. Yet, this is insignificant, because to simulate the capacity of a cat one have to build neural nets with hundreds of millions of nodes.
2. This is obviously an inductive form of reasoning. Detractors of induction usually stick with the problems caused by the wrong conclusions without observing the advantages of the correct ones.
3. This connectionist network is not the same as Artificial Neural Networks. In the latter, simple nodes are linked to others with weighted connections. The former uses links to "concepts" close to the problem at hand (such as first, last, predecessor, leftmost, opposite, etc). Other important point in Copycat is the activation level that can be thought as resembling Donald Hebb's neural activation level.
4. Reinforcement may occur in several forms. Some patterns that occur frequently will be naturally reinforced but a single occurrence of one pattern may receive strong reinforcement if it is accompanied by a strong stress. This strong stress, in "human" terms, is frequently an emotion.
5. Good-Old-Artificial-Intelligence (GOFAI) tried to start modeling intelligence by this level of rules. The inflexibility of the rules is, perhaps, the explanation of the failures at generalizing that these attempts suffered. This seems to indicate also why Expert Systems are so brittle.
6. Unfortunately, as we will see, most of the patterns and templates are at the level of the unconscious, which prevents us from "tracing them down". However, someway or another, we "know" when our theory is well grounded.
7. In this case, the automatic addition of the "-ed" is the rule. Templates may be in the form buy-> bought, think-> thought, bring-> brought and static patterns account for the more "rote" aspect of irregular verbs (go, went, gone; take, took, taken).
8. The other part is, as I said earlier, one mechanism that provides adequate grounding, similar to the one we naturally develop because of our sensory nature.
Sergio Navega's Homepage
Selected Newsgroup Messages