Unnaturally Long Attention Span

AvatarA blog about Grad School at Stanford, Working in a Silicon Valley Internet Company, and Statistical Machine Learning. mike AT ai.stanford.edu

The Right Cognitive Testbed for AI - Babies

The AI community has had a hard enough time defining what AI is, let alone defining milestones for achieving a functional AI. I believe that the obvious choice for funtional milestone is to achieve a functional AI equivalent of a newborn child. You might think this is a quite natural choice, but it differs from a lot of historical "milestones" of the AI community. An effective robotic baby is not going to help streamline your corporate environment, drive a war vehicle through enemy desert terrain, or handle urban assault situations. But then again, if you don't think a baby can cause mass destruction, you haven't spent enough time with one.

It's uncertain whether developing a functional baby cognitive model is the right direction towards human-level adult AI, but at least the progress is measureable, which can't be said for a lot of other approaches, such as animal models, games, or Turing-test-like setups. For example, just look at how the competition to create a Turing-test passing chat bot has turned out. Has creating a world champion chess computer advanced our knowledge at all of building a human-like AI? Not by much.

If you buy into my arugment so far that the baby model is the right approach, what does it actually involve? I will try to break down what I think the work in this track involves. I've citied the sources of the information at the end of this article.


This image shows how a baby's developing eye sees the world.

Below I have a timeline of an infant's cognitive development up to 1 year, and my own comments on what AI work is involved in emulating that functional behavior.

  • Between 1 and 2 months of age, infants become interested in new objects and will turn their gaze toward them. They also gaze longer at more complex objects and seem to thrive on novelty, as though trying to learn as much about the world as possible.
If you look at the image above, it suggests that during this time period, the sensors and the brain interface necessary to support them are still being constructed. An interesting cognitive feature--the ability to determine what is new--develops during this time. This ability to highlight "what is new" is the defining feature of what makes us alive. Basically, living things respond to changes, not to steady-states, so the central survival trait is the ability to detect and track changes. This feature is very complicated and affects us at many levels and deserves really its own discussion. At the lower level, this ability allows us to detect that predator lurking in the field or in the dark alley. At a higher level, why does that new song sound so good now, but so lame the next year?

The ability to detect changes also implies the ability to filter out what's old. i.e. pattern recognition. Old things are, by definition, things that fall into a pattern. So, I believe that the first step of AI is to have generalized pattern recognition(knowing what's old) and differencing(tracking the new changes).
  • At around 3 months of age, infants are able to anticipate coming events. For example, they may pull up their knees when placed on a changing table or smile with gleeful anticipation when put in a front pack for an outing.
The second cognitive ability that makes us living things, is an internal prediction engine. The prediction engine kicks in at 3 months, which is when the sensors finally start collecting reliable data. Prediction implies that there is an internal mental model of the world at this point, however primitive. There has actually been a lot of work that has been done on this component. We now have methods that can make predictions better than humans can. The key challenge, however, has always been in defining what are the inputs(how is this represented in the mind?) and outputs(how does this get translated into behavior?), and what is the structure of the prediction(does context play a role, and over what time periods?).
  • At around 4 months, babies develop keener vision. Babies' brains now are able to combine what they see with what they taste, hear, and feel (sensory integration). Infants wiggle their fingers, feel their fingers move, and see their fingers move. This contributes to an infant's sense of being an individual.
Sensory input development has finally stabalized and now we start refining the outputs(fingers and toes). Up until this point, we have not seen any fruits from our labors--there are no outputs! AI research has been stunted because there is so much upfront cost in developing a cognitive model, when the benefits(driving a war machine through enemy towns, translating natural languages) rely on the outputs. The point where a baby sees his own finger move and realizes what's going on is an important one. It's the point that completes the loop between sensors, internal model, and actuators and this loop creates a very powerful feedback cycle--Do something, predict the output, see the result, match it against the internal prediction, etc. This is the fundamental property of local optimization.
  • Between 6 and 9 months of age, synapses grow rapidly. Babies become adept at recognizing the appearance, sound, and touch of familiar people. Also, babies are able to recall the memory of a person, like a parent, or object when that person or object is not present. This cognitive skill is called object permanence.
In the last step, I hinted at some kind of learning going on, and this leads naturally to the development of a memory to store learned results. The key questions here are "what do you store?" and "what do you forget?". There have been many different approaches to answering the question of what to store. An approach that has been popularized by the press is that of creating a large "commonsense" database of knowledge that an AI can draw upon to do reasoning. The best example of this approach is the CyC project. However, I don't think this is compatible if we look at it in terms of developing a functional baby AI. Most people, not even adults, know the length of the Amazon river or the 25th president of the United States, so it seems that this type of knowledge is not a prerequisite for intelligence. A key feature of human cognition is the ability to forget, and these types of knowledge should be ones that a functional AI forgets(i.e. filters out).


  • Babies observe others' behavior around 9 to 12 months of age. During this time, they also begin a discovery phase and become adept at searching drawers, cabinets, and other areas of interest. Your baby reveals more personality, becomes curious, and demonstrates varied emotions.
This marks the point where the baby is able to acquire completely new pieces of knowledge on its own. I think this is the point where it is effectively an "adult" AI. At this point, the baby has enough capability to learn to be a rocket scientist or computer programmer. The AI equivalent, I think, is one that can learn by simply crawling, reading, and understanding the entire internet.

sources:
[1]Gizmodo-Seeing the world through the eyes of a baby
[2]Yahoo! Health-Cognitive development between 1 and 12 months of age

LED Letters!

Want to use a cool LED-looking font while fooling spammers? Read on..

The other day I was doing some work in Javascript, to try to fix some things in Diffbot, when I re-discovered a cool thing about element borders in HTML. Adjacent borders actually come together at a 45° angle in most browsers. Here's what I mean:

This is a div element with borders.

Now, if you take two of these blocks and simply stack them on top of each other, you get a pattern that resembles the LED "8":

.
.
Like its circuit-based cousin, these HTML LEDs consist of seven parts, which can be turned on or off to create a variety of characters. Having spent countless hours in the circuits lab during my undergrad working with these dreaded LEDs, I realized that now you could design an entire display system using this as a base--you could go as far as creating a scrolling stock ticker! I wrote a quick Javascript demo that turns any text into this form. To try it out, simply include led.js (less than 2k) and the following call to your html <body>
makeText("hello", parentElement);

Below you see an example output:




Try to select the above "hello" with your mouse--it's neither an image nor text.

The interesting thing about this is that you can use it to make text without actually having that text in the source code. This is great for preventing crawling robots and spammers from reading your text, while still allowing your human readers too see things fine. Some applications of this might be to cloak or email address, generate CAPTCHAs, or to do evil search engine optimization by hiding text from Googlebot. This method might be better than the straightforward method of rendering your text as images because it requires the robot/spammer to have
  1. a javascript interpreter/browser
  2. the ability to snapshot/render a certain region of the screen
  3. Optical character like recognition capability
The image rendering obfuscation method, on the other hand, only requires #3. Obviously, a specific implementation can be defeated by reverse-engineering the html/javascript without these three components, but the resulting spamming algorithm would be implementation specific, which would not scale well for the spammer.