Unnaturally Long Attention Span

AvatarA blog about Grad School at Stanford, Working in a Silicon Valley Internet Company, and Statistical Machine Learning. mike AT ai.stanford.edu

Politicians and Moms are Right, but Partially So

Once every three years, the Department of Education participates in the Program for International Student Assessment (PISA), which is a study comparing 15-year-olds' performance in reading literacy, mathematics literacy, and science literacy in 57 participating countries. In particular, this latest assessment focused on science literacy, which the test's methodology defines as

an individual’s scientific knowledge and use of that
knowledge to identify questions, to acquire new
knowledge, to explain scientific phenomena, and
to draw evidence-based conclusions about sciencerelated
issues, understanding of the characteristic
features of science as a form of human knowledge
and enquiry, awareness of how science and
technology shape our material, intellectual, and
cultural environments, and willingness to engage in
science-related issues, and with the ideas of science,
as a reflective citizen (OECD 2006, p.12).


Now, whenever I hear discussion concerning the academic performance of American students in the context of international comparison, it is usually from two types of sources: politicians and moms.

By politicians, I mean mainstream news sources and personalities that draw upon the popular urban legend meme of the US being the worst in education around the world in order to work a crowd.


[sample politician at a rally]

"...the US has the lowest reading and math scores worldwide. Our children are our future and we need more funding to pay our teachers today!"

Crowd: "Yeah!!"


Being an immigrant, by moms I mean this of course:


"You know when I was your age in Soviet Russia, we were doing vector calculus and number theory in middle school. In the snow. Uphill. This American education is rotting your brain."


So, what does the report show? US students scored 489, on a test where the mean and standard deviation have been normalized respectively to 500 and 100. So yes, the report does seem to support the idea that US students underperform their international peers. However, if you break down the data more closely you'll find two interesting features.

Firstly, they do a categorically analysis of the data by racial group, which shows very statistically significant disparities in the scores between different racial groups. They range the gamut from an average score for black students of 409 to an average score of white non-Hispanic students of 523. The state of the American education performance in a nation that is itself multi-cultural and multi-societal, is a complex and non-homogeneous issue. There are definitely serious problems and widening gaps in education performance that are developing in America, but the data suggests to me that uniform across-the-board type policies, such as No Child Left behind, or universal minimum wages for teachers may not be all that beneficial, and potentially harmful, to the overall big picture. Instead target areas which are weighting down our national average should be pinpointed as areas to take a closer look at, and the best policies in those specific areas considered. But of course, this type of fine-grained debate never happens in the mainstream discourse.

In comparison, another feature of this study, and similar types of studies, that further aggravates this effect (of American students' underperformance) is that the distribution of the participating countries in the study is heavily skewed. In the list of member countries shown in the report, I noticed that the vast majority of the regions tested are in Europe and Australia. Not only are these countries much more racial homogeneous, making the comparison to a heavily multi-racial US an ill one, most of these countries are composed primarily of the "non-Hispanic" white population that scored the highest in the US breakdown. Let's not even address the issue of bias in question construction. Sadly, the entire continent of Africa is not included in the study.

What can moms learn from this study? The report also presents a limited what's called decile analysis of the scores. That means they break down the average scores in the 0%-10% range, 10%-20%, ..., 90%-100% range, etc. What the study found is that in the 90th percentile, a.k.a. 90%-100% range the US students scored 628, compared to a lower 622 for international students in the 90th percentile. That means, if you are an immigrant mom, and in the position to wonder which country would provide the best education for your child, you should have no qualms about having your children educated in the US. There are many good schools and almost any immigrant hub (read: major metropolitan area) in the US has some of the best schools in the world.

Read the report for yourself. It has some fun sample questions from the test that was given to the 15 year olds.

Highlights from PISA 2006: Performance of US 15-Year-Old Students

Animals of the Internet Refuse to be Adorable in Support of the Writers' Strike

Oh, those snarky animals!

iRobot Packbot in Action

"The Solution"

The following is a link to a transcript, released today, of a speech titled "The Solution" delivered by Usama Bin Laden, the FBI's most wanted terrorist. The speech is addressed to the American people, commemorating the sixth anniversary of the 9/11 attacks. It is Bin Laden's first public communications in almost three years. I have posted the link to the PDF of the transcript here because, ironically, it is quite hard to find it on any of the news sites.

The speech itself is well written and worth a read. Keep in mind that it is a translation from Arabic.

"The Solution" transcript.pdf

The Automatic Brain

Have you ever driven to work, or walked to class and later had no memory of how you actually got there? Have you ever really needed to study for a test, but just found it impossible to concentrate, no matter what you tried? One theory that can explain this seemingly irrationality is that there is a second brain, a parallel brain, which operates below the observable threshold of consciousness. This is the primitive brain, whose structure we share with other animals.

This primitive brain has a much larger memory capacity. In contrast, the conscious brain has a rather limited memory; studies have shown that people generally can only keep about 7 numbers in our heads. That is why phone numbers in the US have that length.

Some would argue that this subconscious brain is what distinguishes people from each other mentally. This is what some people would define as experience. Observe the people’s reasoning and rationalization patterns. Are they all that different? If they were, we would not be able to hold a logical discussion. It is the primitive brain that distinguishes two people.

But what distinguishes humans’ and machines’ "brains"? Machines have an enormouse advantage over the conscious "reasoning" brain. A machine can store much more than 7 items in short-term memory. Also, the execution speed of sequential reasoning operations in a machine is much faster than in a human’s conscious brain. This is because human brain circuits are limited in their operation by chemical "neurotransmitters" that are physically bound by diffusion speed. Therefore latency of information transfer in humans is higher.

Humans are no match for machines in what you might consider the highest form of human ability, "logical reasoning." In fact there are many well-known efficient algorithms for performing this process.

However, humans do have advantages over machines. The advantage is in this primitive, animal brain. This subconscious brain has massive parallelism, which allow data storage and computation to happen simultanesouly across all circuits. This tradeoff of latency for massive bandwidth have allowed humans to outperform computers in most interesting tasks.

This advantage may be only temporary, however. Although the machines were initially designed for sequential execution, recently we have seen more and more growth in developing parallel computation. Large data-intensive parallel systems have been developed. Parallel hardware and parallel algorithms have allowed new types of programs, such as entire-genome mappers, world-champion chess programs, and search engines. Human brain evolution is relatively fixed. Machine brain evolution seems to be exponential. The cross-over point will be where amazing things start to happen.

© 2005

Meme Representation in Communities

I wrote previously on clip culture, the trend of consuming content in smaller snippets (blogs, TV news, youtube, twitter) rather than in longer form (books, journal articles, films, Ph.D. dissertation). One perhaps non-trivial consequence is that shortening the form of communication actually changes the representation and content of the message. The shortened-form is not simply a summarized version of the original events and information.

Many years ago, when I was young and even more foolish than I am now, I tried writing a intelligent software algorithm for automatically trading stocks. These types of automated trading programs are commonly utilized by hedge funds and constitute a good portion of the current trading activity in our stock market. My idea was to have the software automatically analyze online news and market data and predict the direction on stocks using statistical machine learning methods. The program would be able to react much faster than it would take a human to read and understand news articles from thousands of sources.

Sounds like a lovely idea, right?

Well I'm not a billionaire today, so it obviously didn't work. And I'll tell you why. Essentially, predicting a stocks value reduces to predicting what people, in aggregate, think of a stock. In this regard, the content on the web is not an accurate representation of reality. Here's a crude demo. Consider for example the occurrences of the phrase "Microsoft is good" vs. "Linux is good" on the web. (Go ahead, Google it) You might falsely conclude from this data that you should be buying Red Hat stock and dumping MS shares. However, we all know that this is just selection bias. For another example, check out We Feel Fine by Sepander Kamvar(a Stanford professor, Google employee, and great guy, btw). This is a cool design experiment and has some neat graphs, but I doubt that those percentages are an accurate representation of how Livejournal users actually feel. These kinds of aggregations always tend towards the manic-depressive while most of us just feel normal most of the time. The content of the web is not an accurate representation of reality, and this selection bias is exacerbated by clip culture.

This is not a new disease of the blogsphere though, but a condition that has always existed in media. Anyone that's browsed a bookstore or watched TV can observe that it's the loud, the sensationalistic that gets facetime. And this misrepresentation is an important issue, because you can't expect every single person to have to the time to check all the facts. It's not practical. Whether you like it or not, people go by what they hear and see and this slowly shapes and molds their perspectives and behaviors.

What is new though, is that for the first time in history we might be able to address these problems effectively. Because of the internet, collecting and aggregating all the information together is no longer an issue. Now that the information can be aggregated to one place, the problems of representation and fact-checking can be attacked head on. In the future, intelligent software agents will be able to do this fact-checking for us at a scale that no human reader could possibly do in a lifetime. These agents will classify all of the viewpoints on a topic and determine which are legitimate arguments and which are just re-hashings of old propaganda.

Finally, the average citizen will have a weapon against the rising influence of mass media.

I Named a Chinese Book

So, one of my Aunties is a quite well-renowned chef and cooking instructor in Taipei. She has previously published several of her cookbooks and now lives in Toronto, where she continues to give master classes in Chinese cooking. For the past few years, she has been working on her latest book, which is also a Chinese cooking book, but presents the dishes in a style reminiscent of French cuisine. Her book is titled "中菜西吃", which literally translates into something like "Chinese-Dishes-Western-Eating". Basically, the connotation is that while the foods and recipes are traditional Chinese dishes, the presentation, or display is in a more non-traditional Western-like style.

Anyways, the literal translation obviously doesn't make for a very attractive book title, and she asked many of her Canadian friends for suggestions for the English title of the book, the book being completely bilingual throughout. Not satisfied with any of the suggestions, she called me up when I was in Taiwan to get my recommendation. I actually gave this quite a bit of thought, in order to come up with something catchy. Anyways, below is my end result (click to view detail) , which should be published later this year. What do you think?


Congratulations, Auntie!

Speech Synthesis + Wikipedia

Ever simply wanted an MP3 of a Wikipedia article that you could take on the go in your Ipod?? Ever wanted to save a bunch of articles so that you could listen to them during rush hour traffic??

Yeah, neither did I.

However, today I found myself writing a quick script to do just that and if you by chance answered "yes" to any of those questions above then you are in luck! It's almost as simple as it could possibly be. You just supply the topic in the URL and the script generates a direct download of the MP3 in response.

The basic format I use looks like this: http://madfast.com/wikiread.cgi?q=[YOUR QUERY]. There are also a bunch of secret parameters I added that can be used to change the voice and file format.

Here are some examples:

http://madfast.com/wikiread.cgi?q=Robot
http://madfast.com/wikiread.cgi?q=Computational Biology
http://madfast.com/wikiread.cgi?q=台灣

Go crazy.

On Clip Culture

One of the things that concerns me is that recently a vast majority of my information processing is reading these “snippets” of information instead of longer, more meaningful discussions. It’s not just a consequence of using Diffbot but I think where internet culture is headed towards– YouTube epitomizes the short-attention span cinema that is the trend. Perhaps due to the sheer number of news sources, these “info clips” are the only way to aggregate all these disparate sources sanely. Certainly its not for lack of longer original sources on the web. Plenty of journals and books are available online and corporations and governments publish many of their proceedings now in electronic documents online. The problem is current technology can't really deal with these types of sources. Have you ever had Google return a search result to U.S. Constitution or perhaps some company's SEC filing where the "real answers" might lie?

The hope of AI is the hope that we will eventually have a technology that can synthesize all these threads of information from the original sources into a longer, coherent story instead of relying on the "he said that he said that he said that he said" that is the current blogosphere. Theoretically, it would be able to synthesize over broader and deeper sets of data due to the increased temporary RAM compared to a human brain.

On an unrelated note, if
you haven't already, check out the book God's Debris by Scott Adams. Yes, that's Scott Adams the Dilbert comic guy. It has some interesting ideas and even some pertaining to AI. It's also a free download.

An Unnatural Birth Has Occurred



In other primate news, a female chimpanzee at the Chimp Haven Home for Former Research Animals has given birth to a baby girl [Source]. This is surprising due to the fact that all of the chimp males at the facility have had vasectomies. Management at Chimp Haven is now planning to do DNA testing on all of the chimp males in order to identify the cause of the unauthorized birthing.

However, this testing is largely unnecessary as I am already certain as to what the results will be. If I can call to your attention that case documented in the film Jurassic Park, this is obviously a case of where the genetic experimentation that has been conducted on these former lab animals has permanently altered their DNA. Hence, resulting in the activation of the latent reptilian genes, producing a sex-reversal of the female specimens, and ultimately spawning the creation of the super-raptors, er, monkeys.

I, for one, welcome our new mutant-monkey overlords.