Unnaturally Long Attention Span

AvatarA blog about Grad School at Stanford, Working in a Silicon Valley Internet Company, and Statistical Machine Learning. mike AT ai.stanford.edu

Meme Representation in Communities

I wrote previously on clip culture, the trend of consuming content in smaller snippets (blogs, TV news, youtube, twitter) rather than in longer form (books, journal articles, films, Ph.D. dissertation). One perhaps non-trivial consequence is that shortening the form of communication actually changes the representation and content of the message. The shortened-form is not simply a summarized version of the original events and information.

Many years ago, when I was young and even more foolish than I am now, I tried writing a intelligent software algorithm for automatically trading stocks. These types of automated trading programs are commonly utilized by hedge funds and constitute a good portion of the current trading activity in our stock market. My idea was to have the software automatically analyze online news and market data and predict the direction on stocks using statistical machine learning methods. The program would be able to react much faster than it would take a human to read and understand news articles from thousands of sources.

Sounds like a lovely idea, right?

Well I'm not a billionaire today, so it obviously didn't work. And I'll tell you why. Essentially, predicting a stocks value reduces to predicting what people, in aggregate, think of a stock. In this regard, the content on the web is not an accurate representation of reality. Here's a crude demo. Consider for example the occurrences of the phrase "Microsoft is good" vs. "Linux is good" on the web. (Go ahead, Google it) You might falsely conclude from this data that you should be buying Red Hat stock and dumping MS shares. However, we all know that this is just selection bias. For another example, check out We Feel Fine by Sepander Kamvar(a Stanford professor, Google employee, and great guy, btw). This is a cool design experiment and has some neat graphs, but I doubt that those percentages are an accurate representation of how Livejournal users actually feel. These kinds of aggregations always tend towards the manic-depressive while most of us just feel normal most of the time. The content of the web is not an accurate representation of reality, and this selection bias is exacerbated by clip culture.

This is not a new disease of the blogsphere though, but a condition that has always existed in media. Anyone that's browsed a bookstore or watched TV can observe that it's the loud, the sensationalistic that gets facetime. And this misrepresentation is an important issue, because you can't expect every single person to have to the time to check all the facts. It's not practical. Whether you like it or not, people go by what they hear and see and this slowly shapes and molds their perspectives and behaviors.

What is new though, is that for the first time in history we might be able to address these problems effectively. Because of the internet, collecting and aggregating all the information together is no longer an issue. Now that the information can be aggregated to one place, the problems of representation and fact-checking can be attacked head on. In the future, intelligent software agents will be able to do this fact-checking for us at a scale that no human reader could possibly do in a lifetime. These agents will classify all of the viewpoints on a topic and determine which are legitimate arguments and which are just re-hashings of old propaganda.

Finally, the average citizen will have a weapon against the rising influence of mass media.