Michael Wu, Ph.D. is
Lithium's Principal Scientist of Analytics, digging into the complex dynamics of social interaction and group behavior in online communities and social networks.
Michael was voted a 2010 Influential Leader by CRM Magazine for his work on predictive social analytics and its application to Social CRM.He's a regular blogger on the Lithosphere's Building Community blog and previously wrote in the Analytic Science blog. You can follow him on Twitter or Google+.
Hello and welcome back. I hope you all had a wonderful holiday. It looks like 2012 is certainly going to be an exciting year – we kicked off the year with some great news this week from our Series D funding. In terms of this blog, I would like to put a little more emphasis on data this year and thought it would be appropriate to start a mini-series on analytics.
History of Data Explosion
Throughout history, we have experienced many epochs of data explosion. And each time the data increased many orders of magnitude at an accelerated rate. Basically, whenever there is a new mechanism for people to record information, it is usually follow by a period of data explosion. This happened when paper and the printing press was invented. Subsequently, it has been repeated several times with the invention of various analog media, such as physical storage (e.g. film-based photography and records) and magnetic media (e.g. tapes). And it’s repeated over and over again with the introduction of electronics and modern digital media.
What is different about the social media revolution is that it really didn’t create any new mechanism for data storage. Social data are still stored in digital media. Our current technologies are still unable to tap the power of holographic and quantum data storage yet. Instead, social media have created many new mechanisms for data creation. Moreover, it democratizes data creation. That is, data generated by you and me, are recorded, indexed and searchable, in the same way as content created by professionals (e.g. journalists, musicians and film makers). Therefore the most recent data explosion is contributed primarily by the explosion of user generated content (UGC).
In 2011, IDC’s annual study of the digital universe estimated the amount of information we create to be ~1.8 zettabytes (that is 1.8 sextillion bytes or 1.8 trillion GB). But what good is all these data? And what does it mean to businesses and enterprises?
Data Reduction: The Promise of Analytics
The simple answer is that data help us make better decisions. In fact, many of the national labs have decision support divisions and departments. I know, because many of my statistician/mathematician friends are working there now. And what they do is exactly what we call analytics and business intelligence in the industry. So, the primary function of analytics is to support decision making. As my witty PhD advisor once said, “analytics that doesn’t help people make better decisions is just mental masturbation.”
Now, the million dollar question is “how?” There is little doubt that no human can consume 1.8 zettabytes of information. Not even close. Although our brain has the capacity to store information in the hundreds of terabytes to petabyte range, the capacity of our working memory is very limited. As a result, we never access all our memories at once. George A. Miller, a renowned psychologist from Princeton, estimated the capacity of our working memory. Surprisingly, our working memory can only process 7±2 different recallable chunks of information from our long-term memory at any instance. That is roughly 3 bits of information (less than 1 byte=8bits)!
Although subsequent studies have found quite a bit of variation around the magic number 7, the upper limit is never more than 100, which is about 7 bits, still less than 1 byte! Regardless of the capacity of our working memory, most business decisions come down to a choice between several options. And the most common scenario is a decision between “go,” or “no go.” That is a decision that flips a single bit of data that we can act on.
Since we do not have the resources to act on too many choices, good analytics must reduce the big data down to a few bits to facilitate the evaluation and comparison of their utility (i.e. value, ROI, engagement, influence, or whatever it is that we are interested). This would in turn help us make an informed and hopefully intelligent choice between the few available options.
Conclusion
Alright, this is just the introductory post of my mini-series on data and analytics. But we did cover quite a few bits of information in this post. To summarize the key points:
We probably won’t need to deal with 1.8 zettabytes of data (at least for now), because we don’t have all the data in the world. However, it’s pretty common for enterprise data or social data to go up to tens and hundreds of terabytes. That is still a lot of data to reduce down to a few bits, and that is the challenge of big data analytics.
Fortunately, there are many ways to reduce data depending on the specific business needs. Let’s talk about that next time.
