Big Data Analytics: Reducing Zettabytes of Data Down to a Few Bits

By MikeW

Big Data Analytics: Reducing Zettabytes of Data Down to a Few Bits

by Lithium Guru ‎01-06-2012 11:43 AM - edited ‎09-07-2012 02:07 AM

Dr Michael WuMichael Wu, Ph.D. is 927iC9C1FD6224627807Lithium's Principal Scientist of Analytics, digging into the complex dynamics of social interaction and group behavior in online communities and social networks.

 

Michael was voted a 2010 Influential Leader by CRM Magazine for his work on predictive social analytics and its application to Social CRM.He's a regular blogger on the Lithosphere's Building Community blog and previously wrote in the Analytic Science blog. You can follow him on Twitter or Google+.

 


 

Hello and welcome back. I hope you all had a wonderful holiday. It looks like 2012 is certainly going to be an exciting year – we kicked off the year with some great news this week from our Series D funding. In terms of this blog, I would like to put a little more emphasis on data this year and thought it would be appropriate to start a mini-series on analytics.

 

History of Data Explosion

Throughout history, we have experienced many epochs of data explosion. And each time the data increased many orders of magnitude at an accelerated rate. Basically, whenever there is a new mechanism for people to record information, it is usually follow by a period of data explosion. This happened when paper and the printing press was invented. Subsequently, it has been repeated several times with the invention of various analog media, such as physical storage (e.g. film-based photography and records) and magnetic media (e.g. tapes). And it’s repeated over and over again with the introduction of electronics and modern digital media.

 

What is different about the social media revolution is that it really didn’t create any new mechanism for data storage. Social data are still stored in digital media. Our current technologies are still unable to tap the power of holographic and quantum data storage yet. Instead, social media have created many new mechanisms for data creation. Moreover, it democratizes data creation. That is, data generated by you and me, are recorded, indexed and searchable, in the same way as content created by professionals (e.g. journalists, musicians and film makers). Therefore the most recent data explosion is contributed primarily by the explosion of user generated content (UGC).

 

In 2011, IDC’s annual study of the digital universe estimated the amount of information we create to be ~1.8 zettabytes (that is 1.8 sextillion bytes or 1.8 trillion GB). But what good is all these data? And what does it mean to businesses and enterprises?

 

Data Reduction: The Promise of Analytics

big_data_analytics_Cloud2_web.gifThe simple answer is that data help us make better decisions. In fact, many of the national labs have decision support divisions and departments. I know, because many of my statistician/mathematician friends are working there now. And what they do is exactly what we call analytics and business intelligence in the industry. So, the primary function of analytics is to support decision making. As my witty PhD advisor once said, “analytics that doesn’t help people make better decisions is just mental masturbation.”

 

Now, the million dollar question is “how?” There is little doubt that no human can consume 1.8 zettabytes of information. Not even close. Although our brain has the capacity to store information in the hundreds of terabytes to petabyte range, the capacity of our working memory is very limited. As a result, we never access all our memories at once. George A. Miller, a renowned psychologist from Princeton, estimated the capacity of our working memory. Surprisingly, our working memory can only process 7±2 different recallable chunks of information from our long-term memory at any instance. That is roughly 3 bits of information (less than 1 byte=8bits)!

 

Although subsequent studies have found quite a bit of variation around the magic number 7, the upper limit is never more than 100, which is about 7 bits, still less than 1 byte! Regardless of the capacity of our working memory, most business decisions come down to a choice between several options. And the most common scenario is a decision between “go,” or “no go.” That is a decision that flips a single bit of data that we can act on.

 

Since we do not have the resources to act on too many choices, good analytics must reduce the big data down to a few bits to facilitate the evaluation and comparison of their utility (i.e. value, ROI, engagement, influence, or whatever it is that we are interested). This would in turn help us make an informed and hopefully intelligent choice between the few available options.

 

Conclusion

Alright, this is just the introductory post of my mini-series on data and analytics. But we did cover quite a few bits of information in this post. To summarize the key points:

  1. The social media revolution is a new age of big data, because it created many new mechanisms for data creation rather than a new mechanism for data storage
  2. The primary function of analytics is to enable better decision making; otherwise, you know what my advisor said
  3. Since our working memory and most business decisions involve a choice among only a few options, good analytics must reduce the big data down to a few bits that we can decide and act on

 

We probably won’t need to deal with 1.8 zettabytes of data (at least for now), because we don’t have all the data in the world. However, it’s pretty common for enterprise data or social data to go up to tens and hundreds of terabytes. That is still a lot of data to reduce down to a few bits, and that is the challenge of big data analytics.

 

Fortunately, there are many ways to reduce data depending on the specific business needs. Let’s talk about that next time.

 

 

comments
Teamaction on ‎01-11-2012 05:04 AM

Interesting start, now looking forward to the next part of the discussion on how to reduce the data into meaningful and actionable insight.

Lithium Guru on ‎01-11-2012 10:01 AM

Hello Teamaction,

 

Glad you find this an interesting start. 

Analytics is a very fascinating subject, but at the same time very complex and can get rather technical. I hope people will find value in these discussions.

 

Thank you for commenting and see you next time.

 

MartyThompson on ‎01-13-2012 07:46 AM

Thanks for your work in this important area. As always, you provide some fascinating insights into the future of big data, social, etc. I'm wondering if the advent of social technologies has acted as a catalyst for the push for better analytics across the board, not just for "social" data.

The US government is investing heavily in social data analytics, as well as game technologies, and I wonder if the result will be rapid, crossover improvements in technologies that can be applied in the civilian marketplace. Your thoughts?

Lithium Guru on ‎01-15-2012 12:37 PM

Hello Marty,

 

Thank you for the comment and sorry for the late reply. It's been a super busy week for me, with the book pre-launch and everything else.

 

As alluded in this blog, social technologies definitely facilitate the creation of huge amount of data, which would naturally call for the development of better and more powerful analytics.

 

The US government is definitely trying to invest in this area, especially in gamification, since there were representatives from the White House when we had our first gamification symposium at Wharton last year. However, I am not sure how rapidly they will develop as the complexity of the government process to get anything going has often slow innovation. I'm not blaming the government for their inefficiency, since they also have a much bigger system to manage and a much longer time scale to worry about.

 

Another thing is that governments are less tolerant to failure for consumer technologies. If there is a small chance to fail, they would rather wait for the technology to mature further to reduce the risk of failure before investing in it.

 

I believe the government will one day adopt social/gamification technologyies/strategies. However, because these technology and data often involves people and has security and privacy concerns. I believe the adoption will be relatively slow.

 

Thank you for your inquiry. 

Hope to see you around next time.

 

Lithium Technologies XavierJ on ‎01-19-2012 06:17 PM
"We probably won’t need to deal with 1.8 zettabytes of data" - you may eat those words soon er than you think my fried! :-)
Lithium Guru on ‎01-19-2012 06:21 PM

Hello Xavier,

 

Thanks for the comment.

 

Yes, that is why I qualified it with "(at least not now)." But every time we have a data explosion, it really explodes and data volume increases by many orders of magnitude. So you are definitely right. We may eventually have to deal with zettabytes of data and it may be sooner than than we think.

 

Thanks again for the comment and hope to see you next time.

post a comment
Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.