Voted a 2010 Influential Leader by CRM Magazine for his work on predictive social analytics + its application to Social CRM. Follow him @mich8elwu or Google+.

science of social blog

the official blog of Dr Michael Wu
Insights into social customer behaviors, big data, superfans, gamification, influence, relationships, and more…

 

The 2nd Fallacy of Big Data - Information ≠ Insights

By MikeW

The 2nd Fallacy of Big Data - Information ≠ Insights

by Lithium Guru ‎12-19-2012 06:06 AM - edited ‎02-01-2013 11:51 AM

Data Insights small.pngSince we digressed into the topic of influence over the past few weeks, it’s time to return to big data and talk about another big data fallacy.

 

In my previous Big Data posts, we discussed the data-information inequality (a.k.a. the Big Data Fallacy): information << data. We talked about what is it, how to quantify it, and why it is the way it is. We delved pretty deeply and talked about some nontrivial concepts and statistical properties of big data. As a result, the discussion got a little mathematical. However, if you like the technicality, you should have a quick read of the following posts:

  1. The Big Data Fallacy: Data ≠ Information
  2. How Much Information is in Your Big Data and How Can You Measure It?
  3. Why is there so Much Statistical Redundancy in Big Data?

 

Today, I want to talk about the second fallacy of big data and discuss the distinction between information and insights. I promise I won’t go too deep into the statistics. But before I begin, I want to tie up a few loose ends concerning the statistical redundancy in big data.

 

Validating the Influence Model: How do You Know Your Influence Score is Correct – Part 2

extrapolate too far2.pngLast time, I illustrated the predictive validation framework in a toy problem where we are supposed to predict the stock price of Apple. Today we will apply this framework to analyze algorithms that compute people’s influence score. Since this is second part of a two part article, you will need a solid understanding of the first post in order to make sense of today’s discussion.

 

To validate any model that influence vendors use to predict someone’s influence, they must have an independent measure of that person’s influence. But as we discussed before, nobody has any measured data on influence. So how can influence vendors be sure of the validity of their model?