Voted a 2010 Influential Leader by CRM Magazine for his work on predictive social analytics + its application to Social CRM. Follow him @mich8elwu or Google+.

science of social blog

the official blog of Dr Michael Wu
Insights into social customer behaviors, big data, superfans, gamification, influence, relationships, and more…

 

Big Data Reduction 3: From Descriptive to Prescriptive

By MikeW

Big Data Reduction 3: From Descriptive to Prescriptive

by Lithium Guru ‎04-10-2013 12:19 PM - edited ‎04-11-2013 07:54 AM

guide to future.pngToday we will cover the last class of analytics for finding that needle of information in an ocean of big data—prescriptive analytics. Remember information << data—the information anyone can extract from big data will always be much less than the sheer volume of the big data itself. The difference is even more dramatic if we are talking about relevant and useful information.

 

Prescriptive analytics not only predicts a possible future, it predicts multiple futures based on the decision maker’s actions. Therefore a prescriptive model is, by definition, also predictive.

Big Data Reduction 2: Understanding Predictive Analytics

By MikeW

Big Data Reduction 2: Understanding Predictive Analytics

by Lithium Guru ‎03-25-2013 03:55 PM - edited ‎03-26-2013 09:41 AM

temporal predictive analytics3.png Last time we described the simplest class of analytics (i.e. descriptive analytics) that you can use to reduce your big data into much smaller, but consumable bites of information. Remember, most raw data, especially big data, are not suitable for human consumption, but the information we derived from the data is.

 

Today we will talk about the second class of analytics for data reduction—predictive analytics. First let me clarify 2 subtle points about predictive analytics that is often confusing.

  1. The purpose of predictive analytics is NOT to tell you what will happen in the future. No analytics can do that.
  2. Predictive analytics are not limited to the time domain. Some of the most interesting predictive analytics in social media are non-temporal in nature.

 

Big Data Reduction 1: Descriptive Analytics

By MikeW

Big Data Reduction 1: Descriptive Analytics

by Lithium Guru ‎03-14-2013 01:58 PM - edited ‎03-27-2013 06:54 PM

big data reduction02.pngNow that SxSW interactive is over, it’s time to get back and do some serious business. For me, that means I’ll return to the world of big data. But let me tell you a little secret: although I work with big data all the time, I never actually look at any big data, because big data isn’t made for human consumption.

 

No one can make any sense out of direct examination of petabytes of data; not even analysts or data scientists. You can’t even plot them on the monitor, because even the highest resolution monitors are nowhere near a petapixel. We may look at several small samples of the big data during exploratory data analysis (EDA), but that’s not big data per se, since that is just a tiny fraction of big data. Frankly, I don’t know anyone who actually looks through the entire set of big data with their naked eye. Instead, we apply many sophisticated analytics to big data, and let our computers crunch it down to consumable digests. And that’s where we spend most of time—looking at the results of analyses.

 

Adaptive Influence Model: Fixing the Influence Irony

By MikeW

Adaptive Influence Model: Fixing the Influence Irony

by Lithium Guru ‎02-21-2013 05:50 AM - edited ‎03-18-2013 01:40 PM

Adaptive Influence Model400.pngLast time we took a quick peek at the history of SEO, and we saw that influence engine optimization (IEO) is an inevitable consequence of scoring people’s influence. What’s worse is that IEO leads to the influence irony, where it actually changes people’s behavior in a way that drives them further away from being truly influential (if you missed this crucial point from my last post, you should read The Influence Irony – Influence Engine Optimization).

 

This sounds disappointing, but today, we are going to fix it!

 

The Influence Irony – Influence Engine Optimization

By MikeW

The Influence Irony – Influence Engine Optimization

by Lithium Guru ‎02-07-2013 10:15 AM - edited ‎03-18-2013 01:40 PM

Influence Scores -City People Silhouette320.pngIn my previous writing on digital influence, we had a rather scientific and statistical discussion about validating algorithms which predict people’s influence. When you dig deeper into what influence vendors actually do to validate their algorithms, you quickly find that most influence scores cannot be trusted. Mainly because vendors don’t validate, overgeneralize, or validate their algorithm using flawed circular logic.

 

Another serious problem with most influence scoring models is “IEO.” You see the title; I really meant influence engine optimization (IEO) as opposed to search engine optimization (SEO). What is IEO? That will be the topic of discussion today and I promise it will be much less technical than my last post.

 

Exploratory Data Analysis: Playing with Big Data

By MikeW

Exploratory Data Analysis: Playing with Big Data

by Lithium Guru ‎01-28-2013 09:40 AM - edited ‎02-07-2013 09:01 AM

Einstein Imagination Quote5.pngIn my previous big data post, we discussed the three necessary criteria for information to provide insights that are valuable. Through this discussion, we learned the key to insights discovery.

 

By definition, an insight must provide something we don’t already know. However, we typically don’t know what we don’t know, so we can’t really look for insights, since we won't know what to look for if we don't know what it is a priori. What we need to do is to temporarily forget about the value proposition of the data analysis and look beyond what’s relevant to the immediate problem we are trying to solve. Although there is no guarantee that we will find anything in the land of irrelevance, but ironically that is usually where insights are discovered.

 

How exactly do we do this? That is the topic we will discuss today.

 

The Key to Insight Discovery: Where to Look in Big Data to Find Insights

key to insight.pngLast time we talk about the second fallacy of big data -- insights << information. The reason why this inequality came about is because there are three criteria for information to provide valuable insights. The information must be:

  1. Interpretable
  2. Relevant
  3. Novel

Today we will examine these criteria to get a deeper understanding of what they really mean. Since these criteria narrow the location where insights are found to a tiny subset of the extractable information from big data, a clear understanding of these criteria will help us discover insights from big data.

 

The 2nd Fallacy of Big Data - Information ≠ Insights

By MikeW

The 2nd Fallacy of Big Data - Information ≠ Insights

by Lithium Guru ‎12-19-2012 06:06 AM - edited ‎02-01-2013 11:51 AM

Data Insights small.pngSince we digressed into the topic of influence over the past few weeks, it’s time to return to big data and talk about another big data fallacy.

 

In my previous Big Data posts, we discussed the data-information inequality (a.k.a. the Big Data Fallacy): information << data. We talked about what is it, how to quantify it, and why it is the way it is. We delved pretty deeply and talked about some nontrivial concepts and statistical properties of big data. As a result, the discussion got a little mathematical. However, if you like the technicality, you should have a quick read of the following posts:

  1. The Big Data Fallacy: Data ≠ Information
  2. How Much Information is in Your Big Data and How Can You Measure It?
  3. Why is there so Much Statistical Redundancy in Big Data?

 

Today, I want to talk about the second fallacy of big data and discuss the distinction between information and insights. I promise I won’t go too deep into the statistics. But before I begin, I want to tie up a few loose ends concerning the statistical redundancy in big data.

 

Validating the Influence Model: How do You Know Your Influence Score is Correct – Part 2

extrapolate too far2.pngLast time, I illustrated the predictive validation framework in a toy problem where we are supposed to predict the stock price of Apple. Today we will apply this framework to analyze algorithms that compute people’s influence score. Since this is second part of a two part article, you will need a solid understanding of the first post in order to make sense of today’s discussion.

 

To validate any model that influence vendors use to predict someone’s influence, they must have an independent measure of that person’s influence. But as we discussed before, nobody has any measured data on influence. So how can influence vendors be sure of the validity of their model?

 

Learning the Science of Prediction: How do You Know Your Influence Score is Correct – Part 1

Forecast chart_b300.png

This post is the first of a two part article addressing the question: How do you know if your influence score is correct? Today, I won’t actually answer this question, but will show you a step-by-step procedure that we will use next time to address this question.

 

Because nobody actually has any data on influence (i.e. data that explicitly says who actually influenced who, when, where, how, etc.), all influence scores are therefore computed from users’ social activity data based on some models and algorithms of how influence work. However, anyone can create these models and algorithms. So who is right, and who has the best model? More importantly how can we tell and be sure your influence score is correct? In other words, how can we validate the models that influence vendors use to predict people’s influence?