Big Data Reduction 3: From Descriptive to Prescriptive
Prescriptive analytics not only predicts a possible future, it predicts multiple futures based on the decision maker’s actions. Therefore a prescriptive model is, by definition, also predictive. |
Big Data Reduction 2: Understanding Predictive Analytics
Today we will talk about the second class of analytics for data reduction—predictive analytics. First let me clarify 2 subtle points about predictive analytics that is often confusing.
|
Big Data Reduction 1: Descriptive Analytics
No one can make any sense out of direct examination of petabytes of data; not even analysts or data scientists. You can’t even plot them on the monitor, because even the highest resolution monitors are nowhere near a petapixel. We may look at several small samples of the big data during exploratory data analysis (EDA), but that’s not big data per se, since that is just a tiny fraction of big data. Frankly, I don’t know anyone who actually looks through the entire set of big data with their naked eye. Instead, we apply many sophisticated analytics to big data, and let our computers crunch it down to consumable digests. And that’s where we spend most of time—looking at the results of analyses.
|
Adaptive Influence Model: Fixing the Influence Irony
This sounds disappointing, but today, we are going to fix it!
|
The Influence Irony – Influence Engine Optimization
Another serious problem with most influence scoring models is “IEO.” You see the title; I really meant influence engine optimization (IEO) as opposed to search engine optimization (SEO). What is IEO? That will be the topic of discussion today and I promise it will be much less technical than my last post.
|
Exploratory Data Analysis: Playing with Big Data
By definition, an insight must provide something we don’t already know. However, we typically don’t know what we don’t know, so we can’t really look for insights, since we won't know what to look for if we don't know what it is a priori. What we need to do is to temporarily forget about the value proposition of the data analysis and look beyond what’s relevant to the immediate problem we are trying to solve. Although there is no guarantee that we will find anything in the land of irrelevance, but ironically that is usually where insights are discovered.
How exactly do we do this? That is the topic we will discuss today.
|
The Key to Insight Discovery: Where to Look in Big Data to Find Insights
Today we will examine these criteria to get a deeper understanding of what they really mean. Since these criteria narrow the location where insights are found to a tiny subset of the extractable information from big data, a clear understanding of these criteria will help us discover insights from big data.
|
The 2nd Fallacy of Big Data - Information ≠ Insights
In my previous Big Data posts, we discussed the data-information inequality (a.k.a. the Big Data Fallacy): information << data. We talked about what is it, how to quantify it, and why it is the way it is. We delved pretty deeply and talked about some nontrivial concepts and statistical properties of big data. As a result, the discussion got a little mathematical. However, if you like the technicality, you should have a quick read of the following posts:
Today, I want to talk about the second fallacy of big data and discuss the distinction between information and insights. I promise I won’t go too deep into the statistics. But before I begin, I want to tie up a few loose ends concerning the statistical redundancy in big data.
|
Validating the Influence Model: How do You Know Your Influence Score is Correct – Part 2
0
kudos
To validate any model that influence vendors use to predict someone’s influence, they must have an independent measure of that person’s influence. But as we discussed before, nobody has any measured data on influence. So how can influence vendors be sure of the validity of their model?
|
Learning the Science of Prediction: How do You Know Your Influence Score is Correct – Part 1
By
0
kudos
This post is the first of a two part article addressing the question: How do you know if your influence score is correct? Today, I won’t actually answer this question, but will show you a step-by-step procedure that we will use next time to address this question.
Because nobody actually has any data on influence (i.e. data that explicitly says who actually influenced who, when, where, how, etc.), all influence scores are therefore computed from users’ social activity data based on some models and algorithms of how influence work. However, anyone can create these models and algorithms. So who is right, and who has the best model? More importantly how can we tell and be sure your influence score is correct? In other words, how can we validate the models that influence vendors use to predict people’s influence?
|
