Showing articles with label measuring success. Show all articles

The Numbers That Matter

by Community Manager Community Manager on 04-24-2009 03:46 PM - last edited on 04-24-2009 03:46 PM

Apples and Oranges - They Dont Compare.jpgLots happening on the Lithosphere with the new design coming up, so I've been spending a lot of time in another blog recently.


But today I'd like to return to the topic of numbers - specifically, the numbers that matter.


When I was at the Web 2.0 expo at the begining of this month, I had a good opportunity to see what numbers a lot of social media experts and vendors were using to make their case, from numbers of posts, numbers of views, numbers of registrations or even numbers of communities.


And recently there has been a great deal of fuss over the race between Ashton Kutcher and CNN to reach 1,000,000 followers on Twitter, as well as Oprah's entry onto the Twitter scene and what it means.


People are looking at the social media space and still trying to figure out how to keep score, when the real measure of success is the same as it's always been: are the companies engaged in social media using it to improve profits through increased revenue and decreased costs. For vendors, are your products and services doing this better than the others?


So in the spirit of the times, here is a collection of numbers from the Lithium Technologies website I'd like to share:

 


Maybe those numbers help explain why Lithium Technologies is putting up some numbers of its own in this tough economy.


To paraphrase Will Hunting: Do you like numbers? I got ROI - How d'you like them numbers?

 

 

Photo by TheBusyBrain

Guest Post: Formulate, Predict, and Reformulate

by Community Manager Community Manager on 04-08-2009 10:07 AM - last edited on 04-08-2009 10:07 AM

GPS - Flight Speed Data.jpgMichael Wu returns for the last installment in his series describing how the new Community Health Index was developed:

 

We've came a long way. This is the last blog in the series that describes the development of the community health index. Earlier posts in this topic are listed here:

 

  1. From the Brain to Community Analytics
  2. Criteria for Creating the Community Health Index
  3. Crunching Numbers for the Community Health Index
  4. Interpreting the Statistics for CHI

 

Last time, we talked about the selection of predictive variables, and the tedious process of nonlinear analysis. Once we have the variables and the nonlinearities, we must combine them into a single function, which when evaluated give us the proper health level of a community. But the hard work is not over yet. The result of this process culminated in a health function, which is a product of 6 health factors that are important in determining the health of online communities. These health factors are referred to as:

 

  • Members: the number of registered members over time,
  • Content: a function of posts weighted by member and guest viewership,
  • Traffic: the number of page views over time taking into account search crawlers,
  • Liveliness: a function of the number of posts per board over time taking into account user expectations for engagement
  • Interaction: the number of unique participants weighted by the amount of conversation between them within a thread, and
  • Responsiveness: A measure of time to respond between successive message posts within a thread taking into account expected response time.

 

Each of these health factors usually involves one or more metrics with some nonlinear function applied to them.

 

The health function is smoothed to give the health trend, like smoothing the daily stock price to give a better indication of the underlying direction of movement. The health function is then normalized to remove some of the bias introduced by the size of the community. I did not remove the size bias completely because human experts also have such bias and tend to rate larger communities healthier. The normalization process takes into account of the health history of the community, weighting the recent health more heavily, as well as the volatility of health so that consistent progression of the health trend will result in a greater value of CHI. By design the community health index is constructed to be robust to outliers and also sensitive; if there is a consistent signal for a change in health, it will be reflected in the weekly value of CHI.

 

The final step of any mathematical modeling is model validation. Basically, this means that we must test the model on a data set that we did not use to build the model, and make sure that the model still performs as expected. Lithium now hosts roughly 170 communities, and I developed the community health index using data derived from 16 communities of varying size, age, and purpose, where we have plenty of non-metric data. Then I tested the resulting model in 4 other communities. As with any scientific discovery process, this went through several iteration before the model begin to perform well during all the stages of the modeling process. Once the model predicted health start matching those assessed by human experts, I computed CHI for all our communities and gathered more data to refine the initial formulation. The computation published in our white paper is actually the result of three iterations of major reformulation; each introduces just a few minor but important tweaks that increase the prediction accuracy of community health for a greater variety of communities. And we are already working on future refinements as we continue to learn from the data we collect.

 

Hopefully this series of blogs have given you a peek at the development process behind the community health index and the effort that went into it. If you have any questions I'd be more than happy to address them in the comments, or feel free to ask me on Twitter at mich8elwu.

 

 

Photo by LaertesCTB

0

Calipers.jpgWelcome back once more to Michael Wu, here for the penultimate installment in his series describing how the new Community Health Index was developed:

 

This is my fourth blog in the series that describe the development of the community health index. Previous blog posts can be found here:

 

  1. From the Brain to Community Analytics
  2. Criteria for Creating the Community Health Index
  3. Crunching Numbers for the Community Health Index

 

Last time, I crunched some numbers and talked about some of the mathematical challenges that I have overcome. Now, it is time to interpret the results.

 

Running the regression analysis is the easy part. Although it is fairly technical to set up the nonlinear regression equation, it is mechanical in the sense that anyone with background in math and statistics can do it. The remaining part of the analysis involves interpreting the results to derive meaning and insights. This is often the most challenging aspect of any statistical analysis because it is more an art than a science; yet it must have all the rigor, objectivity and accuracy of science. For example, I would have to decide which predictor variable to remove among those with similar predictive power. When a set of variables is found not predictive, is it a failure of the model to harness their predictive power or is it the case that these variables are truly independent of the response, in this case health. Interpretability of the final model becomes important, and looking at numbers alone is no longer sufficient. In statistics this process is call variable selection.

 

After eliminating the predictor variables that are not consistently predictive of health, we have only answered the question of which variables are predictive. But we still don't know how these variables are predicting health. For example, suppose we know that post count is predictive of health; will the health level increase by 10% if the post count is increased by 10%? Or will the health level increase by 30% if we observe a 10% increase in post count? Or perhaps, the health level depend more strongly on post initially, but become less dependent as the post count increases. To answer these questions, we must analyze the nonlinear relationship between the variables that we decide to keep. Not to complicate things, but it is often necessary to repeat the process of variable selection and nonlinear analysis for different subsets of variables, different nonlinearity, and perform them in different orders.

 

We are almost done! Next week we'll bring this all together into the new Community Health Index! If you have any questions I'd be more than happy to address them in the comments, or feel free to ask me on Twitter at mich8elwu.

 

 

Photo by Thomas Claveirole

0

Guest Post: Crunching Numbers for the Community Health Index

by Community Manager Community Manager on 03-19-2009 12:18 PM - last edited on 03-20-2009 02:12 PM

Numbers.jpgWelcome back Michael Wu! Here is his third installment in a series describing how the new Community Health Index was developed:

 

To begin the analysis of the previously collected data set, I gathered the non-metric data from various sources by talking to the moderators, the customer success managers (CSM), and our best practice advocates, which included Joe Cothrel and his team. As I mentioned earlier, these data are extremely important because they serve as the ground truth to our prediction problem. It is through the eyes of the moderators and the CSM who monitor and interact with the community everyday that we know how healthy a community is. Tabulating these non-metric data gives us a time series of the health level for each community. Since all the recorded metric are already in the forms of a time series, now we can turn to statistics and begin the number crunching.

 

The idea is very simple. We know the health level of the community from the non-metric data; now we simply want to know which of the 20 metrics that are commonly available can best predict community health. This can be achieved by running a sequence of linear and nonlinear regression analyses using the 20 metrics as the predictor variable and the tabulated non-metric data as the response variable.

 

This, however, is not trivial. Some of the issues that must be dealt with include the correlation among the predictor variables, the nonlinearity between the predictors and the response, and the nonstationarity of the time series data.

 

That's quite a mouthful, so here is a bit of explanation about what I mean by that:

 

The problem of correlations among the predictor is known as multicollinearity. If some of the predictor variables are highly correlated, it is very difficult to determine which predictor actually causes the response. Computationally, this shows up where the large regression coefficients may jump randomly between the correlated predictors. And these jumps are highly sensitive to the data making it difficult to determine which of the correlated predictors is most predictive. This is a very prominent problem in community data as many of the metrics are highly correlated. For example, if the community has a lot of traffic, they tend to gain more members, and achieve higher level of activities. I have used partial least square and boosting to try to overcome this problem.

 

Nonlinearity means that the predictors and the response may not be related in a linear fashion. That means a fixed changed in a predictor don't always lead to the same change in the response. It also depends on the history of the predictor as well as the interactions with other predictors. There is no out-of-the-box solution for nonlinearity. I just have to try some nonlinearity, plot the data, look at them, reformulate the model, and see which one fits and predicts best.

 

Finally, nonstationarity means that the system's behavior, in this case the community, depends on the absolute time. This makes prediction of any time series data very difficult. In laymen's term, it means that any statistical pattern that we have learned may change from one time to another (this is what it means by dependence on absolute time). In other words, knowing the history does not predict the future. For example, if we want to accurately predict the stock market price, any pattern we learn from the history better continue in the future. If there is a trend (or seasonality) in the history, the exact same trend (or seasonality) should persist in order for us to predict the future. If the trend changes in the future, then following the historical trend will lead to a wrong prediction. This is a very prevalent problem in communities, because communities are constantly changing due to management decision, product launch, marketing efforts, etc. There is also no way to predict a completely nonstationary system, as seen by the fact that no one can predict the stock market. We can only make some assumption about the how nonstationary our system is, proceed, and hope for the best. To deal with this problem, researchers typically assume one of several weaker forms of nonstationarity, and I have assumed the wide-sense nonstationrity in the analysis of our community data.

 

That is a lot to digest! If you have any questions I'd be more than happy to address them in the comments, or you feel free to ask me on Twitter at mich8elwu.

 

Next time: Interpreting the results!

 

 

Photo by lrargerich

 

Note: edited to correct a typo I added to Michael's post by mistake.

Message Edited by ScottD on 03-20-2009 02:12 PM

0

Guest Post: Criteria for Creating the Community Health Index

by Community Manager Community Manager on 03-10-2009 11:15 PM - last edited on 03-10-2009 11:15 PM

microscope_head.jpgMichael Wu joins us again for the second installment describing how the new Community Health Index was developed:

 

I wrote previously about how I came to start the development of the Community Health Index (CHI), through my background in the science of the brain and through Lithium's extensive data set of online communities. Picking up the task, I will start by defining what it means when we talk about community health.

 

The performance of any enterprise communities has two dimensions:

 

  1. meeting the needs of members (customers), and
  2. meeting needs of the business (enterprise).

 

Community health addresses the first dimension, and it measures how well the community meets the needs of its member. It is very important, because without customer satisfaction, there is no business success.

 

With this understanding of community health, I set two basic criteria to narrow down the data we must plow through. Otherwise, the most complete picture of community health would be a consummate of all the data about the community. First, because it is our objective to make the community health index universal, we must use basic data that every community has. This eliminated many of the metric data that only Lithium keeps bringing the number down to about 20 (I actually analyze more than 20, but only about 20 are universally available). Among these are the usual metrics plus some less common ones such as percent of unanswered threads, average thread depth, average number of unique participants in a thread, average post length, etc. Although these metrics might not be recorded explicitly by every community platform, they can be easily computed from aggregating and summarizing the record of all the messages and user data that every community must have.

 

After establishing the initial data set, the second criterion we applied is known as the Occam's razor. The goal is to come up with a minimum set of data that gives the greatest predictive power. This is a challenging problem in statistics, known as the bias-variance tradeoff. In plain English, it means that there is a tradeoff between the complexity of the model and the predictive power of the model. Although complex models that use many variables will always have greater explanatory power for the available data, their predictive power for unseen future data degrades. On the other hand, simpler model with few variables may not explain the current data as well, but they are more predictive of future trend. Why is that? That is just the nature of uncertainty and how it works, much like why gravity always attracts.

 

Next time we'll start the journey through the Lithium community data set. And I'll turn the number crunching crank to identify areas with the greatest predictive power!

 

For updates and discussion between Michael's posts, leave your comments here or you can follow Michael on Twitter at mich8elwu.

 

 

Photo by xmatt

Guest Post: From the Brain to Community Analytics

by Community Manager Community Manager on 03-04-2009 10:18 PM - last edited on 03-05-2009 11:46 AM

brain cell(s).jpgAnother treat for you today: Michael Wu, resident scientist and chief number wranger behind the Community Health Index has agreed to drop by and tell the story about how this new open standard was developed. Enjoy part one of this special peek behind the scenes!

 

 

For the past six months, I have been engaged in a massive data analysis project at Lithium to develop an index that measures the health of online communities. I've subsequently refer to this index as the community health index (CHI), which I like to denote with the Greek letter Χ. This project began shortly after I joined Lithium when I received my Ph.D. at UC Berkeley in Biophysics. Although it was a dramatic transition from academic to industry, I thought that analyzing community data shouldn't be that difficult. After all, data are just numbers and the math and statistics required to gain insight from them are just equations and symbols, which are universal across all disciplines. I was in for quite a surprise.

 

I have been a brain scientist during my academic years, and I focused in an esoteric area called computational visual neuroscience. Basically, that just means that I use a lot of math, statistics, and techniques in physics to model, study and ultimately understand how the brain process visual information. Coming from this background, I see an obvious connection between a community and the brain: they are both complex networked dynamical systems.

 

  1. The brain is made up of approximately 100 billion neurons talking to each other through a language of their own (action potentials, which are impulses much like the Morse code).
  2. Each neuron also network with other neurons and form connections that create local cliques of friends and buddies.
  3. The interactivity between the neurons is what makes the brain (viewed as a community of neurons) work. Without these interactivities the brain will wither and die of atrophy.

 

Although there are many more interesting analogies between the brain and a community, now that you see the connection, it is time for the surprise. To my astonishment, Lithium actually has a huge data set spanning the 10 years of its SaaS business operation. This is compounded by the fact that Lithium keeps about 240 different metrics that monitor every moving part of the community, and the metric list is growing as new features are being added. Moreover, there are copious non-metric data. These include moderator log files, notes from customer engagement, and annotations of PR or any event related to the customer. To my surprise, it turned out that these non-metric data accumulated over the years through active community management, moderation and customer engagement are most valuable and informative for the development of the community health index.

 

In later posts I'll describe my journey through this large and complex data set. But today I'd like to hear from you - what do you most want to know about the Community Health Index? What next steps would you like to see?

 

 

Photo by jepoirrier

 

Updated to fix the CHI symbol (Χ) display.

Message Edited by ScottD on 03-05-2009 11:46 AM

Guest Post: The Community Health Index for Online Communities

by Community Manager Community Manager on 02-25-2009 07:18 AM - last edited on 02-25-2009 07:18 AM

NeilB.jpgSomething special today: a quest post from Lithium's own Neil Beam, Director of Enterprise Programs in our Client Services group. Neil's recently been spending all his time immersed in metrics and numbers to help us describe what makes your community tick (and more importantly, how to keep it going strong). Today he's here to share some of the though that went into developing Lithium's newly announced analytics offering:

 

We are releasing today our new Lithium Insights suite, and Scott invited me to talk about it in more detail here on his blog.

We have amazing projects in the wings but today the first thing I want to show you is the Community Health Index.

The Index is the first step towards an industry standard – focused on what community practitioners should measure, report and be held accountable to in their daily practice. It does two things: 1) serve as a absolute measure of communities where you can stand them up side-by-side and say – “okay, now I know how to compare these communities” and 2) provide actionable measures that tell practitioners what to focus on first and what to do next that are specific and relative to the individual community – 6 health factors do this. Three are predictive and three are diagnostic. Note, we picked these 6 factors (Liveliness, Interaction, Responsiveness, Members, Content, Traffic) because they are an universal denominator common to most community platforms, even Twitter could fit this paradigm.

Case in point, when I was the project owner of a community in my past position I quickly discovered that simple metrics (page views, posts, registrations) only gave my management team a partial snapshot and single dimension of a very complex and dynamic system. It never felt good. How did I compare for my executives the other 5 communities on the Lithium platform at this company which were built for different products and completely different audiences and launched at different times? The Community Health Index addresses this.

So what does the Community Health Index look like?

 

CHI Acme sample.jpg

There is a lot going on here (it is the front page to a longer analysis) but this report shows a Community Health Index of 672 on a 0 to 1000 scale is a range of ‘healthiness’. We intentionally didn't scale the Index into the negative because this would immediately imply an unhealthy/healthy dichotomy, which isn't the case. Instead, the healthier a community is, the more likely it will accomplish the goals of the members and the company. Obviously a Community Health Index around 100 or 200 is not accomplishing as much for the members, guests and the company a community with a score of 700, 800 or even 900. You can always improve your health – and what is really important is that the 6 health factors tell you exactly what to focus on first. Here we point out Responsiveness and Interaction as target areas in the Compass. This customer got specific recommendations based on those health factors.

Finally, you'll notice that the methodology and formulation of the 6 health factors and the Community Health Index are fully disclosed in the white paper. We did this so practitioners could land on a common dialogue.

So, how did we do? We welcome the feedback because the point is to continue to improve methods for helping our customers derive value from their communities, and help the industry grow as a whole.

 

Announcements

Announcements

The Lithosphere: Your place to exchange ideas and share experiences about online community in the enterprise.

Getting Started

Here are a few ways to maximize your experience on the community:

  1. 1
    Choose your preferences
  2. 2
    Read our guidelines
  3. 3
    Check out the Help FAQs
About the Author
  • Scott is a Client Services Engagement Manager at Lithium and the Community Manager for the Lithosphere community. In this role he helps enterprise organizations using social media to locate and engage their brand advocates and influencers to address real business challenges.
Top Kudoed Authors
User Kudos Count
1