Showing articles with label understanding communities. Show all articles

Guest Post: Formulate, Predict, and Reformulate

by Community Manager Community Manager on 04-08-2009 10:07 AM - last edited on 04-08-2009 10:07 AM

GPS - Flight Speed Data.jpgMichael Wu returns for the last installment in his series describing how the new Community Health Index was developed:

 

We've came a long way. This is the last blog in the series that describes the development of the community health index. Earlier posts in this topic are listed here:

 

  1. From the Brain to Community Analytics
  2. Criteria for Creating the Community Health Index
  3. Crunching Numbers for the Community Health Index
  4. Interpreting the Statistics for CHI

 

Last time, we talked about the selection of predictive variables, and the tedious process of nonlinear analysis. Once we have the variables and the nonlinearities, we must combine them into a single function, which when evaluated give us the proper health level of a community. But the hard work is not over yet. The result of this process culminated in a health function, which is a product of 6 health factors that are important in determining the health of online communities. These health factors are referred to as:

 

  • Members: the number of registered members over time,
  • Content: a function of posts weighted by member and guest viewership,
  • Traffic: the number of page views over time taking into account search crawlers,
  • Liveliness: a function of the number of posts per board over time taking into account user expectations for engagement
  • Interaction: the number of unique participants weighted by the amount of conversation between them within a thread, and
  • Responsiveness: A measure of time to respond between successive message posts within a thread taking into account expected response time.

 

Each of these health factors usually involves one or more metrics with some nonlinear function applied to them.

 

The health function is smoothed to give the health trend, like smoothing the daily stock price to give a better indication of the underlying direction of movement. The health function is then normalized to remove some of the bias introduced by the size of the community. I did not remove the size bias completely because human experts also have such bias and tend to rate larger communities healthier. The normalization process takes into account of the health history of the community, weighting the recent health more heavily, as well as the volatility of health so that consistent progression of the health trend will result in a greater value of CHI. By design the community health index is constructed to be robust to outliers and also sensitive; if there is a consistent signal for a change in health, it will be reflected in the weekly value of CHI.

 

The final step of any mathematical modeling is model validation. Basically, this means that we must test the model on a data set that we did not use to build the model, and make sure that the model still performs as expected. Lithium now hosts roughly 170 communities, and I developed the community health index using data derived from 16 communities of varying size, age, and purpose, where we have plenty of non-metric data. Then I tested the resulting model in 4 other communities. As with any scientific discovery process, this went through several iteration before the model begin to perform well during all the stages of the modeling process. Once the model predicted health start matching those assessed by human experts, I computed CHI for all our communities and gathered more data to refine the initial formulation. The computation published in our white paper is actually the result of three iterations of major reformulation; each introduces just a few minor but important tweaks that increase the prediction accuracy of community health for a greater variety of communities. And we are already working on future refinements as we continue to learn from the data we collect.

 

Hopefully this series of blogs have given you a peek at the development process behind the community health index and the effort that went into it. If you have any questions I'd be more than happy to address them in the comments, or feel free to ask me on Twitter at mich8elwu.

 

 

Photo by LaertesCTB

Love and Fear Online

by Community Manager Community Manager on 03-27-2009 12:15 PM - last edited on 03-27-2009 01:54 PM

True Love by aussiegallWe've had a lot of math on my blog recently, so I thought I'd take a break and talk about some of the more touchy-feely aspects of community today. :smileywink:

 

Do you love your customers? What makes you think they don't love you?


Ever since the Cluetrain, a lot has been said about the new power people have to be heard with social media. However, it seems that most companies believe customers will use this power to do them evil rather than good. After all, the #1 concern I hear from customers considering building a community is some version of "how do we keep people from saying bad things about us on our site"?


I find it equally odd that the common retort I hear is a riff on "Well, they're going to say bad things about you anyway, so why not let them say it where you can see them?" 


Why are we so convinced that our customers hate us? Is this what all those customer surveys, Net Promoter scores and market research have told us over the years? Are there hundreds or thousands of people who have been just chomping at the bit for us to open our doors so they can yell at us? Then why in the world is anybody actually buying our products, much less buying them again and again?


I think this crisis of organizational self-confidence needs a quick dose of Jack Handy: "I'm good enough, I'm smart enough, and gosh darn it - people like me!"

I'm treating the issue kind of lightly here, but there does seem to be a lot of irrationality with how companies perceive online conversations with customers. Perhaps it stems from the venerable old adages that 'no news is good news', and 'you only hear from people when things go wrong'. We deal with so many fires and issues in our daily lives isolated from customers inside our corporate brand, that we think that is all there is. But unless you are a monopoly or fascist state, the reason you are still in business is that customers generally think they get good value for your products and services. When companies actually do invite their customers to tell them what they think, they are often pleasantly surprised by the quality of the responses they receive.


I'm not saying people won't complain about your products and services in your blogs or forums, or that online attacks on your brand don't happen. But I am saying that they happen a lot less than companies expect, and careful preparation in advance will both prevent the worst, and enable you to respond quickly to address issues before they become crises (for a quick primer in preparing for negativity in public see this exerpt from Andy Sernovitz’ book, Word of Mouth Marketing: How Smart Companies Get People Talking).

 

Think for a moment of the brands you use in your own life. What would you like to say to them if they asked? What would you tell your peers about them?

Now think of the brands you despise. Is a part of your anger their unwillingness to listen respectfully to your needs? If they actually did pay attention, would that soften your opinion?

 

 

Photo by aussiegall

Message Edited by ScottD on 03-27-2009 01:54 PM

0

Calipers.jpgWelcome back once more to Michael Wu, here for the penultimate installment in his series describing how the new Community Health Index was developed:

 

This is my fourth blog in the series that describe the development of the community health index. Previous blog posts can be found here:

 

  1. From the Brain to Community Analytics
  2. Criteria for Creating the Community Health Index
  3. Crunching Numbers for the Community Health Index

 

Last time, I crunched some numbers and talked about some of the mathematical challenges that I have overcome. Now, it is time to interpret the results.

 

Running the regression analysis is the easy part. Although it is fairly technical to set up the nonlinear regression equation, it is mechanical in the sense that anyone with background in math and statistics can do it. The remaining part of the analysis involves interpreting the results to derive meaning and insights. This is often the most challenging aspect of any statistical analysis because it is more an art than a science; yet it must have all the rigor, objectivity and accuracy of science. For example, I would have to decide which predictor variable to remove among those with similar predictive power. When a set of variables is found not predictive, is it a failure of the model to harness their predictive power or is it the case that these variables are truly independent of the response, in this case health. Interpretability of the final model becomes important, and looking at numbers alone is no longer sufficient. In statistics this process is call variable selection.

 

After eliminating the predictor variables that are not consistently predictive of health, we have only answered the question of which variables are predictive. But we still don't know how these variables are predicting health. For example, suppose we know that post count is predictive of health; will the health level increase by 10% if the post count is increased by 10%? Or will the health level increase by 30% if we observe a 10% increase in post count? Or perhaps, the health level depend more strongly on post initially, but become less dependent as the post count increases. To answer these questions, we must analyze the nonlinear relationship between the variables that we decide to keep. Not to complicate things, but it is often necessary to repeat the process of variable selection and nonlinear analysis for different subsets of variables, different nonlinearity, and perform them in different orders.

 

We are almost done! Next week we'll bring this all together into the new Community Health Index! If you have any questions I'd be more than happy to address them in the comments, or feel free to ask me on Twitter at mich8elwu.

 

 

Photo by Thomas Claveirole

0

Guest Post: Crunching Numbers for the Community Health Index

by Community Manager Community Manager on 03-19-2009 12:18 PM - last edited on 03-20-2009 02:12 PM

Numbers.jpgWelcome back Michael Wu! Here is his third installment in a series describing how the new Community Health Index was developed:

 

To begin the analysis of the previously collected data set, I gathered the non-metric data from various sources by talking to the moderators, the customer success managers (CSM), and our best practice advocates, which included Joe Cothrel and his team. As I mentioned earlier, these data are extremely important because they serve as the ground truth to our prediction problem. It is through the eyes of the moderators and the CSM who monitor and interact with the community everyday that we know how healthy a community is. Tabulating these non-metric data gives us a time series of the health level for each community. Since all the recorded metric are already in the forms of a time series, now we can turn to statistics and begin the number crunching.

 

The idea is very simple. We know the health level of the community from the non-metric data; now we simply want to know which of the 20 metrics that are commonly available can best predict community health. This can be achieved by running a sequence of linear and nonlinear regression analyses using the 20 metrics as the predictor variable and the tabulated non-metric data as the response variable.

 

This, however, is not trivial. Some of the issues that must be dealt with include the correlation among the predictor variables, the nonlinearity between the predictors and the response, and the nonstationarity of the time series data.

 

That's quite a mouthful, so here is a bit of explanation about what I mean by that:

 

The problem of correlations among the predictor is known as multicollinearity. If some of the predictor variables are highly correlated, it is very difficult to determine which predictor actually causes the response. Computationally, this shows up where the large regression coefficients may jump randomly between the correlated predictors. And these jumps are highly sensitive to the data making it difficult to determine which of the correlated predictors is most predictive. This is a very prominent problem in community data as many of the metrics are highly correlated. For example, if the community has a lot of traffic, they tend to gain more members, and achieve higher level of activities. I have used partial least square and boosting to try to overcome this problem.

 

Nonlinearity means that the predictors and the response may not be related in a linear fashion. That means a fixed changed in a predictor don't always lead to the same change in the response. It also depends on the history of the predictor as well as the interactions with other predictors. There is no out-of-the-box solution for nonlinearity. I just have to try some nonlinearity, plot the data, look at them, reformulate the model, and see which one fits and predicts best.

 

Finally, nonstationarity means that the system's behavior, in this case the community, depends on the absolute time. This makes prediction of any time series data very difficult. In laymen's term, it means that any statistical pattern that we have learned may change from one time to another (this is what it means by dependence on absolute time). In other words, knowing the history does not predict the future. For example, if we want to accurately predict the stock market price, any pattern we learn from the history better continue in the future. If there is a trend (or seasonality) in the history, the exact same trend (or seasonality) should persist in order for us to predict the future. If the trend changes in the future, then following the historical trend will lead to a wrong prediction. This is a very prevalent problem in communities, because communities are constantly changing due to management decision, product launch, marketing efforts, etc. There is also no way to predict a completely nonstationary system, as seen by the fact that no one can predict the stock market. We can only make some assumption about the how nonstationary our system is, proceed, and hope for the best. To deal with this problem, researchers typically assume one of several weaker forms of nonstationarity, and I have assumed the wide-sense nonstationrity in the analysis of our community data.

 

That is a lot to digest! If you have any questions I'd be more than happy to address them in the comments, or you feel free to ask me on Twitter at mich8elwu.

 

Next time: Interpreting the results!

 

 

Photo by lrargerich

 

Note: edited to correct a typo I added to Michael's post by mistake.

Message Edited by ScottD on 03-20-2009 02:12 PM

0

Guest Post: Criteria for Creating the Community Health Index

by Community Manager Community Manager on 03-10-2009 11:15 PM - last edited on 03-10-2009 11:15 PM

microscope_head.jpgMichael Wu joins us again for the second installment describing how the new Community Health Index was developed:

 

I wrote previously about how I came to start the development of the Community Health Index (CHI), through my background in the science of the brain and through Lithium's extensive data set of online communities. Picking up the task, I will start by defining what it means when we talk about community health.

 

The performance of any enterprise communities has two dimensions:

 

  1. meeting the needs of members (customers), and
  2. meeting needs of the business (enterprise).

 

Community health addresses the first dimension, and it measures how well the community meets the needs of its member. It is very important, because without customer satisfaction, there is no business success.

 

With this understanding of community health, I set two basic criteria to narrow down the data we must plow through. Otherwise, the most complete picture of community health would be a consummate of all the data about the community. First, because it is our objective to make the community health index universal, we must use basic data that every community has. This eliminated many of the metric data that only Lithium keeps bringing the number down to about 20 (I actually analyze more than 20, but only about 20 are universally available). Among these are the usual metrics plus some less common ones such as percent of unanswered threads, average thread depth, average number of unique participants in a thread, average post length, etc. Although these metrics might not be recorded explicitly by every community platform, they can be easily computed from aggregating and summarizing the record of all the messages and user data that every community must have.

 

After establishing the initial data set, the second criterion we applied is known as the Occam's razor. The goal is to come up with a minimum set of data that gives the greatest predictive power. This is a challenging problem in statistics, known as the bias-variance tradeoff. In plain English, it means that there is a tradeoff between the complexity of the model and the predictive power of the model. Although complex models that use many variables will always have greater explanatory power for the available data, their predictive power for unseen future data degrades. On the other hand, simpler model with few variables may not explain the current data as well, but they are more predictive of future trend. Why is that? That is just the nature of uncertainty and how it works, much like why gravity always attracts.

 

Next time we'll start the journey through the Lithium community data set. And I'll turn the number crunching crank to identify areas with the greatest predictive power!

 

For updates and discussion between Michael's posts, leave your comments here or you can follow Michael on Twitter at mich8elwu.

 

 

Photo by xmatt

Why Does 90-9-1 Matter Anyway?

by Community Manager Community Manager on 03-07-2009 01:33 PM - last edited on 03-07-2009 01:33 PM

90-9-1.jpgIf you've looked into online communities in any way, chances are you've heard of 90-9-1, also called the 1% Rule of Participation Inequality. What it describes is that about 90% of visitors will rarely contribute content to online communities at all, 9% will post infrequently and a small proportion of members, the 1%, will tend to post the majority of all content in the community. At Lithium, we call that 1% the Super Users of your community.

 

But so what? Why does this matter so much that nearly everyone in the social media world feels compelled to talk about it?

 

Some folks seem to regard this as a challenge or opportunity - if we can just figure out the right magical formula, they say, we can unlock all that potential activity from those 90%ers to make our communities successful. Like turning lead into gold, but perhaps just as hard to do.

 

Others try to use 90-9-1 as a benchmark or average by which to measure their success, and spend a lot of time and effort raising their 'scores' one or two percentage points closer to the mark or above it. Even though studies have indicated that this ratio tends to vary by both scale and modality.

 

And finally there are those who seem to take it as an excuse to avoid online communities altogether, and perhaps marginalize them as the fringe that only represents the minority view. This view forgets or purposely ignores the other 90-99% who are paying attention to what's going on.

 

There is still a lot of work to be done to determine why 90-9-1 seems to occur over an over again and whether it can be influenced or altered in any way. But until that day, there are some ways this knowledge can actually help us to build more healthy and effective communities. Here's three things 90-9-1 means to you:

 

  1. If you want to increase quantity of activity in your community, it’s more effective to increase the total population who visit your site than to try to get current members to participate more (not that you shouldn't do both, but the former will typically be more effective than the latter).
  2. If you want to increase the overall quality of activity in your community, it is generally more effective to focus your efforts on those 1% who contribute the most.
  3. If you want to find out what the total reach is of your community, be sure to count the 90% or so who are spectators as well as the 10% who are posting.

 

Are you worrying about 90-9-1? Or are you using it to your advantage?

Guest Post: From the Brain to Community Analytics

by Community Manager Community Manager on 03-04-2009 10:18 PM - last edited on 03-05-2009 11:46 AM

brain cell(s).jpgAnother treat for you today: Michael Wu, resident scientist and chief number wranger behind the Community Health Index has agreed to drop by and tell the story about how this new open standard was developed. Enjoy part one of this special peek behind the scenes!

 

 

For the past six months, I have been engaged in a massive data analysis project at Lithium to develop an index that measures the health of online communities. I've subsequently refer to this index as the community health index (CHI), which I like to denote with the Greek letter Χ. This project began shortly after I joined Lithium when I received my Ph.D. at UC Berkeley in Biophysics. Although it was a dramatic transition from academic to industry, I thought that analyzing community data shouldn't be that difficult. After all, data are just numbers and the math and statistics required to gain insight from them are just equations and symbols, which are universal across all disciplines. I was in for quite a surprise.

 

I have been a brain scientist during my academic years, and I focused in an esoteric area called computational visual neuroscience. Basically, that just means that I use a lot of math, statistics, and techniques in physics to model, study and ultimately understand how the brain process visual information. Coming from this background, I see an obvious connection between a community and the brain: they are both complex networked dynamical systems.

 

  1. The brain is made up of approximately 100 billion neurons talking to each other through a language of their own (action potentials, which are impulses much like the Morse code).
  2. Each neuron also network with other neurons and form connections that create local cliques of friends and buddies.
  3. The interactivity between the neurons is what makes the brain (viewed as a community of neurons) work. Without these interactivities the brain will wither and die of atrophy.

 

Although there are many more interesting analogies between the brain and a community, now that you see the connection, it is time for the surprise. To my astonishment, Lithium actually has a huge data set spanning the 10 years of its SaaS business operation. This is compounded by the fact that Lithium keeps about 240 different metrics that monitor every moving part of the community, and the metric list is growing as new features are being added. Moreover, there are copious non-metric data. These include moderator log files, notes from customer engagement, and annotations of PR or any event related to the customer. To my surprise, it turned out that these non-metric data accumulated over the years through active community management, moderation and customer engagement are most valuable and informative for the development of the community health index.

 

In later posts I'll describe my journey through this large and complex data set. But today I'd like to hear from you - what do you most want to know about the Community Health Index? What next steps would you like to see?

 

 

Photo by jepoirrier

 

Updated to fix the CHI symbol (Χ) display.

Message Edited by ScottD on 03-05-2009 11:46 AM

0

A Look Inside Reputation

by Community Manager Community Manager on 02-05-2009 05:50 PM

Hand-cranked dome-turner clockworks, old observatory, ZurichOne of the things I love about being a best practices manager for a SaaS company is how we are constantly learning and refining what we know. Even though we've been building and managing successful communities for over 10 years, we are always finding out new things from our customers or finding new ways to look at old topics.

 

A case in point: Reputation. We used to think that every community was unique on how they should set up their ranks, the more creative the better. And for some communities, particularly gaming communities, this strategy works very well. But early on we noticed an interesting trend: the higher the member traveled up the rank structure, the more serious they became about the rank they were given. In some cases we saw members putting these achievements on their resumes! So while creativity is still good, the upper ranks had to be meaningful for these members.

 

Stemming from our original ideas about creativity, our product defaults at first began with just 7 ranks to cover the different areas of progression, the idea being that you start with a framework and then fill it in with your own creative names and progressions. But while some of our customers went to town with their rank structure, others were content to rename the existing ranks and leave the structure more or less intact. What we saw was a pretty dramatic difference between these two approaches over time:

 

  1. The average number of ranks across our most successful, mature communities was 31.3
  2. There is a correlation between the number of ranks you have and the ratio of super users in your population (the more ranks we saw, the higher the ratio).

 

To be clear, we haven't shown a clear causal relationship between these two factors but it was very interesting to see that happen across so many communities. And using this information we have been able to redesign our default recommendations for our customer's initial rank structure and assist them in implementing the improvements.

 

Are your ranks effective? How does your community compare with the above?

 

 

Photo by gruntzooki

0

Light a Fire in Your Community

by Community Manager Community Manager on 01-10-2009 11:58 PM

Fire.jpgActivity is the fuel that powers the engine of your community. And interestingly enough, building activity is a lot like building a fire.

 

For instance, when you start a fire you start with your wood placed close together, concentrated in a stack or pyramid. This makes sure that the fuel is arranged neatly around the initial flame. You supply the first spark, often from a couple of different angles in case one or more goes out, and you use plenty of tinder, lighter fluid or other quick combustibles to make sure the fire takes hold.

 

Your content structure is your log pyramid. You keep it tight together, concentrating your activity where it can best take hold. Then you seed the site with initial content and known users to spark the discussion, often from a couple of different angles in case the some of them fizzle out. And finally, you pour on the gas through promotion to make sure there's enough activity to sustain the blaze until the larger conversation takes hold.

 

Of course, this is just a metaphor. Your community is made of people, not stacks of wood. But a successful community manager learns how to coax embers in the hearts of their members, nurturing the first wisps of smoke into a roaring blaze that spreads to others.

 

Is your community smoldering or sputtering? How are your fire-starting skills?

 

 

photo by herval

0

Fishing in Your Own Backyard

by Community Manager Community Manager on 12-01-2008 11:00 AM

fish.jpgThere is a school of thought in social media which believes that communities cannot be made, they are only discovered. On the other side, some folks believe that you can build it and they will come. The answer probably exists somewhere in the middle.

 

Of course, we at Lithium are in the business of helping companies build a community around their products or services. Leaving aside the argument about whether these are communities that were 'made' or 'discovered', there has been a recurring outcry from some folks in the social media scene against the soundness of this strategy. "Fish where the fish are" is a common refrain to companies considering the build vs.. join decision, and I'd be the first to agree this is very sound advice. But I'd argue that this isn't really an factor against building a community site of your own for two reasons:

 

First, there's nothing to prevent you from building a community site of your own and reaching out to groups in other places; in fact, it should be a key part of your outreach strategy.

 

Second, and more importantly, chances are good that your customers visit your company site today with some frequency for a number of reasons: support, product news and updates, etc. If these folks aren't the core of your new community, who is?

 

Sometimes you don't need to go very far to find the fish: it's quite possible they may be swimming in a lake in your own backyard.

 

 

photo by libookperson

Announcements

Announcements

The Lithosphere: Your place to exchange ideas and share experiences about online community in the enterprise.

Getting Started

Here are a few ways to maximize your experience on the community:

  1. 1
    Choose your preferences
  2. 2
    Read our guidelines
  3. 3
    Check out the Help FAQs
About the Author
  • Scott is a Client Services Engagement Manager at Lithium and the Community Manager for the Lithosphere community. In this role he helps enterprise organizations using social media to locate and engage their brand advocates and influencers to address real business challenges.
Top Kudoed Authors
User Kudos Count
1