The Economics of 90-9-1: The Lorenz Curve

by Lithium Guru on 03-24-2010 12:06 AM - last edited 9 hours ago by Community Manager Community Manager

michaelwu.jpg Dr. Michael Wu, Ph.D. is Lithium's Principal Scientist of Analytics, digging into the complex dynamics of social interaction and online communities.

 

He's a regular blogger on the Lithosphere and previously wrote in the Analytic Science blog.

 

You can follow him on Twitter at mich8elwu.

 


 

In last week's post, I presented summary data on what is often referred to as the 90-9-1 rule. Although participation inequality is not news in itself, we have found that the extent of this inequality is not uniform across communities. Honestly, the only rule that 90-9-1 represents is a rule of thumb--definitely not something that should be taken literally.

 

90-9-1 is really just a packaged way of saying:

  • The majority of the users in a community are lurkers
  • Among the contributors, there is a small fraction of hyper-contributors that produce disproportionately large amounts of content

But exactly how big is that "majority", what percentage of the users are hyper-contributors, and precisely how much content do they actually generate? Answers to these questions vary greatly from community to community.

 

Today, most if not all, community managers probably don't need the precise numbers to keep their community running, but, as social media enters the mainstream, there is a growing need to rigorously quantify social ROI. No one likes an ROI estimate with a large margin of error. As businesses incorporate and become integrated with social CRM, there is no room to be sloppy about social metrics. So if the 90-9-1 rule is only a rule of thumb, how do we accurately quantify the participation inequality in the community? Is there something more accurate than the 90-9-1 rule?

 

Learning from the Economists

In my opinion, the answer comes from the field of economics. Economists have been dealing with skewed and unequal quantities for centuries, because many micro- and macro-economics quantities (such as income distribution, wealth distribution, GDP, etc) are by their nature all highly unequal across individuals and countries. Since this is exactly the same phenomena we've observed in community participation data, we can borrow some of the well-tested analytical tools from economics for our analysis.

 

The Lorenz curve and the associated Gini coefficient, are both excellent tools for quantifying the participation inequality in communities. I will illustrate how this works with data from our own Lithosphere community.

 

Lithosphere_Lorenz_Curve_1_resize.pngThe Lorenz Curve

To quantify the inequality of posting activity in Lithosphere, I've plotted the Lorenz curve for Lithosphere' post count data up to Feb 28, 2010. (Note: You can do the same for any activity metric, I simply use post as an example.) To keep thing simple, I've included only the participants and excluded the lurkers in this analysis. If you are interested in seeing the full data, please ask me and I'd be more than happy to show you data with lurkers included.

 

The x-axis shows the percentage of users sorted by their activity/productivity, and the y-axis shows how much content is created by them. So the Lorenz curve tells you precisely how much content is contributed by what percentage of the community population.

 

 

Lithosphere_Lorenz_Curve_2_resize.pngLet's answer some question that you might have about Lithosphere with this Lorenz curve.

 

  • How much content does the top 10% of the population produce? Answer: 71.5%.

See the yellow area in the figure to the left: You start on the x-axis, by identifying the top 10%, which are those between 90% and 100%. Follow the dotted line up to the Lorenz curve, then to the left until you hit the y-axis. The proportion that is covered on the y-axis is the amount contributed by this 10% of the population.

 

  • How much content does the bottom 50% of the population produce? Answer: only 4.57%. (See the cyan area in the figure to the left.)

Therefore, if we defined hyper-contributors to be the top 10% of the participants, then we know they've contributed precisely 71.5% of content in Lithosphere.

 

Notice that you can be very specific about which part of the population you want to learn about. From this Lorenz curve, we can see that the top 30% of the population produced 88.9% of contents. But the middle 30% of the population (from 35% to 65%) contributed only 6.26% of the total content.

 

Lithosphere_Lorenz_Curve_3_resize.png Lithosphere_Lorenz_Curve_4_resize.png

 

And if you're interested about the specific population with productivity ranging from 60%-90%, then this particular 30% of the population contributed 21.2% of the content on Lithosphere. Moreover, we can turn the problem around as we discussed last time, and ask what fraction of the top participants contributed 50% of the community content. By reading the graph from the y-axis first, we can easily get the answer, which turns out to be 3.36% of the top participants.

 

Lithosphere_Lorenz_Curve_5_resize.png Lithosphere_Lorenz_Curve_6_resize.png

 

So, with the Lorenz curve, we can ask very specific questions and get very precise answers about the participation level in your community. There is no need to use vague and ambiguous words anymore (e.g. there is a small fraction of hyper-contributors that produces disproportionately large amount of content). We will know exactly what that "small fraction" is and precisely how much is "disproportionately large amount."

 

Next time, I will show you how to make use of the Gini coefficient to quantify the degree of participation inequality in a community. In the mean time, please feel free to ask me any questions or comments on my analysis.

 

 

Post a Comment
Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.