Just how big is the bloggernacle?
This question occurred to me after reading some discussion of what constitutes a “big” blog during voting for the 2006 Niblets. That discussion focused on traffic and type of content, but it inspired me to think that there’s lots of easily accessible information about size in the bloggernacle: it’s in the posts and comments. So I’ve gathered some information on posts and comments from several blogs, and I plan to use that data in a short series of posts to answer some numerical questions about the bloggernacle.
Unfortunately, as the bloggernacle is big and growing, and I have limited energy, I’ve had to limit my data gathering to 11 group blogs. These include the big three–By Common Consent, Feminist Mormon Housewives, and Times and Seasons–and eight others: Exponent II, Faith Promoting Rumor, Millennial Star, Mormon Mentality, Mormon Mommy Wars, New Cool Thang, Nine Moons, and of course Zelophehad’s Daughters. I’m sure my sample is biased by what I like to read, but given that it includes the most trafficked blogs in the bloggernacle, I think it’s probably not too unrepresentative.
So, how big is the bloggernacle sample? In 2007, the 11 blogs in my sample had a total of 3525 posts, with a total of 1,807,946 words1. Just to give you an idea of how long that is, the Book of Mormon (Project Gutenberg edition) has 293,472 words, so the total length of all the posts was over six Books of Mormon. Or, following the old convention of 250 words per manuscript page, this would make for about 7232 pages of text. The mean post length was 513 words, and the median2 was 399 words.
These posts drew a total of 126,796 comments3. The mean number of comments per post was 36; the median was 22. The total length of the comments was 12,840,199 words (over 43 Books of Mormon or 51,361 manuscript pages). The mean comment length was 101 words, and the median was 65 words.
Next question: Does the amount of activity vary by day of the week? Here are counts of posts from blogs in the sample on each day of the week:
It looks like weekdays are much busier than are weekends, lending credence to Eve’s theory that blogging is a popular procrastination strategy . The plots for comments and for total words look similar.
Okay, how about this one: Does blog activity vary by day of the year? When I plotted activity (by posts, comments, or words) by day of the year, the figure was so dominated by the within-week pattern in the plot above that it was very difficult to see any larger trend across weeks or months. To get around this, I collapsed each week to a single value, the total number of words per day in all posts and comments during that week. This plot is a little more comprehensible:
Clearly there are distinct highs and lows. I’ve labeled a few of the highest high points with the single topic that seemed to be driving most of the discussion (although my labels are based on nothing more than looking at post titles for each week). The single busiest week of the year came in the aftermath of Relief Society General President Julie Beck’s “Mothers Who Know” talk in October Conference. The quietest week, not surprisingly, was the week of Christmas.
One other question that a plot like this might be used to answer is whether the bloggernacle is growing. It seems pretty clear that it’s growing in the creation of new blogs. But what’s not clear is whether established blogs like the one in the sample are growing. From this plot, it’s difficult to see any trend across time for these blogs. Unfortunately, there could be consistent within-year trends that can’t be seen because there is only one year’s worth of data. So I think answering that question will require more data. The most we can say based on this plot is that if there is growth in established blogs, it’s probably not dramatic.
In Part 2 of ‘Nacle Numbers, I’ll be back with some more interesting questions, those that compare the blogs in the sample to one another. In the meantime, if you enjoy this type of navel-gazing and have any questions you’d like to see answered in a future post, go ahead and ask and I’ll see what I can do to answer.
1. I didn’t actually count words. I just counted characters plus spaces and divided by six, the same strategy used in Wikipedia to compare encyclopedia sizes.
2. The mean of a set of numbers is just the arithmetic average: the sum divided by the number of numbers. The median is the middle number once they’ve all been lined up in order by size. In a case like this, where there are many smaller values (short-to-medium length posts) and fewer larger values (longer posts), the median will be smaller than the mean. This happens because the median is unaffected by the sizes of the numbers away from the middle (it doesn’t care if the largest number is 5000 or 5,000,000) while the mean is affected by the values of all the numbers. The median is often a better summary of the numbers than is the mean in such circumstances.
3. This includes comments made both in 2007 and 2008, but only those made on posts written in 2007. Of course, the number of comments can always continue to increase as long as the threads remain open. My impression is that the vast majority of posts are actively commented on for no more than a week. To be safe, I didn’t gather comment data from any of the posts until they were at least a month old. In answering some questions, I’ve looked at all comments written in 2007, regardless of what year the post they’re attached to was written (although I applied the same one month criterion as before, so I actually only looked at comments written in 2007 on posts written between December 1, 2006 and December 31, 2007). This set of comments, which largely overlaps with the set of comments on posts written in 2007, includes 127,093 comments and 12,885,711 words.