« November 2007 | Main | January 2008 »

December 2007

December 31, 2007

Public and private spaces, and why YouTube comments are so awful

Why do Reddit and YouTube comment areas suck so bad?
Have any of you guys read YouTube comments lately? They are just  really awful - just click through to the YouTube site and read them. Here's a couple examples that I randomly pulled from one of the most popular videos for today:

GORTONclinicalp289 (7 seconds ago)
World's largest sex and swinger personals with over
20,000,000 members looking to hook up with someone just like you!
Enter [_SexDati*ng4Free.com_] to Join for FREE
Just remove * and enjoy

burningtheinternets (52 minutes ago)
This video is being autorefreshed from myspace.
These cheating bitches should burn in hell.
Where do they live?

psychopathick (1 hour ago)
My 'good' videos keep getting deleted here. No biggie. I put them up on another website. If you click on my name, you'll see the link. ;-)

MrDoodyHead (1 hour ago)
chris crocker is ANNOYING. smosh is FUNNY. boh3m3 is DELUSIONAL. iancrossland is A DIRTY HIPPIE. come see who will be what next...

lechampiones (2 hours ago)
i don't know what to say... why did you do that?

okthen (3 hours ago)
that was stupid!

Similarly, there's been a lot of interesting discussion about the decline of Reddit in terms of quality, and how it's been hijacked by biased groups of people. (You know who I'm talking about)

To me, they sound like thousands of people talking past each other. Obviously YouTube has its own specific incarnation of this problem, but think about no-registration internet forums, open chatrooms, global chat within games, and other types of public spaces. It's really all bad.

But let's look from the massive scale public areas and look at a mini-version of this.

The dreaded "Reply All"
The issue is very pronounced, even at a small level - let me ask you:

How many people can you put on the CC of a conversation and still expect a reasonable e-mail thread with everyone hitting "REPLY ALL"?

I'd guess 5-7? Once you get very much beyond that, you're introducing people you probably don't know very well, and everything falls apart. Woe unto the office worker that sends a message to the entire company, and is followed with dozens of replies from dozens of semi-random people.

Of course, the effect from e-mail is exaggerated because we're more sensitive to messaging that's push rather than pull. It's less annoying to open an inbox with a whole bunch of these already in there, versus the scenario where a continual stream of irrelevant e-mails are being pushed to you.

(more below)

The Dunbar number
Perhaps this has to do with the Dunbar number, which governs how well people are able to maintain relationships. Quick refresher on Dunbar number, if it doesn't ring a bell:

Dunbar's number, which is 150, represents a theoretical maximum number of individuals with whom a set of people can maintain a social relationship, the kind of relationship that goes with knowing who each person is and how each person relates socially to every other person.[1] Group sizes larger than this generally require more restricted rules, laws, and enforced policies and regulations to maintain a stable cohesion. Dunbar's number is a significant value in sociology and anthropology.

As an aside, Chris Allen has done some interesting blogging on the Dunbar number in massively multiplayer games. Definitely worth reading.

Essentially, when you move from small private environments where people know each other, or can at least get to know each other over time, and transition to large public spaces, then reputation is drowned out.

All of a sudden, there's zero cost to your non-existent reputation to say whatever you want - and it becomes easy to act like an ass, or flame people who are different, or anything else you want to do. When you start running into people who are from a different culture than you, and then arguments ensue leading to the website LearnToSpell.net getting posted.

So the key issue is that in large, public spaces, you end up with the lowest common denominator of communication. People then begin to drive other folks out, because the public space is a homogenizing force, rather than a diversifying one.

Social network audience convergence
There's a different version of this problem, written by Jeremy Liew where he writes about a social networking company with an odd problem:

I recently met the CEO of a company who claim to be one of the most popular social networks in Turkey with several million monthly visitors from Turkey. This happened by accident - the founders are Americans who have no prior connection to Turkey.

This is just one of many examples of how difficult it can be to predict or control the growth of viral social media. Google’s Orkut, is a better known example - a social network started by a Turkish engineer working in the US that now dominates in Brazil and India. Friendster and hi5 fall into this bucket as well. As I’ve noted before, the online advertising market in the US is bigger than that in the rest of the world combined. The senior management of these companies know this, and all would love to see more US traffic, but it is now beyond their control.

I don't know which company he was thinking of, but let me make a hypothesis: The product was designed in such a way that the entire site was a public space, where anyone could browse anyone's profiles.

The end result of that process is that the customer lifecycle looks like using a hypothetical user:

  1. Becca logs into the site
  2. Becca browses around the site
  3. "Hmm, these people don't look like they're American"
  4. "What are people writing to each other?"
  5. "OK, this is NOT my crowd"
  6. Becca then churns out

Contrast this with a user who is part of the in-group, who would respond well to the social signals exhibited by peoples' profiles, and then opt themselves into the site. In this way, the public areas are self-reinforcing, which is good or bad depending on if you have the target group in mind that you wanted.

Let's look at an approach that works much better.

Where Facebook succeeds at this problem
This public/private problem is an area where Facebook really excels. The following are true statements about Facebook:

  • Facebook has a lot of high school students
  • Facebook has a lot of Canadian, British, and other non-US users
  • Facebook has people who type and speak in different languages

Even though those are all true, and millions of teeny boppers are taking to Facebook, it doesn't affect the space around me, because Facebook creates dynamic, semi-private spaces based on my "Networks." As a result, even if a lot of very different folks are showing up, I only see people I know (or people who are likely to be similar to me based on location/school/etc.). This creates an experience which is less likely to be polluted.

Furthermore, it's also less likely for Facebook to converge to a specific demographic or group, because it's actually quite hard to get the "This is not my crowd" response based on normal usage of the site.

Some questions to ask yourself about public and private spaces
There's a lot of interesting things you can learn about communities that do this right - looking at everything from Craigslist, eBay, Facebook, etc., you see interesting static or dynamic segmentations that break public spaces into semi-public areas. In summary, as your site grows, it makes sense to ask questions like:

  • Are there different "groups" on the site that are interested in different topics? What's the best way to give each group their own area?
  • Is there a reason why people or content should be grouped by geography, language, or otherwise? In many cases, like classifieds or social networks, it absolutely does.
  • Going through the onion layers of relationships, is there a way for people to privately interact with their best friends? How about the next circle of friends after that? How about people they are likely to become friends with? How about the next layer?
  • If one group "blows up" inside your social site, how heavily does it affect the other users? Will it drive them out?
  • Where does reputation play into how people can use your public spaces? Does it make sense to require that users participate for a certain amount of time, or do a certain number of positive things, before they can post? (Look at forums for many successful variations of this)

Happy 2008 everyone!

December 23, 2007

Do you use Google Reader? I need your help!

Calling all Google Reader users...
Are you reading this in Google Reader? If so, I need your help on a science experiment.

Click the SHARE button and share this blog post out to your friends, like this:

Why do this?
This may be your first time reading my blog, and you might be asking "Why would I want to do that?" I'm guessing that when Google built this feature, they didn't think people would use it to propagate chain letter-like things, like this post ;-)

Take part in the first Viral Marketing "science experiment" inside of Google Reader! ;-) Let's answer the quesiton, "Can it be done??"

(This experiment originally inspired by Scoble, who linked to me and said, "How did I find it? My friends on Google Reader shared it with me. You can add me on Google Reader too")

Blogs can now easily jump from user-to-user with just one click, whereas before it was hard to "infect" another user virally. More on the viral marketing topic here. After this experiment runs its course, I'll post a longer analysis and explanation, depending on how successful it is.

In the meantime, don't forget to click on the SHARE button below! (Click here to go to Google Reader)

December 21, 2007

5 ways to break past the San Francisco echo-chamber

The Bay Area echo-chamber
It's now been a year since I moved down from Seattle, and one of the most interesting experiences I've had has been experiencing the "tech echo-chamber" here. Driven by blogs, friends, co-workers, and all the other channels of information, it's very easy to get excited about the next new thing rather than realize the eternal truth of technology:

Every new technology takes longer to permeate the world than you'd think

Whether it's the iPhone, podcasting, Facebook apps, AJAX desktop, OpenSocial, data portability, microformats, or the other legions of buzzwords, there's a LOT of information inefficiency between Silicon Valley and the rest of America.

How to break past the Silicon Valley echo-chamber
The question is, when everyone here talks about this stuff, how do you keep yourself from falling into the trap of building products for a niche tech audience? I honestly don't have a great answer to this question, but here are a couple ideas:

1. Read some books about American demographics, and how you fit into the world
If you haven't yet, I'd highly recommend that you read Bobos in Paradise, which is about you ;-) It talks a lot about a culture that values functional things, is into outdoor sports, and all that stuff. Furthermore, it ties this culture into its roots in the SAT score and new meritocracy that emerged in the last century. Absolutely a great read.

Other related books:

The idea here is to read about some of these groups, and realize how weird and skewed technology folks really are. In a country where the median HOUSEHOLD income is $48k,  the average SF engineer in his late 20s making $130k might want to read a little more about how the rest of the country is split up.

2. Spend a lot of time wandering around the top sites online
At my last company, Revenue Science, one of the most educational things we did was to buy the Alexa 100k list, hire a bunch of guys fresh out of college, and begin cold e-mailing and cold dialing them until we had talked to a good chunk of the top 10k US sites. It was a great experience because you figure out that there are HUGE sites out there, with 100s of millions of pageviews, run by 2- or 3-person teams out in the middle of nowhere, that are growing quite fast. In fact, when you have enough conversions, you'll start to discount Techcrunch and other sources for breaking news about "successful" websites.

In fact, in late 2004, we happened on a site that no one in the blogosphere was talking about (I checked on Feedster and Technorati) yet was adding 40k users per day on a base of 15 million registered users. We had talked to a random company called Intermix that seemed to mostly deal in e-cards, toolbars, herbal supplements, and other internet-marketing programs. But in talking to their sales folks, we were told of a sister property that was exploding, but no one knew why. This site, of course, was called MySpace.com. 200 million users later, I still don't think the property gets the respect that it deserves, just because it doesn't cater to the tech community.

You can do a similar thing now by viewing the Quantcast list here. I would be shocked if you didn't find a ton of sites in the top 500 or so that you've never heard of before.

Other sources of information like this are comScore, Nielsen, and other analytics sources, which can tell you specifically about what sites are the most common for women 35 or older, or teenagers, or other people outside of your demographic. Hugely useful. I'm also really interested in sorting large groups of sites by "longest time on site" or "high growth rate in the last month" because you always find interesting outliers there as well.

3. Visit unfamiliar retail stores, or even better, retail locations way out of your geography
Think about your average Wal-mart. It's a well-oiled machine, stocked with products optimized to the ZIP code it was placed. Now go walk around one, and you'll be surprised by what people are buying. Lots of outdoor equipment, or BB guns, or the book section is mostly self-help, diet/exercise, and cookbooks. Or look at the types of magazines that are stocked. Overall, the square footage of the Wal-mart will correlate with the $ per square foot in revenue that it generates, so find the places that seem to be huge (and uninteresting)

Another example of this is to go to teenage stores - when's the last time you went to a Hot Topic? or a Pacsun? In these stores, you can learn a lot of random things about teen culture, what kinds of influences are being exerted, and so on. Hot Topic is a great one because you definitely see a lot of "video game culture" being shown, as well as a lot of Japanese and Asian stuff being imported, reinterpreted, and then sold to the American audience.

4. Talk to a lot of people different than you - pay them if necessary
I've learned a TON from talking to people who are much different than me - the best way to recruit them is off of a survey form with traffic driven in from Google or Craigslist. If you qualify them and make sure they are sufficiently different, you can learn a ton of information. Or maybe you have a friend or two out in Middle America? Recruit them and their friends if possible.

Ask them about their technology usage - what websites do they use, what technologies are they excited about, what their daily schedule is, etc. I'm sure you'll be surprised by the answers.

Even better is if you can actually get a look at their computer! I'm sure you'd learn more about consumer technology working a week at Geek Squad at Best Buy than anywhere else. You'd see desktops clogged with icons, taskbars with hundreds of open IE windows, spyware everywhere, and everything else that a "typical" user is likely to do.

5. Visit the underbelly of the internet ;-)
And finally, make sure you visit the underbelly of the internet:

Websites/forums like these really epitomize the core of "Internet culture." A lot of different memes get started there, and it's where you can make observations about how internet culture is changing. For example, it's fascinating to note how sharing files has changed - it used to be mostly open FTP servers, then open directories (hosted in Apache), then Gnutella links, then BitTorrent, and now more often than not, it's mostly upload sites like RapidShare or Megaupload.

Similarly, you can see the origins of such phenomenon as lolcatz or other fun themes. Just don't stay there long, or else your IQ will slowly plummet ;-)

An open question... any tips?
I'm sure many folks out there have their own ways of staying above the frothiness. What are they? What do you do? Would appreciate the thoughts - feel free to e-mail or comment.

 

December 20, 2007

Is your website a leaky bucket? 4 scenarios for user retention

Do you have happy, smiling users?
I've previously written a lot about metrics and user acquisition - just look at the left bar of this blog - but have not written much about metrics and user retention. By retention, I mean the process in which you convert new users who don't care about your site into recurring users that are loyal and continually drive pageviews.

In general, I would say that more people care about this than pure user acquisition, which is great, but they are often using aggregate numbers to measure this retention. By aggregate data, I mean looking at an overall Google Analytics number, or looking at an Alexa rank, or some other rolled-up metric which doesn't differentiate between new users that are discovering your site for the first time versus loyal users that are returning to your site.

In fact, in general I think of websites as "leaky buckets" where users are constantly getting poured into the top, and the site is constantly leaking users. In fact, you can imagine that if you pour 1,000 users into any website and then stop additional new users from joining, that 1,000 can only decrease. Over time, some users become loyal and throw off pageviews, but over time, they disappear. The rate at which this happens can be a turned into a metric just like any other number.

Pop quiz: Is Twitter retaining users?
First off, take a look at this graph and tell me if you think Twitter is retaining its userbase month over month. What do you think?

Think you have any answer?

The growth disambiguation problem
And of course, it was a trick question. In fact, it's basically impossible from purely outside data to disambiguate the following scenarios:

  1. Pageviews are coming ONLY from new users
  2. Pageviews are coming ONLY from one generation of users (like early adopters)
  3. Pageviews are coming ONLY from retained users
  4. Pageviews are coming from new users and retained users

This should be totally obvious to people, but instead I see people pointing at Alexa graphs and saying that site A or site B is doing well, when in fact they could have a deep systemic problem.

In fact, let me argue the following in this post:

From aggregate data (like Alexa), you can figure out what sites are doing poorly at retention, but not what sites are doing well

Let's start with the first scenario:

1. Pageviews are coming ONLY from new users
In this first scenario, the retention on your site totally sucks meaning that you lose all your people after the first session. That means that the drop off from a 1,000 users flowing in is 1,000 dropping to 0. Your retention rate is 0% from week 1 to week 2 :)

That said, how could you still get pageviews? First off, you obviously get any pageviews a user might create in the first session, even if they never come back. I think the most common scenarios are the following:

  • Users create text content which is SEO'd and placed in the Google index
  • Users send invites via e-mail which are then accepted

In either case, they are some form of "viral loop" that attracts new users even if the original user is never retained. In fact, I bet you that a lot of sites out there are buoyed by their search engine traffic, even when they have really terrible retention rates. All that matters is that they do enough work to generate a couple pageviews, and then bring in the next generation.

Using the bucket analogy, this is a bucket that has a firehose filling it, but all the water leaks out almost immediately. With a big enough firehose, the aggregate stats could look good when they are in fact rather shitty.

2. Pageviews are coming ONLY from one generation of users (like early adopters)
3. Pageviews are coming ONLY from retained users
Similar to the first scenario, you might have a situation where the numbers look great, but it's because the bucket was able to fill well in the first group of users, but after then, the site sucks at retention. Or the inverse, where there's no growth at all, but the retention is great.

In either case, this might hint at a bad systematic condition within the site, but ultimately the aggregate numbers hide the problem. In either case, not being able to acquire and retain brand new users is a problem, and without measuring the groups separately, it seems impossible to assess the true situation.

Back to Twitter for a second
So in fact, looking at the Twitter chart, the right answer is "we don't know." A plateau'd chart like that could mean that Twitter is doing fine at retaining some set of users, and it's stalled on new users, or that it's acquiring news users like crazy but not retaining them, or anything in the middle.

That said, given the fact that Twitter pages show up in Google, which will provide them with a steady stream of new users, and that the average time on site looks closer to a heavily SEO'd site like Yelp than a social site like MySpace (5min instead of 30min, according to Compete.com), I'd guess that they are actually bleeding users pretty rapidly. Again, it's hard to do an analysis like this without a lot more data to back it up, but that'd be my high-level analysis.

How do you figure out the health of the site then? Measuring "cohorts"
In general, the solution to the retention measurement problem lies in separating out NEW users and RETURNING users within the analytics. So at the minimum, you'd have to be able to talk about the following:

  • 1 million uniques to the site
  • 100,000 new uniques
  • 900,000 returning uniques from the month before

That'd give you a sense that the site was actually retaining users well. But to take this further, what you really care about is to carve up your userbase into "cohorts," and measure drop-off rates from time period to time period. Here's the definition of a time-based cohort:

A cohort is all the users that joined through a particular time period

Only then can you track the retention rate of a SPECIFIC set of users, and then measure other users experiencing an independent scenario. In the "cohort model" you'd end up with a group like:

Users that joined in Week 1
week 1 uniques: 100,000
week 2 uniques: 50,000
week 3 uniques: 25,000

In this model, you'd see that 100k users joined in week 1, and if you follow that "cohort" through, you end up with a 50% drop-off rate from week to week.

But then, in week 2, new users joined as well, which creates a week 2 cohort. Of course, in your aggregate metrics, the site would have 100k uniques in week 1, then 125k+50k uniques in week 2.

Users that joined in Week 2
week 2 uniques: 125,000
week 3 uniques: 50,000

Note that this cohort only goes through 2 weeks because it starts at week 2 and ends at week 3, whereas the week 1 cohort is able to run 3 weeks.

When you compare to the week 1 to week 2 cohort, you can tell that 1) there was a 25% increase in new users (100k to 125k), and that the retention rate DECREASED to 40% (50k/100k versus 50k/125k). This would be a red flag that your site was sucking, even if your aggregate stats looked good:

Total site stats
week 1 uniques: 100,000
week 2 uniques: 175,000
week 3 uniques: N/A*
(*since week3 cohort is not defined, 25k+50k+week3 cohort stats)

It's not clear what your time period should be - perhaps weeks, perhaps days, perhaps months. Probably it depends on the average time between your users logging in, or something similar.

Is there a retention coefficient?
In fact, one might argue that in analyzing these cohorts that in addition to a "viral coefficient" which is measured in viral marketing, there's in fact a "retention coefficient" that measures how well you are able to keep ahold of users.

This would be true if the cohorts you chose typically lose a constant % from week to week. That would mean that every cohort decays exponentially, which would give you a coefficient. (i.e., f(x) = e^-ax, where a is the retention coefficient)

Please measure and e-mail me your findings ;-)

December 15, 2007

Glitter obsession, both online and offline


What's going on with all the glitter text, glitter backgrounds, and glitter effects?
In fact, if you search for "glitter" on Google, you'll see one of the most organized, comprehensive and SEO'd set of websites ever. And they're all "ugly" and full of ads. In fact, it's hard to go to any MySpace affiliated site, even Bay Area darlings Slide and Rockyou, without seeing a bunch of glitter-themed stuff everywhere.

It seems as though glitter is playing an interesting role as a design motif - similar to the edgy graffiti look, or brushed aluminum, or rounded corners, or any other motif. But instead of conveying "futuristic" it conveys "fun" or "pretty." Strange, I know! And furthermore, this aesthetic is being driven a completely different demographic than the folks that exist here in the Bay Area.

And in fact, here are some pictures from a recent excursion to a scrapbooking store that shows you some offline variations of this stuff - keep in mind all this stuff is like 3-4 dollars per bag, crazy!

Why your friends list get polluted over time

Best friends forever?
I've recently been doing some qualitative research into how people use social networks, and I've learned a great deal of interesting stuff through these interviews. Typically, I'm spending about an hour at a time having folks go through exercises like describing 2-3 items that represent them, drawing out their social network, talking about meeting new people, and a bunch of other random things.

While doing this, I've been paying attention to something that's been bothering me over time as people friend me on Facebook:

Why does my friend list get so polluted over time with people I don't know at all?

As a great rush of people in SF have gotten on Facebook, I've gotten regular friend requests - mostly legit, but some completely random strangers - and over time, I've collected a pretty large group. However, my group is much larger than the so-called Dunbar number, which estimates the largest group size that humans can have social relationships with. (It's 150, by the way)

In fact, this entire issue of "real friends" versus "fake friends" has been an issue in social networks for a long time. First, with Fakesters on Friendster, then talk of "fake friends" on MySpace, and even certifications for folks on dating sites. In the past, it's been said that Facebook actually reflects your "real life" friends because of all the geographical semi-private network stuff they do, but over time, I've found my own personal network saturated.

Friendships are complex
The first underpinning of this discussion is that friendship networks are actually very complex, and are poorly approximated by the "friends" versus "not friends" paradigm, or even the "friends", "top friends", and then "not friends" paradigm. In fact, you'll see that a lot of social maps look like this:

This is my sister's social map that she drew out for me, and is one of about half a dozen I've seen so far. What you'll see is several overlapping networks based on geographical location (SF versus seattle), organizational affiliation (school/work/etc), sub-organizational affiliation (fraternity at school), versus strength of relationship.

And in fact, once you have this social map drawn out, one of the most interesting questions you can ask people is how they figure out in what situations they should:

  • call someone
  • text someone
  • e-mail someone
  • poke them
  • write on their wall
  • write them a message
  • meet them in person
  • etc

What you'll find, in that discussion, is that there's a steady progression of "commitment" that it takes to go from writing on a wall (the least burdensome thing) versus meeting them in person (the most burdensome thing). In fact, one of the really useful things that social networks provide that e-mail doesn't is a range of expressiveness in your communication such that you can use it for more things than sending notes or data across the wire.

Where does adding a friend go into this?
Interestingly enough, if you ask people where "adding a friend" fits into the spectrum of interaction, where do they put it?

That's right: They put it in the very beginning as the EASIEST and LEAST burdensome interaction they have with people. So in fact, if you don't know someone at all, or you are just acquaintances with them, the first thing you do is add them to your friends.

And folks, that is not much of a filter at all ;-)

So what you'll find is that as your social network evolves online, you'll end up accumulating more and more acquaintances as a % of your total friends, until your friend list is by far mostly people you don't know (or that you knew in the past), but that you don't really care to see all their pictures and their app installs and all that stuff.

It's very unclear how to come up to a solution for this - you certainly don't want people to need to describe their social networks at the level I asked them to, yet there's enough complexity and detail in your social relationships that you need to capture a lot of detail in order to fix the friends problem.

December 04, 2007

Tech bubble music video

Okay, everyone's seen this but I'm blogging it anyway because it's so awesome:

ABOUT THIS BLOG

  • Futuristic Play

    My name is Andrew Chen and I'm an entrepreneur living in San Francisco, CA. This blog covers my thoughts on metrics, viral marketing, user experience, game design, and online advertising.

    I don't write often, so sometimes the easiest thing to do is to subscribe to my blog (which you can do below).

Enter your email address:

Delivered by FeedBurner

Contact me

My Photo

AdRoll

  • AdRoll

ESSAYS ON VIRAL MARKETING, ONLINE ADS, AND GAMES

Stuff I'm doing around the web

Google Analytics