Right, so Google Reader collects and presents feeds from all over the world, on a variety of subjects. It lets you read them. If you like them you can share them. Other people can then read your shared items. Of course, they could be reading them in Google Reader. The circle goes on.
Why’s this interesting? Think of it from an information perspective. Google have x million regular Reader users. Each of those readers is subscribed to y feeds. Simply from this alone you can work out things like the most popular feeds. Hey – that’s not new. Add a time factor and you can work out the most interesting new feeds, as well as the all-time favourite feeds.
Now, according to Bloglines, Slashdot is the most popular feed (other than Bloglines News…). That’s hardly a surprise considering how many readers Slashdot gets, who are tech savvy and probably want all their headlines in their favourite reader. Wired News, Dilbert, the BBC. All frequent stuff.
Let’s look at the BBC’s headlines. A bit about Syria; something about Bush and Iraq; Europe eases Algeria visa rules apparently; French rugby player receives knee injury. How could we tell which of these are interesting? Well we could use our favourite buzz aggregators which watch links from oodles of blogs. One problem with these is that they take time to update. 13 minutes is good, but not good enough (and that was Google). They rely on picking information from third parties (the blogger themselves). You have to accumulate the inbound blog links by visiting each and every newly posted blog (by the way, the pingmesh/pingosphere which is supposed to notify websites of post updates is absolutely crammed with spam).
Let’s come back to Google. With their sharing capability, you can immediately publicise blog posts you think are interesting. Because this is all internal to Google, the act of sharing is immediately known to Google. They could build up a list of top new posts basically in real time. With enough posts and user contributions, you’ve basically got yourselves a Digg. If you get enough of a user base sharing their ‘favourite’ posts, that’s a pretty good statistical sample to determine ‘what’s hot’ in the blogosphere (and wider web – let’s not forget feeds are not just synonymous with blogs…)
Then we have folders aka tags. These allow the reader to specify under which headings their feeds appear. Again, enough of a sample base and you’ve just obtained a huge human-editing categorisation tool – a folksonomy. These things fascinate me because humans are far better at categorising things than computers. If 2000 people all refer to Granny Buttons with the tag canals, for instance, it’s a pretty sure bet that Mr. Denny over at Granny Buttons probably has a word or two to say about canals.
As I said earlier, I think this shares a lot with OPML reading lists.
More interesting yet again is the abillity for users to subscribe to other users’ tags. This starts really delving into Voice of Authority territory, which as it happens was the focus of my final year dissertation at uni… My reckoning here is simple. If you have one person who tags canals; 500 other people subscribe to that, you can probably infer that that person has pretty much nailed the category of canals. The VoA stuff really starts kicking in (in my view) when the ‘authority’ has editorial control over what does and does not get added to their ‘output’. Mix that in with some of their own content (hey, Google have that covered too!) and you’ve got a content stream you can rely on. See also, similar filtering in Google Co-Op.
Now, here comes the crunch. You have an enormous number of people effectively ‘voting with their subscriptions’ for the best content on the Internet. You have a whole bunch of people who, in various ways, are adding some smart filters to this mass of content and tagging it as they go. You have a whole bunch of people also deciding whether these tags were any good or not. Net result: Google can tell what feeds are good sources for information in particular areas. Not just mechanically good (as in has lots of text and a high PageRank), but really good (as in people really want to read it). Google control the environment in which this occurs, and to some degree they control the input too. Somewhere in amongst all of this the engineers at Google are sitting on a goldmine of human-filtered information, and they probably know it!
Spam, by the way, is something I have even more on and I’ll tackle shortly. For now I need sleep!