Home » Uncategorized

TagLines

14 January 2006 No Comment

I’m such a lazy git sometimes and when I wrote TagLines I never bothered to write a purge script.

Taglines works like this: I have a list of about 20 RSS feeds, like MSN and EnGadget and so on. Then I run a little engine, which grabs the content from these feeds. It stores the content in my database. Then it parse s each piece of content and passes it to the excellent Yahoo Content Term Extraction service. Yahoo returns a list of relevant tags for a piece of content. I store these with a reference back to the original story. Finally I present the tags based on the order in which they occur. In this manner you might find “War” appearing in a the top or close to it for feeds like the BBC or MSNBC. You might find “Apple” or “Microsoft” in the top for feeds like MSDN and so on.

Anyway, the tags just keep accumulating. In the last week my engine gathered roughly 30,000 tags for 9,000 stories. Amazing how active the Blogosphere is.

I’ve also claimed this blog on Technorati. Technorati Profile should prove it.

Leave your response!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.