TagLines
I’m such a lazy git sometimes and when I wrote TagLines I never bothered to write a purge script.
Taglines works like this: I have a list of about 20 RSS feeds, like MSN and EnGadget and so on. Then I run a little engine, which grabs the content from these feeds. It stores the content in my database. Then it parse s each piece of content and passes it to the excellent Yahoo Content Term Extraction service. Yahoo returns a list of relevant tags for a piece of content. I store these with a reference back to the original story. Finally I present the tags based on the order in which they occur. In this manner you might find “War” appearing in a the top or close to it for feeds like the BBC or MSNBC. You might find “Apple” or “Microsoft” in the top for feeds like MSDN and so on.
Anyway, the tags just keep accumulating. In the last week my engine gathered roughly 30,000 tags for 9,000 stories. Amazing how active the Blogosphere is.
I’ve also claimed this blog on Technorati. Technorati Profile should prove it.








