Kurt McKee

lessons learned in production

Archive

Hey there! This article was written in 2008.

It might not have aged well for any number of reasons, so keep that in mind when reading (or clicking outgoing links!).

Revisiting comment scraping

Posted 25 July 2008 in comment, programming, and python

Earlier this month I wrote about scraping LiveJournal comments. What was I thinking?

While I was able to account for a number of variables in the page by tweaking my XPath statements, it became obvious early on that screen scraping for comments should be a last resort. So I decided to use Flickr as my first comment source. Flickr has a delightful API for retrieving comments, and I had working code in short order. No corner cases. No special possibilities. Just functional code.

Now I'm working on the ability to track comments on Digg. It's a new challenge, because Digg supports nested comments, and that's metadata I don't yet have a way to store. After that I'll add a couple of basic blogging providers, and then I'll start soliciting help for creating beautiful templates. Hopefully I'll have a release readied shortly thereafter.

Caleb, it may eventually be possible to get a LiveJournal comments feed, but I hope that my comment tracking software will one day do you one better than just a simple feed.

☕ Like my work? I accept tips!