Revisiting comment scraping
Posted 25 July 2008 in comment, programming, and pythonEarlier this month I wrote about scraping LiveJournal comments. What was I thinking?
While I was able to account for a number of variables in the page by tweaking my XPath statements, it became obvious early on that screen scraping for comments should be a last resort. So I decided to use Flickr as my first comment source. Flickr has a delightful API for retrieving comments, and I had working code in short order. No corner cases. No special possibilities. Just functional code.
Now I'm working on the ability to track comments on Digg. It's a new challenge, because Digg supports nested comments, and that's metadata I don't yet have a way to store. After that I'll add a couple of basic blogging providers, and then I'll start soliciting help for creating beautiful templates. Hopefully I'll have a release readied shortly thereafter.
Caleb, it may eventually be possible to get a LiveJournal comments feed, but I hope that my comment tracking software will one day do you one better than just a simple feed.