Kurt McKee

lessons learned in production

Hey there! This article was written in 2011.

It might not have aged well for any number of reasons, so keep that in mind when reading (or clicking outgoing links!).

Improving feedparser's unit tests

Posted 6 January 2011 in feedparser

Feedparser has a bunch of unit tests...somewhere in the vicinity of 4400 (tells you when the last release was when the homepage says there are only 3000!). My concern is that the vast majority are duplicates that differ in one key aspect: one version is wellformed XML and the other is illformed. No, I'm not talking about false advertising, my concern is simply that it's inefficient and, at this scale, it's unmanageable (example to follow).

The unit test framework is designed around a simple principle: all unit tests are contained in XML files. If you want to test any area of the code, the test has to be expressed in an XML file that gets run through parse(). Further, there are two parsers the XML could run through: a strict XML parser and a loose sgmllib-based parser. Thus, all tests require two versions. I think that this is wildly inefficient at scale.

Take, for example, the following two test cases:

  • illformed/lang/feed_language.xml
  • wellformed/lang/feed_language.xml

These are identical files, save for a purposefully broken closing tag to ensure that the illformed version is run through the sgmllib-based parser. But remember what I said about manageability? It turns out that there are actually accidental duplicates of these already-duplicated tests!

  • illformed/lang/feed_language_override.xml
  • wellformed/lang/feed_language_override.xml

Undoubtedly these quadruplicates came from one file, but sadly the author forgot to make all of the changes so feedparser ended up with four tests that test exactly one thing.

What I'd like to do is revamp the feedparser unit test framework so this duplication can be removed, test times can be reduced, and test coverage can be increased. However, there are other things to consider:

  1. I've already written that I'd like to try adding namespace support through dynamic inheritance; each additional namespace would need separate tests, and the test framework would need to support this.
  2. The date and time parsers need to be more rigorously tested. A recent bug report confirms that the date/time parsers need a code review, but I'd like to back up such a review with non-XML-based tests.
  3. FeedParserDict needs to be thoroughly tested. While creating a git branch to remove UserDict references I created extensive tests and discovered a bug that causes FeedParserDict to raise a TypeError instead of a KeyError if attempting to access the description key when the key doesn't exist.

It will painstaking work (even with help from the near-miraculous coverage!), but I absolutely do not want to begin until feedparser is moved to a decentralized version control system. I'm looking forward to the next release, discontinuing Subversion use, and then improving the unit tests!