Kurt McKee

lessons learned in production

Hey there! This article was written in 2010.

It might not have aged well for any number of reasons, so keep that in mind when reading (or clicking outgoing links!).

Extending feedparser using dynamic inheritance

Posted 14 December 2010 in feedparser and python

After going through many, many feedparser bugs (over 100 closed since I started working on the project, with more waiting to be reviewed!) I've seen a lot of reports asking for additional support for various namespaces. Off the top of my head the list includes GeoRSS, the iTunes Music Store (which is different from the iTunes namespace), Google History, and OpenSearch.

To understand this you might need a little background. When feedparser encounters an XML element, it looks for a method name based on the element name. An opening <itunes:title> element will result in a search for a _start_itunes_title() method (and there will probably be a corresponding _end_itunes_title() method). If no such method is found, then _unknown_starttag() or _unknown_endtag() will be called.

The problem in my opinion is that it's infeasible to add support for all of those namespaces, and so I've been pondering how the project could accommodate these developers' needs without adding volumes of code. Originally I had been thinking that I could write support for .ini configuration files that could educate the _unknown_starttag() and _unknown_endtag() methods.

The problem is that the solution feels complicated, which isn't my favorite kind of solution. It requires developers to learn a new syntax, it requires a new type of unit test methodology in feedparser (and God knows I'm already dreading fixing feedparser's unit tests), and it will likely be too brittle to actually meet developers' needs in the first place.

Suddenly it occurred to me that perhaps it's possible to modify the _FeedParserMixin class dynamically. What if developers could just code the additional namespace support in a Python class, and then make the _FeedParserMixin class inherit that support at runtime?

After some quick searches I tried the following code as a proof-of-concept, and it seems to work like a charm (I've used actual feedparser names to illustrate the point):

class _FeedParserMixin(object):
    def where(self):
        print "in _FeedParserMixin"

additional_support = []
def register_additional_support(cls):
    # cls is a class containing additional functions named in the form
    # `_start_$namespace_$element` and `_end_$namespace_$element`

def parse():
    # parse() is the main public function in feedparser, but here
    # it just returns an instance of a class that inherits from
    # `_FeedParserMixin` and the other registered base classes
    return type('BlendedMixin', tuple([_FeedParserMixin] + additional_support), {})()

Now then, a developer comes along and says "Gee, I like the where() function, but I wish I had support for how(), too!" So the developer creates the following class in a separate file:

class SupportHow(object):
    def how(self):
        print "using magic"

Then, when the developer wants to add support for how(), it's easy to register that additional support with feedparser:

import feedparser
import extra_support

So here's a complete example that shows how this looks at an interactive prompt:

>>> import feedparser
>>> i = feedparser.parse()
>>> i.where()
in _FeedParserMixin
>>> i.how()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'BlendedMixin' object has no attribute 'how'
>>> import extra_support
>>> feedparser.register_additional_support(extra_support.SupportHow)
>>> i = feedparser.parse()
>>> i.where()
in _FeedParserMixin
>>> i.how()
using magic

Obviously this isn't a wholly accurate example, since it claims to be using magic! The truth is that it relies on Python's type() function, which is "essentially a dynamic form of the class statement". Happily, this approach works in Python 2.4 and Python 3.1, which are at the extreme ranges of feedparser's supported Python versions.

I hope this has proven to be useful information to someone out there. Whether anything like this ever goes into feedparser or not isn't up to me, so while feedparser already has a similar way to add support for different date and time formats, this is just a speculative option that seems to balance the maintainability and readability of the core feedparser code against the needs of other developers. Plus, researching and writing this has proven to be an informative exercise, which I'm always a fan of.