It's been five years since feedparser 4.1 was released, and there's been a lot of work put into it.
- HTML5, XHTML, SVG, MathML, and CSS can now be sanitized. Sam Ruby's added a lot of code over the years that whitelists elements and attributes that are known to be safe. By default feedparser will run content through this guantlet so that developers can concentrate more on creating great applications and less on safeguarding from malicious content and poor XML escaping.
- Illformed XML is now interpreted better. Illformed XML forces feedparser to make guesses what the feed author intended, but now it makes more educated guesses.
- Microformats are now supported. Two days after version 4.1 was released, Mark Pilgrim added microformat support so that authors can designate tags and enclosures in-content. Even if an aggregator strips the Atom or RSS elements containing the original tags and enclosures, they can still be recreated by feedparser! XFN and hCards are also supported.
- Python 3 is now supported. Feedparser 4.1 supported Python 2.1 and up; feedparser 5.0 supports Python 2.4 through 3.1, and new releases in the Python 3 line will continue to be added as they're made available. Note that using feedparser in a Python 3 environment currently requires you to install a ported version of
sgmllib, which was removed from the Python standard library in the new 3.x releases.
- HTTP response headers can be passed to
parse(), now. If you're doing heavy-duty feed parsing, you should probably use a heavy-duty HTTP client library. Happily, all of those HTTP headers that feedparser uses for content decoding, decompression, and relative URL resolution can be passed in after the URL has been requested by your HTTP client library. Be sure to thank Joshua Bronson for this feature.
- HTTP request headers can be also be passed to
parse(). If you're relying on feedparser to do your HTTP requests, you can pass in a dictionary of HTTP headers to send with the request. This is useful for negotiating with caching servers and for sending cookies, and you should thank Martin Pool for this feature!
In addition, almost every reported bug was fixed for this release. Yes, that includes the big one you really wanted fixed. You're welcome.
Feedparser 5.0 is available for immediate download in three formats. The packages now include the unit tests, so after extracting the files you can run
feedparsertest.py and make sure everything's running correctly.
The microformat support currently requires BeautifulSoup, but note that only the 3.0.x series and version 3.2.0 have been tested. The BeautifulSoup author considers the 3.1 releases to be a "failed experiment", and feedparser consequently hasn't even been tested with it.
Feedparser 5.0 is not currently available on feedparser.org or PyPI, but Mark has been notified about the problem, and hopefully it will be resolved soon.
We need your help telling people about feedparser 5.0! If you're a feedparser user, please consider posting the news to your blog, your favorite news website, or your social network of choice. The one thing to keep in mind is that feedparser.org has not yet been updated, so please only link to the Google Code project:
If you're a package maintainer for a Linux distribution and there's something we can do to expedite getting a new package uploaded, please let us know.
If you get bitten by a bug in feedparser, we'd like to know about it. Just visit the issue tracker, do a quick search to make sure it hasn't already been reported, and if it hasn't then file a new report. The best bug reports have the simplest sample feed that triggers the bug included as an attachment. Well, that or a link to a publicly-accessible feed.
Again, this release is chock full of exciting features and bug fixes, but the project relies on people like you to report bugs, write patches, and suggest big new features, so join the mailing list and help us get started on a 5.1 release!