Kurt McKee

lessons learned in production

Archive

Hey there! This article was written in 2012.

It might not have aged well for any number of reasons, so keep that in mind when reading (or clicking outgoing links!).

Announcing feedparser 5.1.2

Posted 3 May 2012 in feedparser and release

Howdy everybody,

I'm pleased to announce that feedparser 5.1.2 is available for immediate download, and I want to stress the word "immediate"! This is a security release, and all users and developers are strongly encouraged to upgrade immediately.

Security fix

Yesterday while working on a character encoding bug I noticed that it was possible to slip XML ENTITY declarations past feedparser's filter by encoding the document in a non-ASCII-compatible character encoding. ENTITY declarations are like variables or placeholders in XML. When an XML parser finds an entity (like Á) it will substitute whatever text corresponds to the entity (like Á). The problem is that entities can reference other entities, which can lead to exponential memory consumption, even in small documents.

As an example, the following series of ENTITY declarations can demonstrate the problem:

<!ENTITY exponential1 "long text to be repeated">
<!ENTITY exponential2 "&exponential1;&exponential1;">
<!ENTITY exponential3 "&exponential2;&exponential2;">
...

Each additional entity doubles the number of times 'exponential1' has to be repeated. This has the potential to be used as an attack vector, and feedparser has a filter in place to strip dangerous ENTITY declarations from documents prior to parsing them. Unfortunately, the filter was being run before the character decoding code, which allowed ENTITY declarations through if the document was not ASCII-compatible.

This is now fixed, and you should upgrade immediately!

Other fixes

After the last release I received reports that the RFC822 date parser couldn't handle single-digit days. This has been fixed. feedparser can now also handle feeds that have been compressed with the deflate algorithm but that are missing headers and checksum data. It will also try to continue parsing if it encounters a decompression error that might be recoverable (which can happen if a feed claims to be compressed but is not).

Finally, magnet URIs are now considered safe, and there have been a few other minor fixes.

Go get it

As always, feedparser can be downloaded from PyPI or from Google Code:

http://pypi.python.org/pypi/feedparser
https://code.google.com/p/feedparser/downloads/list

Remember, this is a security release, so hurry and upgrade before it goes out of style!

☕ Like my work? I accept tips!