I misspoke, apparently: the Unicode problem in the first tool was stemming from something weird going on with the xmlrpclib.Binary.decode() function. Extracting the raw utf-8 data and decoding that gets me the data I expect. New problem: some of my entries are not fully HTML. The paragraphs are not wrapped in
tags, resulting in a massive blob of text when converted to Markdown because html2text discards the newlines. I have to add the HTML tags before converting to Markdown.