Baldur Bjarnason writes about Mark Pilgrim’s Parsing RSS at all costs article at XML.com:
Mark fails to say the obvious. All RSS is automatically generated by an application of some sort. The webloggers are end-users themselves of weblogging applications.
So we should lobby the application developers to automatically validate the RSS feed every single time it is generated and automatically fix the most common errors (unescaped ampersands).
This is a sensible thought. If weblogging tools enforced at least the well-formedness of the RSS output, a great part of the problem with invalid feeds would be solved. Enforcing the validity of the feeds itself with respect to the specifications is a minor problem because SAX-based approaches to parsing can take care of almost all incorrect uses.
But there are still some problems. Obviously, not all tool developers care about generating well-formed feeds. Some even have a history of proving feeds that are almost garbage, if feeds are provided at all. That’s the problem I see with any solution other than parsing at all costs: it may be harder to convince some providers to create tools that generate correct XML than write liberal parsers. And even if those developers could be convinced, there are still large amounts of deployed tools that won’t be upgraded and will continue to generate invalid feeds.
I don’t like this situation. As Dare Obasanjo said, it’s quite depressing. XML was supposed to bring a new age to information processing. But real life is always different, and now we must deal with those problems in a way users can benefit. After all, there is no point in creating technology that cannot add value to people lives.