Loading...

RSS standard lameness

Submitted by: etlgfx
Updated: ; Posted:

RSS (most commonly expanded as "Really Simple Syndication") is a family of web feed formats used to publish frequently updated works—such as blog entries, news headlines, audio, and video—in a standardized format.

So I've been working on fixing up my RSS codes so that I can automatically post to Facebook using my blog, which led me to see a lot of stupidness in the RSS specs.

First -- and most nitpicky -- of all, what the hell kind of standard keeps changing it's own name? RSS has stood for:

  • RDF Site Summary

  • Rich Site Summary

  • Really Simple Syndication

Second, apparently for an RSS feed to be valid, it needs to include the Atom namespace (an alternate news feed format) with an element in the atom namespace containing a URL referencing itself. This is not part of the specification I should add, BUT is required to be able to pass the w3.org validator test. While I can see a use for this, the actual RSS specification already requires a link to the website containing the feed, so it seems a bit redundant to be 'required' by a validator.

Third, the specification encourages you to put encoded HTML inside the RDF XML data, and the user agent should render that as regular HTML apparently. So basically you can put whatever XML inside the XML document you want, it just needs to be escaped.

There's so many things wrong with this. How does a parser ensure that the XML is wel formed? I can pass in a bunch of random tags that don't match each other. How does a parser ensure the escaped XML is valid? There's no way you can validate the escaped content with any kind of DTD (Well to be fair you can, but I'm pretty sure no RSS reader does). And besides all the logical reasons: It's ugly!

Fourth: Is the validator created / maintained by the same people who wrote the specification? Doesn't look like it to me, but I'd love to be proven wrong.

1:26p Feb 10th, '10

what is a RSS

2:41p Feb 14th, '10

The history of RSS is long, storied and filled with infighting. It would make for good TV if not for the fact that it consists almost entirely of people sitting at computers.

From what little I remember of the genesis of Atom, from around when I still cared about what Dave Winer had to say:

1. People kept forking RSS, hence the name changes. RSS 0.9x was a Netscape / Userland Software thing (depending on the version), and for 1.x the RDF people got their hands on it, turning it into an unholy mess of namespaces (I'm not a huge fan of XML and the W3C approach to it - wasn't a major advantage of XML supposed to be human readability? Explain XQuery in that frame of reference). RSS 1.X also broke compatibility with RSS 0.9x, which RSS 2.x maintains.

At any rate, according to the latest version of the RSS spec (http://www.rssboard.org/rss-specification) it's Really Simple Syndication now.

2. The Atom namespace stuff (and this goes to answer #4 as well) is simply because the validator, IIRC, was written by people on the Atom project who in general don't like Dave Winer, and opted to use their own namespace rather than the features built into RSS 2.

This is fairly typical, from what I recall, of the kind of shit that went down during the development of Atom and the controversy over the RSS spec lo those many years ago. Some big egos on either side, and whenever egos fight, people act like dicks. Of course, when everyone acts like dicks, everyone gets screwed.

3. That was one of the issues surrounding the RSS 2 controversy that sparked the creation of Atom. I think the best practice here is to wrap the content in CDATA (so long, human readability!). Other major issues included date formats (since resolved, I think?) and that the RSS spec didn't specify character encoding.

4. As mentioned earlier: Not the same people.

This is all from (hazy) memory, so take it with a grain of salt or ten.