Named HTML Entities in RSS

A problem with feeds not working in Sage alerted me to how character references should be used in RSS feeds.

Despite validating with feedvalidator.org and the W3C feed validator, the Firefox XML parser failed on this one feed, as did Sage. Firefox found an undefined entity in the RSS – … – a horizontal ellipsis. I wasn’t sure why that’d break the RSS, so I went digging a little…

Naughty names

RSS doesn’t include an XML schema, which means that a named entity such as … is unlikely to mean anything to a feed reader. … is fine in an XHTML document because it generally has a doctype and a schema.

So if you use data from a content management system to generate RSS, you need to ensure that all named entities are converted into numeric character references. Edit: And as Robert Wellock points out in the comments below, it’s advisable to stick to using numeric character references when using XHTML anyway.

Numeric character references

As far as I know, numeric character references are generally better supported than named entities. I tend to use numeric character references anyway when I code, as I’m sad and have a bunch of the numbers committed to memory after years of usage – that’s a scary thought!

Having said that, I do have Dave Child’s HTML Character Entities Cheat Sheet stuck up in my office footnote 1, with a few I’ve added myself, including:

  • horizontal ellipsis (…)
  • en dash and em dash (– and —)
  • left and right double quote (“ and ”)
  • left and right single quote (‘ and ’)

Of course, the real geeks among us will look things up in the full list of character references in the HTML 4 schema!

Other Resources

More useful info at:

Footnotes

  1. Dave, I can’t really afford anything for you off Amazon at the moment, but I certainly owe you a drink or two by now! Back to footnote 1 source