When should one use HTML entities?

HtmlXhtmlHtml Entities

Html Problem Overview


This has been confusing me for some time. With the advent of UTF-8 as the de-facto standard in web development I'm not sure in which situations I'm supposed to use the HTML entities and for which ones should I just use the UTF-8 character. For example,

  • em dash (–, &emdash;)
  • ampersand (&, &)
  • 3/4 fraction (¾, ¾)

Please do shed light on this issue. It will be appreciated.

Html Solutions


Solution 1 - Html

Based on the comments I have received, I looked into this a little further. It seems that currently the best practice is to forgo using HTML entities and use the actual UTF-8 character instead. The reasons listed are as follows:

  1. UTF-8 encodings are easier to read and edit for those who understand what the character means and know how to type it.
  2. UTF-8 encodings are just as unintelligible as HTML entity encodings for those who don't understand them, but they have the advantage of rendering as special characters rather than hard to understand decimal or hex encodings.

As long as your page's encoding is properly set to UTF-8, you should use the actual character instead of an HTML entity. I read several documents about this topic, but the most helpful were:

From the UTF-8: The Secret of Character Encoding article:

> Wikipedia is a great case study for an > application that originally used > ISO-8859-1 but switched to UTF-8 when > it became far too cumbersome to support > foreign languages. Bots will now > actually go through articles and > convert character entities to their > corresponding real characters for the > sake of user-friendliness and > searchability.

That article also gives a nice example involving Chinese encoding. Here is the abbreviated example for the sake of laziness:

UTF-8:

這兩個字是甚麼意思

HTML Entities:

這兩個字是甚麼意思

The UTF-8 and HTML entity encodings are both meaningless to me, but at least the UTF-8 encoding is recognizable as a foreign language, and it will render properly in an edit box. The article goes on to say the following about the HTML entity-encoded version:

> Extremely inconvenient for those of us > who actually know what character > entities are, totally unintelligible > to poor users who don't! Even the > slightly more user-friendly, > "intelligible" character entities like > θ will leave users who are > uninterested in learning HTML > scratching their heads. On the other > hand, if they see θ in an edit box, > they'll know that it's a special > character, and treat it accordingly, > even if they don't know how to write > that character themselves.

As others have noted, you still have to use HTML entities for reserved XML characters (ampersand, less-than, greater-than).

Solution 2 - Html

You don't generally need to use HTML character entities if your editor supports Unicode. Entities can be useful when:

  • Your keyboard does not support the character you need to type. For example, many keyboards do not have em-dash or the copyright symbol.
  • Your editor does not support Unicode (very common some years ago, but probably not today).
  • You want to make it explicit in the source what is happening. For example, the   code is clearer than the corresponding white space character.
  • You need to escape HTML special characters like <, &, or ".

Solution 3 - Html

Entities may buy you some compatibility with brain-dead clients that don't understand encodings correctly. I don't believe that includes any current browsers, but you never know what other kinds of programs might be hitting you up.

More useful, though, is that HTML entities protect you from your own errors: if you misconfigure something on the server and you end up serving a page with an HTTP header that says it's ISO-8859-1 and a META tag that says it's UTF-8, at least your &mdash;es will always work.

Solution 4 - Html

I would not use UTF-8 for characters that are easily confused visually. For example, it is difficult to distinguish an emdash from a minus, or especially a non-breaking space from a space. For these characters, definitely use entities.

For characters that are easily understood visually (such as the chinese examples above), go ahead and use UTF-8 if you like.

Solution 5 - Html

Personally I do everything in utf-8 since a long time, however, in an html page, you always need to convert ampersands (&), greater than (>) and lesser then (<) characters to their equivalent entities, &amp;, &gt; and &lt;

Also, if you intend on doing some programming using utf-8 text, there are a few thing to watch for.
  • XML needs some extra lines to validate when using entities.
  • Some libraries do not play along nice with utf-8. For instance, PHP in some Linux distributions dropped full support for utf-8 in their regular expression libraries.
  • It is harder to limit the number of characters in a text that uses html entities, because a single entity uses many characters. Also there's always the risk of cutting the entity in half.

Solution 6 - Html

HTML entities are useful when you want to generate content that is going to be included (dynamically) into pages with (several) different encodings. For example, we have white label content that is included both into ISO-8859-1 and UTF-8 encoded web pages...

If character set conversion from/to UTF-8 wasn't such a big unreliable mess (you always stumble over some characters and some tools that don't convert properly), standardizing on UTF-8 would be the way to go.

Solution 7 - Html

If your pages are correctly encoded in utf-8 you should have no need for html entities, just use the characters you want directly.

Solution 8 - Html

All of the previous answers make sense to me.

In addition: It mostly depends on the editor you intent to use and the document language. As a minimum requirement for the editor is that it supports the document language. That means, that if your text is in japanese, beware of using an editor which does not show them (i.e. no entities for the document itself). If its english, you can even use an old vim-like editor and use entities only for the relative seldom &copy; and friends. Of course: &gt; for > and other HTML-specials still need escapes. But even with the other latin-1 languages (german, french etc.) writing ä is a pain in you know where...

In addition, I personally write entities for invisible characters and those which are looking similar to standard-ascii and are therefore easily confused. For example, there is u1173 (looking like a dash in some charsets) or u1175, which looks like the vertical bar. I'd use entities for those in any case.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionallesklarView Question on Stackoverflow
Solution 1 - HtmlWilliam BrendelView Answer on Stackoverflow
Solution 2 - HtmlJacquesBView Answer on Stackoverflow
Solution 3 - HtmlJim PulsView Answer on Stackoverflow
Solution 4 - HtmlNed BatchelderView Answer on Stackoverflow
Solution 5 - HtmlMarco LuglioView Answer on Stackoverflow
Solution 6 - HtmlmjyView Answer on Stackoverflow
Solution 7 - HtmlOtávio DécioView Answer on Stackoverflow
Solution 8 - Htmlblabla999View Answer on Stackoverflow