Why does HTML require that multiple spaces show up as a single space in the browser?

HtmlFormattingWhitespace

Html Problem Overview


I have long recognized that any set of whitespace in an HTML file will only be displayed as a single space. For instance, this:

<p>Hello.        Hello. Hello. Hello.                       Hello.</p>

displays as:

Hello. Hello. Hello. Hello. Hello.

This is perfectly fine, as if you need multiple spaces of pre-formatted text you can just use the <pre> tag. But what is the reason? More precisely, why is this in the specification for HTML?

Html Solutions


Solution 1 - Html

Spaces are compacted in HTML because there's a distinction between how HTML is formatted and how it should be rendered. Consider a page like this:

<html>
	<body>
		<a href="mylink">A link</a>
	</body>
</html>

If the HTML was indented using spaces for example, the link would be preceded by several spaces.

Solution 2 - Html

As others have said, it's in the HTML specification.

If you want to preserve whitespace in output, you can use the <pre> tag:

<pre>This     text has              extra spaces

and

    newlines</pre>

But this will also generally display the text in a different font.

Solution 3 - Html

To try to address the "why" it may be because HTML was based on SGML which had specified it that way. It was in turn based on GML from the early 60's. The reason for white space handling could very well be because data was entered one "card" at a time back then which could result in undesired breakup of sentences and paragraphs. One difference in the old GML is that it specified that there has to be two spaces between sentences (like the old typewriter rules) which may have established a precedenct that spaces are independent of the markup.

Solution 4 - Html

Not only is it in the specification, but there is some sense to it. If spaces weren't compacted, you would have to put all your html on a single line. so something like this:

<div>
    <h1>Title</h1>
    <p>
       This is some text
       <a href="#">Read More</a>
    </p>
</div>

Would have some strange alignment with spaces all over the place. The only way to get it right would be to compact that code, which would be difficult to maintain.

Solution 5 - Html

"Why are multiple spaces converted to single spaces?"

First, "why" questions are hard to answer. It's in the spec. That's pretty much the end of it.

Consider that there are several kinds of white space.

  • White space between tags. <p>\n<b>hi</b>\n</p>

  • White space in the content within a tag. <p>Hi <i>everyone</i>.</p>

  • White space in a <pre> or CDATA section.

The first two are hard to distinguish. Whitespace between tags, even in XML, is "optional". But when you have what is called a "mixed content model" -- tags intermixed with content -- the subtlety of "between tags" and "in the content but between tags" and "in the content but not between tags" is impossible to sort out.

So they don't sort it out. Whitespace between tags and whitespace in the content is all optional.

Solution 6 - Html

If browsers did not do this, it could be difficult to format your HTML code to make it easily readable. For example, you might want to format your code like this:

<html>
<body>
    <div>
        I like to indent all content that is inside div tags.
    </div>
</body>
</html>

If the browser does not ignore the eight or so spaces before the text inside the div tag, your webpage might not look the way you intended it to look.

Solution 7 - Html

Usually, these design decisions are not documented in any specification and can only be gleaned from working group discussion archives that happen to be publicly accessible, or explained by the spec authors themselves. However, in this particular case, HTML 3.2 does state the following:

> Except within literal text (e.g. the PRE element), HTML treats contiguous sequences of white space characters as being equivalent to a single space character (ASCII decimal 32). These rules allow authors considerable flexibility when editing the marked-up text directly. Note that future revisions to HTML may allow for the interpretation of the horizontal tab character (ASCII decimal 9) with respect to a tab rule defined by an associated style sheet.

The behavior you see today is of course much more complicated than what was specified in HTML 3.2, but I believe the reasoning still applies. One example of where this flexibility can be useful is when you have a long paragraph that you intend to hard-wrap and indent:

<H1>Lorem ipsum</H1>
<P>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Fastidii oportere
   consulatu no quo. Vix saepe labores an, pri illud mentitum et, ex suas quas
   duo. Sit utinam volutpat ea, id vis cibo meis dolorum, eam docendi
   accommodare voluptatibus no. Id quaeque electram vim, ut sed singulis
   neglegentur, ne graece alterum has. Simul partiendo quaerendum et his.

If whitespace wasn't collapsed, you would end up with a paragraph with unusually large gaps where the text is hard-wrapped due to the indentation.

No other HTML specification suggests any sort of reasoning behind this design decision. In particular HTML 4 only describes the collapsing behavior, and HTML5 and the living spec both defer to CSS, which doesn't explain anything either. Earlier versions of HTML also do not contain any explanation, although the following excerpt does appear in an example snippet in HTML 2.0:

<OL>
...
  <UL COMPACT>
  ...
  <LI> Whitespace may be used to assist in reading the
       HTML source.
  </UL>
...
</OL>

Solution 8 - Html

It's in the HTML spec. It's the part about inter-word spaces being rendered as an ASCII space.

http://www.w3.org/TR/html401/struct/text.html

Solution 9 - Html

Simple, it's in the specification.

From the HTML specification, section 9.1:

> In particular, user agents should > collapse input white space sequences > when producing output inter-word > space.

Solution 10 - Html

The definition/specifications of HTML clearly stated to ignore excess whitespace.

If you want to include extra spaces, use either the <pre> tag or &nbsp;

Solution 11 - Html

To answer why is this in the specification for HTML? you have to consider the origins of HTML.

Tim Berners-Lee designed HTML for sharing of scientific documents. He based it on pre-existing syntax ideas in SGML, which also has similar treatments of whitespace.

One can imagine that earlier writers of HTML at CERN did so without the aid of WYSIWYG tools, and so the ability to treat whitespace in this way aids legibility of such hand-written source files.

Solution 12 - Html

There's also a typographic answer: words and sentences should have only one space between them, regardless of what your typing teacher in school may have told you.

Use One Space Between Sentences

Use A Single Word Space Between Sentences

Solution 13 - Html

You can also use css whitespace:pre; on a <div>, so you can keep your existing formatting and styling.

More about whitespace on https://developer.mozilla.org/fr/docs/Web/CSS/white-space

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRudd ZwolinskiView Question on Stackoverflow
Solution 1 - HtmltristanView Answer on Stackoverflow
Solution 2 - HtmlZach HirschView Answer on Stackoverflow
Solution 3 - HtmlTurnkeyView Answer on Stackoverflow
Solution 4 - HtmlenobrevView Answer on Stackoverflow
Solution 5 - HtmlS.LottView Answer on Stackoverflow
Solution 6 - HtmlMichaelView Answer on Stackoverflow
Solution 7 - HtmlBoltClockView Answer on Stackoverflow
Solution 8 - HtmlChris FarmerView Answer on Stackoverflow
Solution 9 - HtmlcasperOneView Answer on Stackoverflow
Solution 10 - HtmlTheTXIView Answer on Stackoverflow
Solution 11 - HtmlPaul DixonView Answer on Stackoverflow
Solution 12 - HtmlBarry BrownView Answer on Stackoverflow
Solution 13 - Htmlassayag.orgView Answer on Stackoverflow