Is ASCII code in matter of fact 7 bit or 8 bit?

Character EncodingAscii

Character Encoding Problem Overview


My teacher told me ASCII is an 8-bit character coding scheme. But it is defined only for 0-127 codes which means it can be fitted into 7 bits. So can't it be argued that ASCII is actually a 7-bit code?

And what do we mean to say at all when saying ASCII is a 8-bit code at all?

Character Encoding Solutions


Solution 1 - Character Encoding

ASCII was indeed originally conceived as a 7-bit code. This was done well before 8-bit bytes became ubiquitous, and even into the 1990s you could find software that assumed it could use the 8th bit of each byte of text for its own purposes ("not 8-bit clean"). Nowadays people think of it as an 8-bit coding in which bytes 0x80 through 0xFF have no defined meaning, but that's a retcon.

There are dozens of text encodings that make use of the 8th bit; they can be classified as ASCII-compatible or not, and fixed- or variable-width. ASCII-compatible means that regardless of context, single bytes with values from 0x00 through 0x7F encode the same characters that they would in ASCII. You don't want to have anything to do with a non-ASCII-compatible text encoding if you can possibly avoid it; naive programs expecting ASCII tend to misinterpret them in catastrophic, often security-breaking fashion. They are so deprecated nowadays that (for instance) HTML5 forbids their use on the public Web, with the unfortunate exception of UTF-16. I'm not going to talk about them any more.

A fixed-width encoding means what it sounds like: all characters are encoded using the same number of bytes. To be ASCII-compatible, a fixed-with encoding must encode all its characters using only one byte, so it can have no more than 256 characters. The most common such encoding nowadays is Windows-1252, an extension of ISO 8859-1.

There's only one variable-width ASCII-compatible encoding worth knowing about nowadays, but it's very important: UTF-8, which packs all of Unicode into an ASCII-compatible encoding. You really want to be using this if you can manage it.

As a final note, "ASCII" nowadays takes its practical definition from Unicode, not its original standard (ANSI X3.4-1968), because historically there were several dozen variations on the ASCII 127-character repertoire -- for instance, some of the punctuation might be replaced with accented letters to facilitate the transmission of French text. All of those variations are obsolete, and when people say "ASCII" they mean that the bytes with value 0x00 through 0x7F encode Unicode codepoints U+0000 through U+007F. This will probably only matter to you if you ever find yourself writing a technical standard.

If you're interested in the history of ASCII and the encodings that preceded it, start with the paper "The Evolution of Character Codes, 1874-1968" (samizdat copy at <http://falsedoor.com/doc/ascii_evolution-of-character-codes.pdf>;) and then chase its references (many of which are not available online and may be hard to find even with access to a university library, I regret to say).

Solution 2 - Character Encoding

On Linux man ascii says:

> ASCII is the American Standard Code for Information Interchange. It is a 7-bit code.

Solution 3 - Character Encoding

The original ASCII table is encoded on 7 bits, and therefore it has 128 characters.

Nowadays, most readers/editors use an "extended" ASCII table (from ISO 8859-1), which is encoded on 8 bits and enjoys 256 characters (including Á, Ä, Œ, é, è and other characters useful for European languages as well as mathematical glyphs and other symbols).

While UTF-8 uses the same encoding as the basic ASCII table (meaning 0x41 is A in both codes), it does not share the same encoding for the "Latin Extended-A" block. Which sometimes causes weird characters to appear in words like à la carte or piñata.

Solution 4 - Character Encoding

ASCII encoding is 7-bit, but in practice, characters encoded in ASCII are not stored in groups of 7 bits. Instead, one ASCII is stored in a byte, with the MSB usually set to 0 (yes, it's wasted in ASCII).

You can verify this by inputting a string in the ASCII character set in a text editor, setting the encoding to ASCII, and viewing the binary/hex:
enter image description here

Aside: the use of (strictly) ASCII encoding is now uncommon, in favor of UTF-8 (which does not waste the MSB mentioned above - in fact, an MSB of 1 indicates the code point is encoded with more than 1 byte).

Solution 5 - Character Encoding

The original ASCII code provided 128 different characters numbered 0 to 127. ASCII and 7-bit are synonymous. Since the 8-bit byte is the common storage element, ASCII leaves room for 128 additional characters which are used for foreign languages and other symbols.

But the 7-bit code was original made before the 8-bit code. ASCII stand for American Standard Code for Information Interchange. In early Internet mail systems, it only supported 7-bit ASCII codes.

This was because it then could execute programs and multimedia files over such systems. These systems use 8 bits of the byte, but then it must then be turned into a 7-bit format using coding methods such as MIME, uucoding and BinHex. This means that the 8-bit characters has been converted to 7-bit characters, which adds extra bytes to encode them.

Solution 6 - Character Encoding

When we call ASCII a 7-bit code, the left-most bit is used as the sign bit, so with 7 bits we can write up to 127.

That means from -126 to 127, because the maximum values of ASCII is 0 to 255. This can be only satisfied with the argument of 7 bit if the last bit is considered as the sign bit.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAnurag KaliaView Question on Stackoverflow
Solution 1 - Character EncodingzwolView Answer on Stackoverflow
Solution 2 - Character EncodingBeniBelaView Answer on Stackoverflow
Solution 3 - Character EncodingGuillaumeView Answer on Stackoverflow
Solution 4 - Character Encodingflow2kView Answer on Stackoverflow
Solution 5 - Character EncodingbrookeyView Answer on Stackoverflow
Solution 6 - Character EncodingajuView Answer on Stackoverflow