Why is base128 not used?

EncodingLanguage AgnosticBinary

Encoding Problem Overview


Why is only base64 instead of base128 used to transmit binary data on the web? The ASCII character set has 128 characters which in theory could represent base 128, but only base64 but not base128 is used in most cases.

Encoding Solutions


Solution 1 - Encoding

The problem is that at least 32 characters of the ASCII character set are 'control characters' which may be interpreted by the receiving terminal. E.g., there's the BEL (bell) character that makes the receiving terminal chime. There's the SOT (Start Of Transmission) and EOT (End Of Transmission) characters which performs exactly what their names imply. And don't forget the characters CR and LF, which may have special meanings in how data structures are serialized/flattened into a stream.

Adobe created the Base85 encoding to use more characters in the ASCII character set, but AFAIK it's protected by patents.

Solution 2 - Encoding

Because some of those 128 characters are unprintable (mainly those that is below codepoint 0x20). Therefore, they can't reliably be transmitted as a string over the wire. And, if you go above codepoint 128, you can have encoding issues because of different encodings used across systems.

Solution 3 - Encoding

As already stated in the other answers, the key point is to reduce the character set to the printable ones. A more efficient encoding scheme is basE91 because it uses a larger character set and still avoids control/whitespace characters in the low ASCII range. The webpage contains a nice comparison of binary vs. base64 vs. basE91 encoding efficiency.

I once cleaned up the Java implementation. If people are interested I could push it on GitHub.

Update: It's now on GitHub.

Solution 4 - Encoding

That the first 32 characters are control character has absolutely no relevance, because you don't have to use them to get 128 characters. We have 256 characters to choose from, and only the first 32 are control characters. That leaves 192 characters, and therefore 128 is completely possible without using control characters.

Here is the reason: It has to be something that will look the same, and that you can copy and paste, no matter where. Therefor it has to be characters that will be displayed the same on any forum, chat, email and so on. That means that we can't use characters, that a forum/chat/email clients may typically use for formatting or disregard. It also has to be characters that are the same, regardless of font, language and regional settings.

That is the reason!

Solution 5 - Encoding

Base64 is common because it solves a variety of issues (works nearly everywhere you can think of)

  • You don't need to worry whether the transport is 8-bit clean or not.

  • All the characters in the encoding are printable. You can see them. You can copy and paste them. You can use them in URLs (particular variants). etc.

  • Fixed encoding size. You know that m bytes can always encode to n bytes.

  • Everyone has heard of it - it's widely supported, lots of libraries, so easy to interoperate with.

Base128 doesn't have all those advantages.

It looks like it's 8-bit clean - but recall that base64 uses 65 symbols. Without an out-of-band character you can't have the benefits of a fixed encoding size. If you use an out-of-band character, you can't be 8-bit clean anymore.

It's not all negative though.

  • base128 is easier to encode/decode than base64 - you just use shifts and masks. Can be important for embedded implementations

  • base128 makes slightly more efficient use of the transport than base64 by using more of the available bits.

People do use base128 - I'm using it for something now. It's just not as common.

Solution 6 - Encoding

Not sure, but I think the lower values (representing control codes or something) are not reliably transferred as text/characters inside HTTP-requests/responses, and the values above 127 might be locale/codepage/whatever-specific, so there are not 128 different characters that can be expected to work across all browsers/platforms.

Solution 7 - Encoding

esaji is right. Base64 is used to encode binary data for transmission using a protocol that expects only text. It's right in the Wiki entry.

Solution 8 - Encoding

Checkout the base128 PHP-Class. Encoding and decoding with ISO 8859-1 charset.

GoogleCode PHP-Class Base128

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiongmadarView Question on Stackoverflow
Solution 1 - EncodingpepoluanView Answer on Stackoverflow
Solution 2 - EncodingdriisView Answer on Stackoverflow
Solution 3 - EncodingBenedikt WaldvogelView Answer on Stackoverflow
Solution 4 - Encodinguser3119289View Answer on Stackoverflow
Solution 5 - EncodingJohn La RooyView Answer on Stackoverflow
Solution 6 - EncodingesajView Answer on Stackoverflow
Solution 7 - EncodingRussell TroywestView Answer on Stackoverflow
Solution 8 - EncodingseizuView Answer on Stackoverflow