Why is x86 ugly? Why is it considered inferior when compared to others?

Assembly X86 Mips X86 64 Cpu Architecture

Assembly Problem Overview

I've been reading some SO archives and encountered statements against the x86 architecture.

https://stackoverflow.com/questions/2667256 says
"PC architecture is a mess, any OS developer would tell you that."
~~Is learning Assembly Language worth the effort?~~ (archived) says
"Realize that the x86 architecture is horrible at best"
Any easy way to learn x86 assembler? says
"Most colleges teach assembly on something like MIPS because it's much simpler to understand, x86 assembly is really ugly"

and many more comments like

I tried searching but didn't find any reasons. I don't find x86 bad probably because this is the only architecture I'm familiar with.

Can someone kindly give me reasons for considering x86 ugly/bad/inferior compared to others.

Assembly Solutions

Solution 1 - Assembly

Couple of possible reasons for it:

x86 is a relatively old ISA (its progenitors were 8086s, after all)
x86 has evolved significantly several times, but hardware is required to maintain backwards compatibility with old binaries. For example, modern x86 hardware still contains support for running 16 bit code natively. Additionally, several memory-addressing models exist to allow older code to inter-operate on the same processor, such as real mode, protected mode, virtual 8086 mode, and (amd64) long mode. This can be confusing to some.
x86 is a CISC machine. For a long time this meant it was slower than RISC machines like MIPS or ARM, because instructions have data interdependency and flags making most forms of instruction level parallelism difficult to implement. Modern implementations translate the x86 instructions into RISC-like instructions called "micro-ops" under the covers to make these kinds of optimizations practical to implement in hardware.
In some respects, the x86 isn't inferior, it's just different. For example, input/output is handled as memory mapping on the vast majority of architectures, but not on the x86. (NB: Modern x86 machines typically have some form of DMA support, and communicate with other hardware through memory mapping; but the ISA still has I/O instructions like IN and OUT)
The x86 ISA has a very few architectural registers, which can force programs to round-trip through memory more frequently than would otherwise be necessary. The extra instructions needed to do this take execution resources that could be spent on useful work, although efficient store-forwarding keeps the latency low. Modern implementations with register renaming onto a large physical register file can keep many instructions in flight, but lack of architectural registers was still a significant weakness for 32-bit x86. x86-64's increase from 8 to 16 integer and vector registers is one of the biggest factors in 64bit code being faster than 32-bit (along with the more efficient register-call ABI), not the increased width of each register. A further increase from 16 to 32 integer registers would help some, but not as much. (AVX512 does increase to 32 vector registers, though, because floating-point code has higher latency and often needs more constants.) (see comment)
x86 assembly code is complicated because x86 is a complicated architecture with many features. An instruction listing for a typical MIPS machine fits on a single letter sized piece of paper. The equivalent listing for x86 fills several pages, and the instructions just do more, so you often need a bigger explanation of what they do than a listing can provide. For example, the MOVSB instruction needs a relatively large block of C code to describe what it does:
```
 if (DF==0) 
   *(byte*)DI++ = *(byte*)SI++; 
 else 
   *(byte*)DI-- = *(byte*)SI--;
```

That's a single instruction doing a load, a store, and two adds or subtracts (controlled by a flag input), each of which would be separate instructions on a RISC machine.

While MIPS (and similar architectures) simplicity doesn't necessarily make them superior, for teaching an introduction to assembler class it makes sense to start with a simpler ISA. Some assembly classes teach an ultra-simplified subset of x86 called y86, which is simplified beyond the point of not being useful for real use (e.g. no shift instructions), or some teach just the basic x86 instructions. 7. The x86 uses variable-length opcodes, which add hardware complexity with respect to the parsing of instructions. In the modern era this cost is becoming vanishingly small as CPUs become more and more limited by memory bandwidth than by raw computation, but many "x86 bashing" articles and attitudes come from an era when this cost was comparatively much larger.
Update 2016: Anandtech has posted a discussion regarding opcode sizes under x64 and AArch64.

EDIT: This is not supposed to be a bash the x86! party. I had little choice but to do some amount of bashing given the way the question's worded. But with the exception of (1), all these things were done for good reasons (see comments). Intel designers aren't stupid -- they wanted to achieve some things with their architecture, and these are some of the taxes they had to pay to make those things a reality.

Solution 2 - Assembly

The main knock against x86 in my mind is its CISC origins - the instruction set contains a lot of implicit interdependencies. These interdependencies make it difficult to do things like instruction reordering on the chip, because the artifacts and semantics of those interdependencies must be preserved for each instruction.

For example, most x86 integer add & subtract instructions modify the flags register. After performing an add or subtract, the next operation is often to look at the flags register to check for overflow, sign bit, etc. If there's another add after that, it's very difficult to tell whether it's safe to begin execution of the 2nd add before the outcome of the 1st add is known.

On a RISC architecture, the add instruction would specify the input operands and the output register(s), and everything about the operation would take place using only those registers. This makes it much easier to decouple add operations that are near each other because there's no bloomin' flags register forcing everything to line up and execute single file.

The DEC Alpha AXP chip, a MIPS style RISC design, was painfully spartan in the instructions available, but the instruction set was designed to avoid inter-instruction implicit register dependencies. There was no hardware-defined stack register. There was no hardware-defined flags register. Even the instruction pointer was OS defined - if you wanted to return to the caller, you had to work out how the caller was going to let you know what address to return to. This was usually defined by the OS calling convention. On the x86, though, it's defined by the chip hardware.

Anyway, over 3 or 4 generations of Alpha AXP chip designs, the hardware went from being a literal implementation of the spartan instruction set with 32 int registers and 32 float registers to a massively out of order execution engine with 80 internal registers, register renaming, result forwarding (where the result of a previous instruction is forwarded to a later instruction that is dependent on the value) and all sorts of wild and crazy performance boosters. And with all of those bells and whistles, the AXP chip die was still considerably smaller than the comparable Pentium chip die of that time, and the AXP was a hell of a lot faster.

You don't see those kinds of bursts of performance boosting things in the x86 family tree largely because the x86 instruction set's complexity makes many kinds of execution optimizations prohibitively expensive if not impossible. Intel's stroke of genius was in giving up on implementing the x86 instruction set in hardware anymore - all modern x86 chips are actually RISC cores that to a certain degree interpret the x86 instructions, translating them into internal microcode which preserves all the semantics of the original x86 instruction, but allows for a little bit of that RISC out-of-order and other optimizations over the microcode.

I've written a lot of x86 assembler and can fully appreciate the convenience of its CISC roots. But I didn't fully appreciate just how complicated x86 was until I spent some time writing Alpha AXP assembler. I was gobsmacked by AXP's simplicity and uniformity. The differences are enormous, and profound.

Solution 3 - Assembly

The x86 architecture dates from the design of the 8008 microprocessor and relatives. These CPUs were designed in a time when memory was slow and if you could do it on the CPU die, it was often a lot faster. However, CPU die-space was also expensive. These two reasons are why there are only a small number of registers that tend to have special purposes, and a complicated instruction set with all sorts of gotchas and limitations.

Other processors from the same era (e.g. the 6502 family) also have similar limitations and quirks. Interestingly, both the 8008 series and the 6502 series were intended as embedded controllers. Even back then, embedded controllers were expected to be programmed in assembler and in many ways catered to the assembly programmer rather than the compiler writer. (Look at the VAX chip for what happens when you cater to the compiler write.) The designers didn't expect them to become general purpose computing platforms; that's what things like the predecessors of the POWER archicture were for. The Home Computer revolution changed that, of course.

Solution 4 - Assembly

I have a few additional aspects here:

Consider the operation "a=b/c" x86 would implement this as

  mov eax,b
  xor edx,edx
  div dword ptr c
  mov a,eax

As an additional bonus of the div instruction edx will contain the remainder.

A RISC processor would require first loading the addresses of b and c, loading b and c from memory to registers, doing the division and loading the address of a and then storing the result. Dst,src syntax:

  mov r5,addr b
  mov r5,[r5]
  mov r6,addr c
  mov r6,[r6]
  div r7,r5,r6
  mov r5,addr a
  mov [r5],r7

Here there typically won't be a remainder.

If any variables are to be loaded through pointers both sequences may become longer though this is less of a possibility for the RISC because it may have one or more pointers already loaded in another register. x86 has fewer register so the likelihood of the pointer being in one of them is smaller.

Pros and cons:

The RISC instructionss may be mixed with surrounding code to improve instruction scheduling, this is less of a possibility with x86 which instead does this work (more or less well depending on the sequence) inside the CPU itself. The RISC sequence above will typically be 28 bytes long (7 instructions of 32-bit/4 byte width each) on a 32-bit architecture. This will cause the off-chip memory to work more when fetching the instructions (seven fetches). The denser x86 sequence contains fewer instructions and though their widths vary you're probably looking at an average of 4 bytes/instruction there too. Even if you have instruction caches to speed this up seven fetches means that you will have a deficit of three elsewhere to make up for compared to the x86.

The x86 architecture with fewer registers to save/restore means that it will probably do thread switches and handle interrupts faster than RISC. More registers to save and restore requires more temporary RAM stack space to do interrupts and more permanent stack space to store thread states. These aspects should make x86 a better candidate for running pure RTOS.

On a more personal note I find it more difficult to write RISC assembly than x86. I solve this by writing the RISC routine in C, compiling and modifying the generated code. This is more efficient from a code production standpoint and probably less efficient from an execution standpoint. All those 32 registers to keep track of. With x86 it is the other way around: 6-8 registers with "real" names makes the problem more manageable and instills more confidence that the code produced will work as expected.

Ugly? That's in the eye of the beholder. I prefer "different."

Solution 5 - Assembly

I think this question has a false assumption. It's mainly just RISC-obsessed academics who call x86 ugly. In reality, the x86 ISA can do in a single instruction operations which would take 5-6 instructions on RISC ISAs. RISC fans may counter that modern x86 CPUs break these "complex" instructions down into microops; however:

In many cases that's only partially true or not true at all. The most useful "complex" instructions in x86 are things like mov %eax, 0x1c(%esp,%edi,4) i.e. addressing modes, and these are not broken down.
What's often more important on modern machines is not the number of cycles spent (because most tasks are not cpu-bound) but the instruction cache impact of code. 5-6 fixed-size (usually 32bit) instructions will impact the cache a lot more than one complex instruction that's rarely more than 5 bytes.

x86 really absorbed all the good aspects of RISC about 10-15 years ago, and the remaining qualities of RISC (actually the defining one - the minimal instruction set) are harmful and undesirable.

Aside from the cost and complexity of manufacturing CPUs and their energy requirements, x86 is the best ISA. Anyone who tells you otherwise is letting ideology or agenda get in the way of their reasoning.

On the other hand, if you are targetting embedded devices where the cost of the CPU counts, or embedded/mobile devices where energy consumption is a top concern, ARM or MIPS probably makes more sense. Keep in mind though you'll still have to deal with the extra ram and binary size needed to handle code that's easily 3-4 times larger, and you won't be able to get near the performance. Whether this matters depends a lot on what you'll be running on it.

Solution 6 - Assembly

x86 assembler language isn't so bad. It's when you get to the machine code that it starts to get really ugly. Instruction encodings, addressing modes, etc are much more complicated than the ones for most RISC CPUs. And there's extra fun built in for backward compatibility purposes -- stuff that only kicks in when the processor is in a certain state.

In 16-bit modes, for example, addressing can seem downright bizarre; there's an addressing mode for [BX+SI], but not one for [AX+BX]. Things like that tend to complicate register usage, since you need to ensure your value's in a register that you can use as you need to.

(Fortunately, 32-bit mode is much saner (though still a bit weird itself at times -- segmentation for example), and 16-bit x86 code is largely irrelevant anymore outside of boot loaders and some embedded environments.)

There's also the leftovers from the olden days, when Intel was trying to make x86 the ultimate processor. Instructions a couple of bytes long that performed tasks that no one actually does any more, cause they were frankly too freaking slow or complicated. The ENTER and LOOP instructions, for two examples -- note the C stack frame code is like "push ebp; mov ebp, esp" and not "enter" for most compilers.

Solution 7 - Assembly

I'm not an expert, but it seems that many of the features why people don't like it can be the reasons it performs well. Several years ago, having registers (instead of a stack), register frames, etc. were seen as nice solutions for making the architecture seem simpler to humans. However, nowadays, what matters is cache performance, and x86's variable-length words allow it to store more instructions in cache. The "instruction decode", which I believe opponents pointed out once took up half the chip, is not nearly so much that way anymore.

I think parallelism is one of the most important factors nowadays -- at least for algorithms that already run fast enough to be usable. Expressing high parallelism in software allows the hardware to amortize (or often completely hide) memory latencies. Of course, the farther reaching architecture future is probably in something like quantum computing.

I have heard from nVidia that one of Intel's mistakes was that they kept the binary formats close to the hardware. CUDA's PTX does some fast register use calculations (graph coloring), so nVidia can use a register machine instead of a stack machine, but still have an upgrade path that doesn't break all old software.

Solution 8 - Assembly

Besides the reasons people have already mentioned:

x86-16 had a rather strange memory addressing scheme which allowed a single memory location to be addressed in up to 4096 different ways, limited RAM to 1 MB, and forced programmers to deal with two different sizes of pointers. Fortunately, the move to 32-bit made this feature unnecessary, but x86 chips still carry the cruft of segment registers.
While not a fault of x86 per se, x86 calling conventions weren't standardized like MIPS was (mostly because MS-DOS didn't come with any compilers), leaving us with the mess of __cdecl, __stdcall, __fastcall, etc.

Solution 9 - Assembly

I think you'll get to part of the answer if you ever try to write a compiler that targets x86, or if you write an x86 machine emulator, or even if you try to implement the ISA in a hardware design.

Although I understand the "x86 is ugly!" arguments, I still think it's more fun writing x86 assembly than MIPS (for example) - the latter is just plain tedious. It was always meant to be nice to compilers rather than to humans. I'm not sure a chip could be more hostile to compiler writers if it tried...

The ugliest part for me is the way (real-mode) segmentation works - that any physical address has 4096 segment:offset aliases. When last did you need that? Things would have been so much simpler if the segment part were strictly higher-order bits of a 32-bit address.

Solution 10 - Assembly

x86 has a very, very limited set of general purpose registers
it promotes a very inefficient style of development on the lowest level (CISC hell) instead of an efficient load / store methodology
Intel made the horrifying decision to introduce the plainly stupid segment / offset - memory adressing model to stay compatible with (at this time already!) outdated technology
At a time when everyone was going 32 bit, the x86 held back the mainstream PC world by being a meager 16 bit (most of them - the 8088 - even only with 8 bit external data paths, which is even scarier!) CPU

For me (and I'm a DOS veteran that has seen each and every generation of PCs from a developers perspective!) point 3. was the worst.

Imagine the following situation we had in the early 90s (mainstream!):

a) An operating system that had insane limitations for legacy reasons (640kB of easily accessible RAM) - DOS

b) An operating system extension (Windows) that could do more in terms of RAM, but was limited when it came to stuff like games, etc... and was not the most stable thing on Earth (luckily this changed later, but I'm talking about the early 90s here)

c) Most software was still DOS and we had to create boot disks often for special software, because there was this EMM386.exe that some programs liked, others hated (especially gamers - and I was an AVID gamer at this time - know what I'm talking about here)

d) We were limited to MCGA 320x200x8 bits (ok, there was a bit more with special tricks, 360x480x8 was possible, but only without runtime library support), everything else was messy and horrible ("VESA" - lol)

e) But in terms of hardware we had 32 bit machines with quite a few megabytes of RAM and VGA cards with support of up to 1024x768

Reason for this bad situation?

A simple design decision by Intel. Machine instruction level (NOT binary level!) compatibility to something that was already dying, I think it was the 8085. The other, seemingly unrelated problems (graphic modes, etc...) were related for technical reasons and because of the very narrow minded architecture the x86 platform brought with itself.

Today, the situation is different, but ask any assembler developer or people who build compiler backends for the x86. The insanely low number of general purpose registers is nothing but a horrible performance killer.

Content Type	Original Author	Original Content on Stackoverflow
Question	claws	View Question on Stackoverflow
Solution 1 - Assembly	Billy ONeal	View Answer on Stackoverflow
Solution 2 - Assembly	dthorpe	View Answer on Stackoverflow
Solution 3 - Assembly	staticsan	View Answer on Stackoverflow
Solution 4 - Assembly	Olof Forshell	View Answer on Stackoverflow
Solution 5 - Assembly	R.. GitHub STOP HELPING ICE	View Answer on Stackoverflow
Solution 6 - Assembly	cHao	View Answer on Stackoverflow
Solution 7 - Assembly	gatoatigrado	View Answer on Stackoverflow
Solution 8 - Assembly	dan04	View Answer on Stackoverflow
Solution 9 - Assembly	Bernd Jendrissek	View Answer on Stackoverflow
Solution 10 - Assembly	Turing Complete	View Answer on Stackoverflow