What is the difference between the ARM, Thumb and Thumb 2 instruction encodings?

Arm Problem Overview

I am a bit confused about instruction sets. There are Thumb, ARM and Thumb 2. From what I have read Thumb instructions are all 16-bit but inside the ARMv7M user manual (page vi) there are Thumb 16-bit and Thumb 32-bit instructions mentioned.

Now I have to overcome this confusion. It is said that Thumb 2 supports 16-bit and 32-bit instructions. So is ARMv7M in fact supporting Thumb 2 instructions and not just Thumb?

One more thing. Can I say that Thumb (32-bit) is the same as ARM instructions which are allso 32-bit?

Arm Solutions

Solution 1 - Arm

Oh, ARM and their silly naming...

It's a common misconception, but officially there's no such thing as a "Thumb-2 instruction set".

Ignoring ARMv8 (where everything is renamed and AArch64 complicates things), from ARMv4T to ARMv7-A there are two instruction sets: ARM and Thumb. They are both "32-bit" in the sense that they operate on up-to-32-bit-wide data in 32-bit-wide registers with 32-bit addresses. In fact, where they overlap they represent the exact same instructions - it is only the instruction encoding which differs, and the CPU effectively just has two different decode front-ends to its pipeline which it can switch between. For clarity, I shall now deliberately avoid the terms "32-bit" and "16-bit"...

ARM instructions have fixed-width 4-byte encodings which require 4-byte alignment. Thumb instructions have variable-length (2 or 4-byte, now known as "narrow" and "wide") encodings requiring 2-byte alignment - most instructions have 2-byte encodings, but bl and blx have always had 4-byte encodings^*. The really confusing bit came in ARMv6T2, which introduced "Thumb-2 Technology". Thumb-2 encompassed not just adding a load more instructions to Thumb (mostly with 4-byte encodings) to bring it almost to parity with ARM, but also extending the execution state to allow for conditional execution of most Thumb instructions, and finally introducing a whole new assembly syntax (UAL, "Unified Assembly Language") which replaced the previous separate ARM and Thumb syntaxes and allowed writing code once and assembling it to either instruction set without modification.

The Cortex-M architectures only implement the Thumb instruction set - ARMv7-M (Cortex-M3/M4/M7) supports most of "Thumb-2 Technology", including conditional execution and encodings for VFP instructions, whereas ARMv6-M (Cortex-M0/M0+) only uses Thumb-2 in the form of a handful of 4-byte system instructions.

Thus, the new 4-byte encodings (and those added later in ARMv7 revisions) are still Thumb instructions - the "Thumb-2" aspect of them is that they can have 4-byte encodings, and that they can (mostly) be conditionally executed via it (and, I suppose, that their menmonics are only defined in UAL).

_{* Before ARMv6T2, it was actually a complicated implementation detail as to whether bl (or blx) was executed as a 4-byte instruction or as a pair of 2-byte instructions. The architectural definition was the latter, but since they could only ever be executed as a pair in sequence there was little to lose (other than the ability to take an interrupt halfway through) by fusing them into a single instruction for performance reasons. ARMv6T2 just redefined things in terms of the fused single-instruction execution}

Solution 2 - Arm

In addition to Notlikethat's answer, and as it hints at, ARMv8 introduces some new terminology to try to reduce the confusion (of course adding even more new terminology):

There is a 32-bit execution state (AArch32) and a 64-bit execution state (AArch64).

The 32-bit execution state supports two different instruction sets: T32 ("Thumb") and A32 ("ARM"). The 64-bit execution state supports only one instruction set - A64.

All A64, like all A32, instructions are 32-bit (4 byte) in size, requiring 4-byte alignment.

Many/most A64 instructions can operate on both 32-bit and 64-bit registers (or arguably 32-bit or 64-bit views of the same underlying 64-bit register).

All ARMv8 processors (like all ARMv7 processors) that implement AArch32 support Thumb-2 instructions in the T32 instruction set.

Not all ARMv8-A processors implement AAarch32, and some don't implement AArch64. Some Processors support both, but only support AArch32 at lower exception levels.

Solution 3 - Arm

Thumb: 16 bit instruction set

ARM: 32 bit wide instruction set hence more flexible instructions and less code density

Thumb2 (mixed 16/32 bit): somehow a compromise between ARM and thumb(16) (mixing them), to get both performance/flexibility of ARM and instruction density of Thumb. so a Thumb2 instruction can be either an ARM (only a subset of) with 32 bit wide instruction or a Thumb instruction with 16 bit wide.

Solution 4 - Arm

Please refer to https://developer.arm.com/documentation/ddi0344/c/programmer-s-model/thumb-2-instruction-set It explains in detail about the enhancement of the Thumb2 architecture. The same covers the ARM, Thumb and Thumb2 instruction set description implicitly.

Solution 5 - Arm

It was confusing for me the Cortex M3 having 4-byte instructions, yet not executing the ARM instructions. Or CPUs capable to have 2-byte and 4-byte opcodes, but capable to execute the ARM instructions too. So I read a book about Arm and now I understand it slightly better. Still, the naming and the overlap are still confusing to me. I was thinking it would be interesting to compare a few CPUs first and then talk about the ISAs.

To compare a few CPUs and what they can do and how they overlap:

Cortex M0/M0+/M1/M23 are considered Thumb (Thumb-1) and can execute the 2-byte opcodes which are limited compared to others. However, some instructions such as mrs, msr, bl, dmb, dsb, isb are from Thumb-2 and are 4-byte. The Cortex M0/M0+/M1 are ARMv6, while Cortex M23 is ARMv8. The Thumb-1 instruction was extended in the ARMv7, so it can be said that ARMv8 Cortext M23 supports fuller Thumb-1 (except it instruction) while ARMv6 Cortex M0/M0+ only a subset of the ISA (they are missing specifically it, cbz and cbnz instructions). I might be wrong (please correct me if this is not right), but noticed something funny, that only CPUs I see which support Thumb-1 fully are CPUs that already support Thumb-2 as well, I do not know Thumb-1 only CPU which supports 100% of Thumb-1. I think it's because of the it which could be seen as Thumb-2 opcode which is 2-byte and was in essence added to Thumb-1. On the Thumb-1 CPUs the 4-byte opcodes could be seen as two 2-bytes to represent the 4-byte opcode instead.
Cortex M3/M4/M7/M33/M35P/M55 can execute 2-byte and 4-byte opcodes, both are Thumb-1 and Thumb-2 and support a full set of the ISAs. The 2-byte and 4-byte opcodes are mixed more evenly, while the Cortex M0/M0+/M1/M23 above are biased to use 2-byte opcodes most of the time. Cortex M3/M4/M7 are ARMv7, while Cortex M33/M35P/M55 are ARMv8.
Cortex A/R can accept both ARM and Thumb opcodes and therefore have 2-byte and 4-byte. To switch between the modes the PC needs to be offset by one byte (forcefully unaligned), this can be done for example with branch instruction bx which sets the T bit of the CPSR and switches the mode depending on the lowest bit of address. This works well, for example when calling subroutine the PC (and its mode) get saved, then inside the subroutine it could be switched to Thumb mode, yet when returning from Thumb mode it will restore the PC (and its T-bit) and switches back to whatever the caller was (ARM or Thumb mode) without any issue.
ARM7 only supports ARMv3 4-byte ISA
ARM7T supports both Thumb-1 and ARM ISAs (2-byte and 4-byte)
ARM11 (ARMv6, ARMv6T2, ARMv6Z, ARMv6K) supports Thumb-1, Thumb-2 and ARM ISAs

The book I referenced stated that in the ARMv7 and newer the architecture switched from Von Neumann (data and instructions sharing a bus) to Harvard (dedicated busses) to get better performance. However the absolute term "and newer" is not true, because ARMv8 is newer, yet the ARMv8 Cortex M23 is Von Neumann.

The ISAs are:

ARM has 16 registers (R0-R12, SP, LR, PC), only 4-byte opcodes, there are revisions to the ISA, but they are only 4-byte opcodes.
Thumb (aka Thumb-1) split the 16 registers to lower (R0-R7) and higher (R8-R12, SP, LR, PC), most instructions can access the lower set only, while only some can access the higher set. Only 2-byte opcodes. On low-end devices which have a 16-bit bus (and have to do 32-bit word access in two steps) perform better when they they execute 2-byte opcodes as it's matching their bus. The naming is confusing me the Thumb could be used as the family term for both Thumb-1 together with Thumb-2, or sometimes Thumb can be used for Thumb-1 only. I think the Thumb-1 is not an official Arm term, just something I have seen used by people to make the distinguishment between the Thumb family of both ISAs and the first Thumb ISA clearer. Instructions in ARM can have the optional s suffix to update the CPSR register (for example ands, orrs, movs, adds, subs instruction), while in the Thumb-1 the s is always on and it saves the CPSR register all the time. In some older toolchains the implicit s is not needed, however in the efforts of Unified Assembly Language (UAL) now it's a requirement to explicitly specify the s even when there is no option to not use the s.
Thumb-2 is an extension to Thumb and can access all registers like ARM does, has 4-byte opcodes with some differences compared to ARM. In the assembly, the Thumb-1 2-byte narrow opcode and Thumb-2 4-byte wide opcode can be forced with .n and .w postfix (example orr.w). The ARM and Thumb-2 opcode formats/encodings are different and their capabilities differ too. The conditional execution of instructions can be used, but only when it (if-then) instruction/block is prepended. This can be done explicitly or implied (and done by the toolchain behind the user's back). And the confusion might be actually good as Arm (the company) wanted them to be similar, a lot of effort went to Unified Assembly Language (UAL) so assembly files made for ARM could be compiled on Thumb-2 without change. If I understand this correctly, that can't be 100% guaranteed and some edge cases could probably be made where the ARM assembly can't compile as Thumb-2 and this is another absolute statement that is not fully true. For example the ARM7 bl instruction can address +-32MB while on Cortex M3 it can only +-16MB. The situation such be much better compared to Thumb-1 where the ARM assembly has to be more likely rewritten to target Thumb-1, while ARM to Thumb-2 rewrite is less likely to happen. Another difference are the data processing instructions. Both ARM and Thumb-2 support 8-bit immediates while ARM can rotate bits only to the right and only by even bits, while Thumb can do rotations to left and by even/odd amount of bits and on top of that allows repetitive byte patterns such as 0xXYXYXYXY, 0x00XY00XY or 0xXY00XY00. Because the shifts are rotating, the left and right shifts can be achieved by 'overflowing', shifting so much to one direction that it's effectively a shift to the opposite direction 1 << (32 - n) == 1 >> n

So in conclusion some Arm CPUs can do:

only 4-byte opcode instructions which are pure ARM ISA
2-byte/4-byte Thumb-1/Thumb-2 ISAs with a focus to use the 2-byte most of the time with only a few 4-byte opcodes, these often are labeled as Thumb (Thumb-1) 2-byte opcode CPUs (and the few 4-byte opcodes are sometimes not mentioned)
2-byte/4-byte Thumb-1/Thumb-2 ISAs and are more evenly mixed between 2-byte and 4-byte opcodes, often labeled as Thumb-2
2-byte/4-byte opcodes by switching between ARM/Thumb modes

Reference for this information: ARM Assembly Language Programming & Architecture Muhammad Ali Mazidi et al 2016. The book was written before the company name change from ARM to Arm, so sometimes it was confusing when it was referencing the company Arm and when the ARM ISA.

Content Type	Original Author	Original Content on Stackoverflow
Question	71GA	View Question on Stackoverflow
Solution 1 - Arm	Notlikethat	View Answer on Stackoverflow
Solution 2 - Arm	unixsmurf	View Answer on Stackoverflow
Solution 3 - Arm	ERF4N	View Answer on Stackoverflow
Solution 4 - Arm	Maddy	View Answer on Stackoverflow
Solution 5 - Arm	Anton Krug	View Answer on Stackoverflow