What are the uses of self modifying code?

ExecutableSelf Modifying

Executable Problem Overview


Is there any real use for self modifying code?

I know that they can be used to build worms/viruses, but I was wondering whether there is some good reason that a programmer may have to use self modifying code.

Any ideas? Hypothetical situations are welcome too.

Executable Solutions


Solution 1 - Executable

Turns out that the Wikipedia entry on "self-modifying code" has a great list:

> 1. Semi-automatic optimization of a state dependent loop. > 2. Runtime code generation, or specialization of an algorithm in > runtime or loadtime (which is popular, > for example, in the domain of > real-time graphics) such as a general > sort utility preparing code to perform > the key comparison described in a > specific invocation. > 3. Altering of inlined state of an object, or simulating the high-level > construction of closures. > 4. Patching of subroutine address calling, as done usually at load time > of dynamic libraries, or, on each > invocation patching the subroutine's > internal references to its parameters > so as to use their actual addresses. > Whether this is regarded as > 'self-modifying code' or not is a case > of terminology. > 5. Evolutionary computing systems such as genetic programming. > 6. Hiding of code to prevent reverse engineering, as through use of a > disassembler or debugger. > 7. Hiding of code to evade detection by virus/spyware scanning software and > the like. > 8. Filling 100% of memory (in some architectures) with a rolling pattern > of repeating opcodes, to erase all > programs and data, or to burn-in > hardware. > 9. Compression of code to be decompressed and executed at runtime, > e.g., when memory or disk space is > limited. > 10. Some very limited instruction sets leave no option but to use > self-modifying code to achieve certain > functionality. For example, a "One > Instruction Set Computer" machine that > uses only the > subtract-and-branch-if-negative > "instruction" cannot do an indirect > copy (something like the equivalent of > "*a = **b" in the C programming > language) without using self-modifying > code. > 12. Altering instructions for fault-tolerance

On the point about thwarting hackers using self-modifying code:

Over the course of several firmware updates, DirectTV slowly assembled a program on their smart card to destroy cards that have been hacked to illegally receive unpaid channels. See Jeff's Coding Horror article on the Black Sunday Hack for more information.

Solution 2 - Executable

I've seen self-modifying code used for:

  1. speed optimisation, by having the program write more code for itself on the fly

  2. obsfucation, to make reverse engineering much harder

Solution 3 - Executable

In former times where RAM was limited, self modifying code was used to save memory. Nowadays for example application compression utilities like UPX are used to decompress/modify the own code after loading a compressed image of the application.

Solution 4 - Executable

Because the Commodore 64 doesn't have many registers and has a 1Mhz processor. When you need to read a memory address offset by a value it is easier to modify the source.

@Reader:
LDA $C000
STA $D020
INC Reader+1
JMP Reader

That's the last time I wrote self-modifying code anyway :-)

Solution 5 - Executable

Artificial Intelligence?

Solution 6 - Executable

Because it's really really cool, and sometimes that's reason enough.

Solution 7 - Executable

1960s-era assembly languages used self-modifying code to implement function calls without a stack.

Knuth, v1, 1ed p.182:

MAX100  STJ   EXIT   ;Subroutine linkage
        ENT3  100    ;M1. Initialize
        JMP   2F
1H      CMPA  X,3    ;M3. Compare
        JGE   *+3
2H      ENT2  0,3    ;M4. Change m
        LDA   X,3    ;(New maximum found)
        DEC3  1      ;M5. Decrease k
        J3P   1B     ;M2. All tested?
EXIT    JMP   *      ;Return to main program

> In a larger program containing this coding as a subroutine, the single instruction "JMP MAX100" would cause register A to be set to the current maximum value of locations X + 1 through X + 100, and the position of the maximum would appear in rI2. Subroutine linkage in this case is achieved by the instructions "MAX100 STJ EXIT" and, later, "EXIT JMP *". Because of the way the J-register operates, the exit instruction will then jump to the location following the place where the original reference to MAX100 was made.

Edit: It may be hard to see what's going on, even with the brief explanation here. In the line MAX100 STJ EXIT, MAX100 is a label for the instruction (and thus for the procedure as a whole), STJ means STORE the jump register (where we just came from), EXIT means the memory location labeled 'EXIT' is the target of the STORE. EXIT, we see later is the label for the last instruction. So it's overwriting code! But, many instructions (including STJ here) implicitly overwrite only the operand portion of the instruction word. So the JMP remains untouched, and the * is a dummy token, since there's really nothing meaningful to put there, it'd only get overwritten.


Self-modifying code is also used where register-indirect addressing is not available, and yet the address you need is sitting right there in the register. PDP-1 LISP:

dap .+1  ;deposit address part of accumulator in (IP+1)
lac xy   ;load accumulator with (ADDRESS) [xy is a dummy symbol, just like * above]

These two instructions perform ACC := (ACC) by modifying the operand of the load instruction.

Modifications like these are relatively safe, and on antique architectures, they are necessary.

Solution 8 - Executable

Lots of reasons. Off the top of my head:

  • Runtime class construction and meta programming. For example, having a class factory that takes a connection to an SQL table and generates a client class specialized for that table (with accessors for the columns, find methods, etc.).

  • Then of course there's the famous bitblt example, and the regexp analogs.

  • Dynamically optimizing based on RT information a la tracing JITs

  • Subtype specialization of ada style generic functions in an accretive environment.

-- MarkusQ

Solution 9 - Executable

Dynamic linking is a kind of self-modification (patching absolute and/or relative jump locations) ... that's normally done by the O/S's program loader, though.

Solution 10 - Executable

Neural networks are kind of self-modifying code.

Then there are evolutionary algorithms which modify themselves.

Solution 11 - Executable

Mike Abrash described the Pixomatic code generator for Dr. Dobb's Journal a while back: http://www.ddj.com/architect/184405807 . That's a software 3d dx7(?) compatible rasterizer.

Solution 12 - Executable

LOL - i've written self-modifying code on two occasions:

  1. when first learning assembly language, before i understood indirect indexed access
  2. accidentally, as pointer bugs in assembly language and C

i can imagine that there may be scenarios where self-modifying code would be more efficient than alternatives, but nothing obvious leaps to mind. In general, this is something to avoid - debugging nightmare, etc. - unless you are deliberately trying to obfuscate as mentioned above.

Solution 13 - Executable

Applications which implement their own scripting languages often do this. For example, database servers often compile stored procedures (or queries) this way.

Solution 14 - Executable

Dynamic code generation in SwiftShader is a form of self modifying code that enables it to efficiently implement Direct3D 9 on the CPU.

Solution 15 - Executable

Modern example: I have a script that needs JWT token to work. To request a token interactive login is needed or use a refresh token that is issued with new JWT token. Would be nice to store refresh token in the script and update it each time it is being executed

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNiyazView Question on Stackoverflow
Solution 1 - ExecutableZach ScrivenaView Answer on Stackoverflow
Solution 2 - ExecutableAlnitakView Answer on Stackoverflow
Solution 3 - ExecutableKosi2801View Answer on Stackoverflow
Solution 4 - ExecutablePeter MorrisView Answer on Stackoverflow
Solution 5 - ExecutableAl KatawaziView Answer on Stackoverflow
Solution 6 - ExecutableBruce McGeeView Answer on Stackoverflow
Solution 7 - Executableluser droogView Answer on Stackoverflow
Solution 8 - ExecutableMarkusQView Answer on Stackoverflow
Solution 9 - ExecutableChrisWView Answer on Stackoverflow
Solution 10 - ExecutableGeorg SchöllyView Answer on Stackoverflow
Solution 11 - ExecutableMSNView Answer on Stackoverflow
Solution 12 - ExecutableSteven A. LoweView Answer on Stackoverflow
Solution 13 - ExecutableCraig StuntzView Answer on Stackoverflow
Solution 14 - ExecutableNickView Answer on Stackoverflow
Solution 15 - ExecutableKrzysiek MiniorView Answer on Stackoverflow