how does malloc understand alignment?

C++CMemory Alignment

C++ Problem Overview


following excerpted from here

> pw = (widget *)malloc(sizeof(widget)); > > allocates raw storage. Indeed, the malloc call allocates storage > that's big enough and suitably aligned to hold an object of type > widget

also see fast pImpl from herb sutter, he said:

> Alignment. Any memory Alignment. Any memory that's allocated > dynamically via new or malloc is guaranteed to be properly aligned for > objects of any type, but buffers that are not allocated dynamically > have no such guarantee

I am curious about this, how does malloc know alignment of the custom type?

C++ Solutions


Solution 1 - C++

Alignment requirements are recursive: The alignment of any struct is simply the largest alignment of any of its members, and this is understood recursively.

For example, and assuming that each fundamental type's alignment equals its size (this is not always true in general), the struct X { int; char; double; } has the alignment of double, and it will be padded to be a multiple of the size of double (e.g. 4 (int), 1 (char), 3 (padding), 8 (double)). The struct Y { int; X; float; } has the alignment of X, which is the largest and equal to the alignment of double, and Y is laid out accordingly: 4 (int), 4 (padding), 16 (X), 4 (float), 4 (padding).

(All numbers are just examples and could differ on your machine.)

Therefore, by breaking it down to the fundamental types, we only need to know a handful of fundamental alignments, and among those there is a well-known largest. C++ even defines a type max_align_t whose alignment is that largest alignment.

All malloc() needs to do is to pick an address that's a multiple of that value.

Solution 2 - C++

I think the most relevant part of the Herb Sutter quote is the part I've marked in bold:

> Alignment. Any memory Alignment. Any memory that's allocated dynamically via new or malloc is guaranteed to be properly aligned for objects of any type, but buffers that are not allocated dynamically have no such guarantee

It doesn't have to know what type you have in mind, because it's aligning for any type. On any given system, there's a maximum alignment size that's ever necessary or meaningful; for example, a system with four-byte words will likely have a maximum of four-byte alignment.

This is also made clear by the malloc(3) man-page, which says in part:

> The malloc() and calloc() functions return a pointer to the allocated memory that is suitably aligned for any kind of variable.

Solution 3 - C++

The only information that malloc() can use is the size of the request passed to it. In general, it might do something like round up the passed size to the nearest greater (or equal) power of two, and align the memory based on that value. There would likely also be an upper bound on the alignment value, such as 8 bytes.

The above is a hypothetical discussion, and the actual implementation depends on the machine architecture and runtime library that you're using. Maybe your malloc() always returns blocks aligned on 8 bytes and it never has to do anything different.

Solution 4 - C++

  1. Align to the least common multiple of all alignments. e.g. if ints require 4 byte alignment, but pointers require 8, then allocate everything to 8 byte alignment. This causes everything to be aligned.

  2. Use the size argument to determine correct alignment. For small sizes you can infer the type, such as malloc(1) (assuming other types sizes are not 1) is always a char. C++ new has the benefit of being type safe and so can always make alignment decisions this way.

Solution 5 - C++

Previous to C++11 alignment was treated fairly simple by using the largest alignment where exact value was unknown and malloc/calloc still work this way. This means malloc allocation is correctly aligned for any type.

Wrong alignment may result in undefined behavior according to the standard but I have seen x86 compilers being generous and only punishing with lower performance.

Note that you also can tweak alignment via compiler options or directives. (pragma pack for VisualStudio for example).

But when it comes to placement new, then C++11 brings us new keywords called alignof and alignas. Here is some code which shows the effect if compiler max alignment is greater then 1. The first placement new below is automatically good but not the second.

#include <iostream>
#include <malloc.h>
using namespace std;
int main()
{
        struct A { char c; };
        struct B { int i; char c; };

        unsigned char * buffer = (unsigned char *)malloc(1000000);
        long mp = (long)buffer;

        // First placment new
        long alignofA = alignof(A) - 1;
        cout << "alignment of A: " << std::hex << (alignofA + 1) << endl;
        cout << "placement address before alignment: " << std::hex << mp << endl;
        if (mp&alignofA)
        {
            mp |= alignofA;
            ++mp;
        }
        cout << "placement address after alignment : " << std::hex <<mp << endl;
        A * a = new((unsigned char *)mp)A;
        mp += sizeof(A);

        // Second placment new
        long alignofB = alignof(B) - 1;
        cout << "alignment of B: " <<  std::hex << (alignofB + 1) << endl;
        cout << "placement address before alignment: " << std::hex << mp << endl;
        if (mp&alignofB)
        {
            mp |= alignofB;
            ++mp;
        }
        cout << "placement address after alignment : " << std::hex << mp << endl;
        B * b = new((unsigned char *)mp)B;
        mp += sizeof(B);
}

I guess performance of this code can be improved with some bitwise operations.

EDIT: Replaced expensive modulo computation with bitwise operations. Still hoping that somebody finds something even faster.

Solution 6 - C++

malloc has no knowledge of what it is allocating for because its parameter is just total size. It just aligns to an alignment that is safe for any object.

Solution 7 - C++

You might find out the allocation bits for your malloc()-implementation with this small C-program:

#include <stdlib.h>
#include <stdio.h>

int main()
{
	size_t
		find = 0,
		size;
	for( unsigned i = 1000000; i--; )
		if( size = rand() & 127 )
			find |= (size_t)malloc( size );
	char bits = 0;
	for( ; !(find & 1); find >>= 1, ++bits );
	printf( "%d", (int)bits );
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionChangView Question on Stackoverflow
Solution 1 - C++Kerrek SBView Answer on Stackoverflow
Solution 2 - C++ruakhView Answer on Stackoverflow
Solution 3 - C++Greg HewgillView Answer on Stackoverflow
Solution 4 - C++PubbyView Answer on Stackoverflow
Solution 5 - C++Patrick FrombergView Answer on Stackoverflow
Solution 6 - C++John PaulView Answer on Stackoverflow
Solution 7 - C++Bonita MonteroView Answer on Stackoverflow