C/C++ maximum stack size of program on mainstream OSes

C++ Problem Overview

I want to do DFS on a 100 X 100 array. (Say elements of array represents graph nodes) So assuming worst case, depth of recursive function calls can go upto 10000 with each call taking upto say 20 bytes. So is it feasible means is there a possibility of stackoverflow?

What is the maximum size of stack in C/C++?

> Please specify for gcc for both
> 1) cygwin on Windows
> 2) Unix

What are the general limits?

C++ Solutions

Solution 1 - C++

In Visual Studio the default stack size is 1 MB i think, so with a recursion depth of 10,000 each stack frame can be at most ~100 bytes which should be sufficient for a DFS algorithm.

Most compilers including Visual Studio let you specify the stack size. On some (all?) linux flavours the stack size isn't part of the executable but an environment variable in the OS. You can then check the stack size with ulimit -s and set it to a new value with for example ulimit -s 16384.

Here's a link with default stack sizes for gcc.

DFS without recursion:

std::stack<Node> dfs;
dfs.push(start);
do {
    Node top = dfs.top();
    if (top is what we are looking for) {
       break;
    }
    dfs.pop();
    for (outgoing nodes from top) {
        dfs.push(outgoing node);
    }
} while (!dfs.empty())

Solution 2 - C++

Stacks for threads are often smaller. You can change the default at link time, or change at run time also. For reference, some defaults are:

glibc i386, x86_64: 7.4 MB
Tru64 5.1: 5.2 MB
Cygwin: 1.8 MB
Solaris 7..10: 1 MB
MacOS X 10.5: 460 KB
AIX 5: 98 KB
OpenBSD 4.0: 64 KB
HP-UX 11: 16 KB

Solution 3 - C++

Platform-dependent, toolchain-dependent, ulimit-dependent, parameter-dependent.... It is not at all specified, and there are many static and dynamic properties that can influence it.

Solution 4 - C++

Yes, there is a possibility of stack overflow. The C and C++ standard do not dictate things like stack depth, those are generally an environmental issue.

Most decent development environments and/or operating systems will let you tailor the stack size of a process, either at link or load time.

You should specify which OS and development environment you're using for more targeted assistance.

For example, under Ubuntu Karmic Koala, the default for gcc is 2M reserved and 4K committed but this can be changed when you link the program. Use the --stack option of ld to do that.

Solution 5 - C++

I just ran out of stack at work, it was a database and it was running some threads, basically the previous developer had thrown a big array on the stack, and the stack was low anyway. The software was compiled using Microsoft Visual Studio 2015.

Even though the thread had run out of stack, it silently failed and continued on, it only stack overflowed when it came to access the contents of the data on the stack.

The best advice i can give is to not declare arrays on the stack - especially in complex applications and particularly in threads, instead use heap. That's what it's there for ;)

Also just keep in mind it may not fail immediately when declaring the stack, but only on access. My guess is that the compiler declares stack under windows "optimistically", i.e. it will assume that the stack has been declared and is sufficiently sized until it comes to use it and then finds out that the stack isn't there.

Different operating systems may have different stack declaration policies. Please leave a comment if you know what these policies are.

Solution 6 - C++

I am not sure what you mean by doing a depth first search on a rectangular array, but I assume you know what you are doing.

If the stack limit is a problem you should be able to convert your recursive solution into an iterative solution that pushes intermediate values onto a stack which is allocated from the heap.

Solution 7 - C++

(Added 26 Sept. 2020)

On 24 Oct. 2009, as @pixelbeat first pointed out here, Bruno Haible empirically discovered the following default thread stack sizes for several systems. He said that in a multithreaded program, "the default thread stack size is" as follows. I added in the "Actual" size column because @Peter.Cordes indicates in his comments below my answer, however, that the odd tested numbers shown below do not include all of the thread stack, since some of it was used in initialization. If I run ulimit -s to see "the maximum stack size" that my Linux computer is configured for, it outputs 8192 kB, which is exactly 8 MB, not the odd 7.4 MB listed in the table below for my x86-64 computer with the gcc compiler and glibc. So, you can probably add a little to the numbers in the table below to get the actual full stack size for a given thread.

Note also that the below "Tested" column units are all in MB and KB (base 1000 numbers), NOT MiB and KiB (base 1024 numbers). I've proven this to myself by verifying the 7.4 MB case.

Thread stack sizes

System and std library  Tested  Actual
----------------------  ------  ------
- glibc i386, x86_64    7.4 MB  8 MiB (8192 KiB, as shown by `ulimit -s`)
- Tru64 5.1             5.2 MB  ?
- Cygwin                1.8 MB  ?
- Solaris 7..10           1 MB  ?
- MacOS X 10.5          460 KB  ?
- AIX 5                  98 KB  ?
- OpenBSD 4.0            64 KB  ?
- HP-UX 11               16 KB  ?

Bruno Haible also stated that:

> 32 KB is more than you can safely allocate on the stack in a multithreaded program

And he said:

> And the default stack size for sigaltstack, SIGSTKSZ, is > - only 16 KB on some platforms: IRIX, OSF/1, Haiku. > - only 8 KB on some platforms: glibc, NetBSD, OpenBSD, HP-UX, Solaris. > - only 4 KB on some platforms: AIX. > > Bruno

He wrote the following simple Linux C program to empirically determine the above values. You can run it on your system today to quickly see what your maximum thread stack size is, or you can run it online on GDBOnline here: https://onlinegdb.com/rkO9JnaHD.

Explanation: It simply creates a single new thread, so as to check the thread stack size and NOT the program stack size, in case they differ, then it has that thread repeatedly allocate 128 bytes of memory on the stack (NOT the heap), using the Linux alloca() call, after which it writes a 0 to the first byte of this new memory block, and then it prints out how many total bytes it has allocated. It repeats this process, allocating 128 more bytes on the stack each time, until the program crashes with a Segmentation fault (core dumped) error. The last value printed is the estimated maximum thread stack size allowed for your system.

Important note: alloca() allocates on the stack: even though this looks like dynamic memory allocation onto the heap, similar to a malloc() call, alloca() does NOT dynamically allocate onto the heap. Rather, alloca() is a specialized Linux function to "pseudo-dynamically" (I'm not sure what I'd call this, so that's the term I chose) allocate directly onto the stack as though it was statically-allocated memory. Stack memory used and returned by alloca() is scoped at the function-level, and is therefore "automatically freed when the function that called alloca() returns to its caller." That's why its static scope isn't exited and memory allocated by alloca() is NOT freed each time a for loop iteration is completed and the end of the for loop scope is reached. See man 3 alloca for details. Here's the pertinent quote (emphasis added):

> DESCRIPTION
> The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller. > > RETURN VALUE
> The alloca() function returns a pointer to the beginning of the allocated space. If the allocation causes stack overflow, program behavior is undefined.

Here is Bruno Haible's program from 24 Oct. 2009, copied directly from the GNU mailing list here:

Again, you can run it live online here.

// By Bruno Haible
// 24 Oct. 2009
// Source: https://lists.gnu.org/archive/html/bug-coreutils/2009-10/msg00262.html

// =============== Program for determining the default thread stack size =========

#include <alloca.h>
#include <pthread.h>
#include <stdio.h>
void* threadfunc (void*p) {
  int n = 0;
  for (;;) {
    printf("Allocated %d bytes\n", n);
    fflush(stdout);
    n += 128;
    *((volatile char *) alloca(128)) = 0;
  }
}

int main()
{
  pthread_t thread;
  pthread_create(&thread, NULL, threadfunc, NULL);
  for (;;) {}
}

When I run it on GDBOnline using the link above, I get the exact same results each time I run it, as both a C and a C++17 program. It takes about 10 seconds or so to run. Here are the last several lines of the output:

> Allocated 7449856 bytes > Allocated 7449984 bytes > Allocated 7450112 bytes > Allocated 7450240 bytes > Allocated 7450368 bytes > Allocated 7450496 bytes > Allocated 7450624 bytes > Allocated 7450752 bytes > Allocated 7450880 bytes > Segmentation fault (core dumped)

So, the thread stack size is ~7.45 MB for this system, as Bruno mentioned above (7.4 MB).

I've made a few changes to the program, mostly just for clarity, but also for efficiency, and a bit for learning.

Summary of my changes:

[learning] I passed in BYTES_TO_ALLOCATE_EACH_LOOP as an argument to the threadfunc() just for practice passing in and using generic void* arguments in C.
1. Note: This is also the required function prototype, as required by the pthread_create() function, for the callback function (threadfunc() in my case) passed to pthread_create(). See: https://www.man7.org/linux/man-pages/man3/pthread_create.3.html.
[efficiency] I made the main thread sleep instead of wastefully spinning.
[clarity] I added more-verbose variable names, such as BYTES_TO_ALLOCATE_EACH_LOOP and bytes_allocated.

[clarity] I changed this:

 *((volatile char *) alloca(128)) = 0;

to this:

 volatile uint8_t * byte_buff = 
         (volatile uint8_t *)alloca(BYTES_TO_ALLOCATE_EACH_LOOP);
 byte_buff[0] = 0;

Here is my modified test program, which does exactly the same thing as Bruno's, and even has the same results:

You can run it online here, or download it from my repo here. If you choose to run it locally from my repo, here's the build and run commands I used for testing:

Build and run it as a C program:

 mkdir -p bin && \
 gcc -Wall -Werror -g3 -O3 -std=c11 -pthread -o bin/tmp \
 onlinegdb--empirically_determine_max_thread_stack_size_GS_version.c && \
 time bin/tmp

Build and run it as a C++ program:

 mkdir -p bin && \
 g++ -Wall -Werror -g3 -O3 -std=c++17 -pthread -o bin/tmp \
 onlinegdb--empirically_determine_max_thread_stack_size_GS_version.c && \
 time bin/tmp

It takes < 0.5 seconds to run locally on a fast computer with a thread stack size of ~7.4 MB.

Here's the program:

// =============== Program for determining the default thread stack size =========

// Modified by Gabriel Staples, 26 Sept. 2020

// Originally by Bruno Haible
// 24 Oct. 2009
// Source: https://lists.gnu.org/archive/html/bug-coreutils/2009-10/msg00262.html

#include <alloca.h>
#include <pthread.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <unistd.h> // sleep

/// Thread function to repeatedly allocate memory within a thread, printing
/// the total memory allocated each time, until the program crashes. The last
/// value printed before the crash indicates how big a thread's stack size is.
///
/// Note: passing in a `uint32_t` as a `void *` type here is for practice,
/// to learn how to pass in ANY type to a func by using a `void *` parameter.
/// This is also the required function prototype, as required by the
/// `pthread_create()` function, for the callback function (this function)
/// passed to `pthread_create()`. See:
/// https://www.man7.org/linux/man-pages/man3/pthread_create.3.html
void* threadfunc(void* bytes_to_allocate_each_loop)
{
    const uint32_t BYTES_TO_ALLOCATE_EACH_LOOP =
            *(uint32_t*)bytes_to_allocate_each_loop;

    uint32_t bytes_allocated = 0;
    while (true)
    {
        printf("bytes_allocated = %u\n", bytes_allocated);
        fflush(stdout);
        // NB: it appears that you don't necessarily need `volatile` here,
        // but you DO definitely need to actually use (ex: write to) the
        // memory allocated by `alloca()`, as we do below, or else the
        // `alloca()` call does seem to get optimized out on some systems,
        // making this whole program just run infinitely forever without
        // ever hitting the expected segmentation fault.
        volatile uint8_t * byte_buff =
                (volatile uint8_t *)alloca(BYTES_TO_ALLOCATE_EACH_LOOP);
        byte_buff[0] = 0;
        bytes_allocated += BYTES_TO_ALLOCATE_EACH_LOOP;
    }
}

int main()
{
    const uint32_t BYTES_TO_ALLOCATE_EACH_LOOP = 128;

    pthread_t thread;
    pthread_create(&thread, NULL, threadfunc,
                   (void*)(&BYTES_TO_ALLOCATE_EACH_LOOP));
    while (true)
    {
        const unsigned int SLEEP_SEC = 10000;
        sleep(SLEEP_SEC);
    }

    return 0;
}

Sample output (same results as Bruno Haible's original program):

> bytes_allocated = 7450240
> bytes_allocated = 7450368
> bytes_allocated = 7450496
> bytes_allocated = 7450624
> bytes_allocated = 7450752
> bytes_allocated = 7450880
> Segmentation fault (core dumped)

Content Type	Original Author	Original Content on Stackoverflow
Question	avd	View Question on Stackoverflow
Solution 1 - C++	Andreas Brinck	View Answer on Stackoverflow
Solution 2 - C++	pixelbeat	View Answer on Stackoverflow
Solution 3 - C++	DrPizza	View Answer on Stackoverflow
Solution 4 - C++	paxdiablo	View Answer on Stackoverflow
Solution 5 - C++	Owl	View Answer on Stackoverflow
Solution 6 - C++	Dave Kirby	View Answer on Stackoverflow
Solution 7 - C++	Gabriel Staples	View Answer on Stackoverflow