A way to determine a process's "real" memory usage, i.e. private dirty RSS?

LinuxMacosMemoryMemory ManagementFreebsd

Linux Problem Overview


Tools like 'ps' and 'top' report various kinds of memory usages, such as the VM size and the Resident Set Size. However, none of those are the "real" memory usage:

  • Program code is shared between multiple instances of the same program.
  • Shared library program code is shared between all processes that use that library.
  • Some apps fork off processes and share memory with them (e.g. via shared memory segments).
  • The virtual memory system makes the VM size report pretty much useless.
  • RSS is 0 when a process is swapped out, making it not very useful.
  • Etc etc.

I've found that the private dirty RSS, as reported by Linux, is the closest thing to the "real" memory usage. This can be obtained by summing all Private_Dirty values in /proc/somepid/smaps.

However, do other operating systems provide similar functionality? If not, what are the alternatives? In particular, I'm interested in FreeBSD and OS X.

Linux Solutions


Solution 1 - Linux

On OSX the Activity Monitor gives you actually a very good guess.

Private memory is for sure memory that is only used by your application. E.g. stack memory and all memory dynamically reserved using malloc() and comparable functions/methods (alloc method for Objective-C) is private memory. If you fork, private memory will be shared with you child, but marked copy-on-write. That means as long as a page is not modified by either process (parent or child) it is shared between them. As soon as either process modifies any page, this page is copied before it is modified. Even while this memory is shared with fork children (and it can only be shared with fork children), it is still shown as "private" memory, because in the worst case, every page of it will get modified (sooner or later) and then it is again private to each process again.

Shared memory is either memory that is currently shared (the same pages are visible in the virtual process space of different processes) or that is likely to become shared in the future (e.g. read-only memory, since there is no reason for not sharing read-only memory). At least that's how I read the source code of some command line tools from Apple. So if you share memory between processes using mmap (or a comparable call that maps the same memory into multiple processes), this would be shared memory. However the executable code itself is also shared memory, since if another instance of your application is started there is no reason why it may not share the code already loaded in memory (executable code pages are read-only by default, unless you are running your app in a debugger). Thus shared memory is really memory used by your application, just like private one, but it might additionally be shared with another process (or it might not, but why would it not count towards your application if it was shared?)

Real memory is the amount of RAM currently "assigned" to your process, no matter if private or shared. This can be exactly the sum of private and shared, but usually it is not. Your process might have more memory assigned to it than it currently needs (this speeds up requests for more memory in the future), but that is no loss to the system. If another process needs memory and no free memory is available, before the system starts swapping, it will take that extra memory away from your process and assign it another process (which is a fast and painless operation); therefor your next malloc call might be somewhat slower. Real memory can also be smaller than private and physical memory; this is because if your process requests memory from the system, it will only receive "virtual memory". This virtual memory is not linked to any real memory pages as long as you don't use it (so malloc 10 MB of memory, use only one byte of it, your process will get only a single page, 4096 byte, of memory assigned - the rest is only assigned if you actually ever need it). Further memory that is swapped may not count towards real memory either (not sure about this), but it will count towards shared and private memory.

Virtual memory is the sum of all address blocks that are consider valid in your apps process space. These addresses might be linked to physical memory (that is again private or shared), or they might not, but in that case they will be linked to physical memory as soon as you use the address. Accessing memory addresses outside of the known addresses will cause a SIGBUS and your app will crash. When memory is swapped, the virtual address space for this memory remains valid and accessing those addresses causes memory to be swapped back in.

Conclusion:
If your app does not explicitly or implicitly use shared memory, private memory is the amount of memory your app needs because of the stack size (or sizes if multithreaded) and because of the malloc() calls you made for dynamic memory. You don't have to care a lot for shared or real memory in that case.

If your app uses shared memory, and this includes a graphical UI, where memory is shared between your application and the WindowServer for example, then you might have a look at shared memory as well. A very high shared memory number may mean you have too many graphical resources loaded in memory at the moment.

Real memory is of little interest for app development. If it is bigger than the sum of shared and private, then this means nothing other than that the system is lazy at taken memory away from your process. If it is smaller, then your process has requested more memory than it actually needed, which is not bad either, since as long as you don't use all of the requested memory, you are not "stealing" memory from the system. If it is much smaller than the sum of shared and private, you may only consider to request less memory where possible, as you are a bit over-requesting memory (again, this is not bad, but it tells me that your code is not optimized for minimal memory usage and if it is cross platform, other platforms may not have such a sophisticated memory handling, so you may prefer to alloc many small blocks instead of a few big ones for example, or free memory a lot sooner, and so on).

If you are still not happy with all that information, you can get even more information. Open a terminal and run:

sudo vmmap <pid>

where is the process ID of your process. This will show you statistics for EVERY block of memory in your process space with start and end address. It will also tell you where this memory came from (A mapped file? Stack memory? Malloc'ed memory? A __DATA or __TEXT section of your executable?), how big it is in KB, the access rights and whether it is private, shared or copy-on-write. If it is mapped from a file, it will even give you the path to the file.

If you want only "actual" RAM usage, use

sudo vmmap -resident <pid>

Now it will show for every memory block how big the memory block is virtually and how much of it is really currently present in physical memory.

At the end of each dump is also an overview table with the sums of different memory types. This table looks like this for Firefox right now on my system:

REGION TYPE             [ VIRTUAL/RESIDENT]
===========             [ =======/========]
ATS (font support)      [   33.8M/   2496K]
CG backing stores       [   5588K/   5460K]
CG image                [     20K/     20K]
CG raster data          [    576K/    576K]
CG shared images        [   2572K/   2404K]
Carbon                  [   1516K/   1516K]
CoreGraphics            [      8K/      8K]
IOKit                   [  256.0M/      0K]
MALLOC                  [  256.9M/  247.2M]
Memory tag=240          [      4K/      4K]
Memory tag=242          [     12K/     12K]
Memory tag=243          [      8K/      8K]
Memory tag=249          [    156K/     76K]
STACK GUARD             [  101.2M/   9908K]
Stack                   [   14.0M/    248K]
VM_ALLOCATE             [   25.9M/   25.6M]
__DATA                  [   6752K/   3808K]
__DATA/__OBJC           [     28K/     28K]
__IMAGE                 [   1240K/    112K]
__IMPORT                [    104K/    104K]
__LINKEDIT              [   30.7M/   3184K]
__OBJC                  [   1388K/   1336K]
__OBJC/__DATA           [     72K/     72K]
__PAGEZERO              [      4K/      0K]
__TEXT                  [  108.6M/   63.5M]
__UNICODE               [    536K/    512K]
mapped file             [  118.8M/   50.8M]
shared memory           [    300K/    276K]
shared pmap             [   6396K/   3120K]

What does this tell us? E.g. the Firefox binary and all library it loads have 108 MB data together in their __TEXT sections, but currently only 63 MB of those are currently resident in memory. The font support (ATS) needs 33 MB, but only about 2.5 MB are really in memory. It uses a bit over 5 MB CG backing stores, CG = Core Graphics, those are most likely window contents, buttons, images and other data that is cached for fast drawing. It has requested 256 MB via malloc calls and currently 247 MB are really in mapped to memory pages. It has 14 MB space reserved for stacks, but only 248 KB stack space is really in use right now.

vmmap also has a good summary above the table

ReadOnly portion of Libraries: Total=139.3M resident=66.6M(48%) swapped_out_or_unallocated=72.7M(52%)
Writable regions: Total=595.4M written=201.8M(34%) resident=283.1M(48%) swapped_out=0K(0%) unallocated=312.3M(52%)

And this shows an interesting aspect of the OS X: For read only memory that comes from libraries, it plays no role if it is swapped out or simply unallocated; there is only resident and not resident. For writable memory this makes a difference (in my case 52% of all requested memory has never been used and is such unallocated, 0% of memory has been swapped out to disk).

The reason for that is simple: Read-only memory from mapped files is not swapped. If the memory is needed by the system, the current pages are simply dropped from the process, as the memory is already "swapped". It consisted only of content mapped directly from files and this content can be remapped whenever needed, as the files are still there. That way this memory won't waste space in the swap file either. Only writable memory must first be swapped to file before it is dropped, as its content wasn't stored on disk before.

Solution 2 - Linux

On Linux, you may want the PSS (proportional set size) numbers in /proc/self/smaps. A mapping's PSS is its RSS divided by the number of processes which are using that mapping.

Solution 3 - Linux

Top knows how to do this. It shows VIRT, RES and SHR by default on Debian Linux. VIRT = SWAP + RES. RES = CODE + DATA. SHR is the memory that may be shared with another process (shared library or other memory.)

Also, 'dirty' memory is merely RES memory that has been used, and/or has not been swapped.

It can be hard to tell, but the best way to understand is to look at a system that isn't swapping. Then, RES - SHR is the process exclusive memory. However, that's not a good way of looking at it, because you don't know that the memory in SHR is being used by another process. It may represent unwritten shared object pages that are only used by the process.

Solution 4 - Linux

You really can't.

I mean, shared memory between processes... are you going to count it, or not. If you don't count it, you are wrong; the sum of all processes' memory usage is not going to be the total memory usage. If you count it, you are going to count it twice- the sum's not going to be correct.

Me, I'm happy with RSS. And knowing you can't really rely on it completely...

Solution 5 - Linux

You can get private dirty and private clean RSS from /proc/pid/smaps

Solution 6 - Linux

Take a look at smem. It will give you PSS information

http://www.selenic.com/smem/

Solution 7 - Linux

Reworked this to be much cleaner, to demonstrate some proper best practices in bash, and in particular to use awk instead of bc.

find /proc/ -maxdepth 1 -name '[0-9]*' -print0 | while read -r -d $'\0' pidpath; do
  [ -f "${pidpath}/smaps" ] || continue
  awk '!/^Private_Dirty:/ {next;}
       $3=="kB" {pd += $2 * (1024^1); next}
       $3=="mB" {pd += $2 * (1024^2); next}
       $3=="gB" {pd += $2 * (1024^3); next}
       $3=="tB" {pd += $2 * (1024^4); next}
       $3=="pB" {pd += $2 * (1024^5); next}
       {print "ERROR!!  "$0 >"/dev/stderr"; exit(1)}
       END {printf("%10d: %d\n", '"${pidpath##*/}"', pd)}' "${pidpath}/smaps" || break
done

On a handy little container on my machine, with | sort -n -k 2 to sort the output, this looks like:

        56: 106496
         1: 147456
        55: 155648

Solution 8 - Linux

Use the mincore(2) system call. Quoting the man page:

DESCRIPTION
     The mincore() system call determines whether each of the pages in the
     region beginning at addr and continuing for len bytes is resident.  The
     status is returned in the vec array, one character per page.  Each
     character is either 0 if the page is not resident, or a combination of
     the following flags (defined in <sys/mman.h>):

Solution 9 - Linux

For a question that mentioned Freebsd, surprised no one wrote this yet :

If you want a linux style /proc/PROCESSID/status output, please do the following :

mount -t linprocfs none /proc
cat /proc/PROCESSID/status

Atleast in FreeBSD 7.0, the mounting was not done by default ( 7.0 is a much older release,but for something this basic,the answer was hidden in a mailing list!)

Solution 10 - Linux

Check it out, this is the source code of gnome-system-monitor, it thinks the memory "really used" by one process is sum(info->mem) of X Server Memory(info->memxserver) and Writable Memory(info->memwritable), the "Writable Memory" is the memory blocks which are marked as "Private_Dirty" in /proc/PID/smaps file.

Other than linux system, could be different way according to gnome-system-monitor code.

static void
get_process_memory_writable (ProcInfo *info)
{
    glibtop_proc_map buf;
    glibtop_map_entry *maps;

    maps = glibtop_get_proc_map(&buf, info->pid);

    gulong memwritable = 0;
    const unsigned number = buf.number;

    for (unsigned i = 0; i < number; ++i) {
#ifdef __linux__
        memwritable += maps[i].private_dirty;
#else
        if (maps[i].perm & GLIBTOP_MAP_PERM_WRITE)
            memwritable += maps[i].size;
#endif
    }

    info->memwritable = memwritable;

    g_free(maps);
}

static void
get_process_memory_info (ProcInfo *info)
{
    glibtop_proc_mem procmem;
    WnckResourceUsage xresources;

    wnck_pid_read_resource_usage (gdk_screen_get_display (gdk_screen_get_default ()),
                                  info->pid,
                                  &xresources);

    glibtop_get_proc_mem(&procmem, info->pid);

    info->vmsize	= procmem.vsize;
    info->memres	= procmem.resident;
    info->memshared	= procmem.share;

    info->memxserver = xresources.total_bytes_estimate;

    get_process_memory_writable(info);

    // fake the smart memory column if writable is not available
    info->mem = info->memxserver + (info->memwritable ? info->memwritable : info->memres);
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionHongliView Question on Stackoverflow
Solution 1 - LinuxMeckiView Answer on Stackoverflow
Solution 2 - LinuxJustin L.View Answer on Stackoverflow
Solution 3 - LinuxChrisView Answer on Stackoverflow
Solution 4 - LinuxalexView Answer on Stackoverflow
Solution 5 - LinuxAnkur AgarwalView Answer on Stackoverflow
Solution 6 - LinuxgheeseView Answer on Stackoverflow
Solution 7 - LinuxBMDanView Answer on Stackoverflow
Solution 8 - LinuxEdward Tomasz NapieralaView Answer on Stackoverflow
Solution 9 - LinuxArvindView Answer on Stackoverflow
Solution 10 - Linuxhttp8086View Answer on Stackoverflow