Which functions in the C standard library commonly encourage bad practice?

C Problem Overview

This is inspired by this question and the comments on one particular answer in that I learnt that strncpy is not a very safe string handling function in C and that it pads zeros, until it reaches n, something I was unaware of.

Specifically, to quote R..

> strncpy does not null-terminate, and > does null-pad the whole remainder of > the destination buffer, which is a > huge waste of time. You can work > around the former by adding your own > null padding, but not the latter. It > was never intended for use as a "safe > string handling" function, but for > working with fixed-size fields in Unix > directory tables and database files. > snprintf(dest, n, "%s", src) is the > only correct "safe strcpy" in standard > C, but it's likely to be a lot slower. > By the way, truncation in itself can > be a major bug and in some cases might > lead to privilege elevation or DoS, so > throwing "safe" string functions that > truncate their output at a problem is > not a way to make it "safe" or > "secure". Instead, you should ensure > that the destination buffer is the > right size and simply use strcpy (or > better yet, memcpy if you already know > the source string length).

And from Jonathan Leffler

> Note that strncat() is even more > confusing in its interface than > strncpy() - what exactly is that > length argument, again? It isn't what > you'd expect based on what you supply > strncpy() etc - so it is more error > prone even than strncpy(). For copying > strings around, I'm increasingly of > the opinion that there is a strong > argument that you only need memmove() > because you always know all the sizes > ahead of time and make sure there's > enough space ahead of time. Use > memmove() in preference to any of > strcpy(), strcat(), strncpy(), > strncat(), memcpy().

So, I'm clearly a little rusty on the C standard library. Therefore, I'd like to pose the question:

What C standard library functions are used inappropriately/in ways that may cause/lead to security problems/code defects/inefficiencies?

In the interests of objectivity, I have a number of criteria for an answer:

Please, if you can, cite design reasons behind the function in question i.e. its intended purpose.
Please highlight the misuse to which the code is currently put.
Please state why that misuse may lead towards a problem. I know that should be obvious but it prevents soft answers.

Please avoid:

Debates over naming conventions of functions (except where this unequivocably causes confusion).
"I prefer x over y" - preference is ok, we all have them but I'm interested in actual unexpected side effects and how to guard against them.

As this is likely to be considered subjective and has no definite answer I'm flagging for community wiki straight away.

I am also working as per C99.

C Solutions

Solution 1 - C

> What C standard library functions are used inappropriately/in ways that may cause/lead to security problems/code defects/inefficiencies ?

I'm gonna go with the obvious :

char *gets(char *s);

With its remarkable particularity that it's simply impossible to use it appropriately.

Solution 2 - C

A common pitfall with the strtok() function is to assume that the parsed string is left unchanged, while it actually replaces the separator character with '\0'.

Also, strtok() is used by making subsequent calls to it, until the entire string is tokenized. Some library implementations store strtok()'s internal status in a global variable, which may induce some nasty suprises, if strtok() is called from multiple threads at the same time.

The CERT C Secure Coding Standard lists many of these pitfalls you asked about.

Solution 3 - C

In almost all cases, atoi() should not be used (this also applies to atof(), atol() and atoll()).

This is because these functions do not detect out-of-range errors at all - the standard simply says "If the value of the result cannot be represented, the behavior is undefined.". So the only time they can be safely used is if you can prove that the input will certainly be within range (for example, if you pass a string of length 4 or less to atoi(), it cannot be out of range).

Instead, use one of the strtol() family of functions.

Solution 4 - C

Let us extend the question to interfaces in a broader sense.

errno:

technically it is not even clear what it is, a variable, a macro, an implicit function call? In practice on modern systems it is mostly a macro that transforms into a function call to have a thread specific error state. It is evil:

because it may cause overhead for the caller to access the value, to check the "error" (which might just be an exceptional event)
because it even imposes at some places that the caller clears this "variable" before making a library call
because it implements a simple error return by setting a global state, of the library.

The forthcoming standard gets the definition of errno a bit more straight, but these uglinesses remain

Solution 5 - C

There is often a strtok_r.

For realloc, if you need to use the old pointer, it's not that hard to use another variable. If your program fails with an allocation error, then cleaning up the old pointer is often not really necessary.

Solution 6 - C

I would put printf and scanf pretty high up on this list. The fact that you have to get the formatting specifiers exactly correct makes these functions tricky to use and extremely easy to get wrong. It's also very hard to avoid buffer overruns when reading data out. Moreover, the "printf format string vulnerability" has probably caused countless security holes when well-intentioned programmers specify client-specified strings as the first argument to printf, only to find the stack smashed and security compromised many years down the line.

Solution 7 - C

Any of the functions that manipulate global state, like gmtime() or localtime(). These functions simply can't be used safely in multiple threads.

EDIT: rand() is in the same category it would seem. At least there are no guarantees of thread-safety, and on my Linux system the man page warns that it is non-reentrant and non-threadsafe.

Solution 8 - C

One of my bêtes noire is strtok(), because it is non-reentrant and because it hacks the string it is processing into pieces, inserting NUL at the end of each token it isolates. The problems with this are legion; it is distressingly often touted as a solution to a problem, but is as often a problem itself. Not always - it can be used safely. But only if you are careful. The same is true of most functions, with the notable exception of gets() which cannot be used safely.

Solution 9 - C

There's already one answer about realloc, but I have a different take on it. A lot of time, I've seen people write realloc when they mean free; malloc - in other words, when they have a buffer full of trash that needs to change size before storing new data. This of course leads to potentially-large, cache-thrashing memcpy of trash that's about to be overwritten.

If used correctly with growing data (in a way that avoids worst-case O(n^2) performance for growing an object to size n, i.e. growing the buffer geometrically instead of linearly when you run out of space), realloc has doubtful benefit over simply doing your own new malloc, memcpy, and free cycle. The only way realloc can ever avoid doing this internally is when you're working with a single object at the top of the heap.

If you like to zero-fill new objects with calloc, it's easy to forget that realloc won't zero-fill the new part.

And finally, one more common use of realloc is to allocate more than you need, then resize the allocated object down to just the required size. But this can actually be harmful (additional allocation and memcpy) on implementations that strictly segregate chunks by size, and in other cases might increase fragmentation (by splitting off part of a large free chunk to store a new small object, instead of using an existing small free chunk).

I'm not sure if I'd say realloc encourages bad practice, but it's a function I'd watch out for.

Solution 10 - C

How about the malloc family in general? The vast majority of large, long-lived programs I've seen use dynamic memory allocation all over the place as if it were free. Of course real-time developers know this is a myth, and careless use of dynamic allocation can lead to catastrophic blow-up of memory usage and/or fragmentation of address space to the point of memory exhaustion.

In some higher-level languages without machine-level pointers, dynamic allocation is not so bad because the implementation can move objects and defragment memory during the program's lifetime, as long as it can keep references to these objects up-to-date. A non-conventional C implementation could do this too, but working out the details is non-trivial and it would incur a very significant cost in all pointer dereferences and make pointers rather large, so for practical purposes, it's not possible in C.

My suspicion is that the correct solution is usually for long-lived programs to perform their small routine allocations as usual with malloc, but to keep large, long-lived data structures in a form where they can be reconstructed and replaced periodically to fight fragmentation, or as large malloc blocks containing a number of structures that make up a single large unit of data in the application (like a whole web page presentation in a browser), or on-disk with a fixed-size in-memory cache or memory-mapped files.

Solution 11 - C

On a wholly different tack, I've never really understood the benefits of atan() when there is atan2(). The difference is that atan2() takes two arguments, and returns an angle anywhere in the range -π..+π. Further, it avoids divide by zero errors and loss of precision errors (dividing a very small number by a very large number, or vice versa). By contrast, the atan() function only returns a value in the range -π/2..+π/2, and you have to do the division beforehand (I don't recall a scenario where atan() could be used without there being a division, short of simply generating a table of arctangents). Providing 1.0 as the divisor for atan2() when given a simple value is not pushing the limits.

Solution 12 - C

Another answer, since these are not really related, rand:

it is of unspecified random quality
it is not re-entrant

Solution 13 - C

Some of this functions are modifying some global state. (In windows) this state is shared per single thread - you can get unexpected result. For example, the first call of rand in every thread will give the same result, and it requires some care to make it pseudorandom, but deterministic (for debug purposes).

Solution 14 - C

basename() and dirname() aren't threadsafe.

Content Type	Original Author	Original Content on Stackoverflow
Question	user257111	View Question on Stackoverflow
Solution 1 - C	icecrime	View Answer on Stackoverflow
Solution 2 - C	makes	View Answer on Stackoverflow
Solution 3 - C	caf	View Answer on Stackoverflow
Solution 4 - C	Jens Gustedt	View Answer on Stackoverflow
Solution 5 - C	davep	View Answer on Stackoverflow
Solution 6 - C	templatetypedef	View Answer on Stackoverflow
Solution 7 - C	j_random_hacker	View Answer on Stackoverflow
Solution 8 - C	Jonathan Leffler	View Answer on Stackoverflow
Solution 9 - C	R.. GitHub STOP HELPING ICE	View Answer on Stackoverflow
Solution 10 - C	R.. GitHub STOP HELPING ICE	View Answer on Stackoverflow
Solution 11 - C	Jonathan Leffler	View Answer on Stackoverflow
Solution 12 - C	Jens Gustedt	View Answer on Stackoverflow
Solution 13 - C	crazylammer	View Answer on Stackoverflow
Solution 14 - C	arsenm	View Answer on Stackoverflow