How Random is System.Guid.NewGuid()? (Take two)

.NetRandomGuid

.Net Problem Overview


Before you start marking this as a duplicate, read me out. The other question has a (most likely) incorrect accepted answer.

I do not know how .NET generates its GUIDs, probably only Microsoft does, but there's a high chance it simply calls CoCreateGuid(). That function however is documented to be calling UuidCreate(). And the algorithms for creating an UUID are pretty well documented.

Long story short, be as it may, it seems that System.Guid.NewGuid() indeed uses version 4 UUID generation algorithm, because all the GUIDs it generates matches the criteria (see for yourself, I tried a couple million GUIDs, they all matched).

In other words, these GUIDs are almost random, except for a few known bits.

This then again raises the question - how random IS this random? As every good little programmer knows, a pseudo-random number algorithm is only as random as its seed (aka entropy). So what is the seed for UuidCreate()? How ofter is the PRNG re-seeded? Is it cryptographically strong, or can I expect the same GUIDs to start pouring out if two computers accidentally call System.Guid.NewGuid() at the same time? And can the state of the PRNG be guessed if sufficiently many sequentially generated GUIDs are gathered?

Added: To clarify, I'd like to find out how random can I trust it to be and thus - where can I use it. So, let's establish a rough "randomness" scale here:

  1. Basic randomness, taking current time as the seed. Usable for shuffling cards in Solitaire but little else as collisions are too easy to come by even without trying.
  2. More advanced randomness, using not only the time but other machine-specific factors for seed. Perhaps also seeded only once on system startup. This can be used for generating IDs in a DB because duplicates are unlikely. Still, it's not good for security because the results can be predicted with sufficient effort.
  3. Cryptograhpically random, using device noise or other advanced sources of randomness for seed. Re-seeded on every invocation or at least pretty often. Can be used for session IDs, handed out to untrusted parties, etc.

I arrived at this question while thinking if it would be OK to use them as DB IDs, and whether the Guid.comb algorithm implementation together with System.Guid.NewGuid() (like NHibernate does it) would be flawed or not.

.Net Solutions


Solution 1 - .Net

The answer is: You should not need to know this. As stated in the accepted answer to a related question:

> A GUID doesn't make guarantees about randomness, it makes guarantees around uniqueness.

An even stronger statement on security and randomness is made in RFC4122, which speficies the UUID format:

> Do not assume that UUIDs are hard to guess; they should not be used as security capabilities (identifiers whose mere possession grants access), for example. A predictable random number source will exacerbate the situation.

Anything else is an implementation detail (and might be subject change).

Windows specifics

Often, people claim that the behavior on Windows is documented and that it is therefore guaranteed that GUIDs are cryptographically secure.

The now archived [MS-SECO] Windows Security Overview document mentions in Appendix A:

> Although only a small minority of version 4 GUIDs require cryptographic randomness, the random bits for all version 4 GUIDs built in Windows are obtained via the Windows CryptGenRandom cryptographic API or the equivalent, the same source that is used for generation of cryptographic keys.

Moreover, section 2.5.5 of the same document explicitly mentions the use of "secret GUID" values as nonce or authenticator.

BUT: This piece of product behavior documentation is not a specification you can generally base the security of your product on (in particular in the context of .NET).

In fact, the document above describes an implementation detail of a particular product. Even if the current Windows and .NET Framework 4.x implementations produce truly random version 4 UUID values on Windows, there is no guarantee that System.Guid.NewGuid will do so in the future or on other .NET platforms (e.g. Mono, Silverlight, CF, .NET Core, etc).

Just as an example, the UUID algorithm used in earlier versions of .NET Core depends on the platform and you might get a version 1 UUID (on BSD).

Solution 2 - .Net

Some people have already hinted at that but I want to repeat it since there appears to be a misconception there:

Randomness and uniqueness are orthogonal concepts.

Random data can be unique or redundant, and likewise unique data can use a random source or a deterministic source (think a global counter that is locked and incremented for every GUID ever created).

GUIDs were designed to be unique, not random. If the .NET generator appears to use random input, fine. But don’t rely on it as a source of randomness, neither for cryptographical nor for any other purposes (in particular, what distribution function do you expect to get?). On the other hand, you can be reasonably sure that GUIDs created by .NET, even in large volumes, will be unique.

Solution 3 - .Net

APIs that produce random bytes but which are not explicitly documented to produce cryptographically strong random bytes cannot be trusted to produce cryptographically strong random bytes.

If you need cryptographically strong random bytes, then you should be using an API which is explicitly documented to produce them.

public Guid CreateCryptographicallyStrongGuid() {
    var rng = new System.Security.Cryptography.RNGCryptoServiceProvider();
    var data = new byte[16];
    rng.GetBytes(data);
    return new Guid(data);
}

These GUIDs are simply 128 bits of cryptographic randomness. They are not structured, and they will not collide.

See this article for some of the math. Using "The General Birthday Formula", rearranging gives

> n = sqrt(-2T * ln(p))

where n is the number of chosen elements, T is the total number of elements (2^128), and p is the target probability that all n chosen elements will be different. With p = .99, this gives n = 2.61532104 * 10^18. This means that we can generate a billion truly random GUIDs per second within a system for a billion seconds (32 years), and have better than 99% chance at the end that each one is unique within the system.

Solution 4 - .Net

The definition of Random in no way relates to the definition of Globally Unique.

Flipping a coin twice and getting HH, HT, TH, TT are all random. HH is just as random as HT.

Flipping a "special" coin twice and guaranteeing that you will only get HT or TH is uniqueness.

Solution 5 - .Net

According to https://msdn.microsoft.com/en-us/library/bb417a2c-7a58-404f-84dd-6b494ecf0d13#id11, since Windows 2000 back in 1999,

> "the random bits for all version 4 GUIDs built in Windows are obtained > via the Windows CryptGenRandom cryptographic API or the equivalent, > the same source that is used for generation of cryptographic keys"

So I'd consider them cryptographically secure -- at least to the extent of the 122 bits of entropy they provide.

Also see https://stackoverflow.com/a/35384818/284704, where Will Dean verified through a debug-step that the CLR is calling the proper secure OS random generator.

Solution 6 - .Net

GUIDs are designed to be at number 2 on your scale, i.e. "can be used for generating IDs in a DB because duplicates are unlikely*."

As for security, the problem isn't "it's not good for security because the results can be predicted with sufficient effort.". The problem is that no-one gives you a documented security guarantee.

In practise, according to this comment and this one, the GUID generation is implemented in terms of a cryptographically secure RNG (CryptGenRandom). But that appears to be an undocumented implementation detail. (And I haven't verified this - it's random comments on the Internet, take with a truckload of salt).

(* Where "unlikely" means something like "the chances of anyone finding a duplicate GUID before the end of the universe are less than the chances of you personally winning the lottery." Implementation bugs excepted, of course.)

Solution 7 - .Net

They are random so that it is mathematcially provable that collisions should not occur for a very long time, so that you can assume that they are unique globally. However, they are not cryptographically strong, since this would require a true randomness, which isn't really possible in computers without dedicated hardware.

Solution 8 - .Net

Focussing on your question of using GUID's as row identifiers:

GUID's are for databases geared towards replication, or generating rows ahead of time before adding them to the DB. If you do not need GUID's to solve a particular problem, try stick to incremental numbering. GUID's complicate debugging and testing a little bit.

The COMB method in the article you mention seems pretty great, actually. I never realized, thanks for that one! (p.s. the printer friendly version of that article reads much better)

So if you don't need to generate a GUID ahead of time, you can let the database handle the GUID generation for you. The speed differences you'll only notice if you start adding 10,000's of records in one go, which you shouldn't do anyway, that's what bulk-importing is for.

Also have a look at Jeff on ID's vs GUID's

create table #temp ([id] uniqueidentifier primary key default(newid()), [name] varchar(20))
insert into #temp (name) values ('apple')
insert into #temp (name) values ('orange')
insert into #temp (name) values ('banana')
select * from #temp
drop table #temp

id                                   name
------------------------------------ --------------------
911B0CBD-4EED-4EB0-8488-1B2CDD915C02 banana
56CF3A80-A2DE-4949-9C9B-5F890824EA9C orange
5990B9FD-143D-41B0-89D1-957B2C57AB94 apple

Solution 9 - .Net

I read somewhere that the chances of winning the lottery would be equivalent to 2 4-byte "GUIDs" colliding. The standard 16-byte GUIDs would offer much less chance of collision.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionVilx-View Question on Stackoverflow
Solution 1 - .NetDirk VollmarView Answer on Stackoverflow
Solution 2 - .NetKonrad RudolphView Answer on Stackoverflow
Solution 3 - .NetyfeldblumView Answer on Stackoverflow
Solution 4 - .NetRobin DayView Answer on Stackoverflow
Solution 5 - .NetJordan RiegerView Answer on Stackoverflow
Solution 6 - .Netuser9876View Answer on Stackoverflow
Solution 7 - .NetLuceroView Answer on Stackoverflow
Solution 8 - .NetinvertView Answer on Stackoverflow
Solution 9 - .NetcjkView Answer on Stackoverflow