Are GUID collisions possible?

Sql ServerGuid

Sql Server Problem Overview


I'm working on a database in SQL Server 2000 that uses a GUID for each user that uses the app it's tied to. Somehow, two users ended up with the same GUID. I know that microsoft uses an algorithm to generate a random GUID that has an extremely low chance of causing collisons, but is a collision still possible?

Sql Server Solutions


Solution 1 - Sql Server

Basically, no. I think someone went mucking with your database. Depending on the version GUID you're using the value is either unique (for things like version 1 GUIDs), or both unique and unpredictable (for things like version 4 GUIDs). SQL Server's implementation for their NEWID() function appears to use a 128-bit random number, so you're not going to get a collision.

For a 1% chance of collision, you'd need to generate about 2,600,000,000,000,000,000 GUIDs.

Solution 2 - Sql Server

Basically they are not possible !, the chances are astronomically low.

But... I'm the only person I the world that I know of, that had a GUID colision once (yep!).

And I'm sure of it, and that it wasn't a mistake.

How did it happen, in a small application that was running on Pocket PC, at the end of an operation a command that has an generated GUID must be issued. The command after it was executed on the server it was stored in a command table on the server along with the execution date. One day when I was debugging I issued the module command (with the newly generated GUID attached) and nothing happened. I did it again (with the same guid, because the guid was generated only once at the beginning of the operation), and again, and nothing, finally trying to find out why the command isn't executing, I checked the command table, and the same GUID as the current one was inserted 3 weeks ago. Not believing this, I restored a database from 2 weeks backup, and the guid was there. Checked the code, the new guid was freshly generated no doubt about it. Pow guid collision, happened only once, but I really wish I would have won at lotto instead,the chance is greater :).

Edit: there are some factors that could have greatly increased the chance of this happening, the application was running on the PocketPC emulator, and the emulator has a save state feature, which means that every time the state is restored the local time is restored also and the guid is based on on the internal timer....also the guid generating algorithm for compact framework might be less complete than for example the COM one...

Solution 3 - Sql Server

They are theoretically possible, but with 3.4E38 possible numbers, if you create tens of trillions of GUIDs in a year the chance of having one duplicate is 0.00000000006 (Source).

If two users ended up with the same GUID, I would wager that there is a bug in the program which is causing the data to be copied or shared.

Solution 4 - Sql Server

First lets look at the chance of collision of two GUIDs. It is not, as other answers have stated, 1 in 2^128 (10^38) because of the birthday paradox, which means that for a 50% chance of two GUIDs colliding the probability is actually 1 in 2^64 (10^19) which is a lot smaller. However, this is still a very large number, and as such the probability of collision assuming you are using a reasonable number of GUIDs is low.

Note also that GUIDs do not contain a timestamp or the MAC address as many people also seem to believe. This was true for v1 GUIDs but now v4 GUIDs are used, which are simply a pseudo-random number which means that possibility of collision is arguably higher because they are no longer unique to a time and a machine.

So essentially the answer is yes, collisions are possible. But they are highly unlikely.

Edit: fixed to say 2^64

Solution 5 - Sql Server

The chances of two random GUIDs colliding (~1 in 10^38) is lower than the chance of not detecting a corrupt TCP/IP packet (~1 in 10^10). http://wwwse.inf.tu-dresden.de/data/courses/SE1/SE1-2004-lec12.pdf, page 11. This is also true of disk drives, cd drives, etc...

GUIDs are statistically unique and the data you read from the db is only statistically correct.

Solution 6 - Sql Server

Are you a mathematician? Then yes.

Are you an engineer? Then no.

Solution 7 - Sql Server

I would consider Occam's razor as a good guide in this case. It is incredibly unlikely that you have a GUID collision. It is much more likely you have a bug, or someone messing with your data.

Solution 8 - Sql Server

See Wikipedia's [Globally Unique Identifier](http://en.wikipedia.org/wiki/Guid "Globally Unique Identifier") article. There are several ways to generate GUIDs. Apparently the old (?) way used Mac address, a timestamp down to a very short unit and a unique counter (to manage fast generations on the same computer), so making them duplicate is nearly impossible. But these GUIDs were dropped because they could be used to track down users...

I am not sure of the new algorithm used by Microsoft (the article says a sequence of GUIDs can be predicted, looks like they no longer use timestamp? The Microsoft article linked above says something else...).

Now, GUIDs are carefully designed to be, by name, globally unique, so I will risk it is impossible, or of very very very low probability. I would look elsewhere.

Solution 9 - Sql Server

Two Win95 machines that have ethernet cards with duplicate MAC addresses will issue duplicate GUIDS under tightly controlled conditions, especially if, for example, the power goes off in the building and they both boot at exactly the same time.

Solution 10 - Sql Server

I'll preface this with "I'm not a networking person, so I may make completely incoherent sentences following.".

When I worked at Illinois State University, we had two Dell desktops, ordered at different times. We put the first one on the network, but when we tried to put the second one on the network we started receiving crazy errors. After much troubleshooting, it was determined that both machines were producing the same GUID (I'm not sure exactly what for, but it made them both unusable on the network). Dell actually replaced both machines as defective.

Solution 11 - Sql Server

I know people like the feel-good answer that GUIDs are magical and guaranteed to be unique, but in reality, most GUIDs are just 121-bit random numbers (seven of the bits are wasted on formatting). If you wouldn't feel comfortable using a big random number, then you shouldn't feel comfortable using a GUID.

Solution 12 - Sql Server

Could the code used to generate a GUID have a bug in it? Yes, of course it could. But the answer is the same as it would be for a compiler bug - your own code is orders of magnitude more likely to be buggy, so look there first.

Solution 13 - Sql Server

Generalized formula

There's a formula that estimates how many values of size S to generate to get a collision between two of them with probability P.

Variables:

  • bits - how many bits in your data type.
  • probability - target probability for the collision.

To get a collision, you have to generate around:

2^{\frac{bits + 1}{2}} * \sqrt{-log_2(1 - probability)}

Or in Python:

from math import sqrt, log

def how_many(bits, probability):
    return 2 ** ((bits + 1) / 2) * sqrt(-log(1 - probability))

GUIDs

For GUIDs (128 bits), to get a collision with probability 1% (0.01), you'll need:

In [2]: how_many(bits=128, probability=0.01)
Out[2]: 2.6153210405530885e+18

...around 2.6 * 10^18 GUIDs (that's 42 exabytes of GUIDs).

Note that this probability grows rapidly. No matter the number of bits, for 99.99% probability you'll need only 30x more GUIDs than for 1%!

In [3]: how_many(bits=128, probability=0.9999)
Out[3]: 7.91721721556706e+19

Int64

Same numbers, but for the int64 datatype:

In [4]: how_many(bits=64, probability=0.01)
Out[4]: 608926881

In [5]: how_many(bits=64, probability=0.9999)
Out[5]: 18433707802

For 1% collision probability you'll need 5 gigabytes of int64-s. Still a lot but compared to the GUIDs that is a much more comprehensible number.


It's the so called birthday problem - and in this Wikipedia article you can find more precise estimation formulas than this one.

Solution 14 - Sql Server

Of course its possible....Probable? Not likely, but it is possible.

Remember, the same machine is generating every GUID (the server), so a lot of the "randomness" that is based on machine specific information is lost.

Solution 15 - Sql Server

Just for grins, try the following script... (works on SQL 2005, not sure about 2000)

declare @table table
(
	column1 uniqueidentifier default (newid()),
	column2 int,
	column3 datetime default (getdate())
)

declare @counter int

set @counter = 1

while @counter <= 10000
begin
	insert into @table (column2) values (@counter)
	set @counter = @counter + 1
end

select * from @table

select * from @table t1 join @table t2 on t1.column1 = t2.column1 and t1.column2 != t2.column2

Running this repeatedly (takes less than a second) produces a fairly wide range from the first select, even with an EXTREMELY short time gap. So far the second select hasn't produced anything.

Solution 16 - Sql Server

Impossible if the users have different machines with network cards, and even if not it is still an extremely marginal almost theoretical risk.

Personally I'd look elsewhere as it is more likely a bug rather than a GUID clash...

Providing of course that you don't chop bits off the GUID to make it shorter.

Solution 17 - Sql Server

It's highly unlikely that you'll run into GUID collisions if you're generating them through something like the NEWID() function in SQL Server (though of course possible, as other answers have emphasized). One thing they haven't pointed out is that it's actually quite likely that you'll run into collisions if you're generating GUIDs in JavaScript on browsers in the wild. Not only are there sometimes problems in the RNG in different browsers, but I've also run into problems where the Google spiders seem to cache the results of functions like that, and ended up repeatedly passing the same GUID up to our systems.

See the various answers here for more details:

https://stackoverflow.com/questions/6906916/collisions-when-generating-uuids-in-javascript

Solution 18 - Sql Server

Don’t worry about what it is. Make it impossible. Mix the improbability of GUID with the impossibility of sequential. Just add a database sequential I’d to the GUID and call it done. You may need to change the data type from GUID to String-ish but they are not that different storage wise.

Solution 19 - Sql Server

Sure it's possible, and maybe even likely. It's not like each GUID is in a random portion of the possible number space. In the event that two threads attempted to generate one simultaneously, barring some kind of centralized GUID function with a semaphore around it, they could end up with the same value.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJason BakerView Question on Stackoverflow
Solution 1 - Sql ServerTom RitterView Answer on Stackoverflow
Solution 2 - Sql ServerPop CatalinView Answer on Stackoverflow
Solution 3 - Sql ServerBen HoffsteinView Answer on Stackoverflow
Solution 4 - Sql ServerGreg BeechView Answer on Stackoverflow
Solution 5 - Sql ServerTony LeeView Answer on Stackoverflow
Solution 6 - Sql ServerEneroth3View Answer on Stackoverflow
Solution 7 - Sql ServerJason JacksonView Answer on Stackoverflow
Solution 8 - Sql ServerPhiLhoView Answer on Stackoverflow
Solution 9 - Sql ServerJoshuaView Answer on Stackoverflow
Solution 10 - Sql ServerJohn KraftView Answer on Stackoverflow
Solution 11 - Sql ServerRick YorgasonView Answer on Stackoverflow
Solution 12 - Sql ServerMark RansomView Answer on Stackoverflow
Solution 13 - Sql ServergukoffView Answer on Stackoverflow
Solution 14 - Sql ServerFlySwatView Answer on Stackoverflow
Solution 15 - Sql ServerGalacticCowboyView Answer on Stackoverflow
Solution 16 - Sql ServerRichard HarrisonView Answer on Stackoverflow
Solution 17 - Sql ServerKen SmithView Answer on Stackoverflow
Solution 18 - Sql ServerWiredLessInTXView Answer on Stackoverflow
Solution 19 - Sql ServerKirk StrauserView Answer on Stackoverflow