Likelihood of collision using most significant bits of a UUID in Java

JavaCollisionUuid

Java Problem Overview


If I'm using Long uuid = UUID.randomUUID().getMostSignificantBits() how likely is it to get a collision. It cuts off the least significant bits, so there is a possibility that you run into a collision, right?

Java Solutions


Solution 1 - Java

According to the documentation, the static method UUID.randomUUID() generates a type 4 UUID.

This means that six bits are used for some type information and the remaining 122 bits are assigned randomly.

The six non-random bits are distributed with four in the most significant half of the UUID and two in the least significant half. So the most significant half of your UUID contains 60 bits of randomness, which means you on average need to generate 2^30 UUIDs to get a collision (compared to 2^61 for the full UUID).

So I would say that you are rather safe. Note, however that this is absolutely not true for other types of UUIDs, as Carl Seleborg mentions.

Incidentally, you would be slightly better off by using the least significant half of the UUID (or just generating a random long using SecureRandom).

Solution 2 - Java

Raymond Chen has a really excellent blog post on this:

GUIDs are globally unique, but substrings of GUIDs aren't

Solution 3 - Java

I thinks this is the best example for using randomUUID :

http://www.javapractices.com/topic/TopicAction.do?Id=56

Solution 4 - Java

You are better off just generating a random long value, then all the bits are random. In Java 6, new Random() uses the System.nanoTime() plus a counter as a seed.

There are different levels of uniqueness.

If you need uniqueness across many machines, you could have a central database table for allocating unique ids, or even batches of unique ids.

If you just need to have uniqueness in one app you can just have a counter (or a counter which starts from the currentTimeMillis()*1000 or nanoTime() depending on your requirements)

Solution 5 - Java

Use Time YYYYDDDD (Year + Day of Year) as prefix. This decreases database fragmentation in tables and indexes. This method returns byte[40]. I used it in a hybrid environment where the Active Directory SID (varbinary(85)) is the key for LDAP users and an application auto-generated ID is used for non-LDAP Users. Also the large number of transactions per day in transactional tables (Banking Industry) cannot use standard Int types for Keys

private static final DecimalFormat timeFormat4 = new DecimalFormat("0000;0000");

public static byte[] getSidWithCalendar() {
    Calendar cal = Calendar.getInstance();
    String val = String.valueOf(cal.get(Calendar.YEAR));
    val += timeFormat4.format(cal.get(Calendar.DAY_OF_YEAR));
    val += UUID.randomUUID().toString().replaceAll("-", "");
    return val.getBytes();
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiondlinsinView Question on Stackoverflow
Solution 1 - JavaRasmus FaberView Answer on Stackoverflow
Solution 2 - JavaCarl SeleborgView Answer on Stackoverflow
Solution 3 - JavaKannikaView Answer on Stackoverflow
Solution 4 - JavaPeter LawreyView Answer on Stackoverflow
Solution 5 - JavaDr BobView Answer on Stackoverflow