What's the best way to create a short hash, similar to what tiny Url does?

C#.NetHashShort Url

C# Problem Overview


I'm currently using MD5 hashes but I would like to find something that will create a shorter hash that uses just [a-z][A-Z][0-9]. It only needs to be around 5-10 characters long.

Is there something out there that already does this?

Update 1:

I like the CRC32 hash. Is there a clean way of calculating it in .NET?

Update 2:

I'm using the CRC32 function from the link Joe provided. How can I convert the uInt into the characters defined above?

C# Solutions


Solution 1 - C#

.NET string object has a GetHashCode() function. It returns an integer. Convert it into a hex and then to an 8 characters long string.

Like so:

string hashCode = String.Format("{0:X}", sourceString.GetHashCode());

More on that: http://msdn.microsoft.com/en-us/library/system.string.gethashcode.aspx

UPDATE: Added the remarks from the link above to this answer:

> The behavior of GetHashCode is dependent on its implementation, which > might change from one version of the common language runtime to > another. A reason why this might happen is to improve the performance > of GetHashCode. > > If two string objects are equal, the GetHashCode method returns > identical values. However, there is not a unique hash code value for > each unique string value. Different strings can return the same hash > code. > > Notes to Callers > > The value returned by GetHashCode is platform-dependent. It differs on > the 32-bit and 64-bit versions of the .NET Framework.

Solution 2 - C#

Is your goal to create a URL shortener or to create a hash function?

If your goal is to create a URL shortener, then you don't need a hash function. In that case, you just want to pre generate a sequence of cryptographically secure random numbers, and then assign each url to be encoded a unique number from the sequence.

You can do this using code like:

using System.Security.Cryptography;

const int numberOfNumbersNeeded = 100;
const int numberOfBytesNeeded = 8;
var randomGen = RandomNumberGenerator.Create();
for (int i = 0; i < numberOfNumbersNeeded; ++i)
{
     var bytes = new Byte[numberOfBytesNeeded];
     randomGen.GetBytes(bytes);
}
 

Using the cryptographic number generator will make it very difficult for people to predict the strings you generate, which I assume is important to you.

You can then convert the 8 byte random number into a string using the chars in your alphabet. This is basically a change of base calculation (from base 256 to base 62).

Solution 3 - C#

I dont think URL shortening services use hashes, I think they just have a running alphanumerical string that is increased with every new URL and stored in a database. If you really need to use a hash function have a look at this link: some hash functions Also, a bit offtopic but depending on what you are working on this might be interesting: Coding Horror article

Solution 4 - C#

Just take a Base36 (case-insensitive) or Base64 of the ID of the entry.

So, lets say I wanted to use Base36:

(ID - Base36)
1 - 1
2 - 2
3 - 3
10 - A
11 - B
12 - C
...
10000 - 7PS
22000 - GZ4
34000 - Q8C
...
1000000 - LFLS
2345000 - 1E9EW
6000000 - 3KLMO

You could keep these even shorter if you went with base64 but then the URL's would be case-sensitive. You can see you still get your nice, neat alphanumeric key and with a guarantee that there will be no collisions!

Solution 5 - C#

You cannot use a short hash as you need a one-to-one mapping from the short version to the actual value. For a short hash the chance for a collision would be far too high. Normal, long hashes, would not be very user-friendly (and even though the chance for a collision would probably be small enough then, it still wouldn't feel "right" to me).

TinyURL.com seems to use an incremented number that is converted to Base 36 (0-9, A-Z).

Solution 6 - C#

First I get a list of random distinct numbers. Then I select each char from base string, append and return result. I'm selecting 5 chars, that will amount to 6471002 permutations out of base 62. Second part is to check against db to see if any exists, if not save short url.

 const string BaseUrlChars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

 private static string ShortUrl
 {
     get
     {
         const int numberOfCharsToSelect = 5;
         int maxNumber = BaseUrlChars.Length;

         var rnd = new Random();
         var numList = new List<int>();

         for (int i = 0; i < numberOfCharsToSelect; i++)
             numList.Add(rnd.Next(maxNumber));

         return numList.Aggregate(string.Empty, (current, num) => current + BaseUrlChars.Substring(num, 1));
      } 
  }

Solution 7 - C#

You can decrease the number of characters from the MD5 hash by encoding them as alphanumerics. Each MD5 character is usually represented as hex, so that's 16 possible values. [a-zA-Z0-9] includes 62 possible values, so you could encode each value by taking 4 MD5 values.

EDIT:

here's a function that takes a number ( 4 hex digits long ) and returns [0-9a-zA-Z]. This should give you an idea of how to implement it. Note that there may be some issues with the types; I didn't test this code.

char num2char( unsigned int x ){
    if( x < 26 ) return (char)('a' + (int)x);
    if( x < 52 ) return (char)('A' + (int)x - 26);
    if( x < 62 ) return (char)('0' + (int)x - 52);
    if( x == 62 ) return '0';
    if( x == 63 ) return '1';
}



        

Solution 8 - C#

You can use CRC32, it is 8 bytes long and similar to MD5. Unique values will be supported by adding timestamp to actual value.

So its will look like http://foo.bar/abcdefg12.

Solution 9 - C#

If you're looking for a library that generates tiny unique hashes from inters, I can highly recommend http://hashids.org/net/. I use it in many projects and it works fantastically. You can also specify your own character set for custom hashes.

Solution 10 - C#

If you don't care about cryptographic strength, any of the CRC functions will do.

Wikipedia lists a bunch of different hash functions, including length of output. Converting their output to [a-z][A-Z][0-9] is trivial.

Solution 11 - C#

You could encode your md5 hash code with base64 instead of hexadecimal, this way you get a shorter url using exactly the characters [a-z][A-Z][0-9].

Solution 12 - C#

There's a wonderful but ancient program called btoa which converts binary to ASCII using upper- and lower-case letters, digits, and two additional characters. There's also the MIME base64 encoding; most Linux systems probably have a program called base64 or base64encode. Either one would give you a short, readable string from a 32-bit CRC.

Solution 13 - C#

You could take the first alphanumeric 5-10 characters of the MD5 hash.

Solution 14 - C#

If you need the hash to change on every call, you can do something like:

string hash = String.Format("{0:X}", DateTime.Now.GetHashCode());

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionArron SView Question on Stackoverflow
Solution 1 - C#VladView Answer on Stackoverflow
Solution 2 - C#Scott WisniewskiView Answer on Stackoverflow
Solution 3 - C#jörgView Answer on Stackoverflow
Solution 4 - C#KingNestorView Answer on Stackoverflow
Solution 5 - C#ArjanView Answer on Stackoverflow
Solution 6 - C#Filix MogilevskyView Answer on Stackoverflow
Solution 7 - C#PaulView Answer on Stackoverflow
Solution 8 - C#JoeView Answer on Stackoverflow
Solution 9 - C#herostwistView Answer on Stackoverflow
Solution 10 - C#Kevin MontroseView Answer on Stackoverflow
Solution 11 - C#codymanixView Answer on Stackoverflow
Solution 12 - C#Norman RamseyView Answer on Stackoverflow
Solution 13 - C#M4NView Answer on Stackoverflow
Solution 14 - C#viniciusalvessView Answer on Stackoverflow