YouTube URL algorithm?

Algorithm

Algorithm Problem Overview


How would you go about generating the unique video URL's that YouTube uses?

Example:

Algorithm Solutions


Solution 1 - Algorithm

YouTube uses Base64 encoding to generate IDs for each video.Characters involved in generating Ids consists of

> (A-Z) + (a-z) + (0-9) + (-) + (_). (64 Characters).

Using Base64 encoding and only up to 11 characters they can generate 73+ Quintilian unique IDs.How much large pool of ID is that?

Well, it's enough for everyone on earth to produce video every single minute for 18000 years.

And they have achieved such huge number by only using 11 characters (6464646464646464646464) if they need more IDs they will just have to add 1 more character to their IDs.

So when video is uploaded on YouTube they basically randomly select from 73+ Quintilian possibility and see if its already taken or not.if not use it otherwise look for another one.

Refer to this video for detailed explanation.

Solution 2 - Algorithm

Using some non-trivial hashing function. The probability of collision is very low, depending on the function, the parameters and the input domain. Keep in mind that cryptographic hashes were specifically designed to have very low collision rates for non-random input (i.e. completely different hashes for two close-but-unequal inputs).

This post by Jeff Attwood is a nice overview of the topic.

And here is an online hash calculator you can play with.

Solution 3 - Algorithm

There is no need to use a hash. It is probably just a quasi-random 64 bit value passed through base64 or some equivalent.

By quasi-random, I mean it is just a one-to-one mapping with the counting integers, just shuffled.

For example, you could take a monotonically increasing database id and multiply it by some prime near 2^64, then base64 the result. If you did not want people to be able to guess, you might choose a more complex mapping or just pick a random number that is not in the database yet.

Normal base64 would add an equals at the end, but in this case it is implied because the size is known. The character mapping could easily be something besides the standard.

Solution 4 - Algorithm

Eli's link to Jeff's article is, in my opinion, irrelevant. URL shortening is not the same thing as presenting an ID to the world. Instead, a nicer way would be to convert your existing integer ID to a different radix.

An example in PHP:

$id = 9999;
//$url_id = base_convert($id, 10, 26+26+10); // PHP doesn't like this
$url_id = base_convert($id, 10, 26+10); // Works, but only digits + lowercase

Sadly, PHP only supports up to base 36 (digits + alphabet). Base 62 would support alphabet in both upper-case and lower-case.


People are talking about these other systems:

  • Random number/letters - Why? If you want people to not see the next video (id+1), then just make it private. On a website like youtube, where it actively shows any video it has, why bother with random ids?
  • Hashing an ID - This design concept really stinks. Think about it; so you have an ID guaranteed by your DBM software to be unique, and you hash it (introducing a collision factor)? Give me one reason why to even consider this idea.
  • Using the ID in URL - To be honest, I don't see any problems with this either, though it will grow to be large when in fact you can express the same number with fewer letters (hence my solution).
  • Using Base64 - Base64 expects bytes of data, literally anything from nulls to spaces. Why use this function when your data consists of a number (ie, a mix of 10 different characters, instead of 256)?

Solution 5 - Algorithm

You could generate a GUID and have that as the ID for the video. Guids are very unlikely to collide.

Solution 6 - Algorithm

Your best bet is probably to simply generate random strings, and keep track (in a DB for example) of which strings you've already used so you don't duplicate. This is very easy to implement and it cannot fail if properly implemented (no duplicates, etc).

Solution 7 - Algorithm

You can use any library or some languages like python provides it in standard library.

Example:

import secrets


id_length = 12
random_video_id = secrets.token_urlsafe(id_length)

Solution 8 - Algorithm

I don't think that the URL v parameter has anything to do with the content (video properties, title, description etc).

It's a randomly generated string of fixed length and contains a very specific set of characters. No duplicates are allowed.

Solution 9 - Algorithm

I suggest using a perfect hash function:

https://stackoverflow.com/questions/9551091/perfect-hash-function-for-human-readable-order-codes

As the accepted answer indicates, take a number, then apply a sequence of "bijective" (or reversible) operations on the number to get a hashed number.

The input numbers should be in sequence: 0, 1, 2, 3, and so on.

Solution 10 - Algorithm

Just pick random values until you have one never seen before.

Randomly picking and exhausting all values form a set runs in expected time O(nlogn): https://stackoverflow.com/questions/1293939/what-is-o-value-for-naive-random-selection-from-finite-set/2169615#2169615

In your case you wouldn't exhaust the set, so you should get constant time picks. Just use a fast data structure to do the duplication lookups.

Solution 11 - Algorithm

Typically you're hiding a numeric identifier in the form of something that doesn't look numeric. One simple method is something like base-36 encoding the number. You should be able to pull that off with one or another variant of itoa() in the language of your choice.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBenBView Question on Stackoverflow
Solution 1 - AlgorithmSunil Kumar JhaView Answer on Stackoverflow
Solution 2 - AlgorithmEli BenderskyView Answer on Stackoverflow
Solution 3 - AlgorithmdrawnonwardView Answer on Stackoverflow
Solution 4 - AlgorithmChristianView Answer on Stackoverflow
Solution 5 - AlgorithmTelavianView Answer on Stackoverflow
Solution 6 - AlgorithmCamView Answer on Stackoverflow
Solution 7 - AlgorithmAvm-xView Answer on Stackoverflow
Solution 8 - AlgorithmcherouvimView Answer on Stackoverflow
Solution 9 - AlgorithmPeter O.View Answer on Stackoverflow
Solution 10 - AlgorithmThomas AhleView Answer on Stackoverflow
Solution 11 - AlgorithmIanView Answer on Stackoverflow