What's the Hi/Lo algorithm?

DatabaseAlgorithmPrimary KeyIdentifierHilo

Database Problem Overview


What's the Hi/Lo algorithm?

I've found this in the NHibernate documentation (it's one method to generate unique keys, section 5.1.4.2), but I haven't found a good explanation of how it works.

I know that Nhibernate handles it, and I don't need to know the inside, but I'm just curious.

Database Solutions


Solution 1 - Database

The basic idea is that you have two numbers to make up a primary key- a "high" number and a "low" number. A client can basically increment the "high" sequence, knowing that it can then safely generate keys from the entire range of the previous "high" value with the variety of "low" values.

For instance, supposing you have a "high" sequence with a current value of 35, and the "low" number is in the range 0-1023. Then the client can increment the sequence to 36 (for other clients to be able to generate keys while it's using 35) and know that keys 35/0, 35/1, 35/2, 35/3... 35/1023 are all available.

It can be very useful (particularly with ORMs) to be able to set the primary keys on the client side, instead of inserting values without primary keys and then fetching them back onto the client. Aside from anything else, it means you can easily make parent/child relationships and have the keys all in place before you do any inserts, which makes batching them simpler.

Solution 2 - Database

In addition to Jon's answer:

It is used to be able to work disconnected. A client can then ask the server for a hi number and create objects increasing the lo number itself. It does not need to contact the server until the lo range is used up.

Solution 3 - Database

The hi/lo algorithm splits the sequences domain into hi groups. A hi value is assigned synchronously. Every hi group is given a maximum number of lo entries, that can be assigned off-line without worrying about concurrent duplicate entries.

  1. The hi token is assigned by the database, and two concurrent calls are guaranteed to see unique consecutive values

  2. Once a hi token is retrieved we only need the incrementSize (the number of lo entries)

  3. The identifiers range is given by the following formula:

     [(hi -1) * incrementSize) + 1, (hi * incrementSize) + 1)
    

    and the “lo” value will be in the range:

     [0, incrementSize)
    

    being applied from the start value of:

     [(hi -1) * incrementSize) + 1)
    
  4. When all lo values are used, a new hi value is fetched and the cycle continues

And this visual presentation is easy to follow as well:

enter image description here

While hi/lo optimizer is fine for optimizing identifier generation, it doesn't play well with other systems inserting rows into our database, without knowing anything about our identifier strategy.

Hibernate offers the pooled-lo optimizer, which offers the advantages of the hi/lo generator strategy while also providing interoperability with other 3rd-party clients that are not aware of this sequence allocation strategy.

Being both efficient and interoperable with other systems, the pooled-lo optimizer is a much better candidate than the legacy hi/lo identifier strategy.

Solution 4 - Database

Lo is a cached allocator that splits the keyspace into large chunks, typically based on some machine word size, rather than the meaningfully-sized ranges (eg obtaining 200 keys at a time) which a human might sensibly choose.

Hi-Lo usage tends to waste large numbers of keys on server restart, and generate large human-unfriendly key values.

Better than the Hi-Lo allocator, is the "Linear Chunk" allocator. This uses a similar table-based principle but allocates small, conveniently-sized chunks & generates nice human-friendly values.

create table KEY_ALLOC (
    SEQ varchar(32) not null,
    NEXT bigint not null,
    primary key (SEQ)
);

To allocate the next, say, 200 keys (which are then held as a range in the server & used as needed):

select NEXT from KEY_ALLOC where SEQ=?;
update KEY_ALLOC set NEXT=(old value+200) where SEQ=? and NEXT=(old value);

Providing you can commit this transaction (use retries to handle contention), you have allocated 200 keys & can dispense them as needed.

With a chunk-size of just 20, this scheme is 10x faster than allocating from an Oracle sequence, and is 100% portable amongst all databases. Allocation performance is equivalent to hi-lo.

Unlike Ambler's idea, it treats the keyspace as a contiguous linear numberline.

This avoids the impetus for composite keys (which were never really a good idea) and avoids wasting entire lo-words when the server restarts. It generates "friendly", human-scale key values.

Mr Ambler's idea, by comparison, allocates the high 16- or 32-bits, and generates large human-unfriendly key values as the hi-words increment.

Comparison of allocated keys:

Linear_Chunk       Hi_Lo
100                65536
101                65537
102                65538
.. server restart
120                131072
121                131073
122                131073
.. server restart
140                196608

Design-wise, his solution is fundamentally more complex on the number-line (composite keys, large hi_word products) than Linear_Chunk while achieving no comparative benefit.

The Hi-Lo design arose early in OO mapping and persistence. These days persistence frameworks such as Hibernate offer simpler and better allocators as their default.

Solution 5 - Database

I found the Hi/Lo algorithm is perfect for multiple databases with replication scenarios based in my experience. Imagine this. you have a server in New York (alias 01) and another server in Los Angeles (alias 02) then you have a PERSON table... so in New York when a person is create... you always use 01 as the HI value and the LO value is the next secuential. por example.

  • 010000010 Jason
  • 010000011 David
  • 010000012 Theo

in Los Angeles you always use the HI 02. for example:

  • 020000045 Rupert
  • 020000046 Oswald
  • 020000047 Mario

So, when you use the database replication (no matter what brand) all the primary keys and data combine easily and naturally without to worry about duplicate primary keys, collissions, etc.

This is the best way to go in this scenario.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDiegoCofreView Question on Stackoverflow
Solution 1 - DatabaseJon SkeetView Answer on Stackoverflow
Solution 2 - DatabaseStephan EggermontView Answer on Stackoverflow
Solution 3 - DatabaseVlad MihalceaView Answer on Stackoverflow
Solution 4 - DatabaseThomas WView Answer on Stackoverflow
Solution 5 - DatabaseTheoView Answer on Stackoverflow