How to find gaps in sequential numbering in mysql?

MysqlSqlGaps and-Islands

Mysql Problem Overview


We have a database with a table whose values were imported from another system. There is an auto-increment column, and there are no duplicate values, but there are missing values. For example, running this query:

select count(id) from arrc_vouchers where id between 1 and 100

should return 100, but it returns 87 instead. Is there any query I can run that will return the values of the missing numbers? For example, the records may exist for id 1-70 and 83-100, but there are no records with id's of 71-82. I want to return 71, 72, 73, etc.

Is this possible?

Mysql Solutions


Solution 1 - Mysql

Update

ConfexianMJS provided much better answer in terms of performance.

The (not as fast as possible) answer

Here's version that works on table of any size (not just on 100 rows):

SELECT (t1.id + 1) as gap_starts_at, 
       (SELECT MIN(t3.id) -1 FROM arrc_vouchers t3 WHERE t3.id > t1.id) as gap_ends_at
FROM arrc_vouchers t1
WHERE NOT EXISTS (SELECT t2.id FROM arrc_vouchers t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
  • gap_starts_at - first id in current gap
  • gap_ends_at - last id in current gap

Solution 2 - Mysql

This just worked for me to find the gaps in a table with more than 80k rows:

SELECT
 CONCAT(z.expected, IF(z.got-1>z.expected, CONCAT(' thru ',z.got-1), '')) AS missing
FROM (
 SELECT
  @rownum:=@rownum+1 AS expected,
  IF(@rownum=YourCol, 0, @rownum:=YourCol) AS got
 FROM
  (SELECT @rownum:=0) AS a
  JOIN YourTable
  ORDER BY YourCol
 ) AS z
WHERE z.got!=0;

Result:

+------------------+
| missing          |
+------------------+
| 1 thru 99        |
| 666 thru 667     |
| 50000            |
| 66419 thru 66456 |
+------------------+
4 rows in set (0.06 sec)

Note that the order of columns expected and got is critical.

If you know that YourCol doesn't start at 1 and that doesn't matter, you can replace

(SELECT @rownum:=0) AS a

with

(SELECT @rownum:=(SELECT MIN(YourCol)-1 FROM YourTable)) AS a

New result:

+------------------+
| missing          |
+------------------+
| 666 thru 667     |
| 50000            |
| 66419 thru 66456 |
+------------------+
3 rows in set (0.06 sec)

If you need to perform some kind of shell script task on the missing IDs, you can also use this variant in order to directly produce an expression you can iterate over in bash.

SELECT GROUP_CONCAT(IF(z.got-1>z.expected, CONCAT('$(',z.expected,' ',z.got-1,')'), z.expected) SEPARATOR " ") AS missing
FROM (  SELECT   @rownum:=@rownum+1 AS expected,   IF(@rownum=height, 0, @rownum:=height) AS got  FROM   (SELECT @rownum:=0) AS a   JOIN block   ORDER BY height  ) AS z WHERE z.got!=0;

This produces an output like so

$(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456)

You can then copy and paste it into a for loop in a bash terminal to execute a command for every ID

for ID in $(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456); do
  echo $ID
  # fill the gaps
done

It's the same thing as above, only that it's both readable and executable. By changing the "CONCAT" command above, syntax can be generated for other programming languages. Or maybe even SQL.

Solution 3 - Mysql

Quick and Dirty query that should do the trick:

SELECT a AS id, b AS next_id, (b - a) -1 AS missing_inbetween
FROM 
 (
SELECT a1.id AS a , MIN(a2.id) AS b 
FROM arrc_vouchers  AS a1
LEFT JOIN arrc_vouchers AS a2 ON a2.id > a1.id
WHERE a1.id <= 100
GROUP BY a1.id
) AS tab

WHERE 
b > a + 1

This will give you a table showing the id that has ids missing above it, and next_id that exists, and how many are missing between...e.g.

id  next_id  missing_inbetween
1        4                  2
68       70                  1
75       87                 11

Solution 4 - Mysql

If you are using an MariaDB you have a faster (800%) option using the sequence storage engine:

SELECT * FROM seq_1_to_50000 WHERE SEQ NOT IN (SELECT COL FROM TABLE);

Solution 5 - Mysql

An alternative solution that requires a query + some code doing some processing would be:

select l.id lValue, c.id cValue, r.id rValue 
  from 
  arrc_vouchers l 
  right join arrc_vouchers c on l.id=IF(c.id > 0, c.id-1, null)
  left  join arrc_vouchers r on r.id=c.id+1
where 1=1
  and c.id > 0 
  and (l.id is null or r.id is null)
order by c.id asc;

Note that the query does not contain any subselect that we know it's not handled performantly by MySQL's planner.

That will return one entry per centralValue (cValue) that does not have a smaller value (lValue) or a greater value (rValue), ie:

lValue |cValue|rValue
-------+------+-------
{null} | 2    | 3      
8      | 9    | {null} 
{null} | 22   | 23     
23     | 24   | {null} 
{null} | 29   | {null} 
{null} | 33   | {null} 


Without going into further details (we'll see them in next paragraphs) this output means that:

  • No values between 0 and 2
  • No values between 9 and 22
  • No values between 24 and 29
  • No values between 29 and 33
  • No values between 33 and MAX VALUE

So the basic idea is to do a RIGHT and LEFT joins with the same table seeing if we have adjacents values per value (ie: if central value is '3' then we check for 3-1=2 at left and 3+1 at right), and when a ROW has a NULL value at RIGHT or LEFT then we know there is no adjacent value.

The complete raw output of my table is:

select * from arrc_vouchers order by id asc;

0  
2  
3  
4  
5  
6  
7  
8  
9  
22 
23 
24 
29 
33 

Some notes:

  1. The SQL IF statement in the join condition is needed if you define the 'id' field as UNSIGNED, therefore it will not allow you to decrease it under zero. This is not strictly necessary if you keep the c.value > 0 as it's stated in the next note, but I'm including it just as doc.
  2. I'm filtering the zero central value as we are not interested in any previous value and we can derive the post value from the next row.

Solution 6 - Mysql

Create a temporary table with 100 rows and a single column containing the values 1-100.

Outer Join this table to your arrc_vouchers table and select the single column values where the arrc_vouchers id is null.

Coding this blind, but should work.

select tempid from temptable 
left join arrc_vouchers on temptable.tempid = arrc_vouchers.id 
where arrc_vouchers.id is null

Solution 7 - Mysql

If there is a sequence having gap of maximum one between two numbers (like 1,3,5,6) then the query that can be used is:

select s.id+1 from source1 s where s.id+1 not in(select id from source1) and s.id+1<(select max(id) from source1);
  • table_name - source1
  • column_name - id

Solution 8 - Mysql

Although these all seem to work, the result set returns in a very lengthy time when there are 50,000 records.

I used this, and it find the gap or the next available (last used + 1) with a much faster return from the query.

SELECT a.id as beforegap, a.id+1 as avail
FROM table_name a
where (select b.id from table_name b where b.id=a.id+1) is null
limit 1;

Solution 9 - Mysql

based on the answer given above by Lucek this stored procedure allows you to specify the table and column names that you wish to test to find non-contiguous records - thus answering the original question and also demonstrating how one could use @var to represent tables &/or columns in a stored procedure.

create definer=`root`@`localhost` procedure `spfindnoncontiguous`(in `param_tbl` varchar(64), in `param_col` varchar(64))
language sql
not deterministic
contains sql
sql security definer
comment ''
begin
declare strsql varchar(1000);
declare tbl varchar(64);
declare col varchar(64);

set @tbl=cast(param_tbl as char character set utf8);
set @col=cast(param_col as char character set utf8);

set @strsql=concat("select 
	( t1.",@col," + 1 ) as starts_at, 
  ( select min(t3.",@col,") -1 from ",@tbl," t3 where t3.",@col," > t1.",@col," ) as ends_at
	from ",@tbl," t1
		where not exists ( select t2.",@col," from ",@tbl," t2 where t2.",@col," = t1.",@col," + 1 )
		having ends_at is not null");

prepare stmt from @strsql;
execute stmt;
deallocate prepare stmt;
end

Solution 10 - Mysql

I tried it in different manners and the best performance that I found was this simple query:

select a.id+1 gapIni
	,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
	from arrc_vouchers a
	left join arrc_vouchers b on b.id=a.id+1
	where b.id is null
	order by 1
;

... one left join to check if the next id exists, only if next if is not found, then the subquery find the next id that exists to find the end of gap. I did it because query with equal (=) is better performance than greater than (>) operator.

Using the sqlfiddle it not show so different performance of others query but in a real database this query above result 3 times more fast than others.

The schema:

CREATE TABLE arrc_vouchers (id int primary key)
;
INSERT INTO `arrc_vouchers` (`id`) VALUES (1),(4),(5),(7),(8),(9),(10),(11),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29)
;

Follow bellow all query that I made to compare the performance:

select a.id+1 gapIni
	,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
	from arrc_vouchers a
	left join arrc_vouchers b on b.id=a.id+1
	where b.id is null
	order by 1
;
select *, (gapEnd-gapIni) qt
	from (
		select id+1 gapIni
		,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
		from arrc_vouchers a
		order by id
	) a where gapEnd <> gapIni
;
select id+1 gapIni
	,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
	#,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
	from arrc_vouchers a
	where id+1 <> (select x.id from arrc_vouchers x where x.id>a.id limit 1)
	order by id
;
select id+1 gapIni
	,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
	from arrc_vouchers a
	order by id
;
select id+1 gapIni
	,coalesce((select id from arrc_vouchers x where x.id=a.id+1),concat('*** GAT *** ',(select x.id from arrc_vouchers x where x.id>a.id limit 1))) gapEnd
	from arrc_vouchers a
	order by id
;

Maybe it helps someone and useful.

You can see and test my query using this sqlfiddle:

http://sqlfiddle.com/#!9/6bdca7/1

Solution 11 - Mysql

Probably not relevant, but I was looking for something like this to list the gaps in a sequence of numbers and found this post, that has multiple different solutions depending upon exactly what you are looking for. I was looking for the first available gap in the sequence (i.e. next available number), and this seems to work fine.

SELECT MIN(l.number_sequence + 1) as nextavabile 
from patients as l 
LEFT OUTER JOIN patients as r on l.number_sequence + 1 = r.number_sequence
WHERE r.number_sequence is NULL

Several other scenarios and solutions discussed there, from 2005!

How to Find Missing Values in a Sequence With SQL

Solution 12 - Mysql

A simple yet effective solution to find the missing auto-increment values

SELECT `id`+1 
FROM `table_name` 
WHERE `id`+1 NOT IN (SELECT id FROM table_name)

Solution 13 - Mysql

Another simple answer that identifies the gaps. We do a query selecting just the odd numbers and we right join it to a query with all the even numbers. As long as you're not missing id 1; this should give you a comprehensive list of where the gaps start.

You'll still have to take a look at that place in the database to figure out how many numbers the gap is. I found this way easier than the solution proposed and much easier to customize to unique situations.

SELECT *
FROM (SELECT * FROM MyTABLE WHERE MYFIELD % 2 > 0) AS A
RIGHT JOIN FROM (SELECT * FROM MyTABLE WHERE MYFIELD % 2 = 0) AS B
ON A.MYFIELD=(B.MYFIELD+1)
WHERE a.id IS NULL;

Solution 14 - Mysql

Starting from the comment posted by user933161,

select l.id + 1 as start from sequence as l inner join sequence as r on l.id + 1 = r.id where r.id is null;

is better in that it will not produce a false positive for the end of the list of records. (I'm not sure why so many are using left outer joins.) Also,

insert into sequence (id) values (#);

where # is the start value for a gap will fill that start value. (If there are fields that cannot be null, you will have to add those with dummy values.) You could alternate between querying for start values and filling in each start value until the query for start values returns an empty set.

Of course, this approach would only be helpful if you're working with a small enough data set that manually iterating like that is reasonable. I don't know enough about things like phpMyAdmin to come up with ways to automate it for larger sets with more and larger gaps.

Solution 15 - Mysql

This works for me:

SELECT distinct(l.membership_no + 1) as nextavabile
from Tablename as l
LEFT OUTER JOIN Tablename as r on l.membership_no + 1 = r.membership_no
WHERE r.membership_no is NULL and l.membership_no is not null order by nextavabile asc;

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionEmmySView Question on Stackoverflow
Solution 1 - MysqlmattView Answer on Stackoverflow
Solution 2 - MysqlConfexianMJSView Answer on Stackoverflow
Solution 3 - MysqlBenView Answer on Stackoverflow
Solution 4 - MysqlMoshe LView Answer on Stackoverflow
Solution 5 - Mysqlmgo1977View Answer on Stackoverflow
Solution 6 - MysqlamelvinView Answer on Stackoverflow
Solution 7 - MysqlPrakhar GuptaView Answer on Stackoverflow
Solution 8 - MysqlRobView Answer on Stackoverflow
Solution 9 - MysqlProfessor AbronsiusView Answer on Stackoverflow
Solution 10 - Mysqllynx_74View Answer on Stackoverflow
Solution 11 - MysqlSScottiView Answer on Stackoverflow
Solution 12 - MysqlJoji MathewView Answer on Stackoverflow
Solution 13 - MysqlDougView Answer on Stackoverflow
Solution 14 - Mysqlagerber85View Answer on Stackoverflow
Solution 15 - MysqlssbhattaraiView Answer on Stackoverflow