search for interval overlap in list of intervals?

Algorithm Problem Overview

Say [a,b] represents the interval on the real line from a to b, a < b, inclusive (ie, [a,b] = set of all x such that a<=x<=b). Also, say [a,b] and [c,d] are 'overlapping' if they share any x such that x is in both [a,b] and [c,d].

Given a list of intervals, ([x1,y1],[x2,y2],...), what is the most efficient way to find all such intervals that overlap with [x,y]?

Obviously, I can try each and get it in O(n). But I was wondering if I could sort the list of intervals in some clever way, I could find /one/ overlapping item in O(log N) via a binary search, and then 'look around' from that position in the list to find all overlapping intervals. However, how do I sort intervals such that such a strategy would work?

Note that there may be overlaps between elements in the list items itself, which is what makes this hard.

I've tried it by sorting intervals by their left end, right end, middle, but none seem to lead to an exhaustive search.

Help?

Algorithm Solutions

Solution 1 - Algorithm

For completeness' sake, I'd like to add that there is a well-known data structure for just this sort of problem, known (surprise, surprise) as an interval tree. It's basically an augmented balanced tree (red-black, AVL, your pick) that stores intervals sorted by their left (low) endpoint. The augmentation is that each node stores the largest right (high) endpoint in its subtree. This tree allows you to find all overlapping intervals in O(log n) time.

It's described in CLRS 14.3.

Solution 2 - Algorithm

[a, b] overlaps with [x, y] iff b > x and a < y. Sorting intervals by their first elements gives you intervals matching the first condition in log time. Sorting intervals by their last elements gives you intervals matching the second condition in log time. Take the intersections of the resulting sets.

Solution 3 - Algorithm

A 'quadtree' is a data structure often used to improve the efficiency of collision detection in 2 dimensions.

I think you could come up with a similar 1-d structure. This would require some pre-computation but should result in O(log N) performance.

Basically you start with a root 'node' that covers all possible intervals, and when adding a node to the tree, you decide if it falls on the left or the right of the midpoint. If it crosses the mid point, you break it into two intervals (but record the original parent) and recursively proceed from there. You can set a limit on the depth of the tree, which can save memory and improve performance, but comes at the expense of complicating things a little (you need to store a list of intervals in your nodes).

Then when checking an interval, you basically find all leaf nodes that it would be inserted into were it inserted, check the partial intervals within those nodes for intersection, and then report the interval that is recorded against them as the 'original' parent.

Solution 4 - Algorithm

Just a quick thought 'off the cuff' so to speak.

Could you organize them into 2 lists, one for start of intervals and the other for end of intervals.

This way, you can compare y to the items in the start of interval list (say by binary search) to cut down the candidates based on that.

You can then compare x to the items in the end of interval list.

EDIT

Case: Once Off

If you are comparing only single interval to the list of intervals in a once-off situation, I don't believe sorting will help you out since ideal sorting is O(n).

By doing a linear search through all x's to trim out any impossible intervals then doing another linear search through the remaining y's you can reduce your total work. While this is still O(n), without this you would be doing 2n comparisons, whereas on average, you would only do (3n-1)/2 comparisons this way.

I believe this is the best you can do for an unsorted list.

Case: Pre-sorting doesn't count

In the case where you will be repeatedly comparing single intervals to this list of intervals and your pre-sort your list, you can achieve better results. The process above still applies, but by doing a binary search on the first list then the second you can get O(m log n) as opposed to O(mn), where m is the number of single intervals being compared. Note, still still gives you the advantage of reducing total comparisons. [2m log n compared to m(3(log n) - 1)/2]

Solution 5 - Algorithm

You could sort by both left end and right end at the same time and use both lists to eliminate none overlapping values. If the list is sorted by the left end then none of the intervals to the right of the right end of the test range can overlap. If the list is sorted by the right end then none of the intervals to the left of the left end of the test range can overlap.

For example if the intervals are

[1,4], [3,6], [4,5], [2,8], [5,7], [1,2], [2,2.5]

and you're finding overlap with [3,4] then sorting by left end and marking position of the right end of the test (with the right end as just greater than its value so that 4 is included in the range)

[1,4], [1,2], [2,2.5], [2,8], [3,6], [4,5], *, [5,7]

you know [5,7] can't overlap, then sorting by right end and marking position of the left end of the test

[1,2], [2,2.5], *, [1,4], [4,5], [3,6], [5,7], [2,8]

you know [1,2] and [2,2.5] can't overlap

Not sure how efficient this would be since you're having to do two sorts and searches.

Solution 6 - Algorithm

As you can see in other answers, most algorithms come together with a special data structure. For example, for unsorted list of intervals as input O(n) is best that you'll get. (And usually it's easier to think in terms of data structure that dictates the algorithm).

In this case, your question is not complete:

Are you given the whole list or it is you who actually creates it?
Do you have to perform just one such lookup or many of them?
Do you have any estimations for operations it should support and their frequencies?

For example, if you have to perform just one such lookup, then it's not worthy to sort the list before. If many, then the more expensive sorting or generation of an "1D quadtree" would be amortized.

However, it would be difficult to solve it, because a simple quadtree (as I understand it) is able just to detect the collistion, but it's not able to create the list of all the segments that are overlapping with your input.

One simple implementation would be an ordered (by coordonate) list where you insert all the segment ends with flag start/end and with segment number. In this way, by parsing it (still O(n), but I doubt you can make it faster if you also need the list of all the segments that overlaps), and keeping the track of all opened segments that were not closed at "check points".

Content Type	Original Author	Original Content on Stackoverflow
Question	cespinoza	View Question on Stackoverflow
Solution 1 - Algorithm	Ben	View Answer on Stackoverflow
Solution 2 - Algorithm	gdj	View Answer on Stackoverflow
Solution 3 - Algorithm	sje397	View Answer on Stackoverflow
Solution 4 - Algorithm	Dan McGrath	View Answer on Stackoverflow
Solution 5 - Algorithm	Nemo157	View Answer on Stackoverflow
Solution 6 - Algorithm	ruslik	View Answer on Stackoverflow