Good algorithm for finding the diameter of a (sparse) graph?

AlgorithmMathGraph Theory

Algorithm Problem Overview


I have a large, connected, sparse graph in adjacency-list form. I would like to find two vertices that are as far apart as possible, that is, the diameter of the graph and two vertices achieving it.

I am interested in this problem in both the undirected and directed cases, for different applications. In the directed case, I of course care about directed distance (the shortest directed path from one vertex to another).

Is there a better approach than computing all-pairs shortest paths?

Edit: By "as far apart as possible", I of course mean the "longest shortest path" -- that is, the maximum over all pairs of vertices of the shortest distance from one to the other.

Algorithm Solutions


Solution 1 - Algorithm

Well, I've put a little bit of thought on the problem, and a bit of googling, and I'm sorry, but I can't find any algorithm that doesn't seem to be "just find all pairs shortest path".

However, if you assume that Floyd-Warshall is the only algorithm for computing such a thing (Big-Theta of |V|^3), then I have a bit of good news for you: Johnson's Algorithm for Sparse Graphs (thank you, trusty CLRS!) computes all pairs shortest paths in (Big-Oh (|V|^2 * lgV + VE)), which should be asymptotically faster for sparse graphs.

Wikipedia says it works for directed (not sure about undirected, but at least I can't think of a reason why not), here's the link.

Is there anything else about the graph that may be useful? If it can be mapped easily onto a 2D plane (so, its planar and the edge weights obey the triangle inequality [it may need to satisfy a stricter requirement, I'm not sure]) you may be able to break out some geometric algorithms (convex-hull can run in nlogn, and finding the farthest pair of points is easy from there).

Hope this helps!

  • Agor

Edit: I hope the link works now. If not, just google it. :)

Solution 2 - Algorithm

I don't know of a better method for computing diameter other than all shortest paths, but Mathematica uses the following approximation for PseudoDiameter:

  • A graph geodesic is the shortest path between two vertices of a graph. The graph diameter is the longest possible length of all graph geodesics of the graph. PseudoDiameter finds an approximate graph diameter. It works by starting from a vertex u, and finds a vertex v that is farthest away from u. This process is repeated by treating v as the new starting vertex, and ends when the graph distance no longer increases. A vertex from the last level set that has the smallest degree is chosen as the final starting vertex u, and a traversal is done to see if the graph distance can be increased. This graph distance is taken to be the pseudo-diameter.

http://reference.wolfram.com/mathematica/GraphUtilities/ref/PseudoDiameter.html

Solution 3 - Algorithm

Edit I'm undeleting again, simply so I can continue commenting. I have some comments on Johnson's Algorithm below this answer. - Aaron

My original comment : I too am curious about this problem, but don't have an answer. It seems related to the [Minimum Spanning Tree][1], the subgraph connecting all vertices but having fewest (or lowest weight) edges. That is an old problem with a number of algorithms; some of which seem quite easy to implement.

I had initially hoped that the diameter would be obvious once the MST had been found, but I'm losing hope now :-( Perhaps the MST can be used to place a reasonable upper bound on the diameter, which you can use to speed up your search for the actual diameter?

[1]: http://en.wikipedia.org/wiki/Minimum_spanning_tree "Minimum Spanning Tree"

Solution 4 - Algorithm

Here's some thoughts on doing better than all pairs shortest paths in an undirected graph, although I'm not sure just how much of an improvement it would be.

Here's a subroutine that will find two nodes distance D apart, if there are any. Pick an arbitrary node x and compute M[x] = maximum distance from x to any other node (using any single source shortest path algorithm). If M[x] >= D, then x is one of our nodes and the other is easy to find. However, if M[x] < D, then neither endpoint we're looking for can be less than distance D - M[x] from x (because there are paths from that node to all other nodes, through x, of distance < D). So find all nodes of distance less than D-M[x] from x and mark them as bad. Pick a new x, this time making sure we avoid all nodes marked as bad, and repeat. Hopefully, we'll mark lots of nodes as bad so we'll have to do many fewer than |V| shortest path computations.

Now we just need to set D=diam(G) and run the above procedure. We don't know what diam(G) is, but we can get a pretty tight range for it, as for any x, M[x] <= diam(G) <= 2M[x]. Pick a few x to start, compute M[x] for each, and compute upper and lower bounds on diam(G) as a result. We can then do binary search in the resulting range, using the above procedure to find a path of the guessed length, if any.

Of course, this is undirected only. I think you could do a similar scheme with directed graphs. The bad nodes are those which can reach x in less than D-M[x], and the upper bound on diam(G) doesn't work so you'd need a larger binary search range.

Solution 5 - Algorithm

Not sure if it fits the bill, but interesting:

HADI: Fast Diameter Estimation and Mining in Massive Graphs with Hadoop

U. Kang, C. Tsourakakis, A. P. Appel, C. Faloutsos, J. Leskovec, “HADI: Fast Diameter Estimation and Mining in Massive Graphs with Hadoop”, CMU ML Tech Report CMU-ML-08-117, 2008.

Solution 6 - Algorithm

if the graph is a tree (and undirected). you can simply run 2 dfs's. Start at a random node u and dfs to find farthest node v. Then start at v and find farthest length. That length is optimal

Solution 7 - Algorithm

Pick a vertex v and do BFS(v), this will calculate the distance from v for all vertices. Get the longest distance. This is O(V+E)

Now run this algorithm for all v vertices and pick the maximum of these longest distances. Overall complexity: O(V*(V+E))

Solution 8 - Algorithm

I really doubt that there is any method of finding the longest-shortest path without having to use some sort of all pairs shortest path algorithm (finding single source shortest path repeatedly is basically doing all pairs in the worst case).

'Diameter' becomes hard to define in terms of the 'longest path' if the graph is not a tree or a DAG. The 'longest' path can be infinite if there are cycles in the graph. Hence a simple traversal of the graph cannot yield the longest path over all nodes. Since you have already stated that your graph is not necessarily acyclic, and you are interested in the "longest shortest" path, there does not seem to be any method that can avoid finding the shortest path for all nodes. Using Johnson's Algorithm, as Agor suggested, is probably best for doing that.

You can of course go with a heuristics based approach. The algorithm that uses pseudo-peripheral vertex appears to be the most commonly used approach.

Solution 9 - Algorithm

Forgive me if my answer is not correct in terms of syntax but my Algorithm course was a while ago (and not in English).

If I understand your problem correctly, you want to know what is the highest number you can count up to by starting from node A and reaching node B without "retracing" your steps. If this is the case, I would imagine your graph as acyclic (the cyclic option comes later).

First of all, the upper limit is the number of edges. How I see the thing is: take one node, create a tree where the node is at the root and each subsequent node you can reach is at the next level. The height of the tree you build is the diameter, and the leaves are the nodes that are at that distance. If that distance = number of edges you're done. If not, pick another node and repeat.

I think it's similar to the building of a breadth-first search. Not knowing much else about the graph you could employ some heuristics to see which tree orientation (i.e. which node should be picked first) would be better, but that's another topic.

Regarding the cyclicity of the graph -- as others have pointed out those can lead to infinite loops. A way to get rid of those could be to 'rule out' the nodes that belong to a cycle and adding the longest path between them as the value you'll get by entering the cycle and coming out of it, touch each node only once.

Now, as I said this method could very easily be the same as doing all-paires shortest path. Worst case complexity is certainly the same, and could not be otherwise.

Solution 10 - Algorithm

One way to obtain an estimate of this number is to start at a random point, and do a breadth-first "grassfire" algorithm, marking the shortest distance to each node. The longest distance here is your estimate.

Running this extremely fast algorithm multiple times with different starting points and then taking the maximum will increase the accuracy of the estimate, and, of course, give you a decent lower bound. Depending on the distribution and connectivity of your graph, this estimate may even be accurate!

If your graph is large enough, asymptotic analysis of how the estimate changes as you add more samples might allow you to project to an even better guess.

If you're interested in an exact answer, it seems unlikely that you can get away with cutting corners too much unless your graph is easy to partition into components that are weakly connected with each other - in which case you can restrict your search to shortest path between all pairs of vertices in different components.

Solution 11 - Algorithm

You may not have to compute ALL the pairs, because in an non-directed graph there is an upper bound available, and it can be driven downward.

When a Breath-First-Search (BFS) is done from an arbitrary node, it can yield a list of all other nodes sorted by distance. Of course, the longest distance is a lower bound on the diameter and a candidate for it.

The top two distances added together is an upper bound on the diameter you seek. When taking these top two, you may exclude any nodes for which you have already done the BFS, since you already know the diameter candidates that use those nodes as an endpoint. By prioritizing the longer distance nodes to be the next nodes to do the BFS, the upper bound will eventually reach the lower bound. This may happen before you've done all the pairs.

Solution 12 - Algorithm

A dirty method:

We know that for a graph G(V,E) with |V|=n and |E|=m, Dijkstra algorithm runs in O(m+nlogn) and this is for a single source. For your all-pairs problem, you need to run Dijkstra for each node as a starting point.

However, if you have many machines, you can easily parallel this process.

This method is easiest to implement, definitely not very good.

Solution 13 - Algorithm

Yes there is a better method for finding the Diameter of the graph. Here I made a simple class to demonstrate it. The Vertices would be the Points on your graph.

public class MyTestClass
{
	//Simple Point struct
	struct Vertex
	{
		public float X, Y;
		public Vertex(float pX, float pY)
		{
			X = pX;
			Y = pY;
		}
	}

	//For getting the bounds of your graph
	struct BoundingBox
	{
		public float Left, Right, Bottom, Top;
		public BoundingBox(float pLeft, float pRight, float pBottom, float pTop)
		{
			Left = pLeft;
			Right = pRight;
			Bottom = pBottom;
			Top = pTop;
		}
	}

	//Properties
	Vertex[] vertices;
	BoundingBox bound;
	float diameter;

	//Constructor
	//Here is the fastest way to get the diameter >>
	public MyTestClass()
	{
		//Init objects
		vertices = new Vertex[100];
		for(int i = 0; i != vertices.Length; ++i) vertices[i] = new Vertex(i, i);
		bound = new BoundingBox(vertices[0].X, vertices[0].X, vertices[0].Y, vertices[0].Y);
		//Calculate BoundingBox
		for(int i = 0; i != vertices.Length; ++i)
		{
			bound.Left = (vertices[i].X <= bound.Left) ? vertices[i].X:bound.Left;
			bound.Right = (vertices[i].X >= bound.Right) ? vertices[i].X:bound.Right;
			bound.Bottom = (vertices[i].Y <= bound.Bottom) ? vertices[i].Y:bound.Bottom;//NOTE: If Y is faces down, then flip bottom & top comparison
			bound.Top = (vertices[i].Y >= bound.Top) ? vertices[i].Y:bound.Top;
		}
		//Messure Size of the BoundingBox
		float vecX = (bound.Right-bound.Left);
		float vecY = (bound.Top-bound.Bottom);
		diameter = (float)System.Math.Sqrt((vecX*vecX) + (vecY*vecY));
	}
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionA. RexView Question on Stackoverflow
Solution 1 - AlgorithmagorenstView Answer on Stackoverflow
Solution 2 - AlgorithmjobView Answer on Stackoverflow
Solution 3 - AlgorithmAaron McDaidView Answer on Stackoverflow
Solution 4 - AlgorithmKeith RandallView Answer on Stackoverflow
Solution 5 - AlgorithmGalacticJelloView Answer on Stackoverflow
Solution 6 - AlgorithmpeterView Answer on Stackoverflow
Solution 7 - AlgorithmKaroly HorvathView Answer on Stackoverflow
Solution 8 - AlgorithmMAKView Answer on Stackoverflow
Solution 9 - AlgorithmlorenzogView Answer on Stackoverflow
Solution 10 - AlgorithmJustin WView Answer on Stackoverflow
Solution 11 - AlgorithmJeffWilkinsonView Answer on Stackoverflow
Solution 12 - AlgorithmYin ZhuView Answer on Stackoverflow
Solution 13 - Algorithmzezba9000View Answer on Stackoverflow