http keep-alive in the modern age

Http Problem Overview

So according to the haproxy author, who knows a thing or two about http:

> Keep-alive was invented to reduce CPU > usage on servers when CPUs were 100 > times slower. But what is not said is > that persistent connections consume a > lot of memory while not being usable > by anybody except the client who > openned them. Today in 2009, CPUs are > very cheap and memory is still limited > to a few gigabytes by the architecture > or the price. If a site needs > keep-alive, there is a real problem. > Highly loaded sites often disable > keep-alive to support the maximum > number of simultaneous clients. The > real downside of not having keep-alive > is a slightly increased latency to > fetch objects. Browsers double the > number of concurrent connections on > non-keepalive sites to compensate for > this.

(from http://haproxy.1wt.eu/)

Is this in line with other peoples experience? ie without keep-alive - is the result barely noticable now? (its probably worth noting that with websockets etc - a connection is kept "open" regardless of keep-alive status anyway - for very responsive apps). Is the effect greater for people who are remote from the server - or if there are many artifacts to load from the same host when loading a page? (I would think things like CSS, images and JS are increasingly coming from cache friendly CDNs).

Thoughts?

(not sure if this is a serverfault.com thing, but I won't cross post until someone tells me to move it there).

Http Solutions

Solution 1 - Http

Hey since I'm the author of this citation, I'll respond :-)

There are two big issues on large sites : concurrent connections and latency. Concurrent connection are caused by slow clients which take ages to download contents, and by idle connection states. Those idle connection states are caused by connection reuse to fetch multiple objects, known as keep-alive, which is further increased by latency. When the client is very close to the server, it can make an intensive use of the connection and ensure it is almost never idle. However when the sequence ends, nobody cares to quickly close the channel and the connection remains open and unused for a long time. That's the reason why many people suggest using a very low keep-alive timeout. On some servers like Apache, the lowest timeout you can set is one second, and it is often far too much to sustain high loads : if you have 20000 clients in front of you and they fetch on average one object every second, you'll have those 20000 connections permanently established. 20000 concurrent connections on a general purpose server like Apache is huge, will require between 32 and 64 GB of RAM depending on what modules are loaded, and you can probably not hope to go much higher even by adding RAM. In practice, for 20000 clients you may even see 40000 to 60000 concurrent connections on the server because browsers will try to set up 2 to 3 connections if they have many objects to fetch.

If you close the connection after each object, the number of concurrent connections will dramatically drop. Indeed, it will drop by a factor corresponding to the average time to download an object by the time between objects. If you need 50 ms to download an object (a miniature photo, a button, etc...), and you download on average 1 object per second as above, then you'll only have 0.05 connection per client, which is only 1000 concurrent connections for 20000 clients.

Now the time to establish new connections is going to count. Far remote clients will experience an unpleasant latency. In the past, browsers used to use large amounts of concurrent connections when keep-alive was disabled. I remember figures of 4 on MSIE and 8 on Netscape. This would really have divided the average per-object latency by that much. Now that keep-alive is present everywhere, we're not seeing that high numbers anymore, because doing so further increases the load on remote servers, and browsers take care of protecting the Internet's infrastructure.

This means that with todays browsers, it's harder to get the non-keep-alive services as much responsive as the keep-alive ones. Also, some browsers (eg: Opera) use heuristics to try to use pipelinining. Pipelining is an efficient way of using keep-alive, because it almost eliminates latency by sending multiple requests without waiting for a response. I have tried it on a page with 100 small photos, and the first access is about twice as fast as without keep-alive, but the next access is about 8 times as fast, because the responses are so small that only latency counts (only "304" responses).

I'd say that ideally we should have some tunables in the browsers to make them keep the connections alive between fetched objects, and immediately drop it when the page is complete. But we're not seeing that unfortunately.

For this reason, some sites which need to install general purpose servers such as Apache on the front side and which have to support large amounts of clients generally have to disable keep-alive. And to force browsers to increase the number of connections, they use multiple domain names so that downloads can be parallelized. It's particularly problematic on sites making intensive use of SSL because the connection setup is even higher as there is one additional round trip.

What is more commonly observed nowadays is that such sites prefer to install light frontends such as haproxy or nginx, which have no problem handling tens to hundreds of thousands of concurrent connections, they enable keep-alive on the client side, and disable it on the Apache side. On this side, the cost of establishing a connection is almost null in terms of CPU, and not noticeable at all in terms of time. That way this provides the best of both worlds : low latency due to keep-alive with very low timeouts on the client side, and low number of connections on the server side. Everyone is happy :-)

Some commercial products further improve this by reusing connections between the front load balancer and the server and multiplexing all client connections over them. When the servers are close to the LB, the gain is not much higher than previous solution, but it will often require adaptations on the application to ensure there is no risk of session crossing between users due to the unexpected sharing of a connection between multiple users. In theory this should never happen. Reality is much different :-)

Solution 2 - Http

In the years since this was written (and posted here on stackoverflow) we now have servers such as nginx which are rising in popularity.

nginx for example can hold open 10,000 keep-alive connections in a single process with only 2.5 MB (megabytes) of RAM. In fact it's easy to hold open multiple thousands of connections with very little RAM, and the only limits you'll hit will be other limits such as the number of open file handles or TCP connections.

Keep-alive was a problem not because of any problem with the keep-alive spec itself but because of Apache's process-based scaling model and of keep-alives hacked into a server whose architecture wasn't designed to accommodate it.

Especially problematic is Apache Prefork + mod_php + keep-alives. This is a model where every single connection will continue to occupy all the RAM that a PHP process occupies, even if it's completely idle and only remains open as a keep-alive. This is not scalable. But servers don't have to be designed this way - there's no particular reason a server needs to keep every keep-alive connection in a separate process (especially not when every such process has a full PHP interpreter). PHP-FPM and an event-based server processing model such as that in nginx solve the problem elegantly.

Update 2015:

SPDY and HTTP/2 replace HTTP's keep-alive functionality with something even better: the ability not only to keep alive a connection and make multiple requests and responses over it, but for them to be multiplexed, so the responses can be sent in any order, and in parallel, rather than only in the order they were requested. This prevents slow responses blocking faster ones and removes the temptation for browsers to hold open multiple parallel connections to a single server. These technologies further highlight the inadequacies of the mod_php approach and the benefits of something like an event-based (or at the very least, multi-threaded) web server coupled separately with something like PHP-FPM.

Solution 3 - Http

my understanding was that it had little to do with CPU, but the latency in opening of repeated sockets to the other side of the world. even if you have infinite bandwidth, connect latency will slow down the whole process. amplified if your page has dozens of objects. even a persistent connection has a request/response latency but its reduced when you have 2 sockets as on average, one should be streaming data while the other could be blocking. Also, a router is never going to assume a socket connects before letting you write to it. It needs the full round trip handshake. again, i dont claim to be an expert, but this is how i always saw it. what would really be cool is a fully ASYNC protocol (no, not a fully sick protocol).

Solution 4 - Http

Very long keep-alives can be useful if you're using an "origin pull" CDN such as CloudFront or CloudFlare. In fact, this can work out to be faster than no CDN, even if you're serving completely dynamic content.

If you have long keep alives such that each PoP basically has a permanent connection to your server, then the first time users visit your site, they can do a fast TCP handshake with their local PoP instead of a slow handshake with you. (Light itself takes around 100ms to go half-way around the world via fiber, and establishing a TCP connection requires three packets to be passed back and forth. SSL requires three round-trips.)

Content Type	Original Author	Original Content on Stackoverflow
Question	Michael Neale	View Question on Stackoverflow
Solution 1 - Http	Willy Tarreau	View Answer on Stackoverflow
Solution 2 - Http	thomasrutter	View Answer on Stackoverflow
Solution 3 - Http	catchpolenet	View Answer on Stackoverflow
Solution 4 - Http	mjs	View Answer on Stackoverflow