How to prevent hangs on SocketInputStream.socketRead0 in Java?

JavaSocketsHttpTimeoutApache Httpclient-4.x

Java Problem Overview


Performing millions of HTTP requests with different Java libraries gives me threads hanged on:

java.net.SocketInputStream.socketRead0()

Which is native function.

I tried to set up Apche Http Client and RequestConfig to have timeouts on (I hope) everythig that is possible but still, I have (probably infinite) hangs on socketRead0. How to get rid of them?

Hung ratio is about ~1 per 10000 requests (to 10000 different hosts) and it can last probably forever (I've confirmed thread hung as still valid after 10 hours).

JDK 1.8 on Windows 7.

My HttpClient factory:

SocketConfig socketConfig = SocketConfig.custom()
            .setSoKeepAlive(false)
            .setSoLinger(1)
            .setSoReuseAddress(true)
            .setSoTimeout(5000)
            .setTcpNoDelay(true).build();

    HttpClientBuilder builder = HttpClientBuilder.create();
    builder.disableAutomaticRetries();
    builder.disableContentCompression();
    builder.disableCookieManagement();
    builder.disableRedirectHandling();
    builder.setConnectionReuseStrategy(new NoConnectionReuseStrategy());
    builder.setDefaultSocketConfig(socketConfig);

    return HttpClientBuilder.create().build();

My RequestConfig factory:

    HttpGet request = new HttpGet(url);

    RequestConfig config = RequestConfig.custom()
            .setCircularRedirectsAllowed(false)
            .setConnectionRequestTimeout(8000)
            .setConnectTimeout(4000)
            .setMaxRedirects(1)
            .setRedirectsEnabled(true)
            .setSocketTimeout(5000)
            .setStaleConnectionCheckEnabled(true).build();
    request.setConfig(config);

    return new HttpGet(url);

OpenJDK socketRead0 source

Note: Actually I have some "trick" - I can schedule .getConnectionManager().shutdown() in other Thread with cancellation of Future if request finished properly, but it is depracated and also it kills whole HttpClient, not only that single request.

Java Solutions


Solution 1 - Java

Though this question mentions Windows, I have the same problem on Linux. It appears there is a flaw in the way the JVM implements blocking socket timeouts:

To summarize, timeout for blocking sockets is implemented by calling poll on Linux (and select on Windows) to determine that data is available before calling recv. However, at least on Linux, both methods can spuriously indicate that data is available when it is not, leading to recv blocking indefinitely.

From poll(2) man page BUGS section:

> See the discussion of spurious readiness notifications under the BUGS section of select(2).

From select(2) man page BUGS section:

> Under Linux, select() may report a socket file descriptor as "ready > for reading", while nevertheless a subsequent read blocks. This could > for example happen when data has arrived but upon examination has > wrong checksum and is discarded. There may be other circumstances > in which a file descriptor is spuriously reported as ready. Thus it > may be safer to use O_NONBLOCK on sockets that should not block.

The Apache HTTP Client code is a bit hard to follow, but it appears that connection expiration is only set for HTTP keep-alive connections (which you've disabled) and is indefinite unless the server specifies otherwise. Therefore, as pointed out by oleg, the Connection eviction policy approach won't work in your case and can't be relied upon in general.

Solution 2 - Java

As Clint said, you should consider a Non-blocking HTTP client, or (seeing that you are using the Apache Httpclient) implement a Multithreaded request execution to prevent possible hangs of the main application thread (this not solve the problem but is better than restart your app because is freezed). Anyway, you set the setStaleConnectionCheckEnabled property but the stale connection check is not 100% reliable, from the Apache Httpclient tutorial:

> One of the major shortcomings of the classic blocking I/O model is > that the network socket can react to I/O events only when blocked in > an I/O operation. When a connection is released back to the manager, > it can be kept alive however it is unable to monitor the status of the > socket and react to any I/O events. If the connection gets closed on > the server side, the client side connection is unable to detect the > change in the connection state (and react appropriately by closing the > socket on its end). > > HttpClient tries to mitigate the problem by testing whether the > connection is 'stale', that is no longer valid because it was closed > on the server side, prior to using the connection for executing an > HTTP request. The stale connection check is not 100% reliable and adds > 10 to 30 ms overhead to each request execution.

The Apache HttpComponents crew recommends the implementation of a Connection eviction policy

> The only feasible solution that does not involve a one thread per > socket model for idle connections is a dedicated monitor thread used > to evict connections that are considered expired due to a long period > of inactivity. The monitor thread can periodically call > ClientConnectionManager#closeExpiredConnections() method to close all > expired connections and evict closed connections from the pool. It can > also optionally call ClientConnectionManager#closeIdleConnections() > method to close all connections that have been idle over a given > period of time.

Take a look at the sample code of the Connection eviction policy section and try to implement it in your application along with the Multithread request execution, I think the implementation of both mechanisms will prevent your undesired hangs.

Solution 3 - Java

You should consider a Non-blocking HTTP client like Grizzly or Netty which do not have blocking operations to hang a thread.

Solution 4 - Java

I have more than 50 machines that make about 200k requests/day/machine. They are running Amazon Linux AMI 2017.03. I previously had jdk1.8.0_102, now I have jdk1.8.0_131. I am using both apacheHttpClient and OKHttp as scraping libraries.

Each machine was running 50 threads, and sometimes, the threads get lost. After profiling with Youkit java profiler I got

ScraperThread42 State: RUNNABLE CPU usage on sample: 0ms
java.net.SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) SocketInputStream.java (native)
java.net.SocketInputStream.socketRead(FileDescriptor, byte[], int, int, int) SocketInputStream.java:116
java.net.SocketInputStream.read(byte[], int, int, int) SocketInputStream.java:171
java.net.SocketInputStream.read(byte[], int, int) SocketInputStream.java:141
okio.Okio$2.read(Buffer, long) Okio.java:139
okio.AsyncTimeout$2.read(Buffer, long) AsyncTimeout.java:211
okio.RealBufferedSource.indexOf(byte, long) RealBufferedSource.java:306
okio.RealBufferedSource.indexOf(byte) RealBufferedSource.java:300
okio.RealBufferedSource.readUtf8LineStrict() RealBufferedSource.java:196
okhttp3.internal.http1.Http1Codec.readResponse() Http1Codec.java:191
okhttp3.internal.connection.RealConnection.createTunnel(int, int, Request, HttpUrl) RealConnection.java:303
okhttp3.internal.connection.RealConnection.buildTunneledConnection(int, int, int, ConnectionSpecSelector) RealConnection.java:156
okhttp3.internal.connection.RealConnection.connect(int, int, int, List, boolean) RealConnection.java:112
okhttp3.internal.connection.StreamAllocation.findConnection(int, int, int, boolean) StreamAllocation.java:193
okhttp3.internal.connection.StreamAllocation.findHealthyConnection(int, int, int, boolean, boolean) StreamAllocation.java:129
okhttp3.internal.connection.StreamAllocation.newStream(OkHttpClient, boolean) StreamAllocation.java:98
okhttp3.internal.connection.ConnectInterceptor.intercept(Interceptor$Chain) ConnectInterceptor.java:42
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RealInterceptorChain.proceed(Request) RealInterceptorChain.java:67
okhttp3.internal.http.BridgeInterceptor.intercept(Interceptor$Chain) BridgeInterceptor.java:93
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(Interceptor$Chain) RetryAndFollowUpInterceptor.java:124
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RealInterceptorChain.proceed(Request) RealInterceptorChain.java:67
okhttp3.RealCall.getResponseWithInterceptorChain() RealCall.java:198
okhttp3.RealCall.execute() RealCall.java:83

I found out that they have a fix for this

https://bugs.openjdk.java.net/browse/JDK-8172578

in JDK 8u152 (early access). I have installed it on one of our machines. Now I am waiting to see some good results.

Solution 5 - Java

Given no one else responded so far, here is my take

Your timeout setting looks perfectly OK to me. The reason why certain requests appear to be constantly blocked in a java.net.SocketInputStream#socketRead0() call is likely to be due to a combination of misbehaving servers and your local configuration. Socket timeout defines a maximum period of inactivity between two consecutive i/o read operations (or in other words two consecutive incoming packets). Your socket timeout setting is 5,000 milliseconds. As long as the opposite endpoint keeps on sending a packet every 4,999 milliseconds for a chunk encoded message the request will never time out and will end up sending most of its time blocked in java.net.SocketInputStream#socketRead0(). You can find out whether or not this is the case by running HttpClient with wire logging turned on.

Solution 6 - Java

I bumped into the same issue using apache common http client.

There's a pretty simple workaround (which doesn't require shutting the connection manager down):

In order to reproduce it, one needs to execute the request from the question in a new thread paying attention to details:

  • run request in separate thread, close request and release it's connection in a different thread, interrupt hanging thread
  • don't run EntityUtils.consumeQuietly(response.getEntity()) in finally block (because it hangs on 'dead' connection)

First, add the interface

interface RequestDisposer {
    void dispose();
}

Execute an HTTP request in a new thread

final AtomicReference<RequestDisposer> requestDisposer = new AtomicReference<>(null);  

final Thread thread = new Thread(() -> {
    final HttpGet request = new HttpGet("http://my.url");
    final RequestDisposer disposer = () -> {
        request.abort();
        request.releaseConnection();
    };
    requestDiposer.set(disposer);
    
    try (final CloseableHttpResponse response = httpClient.execute(request))) {
        ...
    } finally {
      disposer.dispose();
    } 
};)
thread.start()

Call dispose() in the main thread to close hanging connection

requestDisposer.get().dispose(); // better check if it's not null first
thread.interrupt();
thread.join();

That fixed the issue for me.

My stacktrace looked like this:

java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139)
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155)
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284)
at org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:253)
at org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:227)
at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:186)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)

To whom it might be interesting, it easily reproducable, interrupt the thread without aborting request and releasing connection (ratio is about 1/100). Windows 10, version 10.0. jdk8.151-x64.

Solution 7 - Java

For Apache HTTP Client (blocking) I found best solution is to getConnectionManager(). and shutdown it.

So in high-reliability solution I just schedule shutdown in other thread and in case request does not complete I'm shutting in down from other thread

Solution 8 - Java

I faced the same issue today. Based on @Sergei Voitovich I've tried to make it work still using Apache Http Client.

Since I am using Java 8 its simpler to make a timeout to abort the connection.

Here's is a draft of the implementation:

private HttpResponse executeRequest(Request request){
    InterruptibleRequestExecution requestExecution = new InterruptibleRequestExecution(request, executor);
    ExecutorService executorService = Executors.newSingleThreadExecutor();
    try {
        return executorService.submit(requestExecution).get(<your timeout in milliseconds>, TimeUnit.MILLISECONDS);
    } catch (TimeoutException | ExecutionException e) {
        // Your request timed out, you can throw an exception here if you want
        throw new UsefulExceptionForYourApplication(e);
    } catch (InterruptedException e) {
        // Always remember to call interrupt after catching InterruptedException
        Thread.currentThread().interrupt();
        throw new UsefulExceptionForYourApplication(e);
    } finally {
        // This method forces to stop the Thread Pool (with single thread) created by Executors.newSingleThreadExecutor() and makes the pending request to abort inside the thread. So if the request is hanging in socketRead0 it will stop and also the thread will be terminated
        forceStopIdleThreadsAndRequests(requestExecution, executorService);
    }
}

private void forceStopIdleThreadsAndRequests(InterruptibleRequestExecution execution,
                                             ExecutorService executorService) {
    execution.abortRequest();
    executorService.shutdownNow();
}

The code above will create a new Thread to execute the request using org.apache.http.client.fluent.Executor. Timeout can be easily configured.

The execution of the thread is defined in InterruptibleRequestExecution which you can see below.

private static class InterruptibleRequestExecution implements Callable<HttpResponse> {
    private final Request request;
    private final Executor executor;
    private final RequestDisposer disposer;

    public InterruptibleRequestExecution(Request request, Executor executor) {
        this.request = request;
        this.executor = executor;
        this.disposer = request::abort;
    }

    @Override
    public HttpResponse call() {
        try {
            return executor.execute(request).returnResponse();
        } catch (IOException e) {
            throw new UsefulExceptionForYourApplication(e);
        } finally {
            disposer.dispose();
        }
    }

    public void abortRequest() {
        disposer.dispose();
    }

    @FunctionalInterface
    interface RequestDisposer {
        void dispose();
    }
}

The results are really good. We've had times where some connections where hanging in sockedRead0 for 7 hours! Now, it never passes the defined timeout and its working in production with millions of requests per day without having any problems.

Solution 9 - Java

I feel that all these answers are way too specific.

We have to note that this is probably a real JVM bug. It should be possible to get the file descriptor and close it. All this timeout-talk is too high level. You do not want a timeout to the extent that the connection fails, what you want is an ability to hard break this stuck thread and stop or interrupt it.

The way the JVM should implemented the SocketInputStream.socketRead function is to set some internal default timeout, which should be even as low as 1 second. Then when the timeout comes, immediately looping back to the socketRead0. While that is happening, the Thread.interrupt and Thread.stop commands can take effect.

The even better way of doing this of course is not to do any blocking wait at all, but instead use a the select(2) system call with a list of file descriptors and when any one has data available, let it perform the read operation.

Just look all over the internet all these people having trouble with threads stuck in java.net.SocketInputStream#socketRead0, it's the most popular topic about java.net.SocketInputStream hands down!

So, while the bug is not fixed, I wonder about the most dirty trick I can come up with to break up this situation. Something like connecting with the debugger interface to get to the stack frame of the socketRead call and grab the FileDescriptor and then break into that to get the int fd number and then make a native close(2) call on that fd.

Do we have a chance to do that? (Don't tell me "it's not good practice") -- if so, let's do it!

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionPiotr M&#252;llerView Question on Stackoverflow
Solution 1 - JavaTrevor RobinsonView Answer on Stackoverflow
Solution 2 - JavavzamanilloView Answer on Stackoverflow
Solution 3 - JavaClintView Answer on Stackoverflow
Solution 4 - JavaStefan MateiView Answer on Stackoverflow
Solution 5 - Javaok2cView Answer on Stackoverflow
Solution 6 - JavaSergei VoitovichView Answer on Stackoverflow
Solution 7 - JavaPiotr MüllerView Answer on Stackoverflow
Solution 8 - JavaEduardo BritoView Answer on Stackoverflow
Solution 9 - JavaGunther SchadowView Answer on Stackoverflow