Jsoup SocketTimeoutException: Read timed out

JavaJsoup

Java Problem Overview


I get a SocketTimeoutException when I try to parse a lot of HTML documents using Jsoup.

For example, I got a list of links :

<a href="www.domain.com/url1.html">link1</a>
<a href="www.domain.com/url2.html">link2</a>
<a href="www.domain.com/url3.html">link3</a>
<a href="www.domain.com/url4.html">link4</a>

For each link, I parse the document linked to the URL (from the href attribute) to get other pieces of information in those pages.

So I can imagine that it takes lot of time, but how to shut off this exception Here is the whole stack trace:

java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(Unknown Source)
	at java.io.BufferedInputStream.fill(Unknown Source)
	at java.io.BufferedInputStream.read1(Unknown Source)
	at java.io.BufferedInputStream.read(Unknown Source)
	at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
	at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
	at java.net.HttpURLConnection.getResponseCode(Unknown Source)
	at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:381)
	at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
	at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)
	at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132)
	at app.ForumCrawler.crawl(ForumCrawler.java:50)
	at Main.main(Main.java:15)

Java Solutions


Solution 1 - Java

I think you can do

Jsoup.connect("...").timeout(10 * 1000).get(); 

which sets timeout to 10s.

Solution 2 - Java

Ok - so, I tried to offer this as an edit to MarcoS's answer, but the edit was rejected. Nevertheless, the following information may be useful to future visitors:

According to the javadocs, the default timeout for an org.jsoup.Connection is 30 seconds.

As has already been mentioned, this can be set using timeout(int millis)

Also, as the OP notes in the edit, this can also be set using timeout(0). However, as the javadocs state:

> A timeout of zero is treated as an infinite timeout.

Solution 3 - Java

I had the same error:

java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)

and only setting .userAgent(Opera) worked for me.

So I used Connection userAgent(String userAgent) method of Connection class to set Jsoup user agent.

Something like:

Jsoup.connect("link").userAgent("Opera").get();

Solution 4 - Java

There is mistake on https://jsoup.org/apidocs/org/jsoup/Connection.html. Default timeout is not 30 seconds. It is 3 seconds. Just look at javadoc in codes. It says 3000 ms.

Solution 5 - Java

This should work: Jsoup.connect(url.toLowerCase()).timeout(0);.

Solution 6 - Java

Set timeout while connecting from jsoup.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionC. MaillardView Question on Stackoverflow
Solution 1 - JavaMarcoSView Answer on Stackoverflow
Solution 2 - JavaamaidmentView Answer on Stackoverflow
Solution 3 - Javainvzbl3View Answer on Stackoverflow
Solution 4 - JavaBartekView Answer on Stackoverflow
Solution 5 - JavaPrasanna MendonView Answer on Stackoverflow
Solution 6 - JavaGaurab PradhanView Answer on Stackoverflow