403 Forbidden with Java but not web browser?

JavaHttp Status-Code-403

Java Problem Overview


I am writing a small Java program to get the amount of results for a given Google search term. For some reason, in Java I am getting a 403 Forbidden but I am getting the right results in web browsers. Code:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;


public class DataGetter {

	public static void main(String[] args) throws IOException {
		getResultAmount("test");
	}
	
	private static int getResultAmount(String query) throws IOException {
		BufferedReader r = new BufferedReader(new InputStreamReader(new URL("https://www.google.com/search?q=" + query).openConnection()
				.getInputStream()));
        String line;
        String src = "";
        while ((line = r.readLine()) != null) {
        	src += line;
        }
        System.out.println(src);
        return 1;
	}

}

And the error:

Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: https://www.google.com/search?q=test
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
	at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
	at DataGetter.getResultAmount(DataGetter.java:15)
	at DataGetter.main(DataGetter.java:10)

Why is it doing this?

Java Solutions


Solution 1 - Java

You just need to set user agent header for it to work:

URLConnection connection = new URL("https://www.google.com/search?q=" + query).openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();

BufferedReader r  = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));

StringBuilder sb = new StringBuilder();
String line;
while ((line = r.readLine()) != null) {
	sb.append(line);
}
System.out.println(sb.toString());

The SSL was transparently handled for you as could be seen from your exception stacktrace.

Getting the result amount is not really this simple though, after this you have to fake that you're a browser by fetching the cookie and parsing the redirect token link.

String cookie = connection.getHeaderField( "Set-Cookie").split(";")[0];
Pattern pattern = Pattern.compile("content=\\\"0;url=(.*?)\\\"");
Matcher m = pattern.matcher(response);
if( m.find() ) {
	String url = m.group(1);
	connection = new URL(url).openConnection();
	connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
	connection.setRequestProperty("Cookie", cookie );
	connection.connect();
	r  = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
	sb = new StringBuilder();
	while ((line = r.readLine()) != null) {
		sb.append(line);
	}
	response = sb.toString();
	pattern = Pattern.compile("<div id=\"resultStats\">About ([0-9,]+) results</div>");
	m = pattern.matcher(response);
	if( m.find() ) {
		long amount = Long.parseLong(m.group(1).replaceAll(",", ""));
		return amount;
	}

}

Running the full code I get 2930000000L as a result.

Solution 2 - Java

For me it worked by adding the header: "Accept": "*/*"

Solution 3 - Java

You probably aren't setting the correct headers. Use LiveHttpHeaders (or equivalent) in the browser to see what headers the browser is sending, then emulate them in your code.

Solution 4 - Java

It's because the site uses SSL. Try using the Jersey HTTP Client. You will probably also have to learn a little about HTTPS and the certificates, but I think Jersey can bet set to ignore most of the details relating to the actual security.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiontckmnView Question on Stackoverflow
Solution 1 - JavaEsailijaView Answer on Stackoverflow
Solution 2 - JavarpajazitiView Answer on Stackoverflow
Solution 3 - JavaKevin DayView Answer on Stackoverflow
Solution 4 - Javauser785262View Answer on Stackoverflow