jsoup posting and cookie

JavaScreen ScrapingJsoup

Java Problem Overview


I'm trying to use jsoup to login to a site and then scrape information, I am running into in a problem, I can login successfully and create a Document from index.php but I cannot get other pages on the site. I know I need to set a cookie after I post and then load it when I'm trying to open another page on the site. But how do I do this? The following code lets me login and get index.php

Document doc = Jsoup.connect("http://www.example.com/login.php")
               .data("username", "myUsername", 
                     "password", "myPassword")
               .post();

I know I can use apache httpclient to do this but I don't want to.

Java Solutions


Solution 1 - Java

When you login to the site, it is probably setting an authorised session cookie that needs to be sent on subsequent requests to maintain the session.

You can get the cookie like this:

Connection.Response res = Jsoup.connect("http://www.example.com/login.php")
    .data("username", "myUsername", "password", "myPassword")
    .method(Method.POST)
    .execute();

Document doc = res.parse();
String sessionId = res.cookie("SESSIONID"); // you will need to check what the right cookie name is

And then send it on the next request like:

Document doc2 = Jsoup.connect("http://www.example.com/otherPage")
    .cookie("SESSIONID", sessionId)
    .get();

Solution 2 - Java

//This will get you the response.
Response res = Jsoup
    .connect("loginPageUrl")
    .data("loginField", "[email protected]", "passField", "pass1234")
    .method(Method.POST)
    .execute();

//This will get you cookies
Map<String, String> loginCookies = res.cookies();

//And this is the easiest way I've found to remain in session
Document doc = Jsoup.connect("urlYouNeedToBeLoggedInToAccess")
      .cookies(loginCookies)
      .get();

Solution 3 - Java

Where the code was:

Document doc = Jsoup.connect("urlYouNeedToBeLoggedInToAccess").cookies().get(); 

I was having difficulties until I changed it to:

Document doc = Jsoup.connect("urlYouNeedToBeLoggedInToAccess").cookies(cookies).get();

Now it is working flawlessly.

Solution 4 - Java

Here is what you can try...

import org.jsoup.Connection;


Connection.Response res = null;
    try {
        res = Jsoup
                .connect("http://www.example.com/login.php")
                .data("username", "your login id", "password", "your password")
                .method(Connection.Method.POST)
                .execute();
    } catch (IOException e) {
        e.printStackTrace();
    }

Now save all your cookies and make request to the other page you want.

//Store Cookies
cookies = res.cookies();

Making request to another page.

try {
    Document doc = Jsoup.connect("your-second-page-link").cookies(cookies).get();
}
catch(Exception e){
    e.printStackTrace();
}

Ask if further help needed.

Solution 5 - Java

Connection.Response res = Jsoup.connect("http://www.example.com/login.php")
    .data("username", "myUsername")
    .data("password", "myPassword")
    .method(Connection.Method.POST)
    .execute();
//Connecting to the server with login details
Document doc = res.parse();
//This will give the redirected file
Map<String,String> cooki=res.cookies();
//This gives the cookies stored into cooki
Document docs= Jsoup.connect("http://www.example.com/otherPage")
    .cookies(cooki)
    .get();
//This gives the data of the required website

Solution 6 - Java

Why reconnect? if there are any cookies to avoid 403 Status i do so.

        		Document doc = null;
        		int statusCode = -1;
        		String statusMessage = null;
        		String strHTML = null;
        
        		try {
    // connect one time.    			
        			Connection con = Jsoup.connect(urlString);
    // get response.
        			Connection.Response res = con.execute();		
    // get cookies
        			Map<String, String> loginCookies = res.cookies();

    // print cookie content and status message
        			if (loginCookies != null) {
        				for (Map.Entry<String, String> entry : loginCookies.entrySet()) {
        					System.out.println(entry.getKey() + ":" + entry.getValue().toString() + "\n");
        				}
        			}
        
        			statusCode = res.statusCode();
        			statusMessage = res.statusMessage();
        			System.out.print("Status CODE\n" + statusCode + "\n\n");
        			System.out.print("Status Message\n" + statusMessage + "\n\n");
        
    // set login cookies to connection here
        			con.cookies(loginCookies).userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0");
        
    // now do whatever you want, get document for example
        			doc = con.get();
    // get HTML
        			strHTML = doc.head().html();

        		} catch (org.jsoup.HttpStatusException hse) {
        			hse.printStackTrace();
        		} catch (IOException ioe) {
        			ioe.printStackTrace();
        		}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGwindowView Question on Stackoverflow
Solution 1 - JavaJonathan HedleyView Answer on Stackoverflow
Solution 2 - JavaIgor Brusamolin Lobo SantosView Answer on Stackoverflow
Solution 3 - Javauser1935501View Answer on Stackoverflow
Solution 4 - JavaiamvinitkView Answer on Stackoverflow
Solution 5 - JavaSandeshView Answer on Stackoverflow
Solution 6 - JavaHeinrich SiggelkowView Answer on Stackoverflow