How to manage log in session through headless chrome?

JavascriptCookiesWeb ScrapingHeadlessPuppeteer

Javascript Problem Overview


I want to create a scraper that:

  1. opens a headless browser,
  2. goes to a url,
  3. logs in (there is steam oauth),
  4. fills some inputs,
  5. and clicks 2 buttons.

My problem is that every new instance of headless browser clears my login session, and then I need to login again and again...

How to save it through instances? (using puppeteer with headless chrome)

Or how can I open already logged in chrome headless instance? (if I have already logged in in my main chrome window)

Javascript Solutions


Solution 1 - Javascript

There is an option to save user data using the userDataDir option when launching puppeteer. This stores the session and other things related to launching chrome.

puppeteer.launch({
  userDataDir: "./user_data"
});

It doesn't go into great detail but here's a link to the docs for it: https://pptr.dev/#?product=Puppeteer&version=v1.6.1&show=api-puppeteerlaunchoptions

Solution 2 - Javascript

In puppeter you have access to the session cookies through page.cookies().

So once you log in, you could get every cookie and save it in a json file:

const fs = require(fs);
const cookiesFilePath = 'cookies.json';
// Save Session Cookies
const cookiesObject = await page.cookies()
// Write cookies to temp file to be used in other profile pages
fs.writeFile(cookiesFilePath, JSON.stringify(cookiesObject),
 function(err) { 
  if (err) {
  console.log('The file could not be written.', err)
  }
  console.log('Session has been successfully saved')
})

Then, on your next iteration right before using page.goto() you can call page.setCookie() to load the cookies from the file one by one:

const previousSession = fs.existsSync(cookiesFilePath)
if (previousSession) {
  // If file exist load the cookies
  const cookiesString = fs.readFileSync(cookiesFilePath);
  const parsedCookies = JSON.parse(cookiesString);
  if (parsedCookies.length !== 0) {
    for (let cookie of parsedCookies) {
      await page.setCookie(cookie)
    }
    console.log('Session has been loaded in the browser')
  }
}

Checkout the docs:

Solution 3 - Javascript

For a version of the above solution that actually works and doesn't rely on jsonfile (instead using the more standard fs) check this out:

Setup:

const fs = require('fs');
const cookiesPath = "cookies.txt";

Reading the cookies (put this code first):

// If the cookies file exists, read the cookies.
const previousSession = fs.existsSync(cookiesPath)
if (previousSession) {
  const content = fs.readFileSync(cookiesPath);
  const cookiesArr = JSON.parse(content);
  if (cookiesArr.length !== 0) {
    for (let cookie of cookiesArr) {
      await page.setCookie(cookie)
    }
    console.log('Session has been loaded in the browser')
  }
}

Writing the cookies:

// Write Cookies
const cookiesObject = await page.cookies()
fs.writeFileSync(cookiesPath, JSON.stringify(cookiesObject));
console.log('Session has been saved to ' + cookiesPath);

Solution 4 - Javascript

For writing Cookies

async function writingCookies() {
const cookieArray = require(C.cookieFile); //C.cookieFile can be replaced by ('./filename.json')
await page.setCookie(...cookieArray);
await page.cookies(C.feedUrl); //C.url can be ('https://example.com')
}

For reading Cookies, for this, you've to install jsonfile in your project : npm install jsonfile

async function getCookies() {
const cookiesObject = await page.cookies();
jsonfile.writeFile('linkedinCookies.json', cookiesObject, { spaces: 2 },
  function (err) {
    if (err) {
      console.log('The Cookie file could not be written.', err);
    }
    console.log("Cookie file has been successfully saved in current working Directory : '" + process.cwd() + "'");
  })
}

Call these two functions using await and it will work for you.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAnton KurtinView Question on Stackoverflow
Solution 1 - JavascriptmeatherlyView Answer on Stackoverflow
Solution 2 - JavascriptEcoVirtualView Answer on Stackoverflow
Solution 3 - JavascriptDaniel PorteousView Answer on Stackoverflow
Solution 4 - JavascriptMeet MahajanView Answer on Stackoverflow