Downloading with chrome headless and selenium

PythonGoogle ChromeSeleniumGoogle Chrome-Headless

Python Problem Overview


I'm using python-selenium and Chrome 59 and trying to automate a simple download sequence. When I launch the browser normally, the download works, but when I do so in headless mode, the download doesn't work.

# Headless implementation
from selenium import webdriver

chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument("headless")

driver = webdriver.Chrome(chrome_options=chromeOptions)

driver.get('https://www.mockaroo.com/')
driver.find_element_by_id('download').click()
# ^^^ Download doesn't start

# Normal Mode
from selenium import webdriver

driver = webdriver.Chrome()

driver.get('https://www.mockaroo.com/')
driver.find_element_by_id('download').click()
# ^^^ Download works normally

I've even tried adding a default path:

prefs = {"download.default_directory" : "/Users/Chetan/Desktop/"}
chromeOptions.add_argument("headless")
chromeOptions.add_experimental_option("prefs",prefs)

Adding a default path works in the normal implementation, but the same problem persists in the headless version.

How do I get the download to start in headless mode?

Python Solutions


Solution 1 - Python

Yes, it's a "feature", for security. As mentioned before here is the bug discussion: https://bugs.chromium.org/p/chromium/issues/detail?id=696481

Support was added in chrome version 62.0.3196.0 or above to enable downloading.

Here is a python implementation. I had to add the command to the chromedriver commands. I will try to submit a PR so it is included in the library in the future.

def enable_download_in_headless_chrome(self, driver, download_dir):
    # add missing support for chrome "send_command"  to selenium webdriver
    driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')

    params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': download_dir}}
    command_result = driver.execute("send_command", params)

For reference here is a little repo to demonstrate how to use this: https://github.com/shawnbutton/PythonHeadlessChrome

update 2020-05-01 There have been comments saying this is not working anymore. Given this patch is now over a year old it's quite possible they have changed the underlying library.

Solution 2 - Python

Here's a working example for Python based on Shawn Button's answer. I've tested this with Chromium 68.0.3440.75 & chromedriver 2.38

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_experimental_option("prefs", {
  "download.default_directory": "/path/to/download/dir",
  "download.prompt_for_download": False,
})

chrome_options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=chrome_options)

driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': "/path/to/download/dir"}}
command_result = driver.execute("send_command", params)

driver.get('http://download-page.url/')
driver.find_element_by_css_selector("#download_link").click()

Solution 3 - Python

This is a feature of Chrome to prevent from software to download files to your computer. There is a workaround though. Read more about it here.

What you need to do is enable it via DevTools, Something like that:

async function setDownload () {
  const client = await CDP({tab: 'ws://localhost:9222/devtools/browser'});
  const info =  await client.send('Browser.setDownloadBehavior', {behavior : "allow", downloadPath: "/tmp/"});
  await client.close();
}

This is the solution some one gave in the mentioned topic. Here is his comment.

Solution 4 - Python

UPDATED PYTHON SOLUTION - TESTED Mar 4, 2021 on chromedriver v88 and v89

This will allow you to click to download files in headless mode.

    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.chrome.options import Options

    # Instantiate headless driver
    chrome_options = Options()

    # Windows path
    chromedriver_location = 'C:\\path\\to\\chromedriver_win32\\chromedriver.exe'
    # Mac path. May have to allow chromedriver developer in os system prefs
    '/Users/path/to/chromedriver'

    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    
    chrome_prefs = {"download.default_directory": r"C:\path\to\Downloads"} # (windows)
    chrome_options.experimental_options["prefs"] = chrome_prefs

    driver = webdriver.Chrome(chromedriver_location,options=chrome_options)

    # Download your file
    driver.get('https://www.mockaroo.com/')
    driver.find_element_by_id('download').click()

Solution 5 - Python

Maybe the website that you handle returns different HTML pages for browsers, means the XPath or Id that you want maybe differently in headless browser. Try to download pageSource in headless browser and open it as HTML page to see the Id or XPath that you want. You can see this as c# example https://stackoverflow.com/questions/45233065/how-to-hide-firefoxdriver-using-selenium-without-findelement-function-error-in/45649146#45649146 .

Solution 6 - Python

Usually it's redundant seeing the same thing just written in another language, but because this issue drove me crazy, I hope I'm saving someone else from the pain... so here's the C# version of Shawn Button's answer (tested with headless chrome=71.0.3578.98, chromedriver=2.45.615279, platform=Linux 4.9.125-linuxkit x86_64)):

            var enableDownloadCommandParameters = new Dictionary<string, object>
            {
                { "behavior", "allow" },
                { "downloadPath", downloadDirectoryPath }
            };
            var result = ((OpenQA.Selenium.Chrome.ChromeDriver)driver).ExecuteChromeCommandWithResult("Page.setDownloadBehavior", enableDownloadCommandParameters);

Solution 7 - Python

A full working example for JavaScript with selenium-cucumber-js / selenium-webdriver:

const chromedriver = require('chromedriver');
const selenium = require('selenium-webdriver');
const command = require('selenium-webdriver/lib/command');
const chrome = require('selenium-webdriver/chrome');

module.exports = function() {

  const chromeOptions = new chrome.Options()
    .addArguments('--no-sandbox', '--headless', '--start-maximized', '--ignore-certificate-errors')
    .setUserPreferences({
      'profile.default_content_settings.popups': 0, // disable download file dialog
      'download.default_directory': '/tmp/downloads', // default file download location
      "download.prompt_for_download": false,
      'download.directory_upgrade': true,
      'safebrowsing.enabled': false,
      'plugins.always_open_pdf_externally': true,
      'plugins.plugins_disabled': ["Chrome PDF Viewer"]
    })
    .windowSize({width: 1600, height: 1200});

  const driver = new selenium.Builder()
    .withCapabilities({
      browserName: 'chrome',
      javascriptEnabled: true,
      acceptSslCerts: true,
      path: chromedriver.path
    })
    .setChromeOptions(chromeOptions)
    .build();

  driver.manage().window().maximize();

  driver.getSession()
    .then(session => {
      const cmd = new command.Command("SEND_COMMAND")
        .setParameter("cmd", "Page.setDownloadBehavior")
        .setParameter("params", {'behavior': 'allow', 'downloadPath': '/tmp/downloads'});
      driver.getExecutor().defineCommand("SEND_COMMAND", "POST", `/session/${session.getId()}/chromium/send_command`);
      return driver.execute(cmd);
    });

  return driver;
};

The key part is:

  driver.getSession()
    .then(session => {
      const cmd = new command.Command("SEND_COMMAND")
        .setParameter("cmd", "Page.setDownloadBehavior")
        .setParameter("params", {'behavior': 'allow', 'downloadPath': '/tmp/downloads'});
      driver.getExecutor().defineCommand("SEND_COMMAND", "POST", `/session/${session.getId()}/chromium/send_command`);
      return driver.execute(cmd);
    });

Tested with:

  • Chrome 67.0.3396.99
  • Chromedriver 2.36.540469
  • selenium-cucumber-js 1.5.12
  • selenium-webdriver 3.0.0

Solution 8 - Python

Following is the equivalent in Java, selenium, chromedriver and chrome v 71.x. The code in is the key to allow saving of downloads Additional jars: com.fasterxml.jackson.core, com.fasterxml.jackson.annotation, com.fasterxml.jackson.databind

System.setProperty("webdriver.chrome.driver","C:\libraries\chromedriver.exe");

			String downloadFilepath = "C:\\Download";
			HashMap<String, Object> chromePreferences = new HashMap<String, Object>();
			chromePreferences.put("profile.default_content_settings.popups", 0);
			chromePreferences.put("download.prompt_for_download", "false");
			chromePreferences.put("download.default_directory", downloadFilepath);
			ChromeOptions chromeOptions = new ChromeOptions();
			chromeOptions.setBinary("C:\\pathto\\Chrome SxS\\Application\\chrome.exe");
			
			//ChromeOptions options = new ChromeOptions();
			//chromeOptions.setExperimentalOption("prefs", chromePreferences);
			chromeOptions.addArguments("start-maximized");
			chromeOptions.addArguments("disable-infobars");
			
			
			//HEADLESS CHROME
			**chromeOptions.addArguments("headless");**
			
			chromeOptions.setExperimentalOption("prefs", chromePreferences);
			DesiredCapabilities cap = DesiredCapabilities.chrome();
			cap.setCapability(CapabilityType.ACCEPT_SSL_CERTS, true);
			cap.setCapability(ChromeOptions.CAPABILITY, chromeOptions);
			
			**ChromeDriverService driverService = ChromeDriverService.createDefaultService();
			ChromeDriver driver = new ChromeDriver(driverService, chromeOptions);
			 
			Map<String, Object> commandParams = new HashMap<>();
			commandParams.put("cmd", "Page.setDownloadBehavior");
			Map<String, String> params = new HashMap<>();
			params.put("behavior", "allow");
			params.put("downloadPath", downloadFilepath);
			commandParams.put("params", params);
			ObjectMapper objectMapper = new ObjectMapper();
			HttpClient httpClient = HttpClientBuilder.create().build();
			String command = objectMapper.writeValueAsString(commandParams);
			String u = driverService.getUrl().toString() + "/session/" + driver.getSessionId() + "/chromium/send_command";
			HttpPost request = new HttpPost(u);
			request.addHeader("content-type", "application/json");
			request.setEntity(new StringEntity(command));**
			try {
				httpClient.execute(request);
			} catch (IOException e2) {
				// TODO Auto-generated catch block
				e2.printStackTrace();
			}**
			 
		//Continue using the driver for automation	
	driver.manage().window().maximize();

Solution 9 - Python

I solved this problem by using the workaround shared by @Shawn Button and using the full path for the 'downloadPath' parameter. Using a relative path did not work and give me the error.

Versions:
Chrome Version 75.0.3770.100 (Official Build) (32-bit)
ChromeDriver 75.0.3770.90

Solution 10 - Python

Using: google-chrome-stable amd64 86.0.4240.111-1,chromedriver 86.0.4240.22, selenium 3.141.0 python 3.8.3

Tried multiple proposed solutions, and nothing really worked for chrome headless, also my testing website opens a new blank tab and then the data is downloaded.

Finally gave up on headless and implemented pyvirtualdisplay and xvfd to emulate X server, something like:

from selenium.webdriver.chrome.options import Options # and other imports
import selenium.webdriver as webdriver
import tempfile

url = "https://really_badly_programmed_website.org"

tmp_dir = tempfile.mkdtemp(prefix="hamster_")

driver_path="/usr/bin/chromedriver"

chrome_options = Options() 
chrome_options.binary_location = "/usr/bin/google-chrome"

prefs = {'download.default_directory': tmp_dir,}
chrome_options.add_experimental_option("prefs", prefs)

with Display(backend="xvfb",size=(1920,1080),color_depth=24) as disp:

    driver = webdriver.Chrome(options=chrome_options, executable_path=driver_path)
    driver.get(url)

At the end everything worked and had the dowload file on the tmp folder.

Solution 11 - Python

I finally got it to work by upgrading to Chromium 90! I previously had version 72-78, but I saw that it had been fixed recently: https://bugs.chromium.org/p/chromium/issues/detail?id=696481 so i decided to give it a shot.

So after upgrading, which took a while (home brew in MacOS is so slow...), I simply did, without setting options or anything (this is a JavaScript example):

await driver.findElement(By.className('download')).click();

And it worked! I saw the downloaded PDF in the same working folder that I had been trying to download for a long time...

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTheChetanView Question on Stackoverflow
Solution 1 - PythonShawn ButtonView Answer on Stackoverflow
Solution 2 - PythonFayçalView Answer on Stackoverflow
Solution 3 - PythonSome1ElseView Answer on Stackoverflow
Solution 4 - PythongannagainzView Answer on Stackoverflow
Solution 5 - PythonHazemView Answer on Stackoverflow
Solution 6 - PythonvictorvartanView Answer on Stackoverflow
Solution 7 - PythonMykhailoView Answer on Stackoverflow
Solution 8 - PythonManasi VoraView Answer on Stackoverflow
Solution 9 - PythonMatheus AraujoView Answer on Stackoverflow
Solution 10 - PythonJorge MendesView Answer on Stackoverflow
Solution 11 - PythonJasonView Answer on Stackoverflow