Pyspark: Exception: Java gateway process exited before sending the driver its port number

Java Problem Overview

I'm trying to run pyspark on my macbook air. When i try starting it up I get the error:

Exception: Java gateway process exited before sending the driver its port number

when sc = SparkContext() is being called upon startup. I have tried running the following commands:

./bin/pyspark
./bin/spark-shell
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

with no avail. I have also looked here:

https://stackoverflow.com/questions/31825911/spark-python-java-gateway-process-exited-before-sending-the-driver-its-port

but the question has never been answered. Please help! Thanks.

Java Solutions

Solution 1 - Java

One possible reason is JAVA_HOME is not set because java is not installed.

I encountered the same issue. It says

Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/spark/launcher/Main : Unsupported major.minor version 51.0
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:643)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
	at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:296)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:406)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/spark/python/pyspark/conf.py", line 104, in __init__
    SparkContext._ensure_initialized()
  File "/opt/spark/python/pyspark/context.py", line 243, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/opt/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number

at sc = pyspark.SparkConf(). I solved it by running

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

which is from https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-get-on-ubuntu-16-04

Solution 2 - Java

this should help you

One solution is adding pyspark-shell to the shell environment variable PYSPARK_SUBMIT_ARGS:

export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

There is a change in python/pyspark/java_gateway.py , which requires PYSPARK_SUBMIT_ARGS includes pyspark-shell if a PYSPARK_SUBMIT_ARGS variable is set by a user.

Solution 3 - Java

Had this error message running pyspark on Ubuntu, got rid of it by installing the openjdk-8-jdk package

from pyspark import SparkConf, SparkContext
sc = SparkContext(conf=SparkConf().setAppName("MyApp").setMaster("local"))
^^^ error

Install Open JDK 8:

apt-get install openjdk-8-jdk-headless -qq

On MacOS

Same on Mac OS, I typed in a terminal:

$ java -version
No Java runtime present, requesting install.

I was prompted to install Java from the Oracle's download site, chose the MacOS installer, clicked on jdk-13.0.2_osx-x64_bin.dmg and after that checked that Java was installed

$ java -version
java version "13.0.2" 2020-01-14

EDIT To install JDK 8 you need to go to https://www.oracle.com/java/technologies/javase-jdk8-downloads.html (login required)

After that I was able to start a Spark context with pyspark.

Checking if it works

In Python:

from pyspark import SparkContext 
sc = SparkContext.getOrCreate() 

# check that it really works by running a job
# example from http://spark.apache.org/docs/latest/rdd-programming-guide.html#parallelized-collections
data = range(10000) 
distData = sc.parallelize(data)
distData.filter(lambda x: not x&1).take(10)
# Out: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Note that you might need to set the environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON and they have to be the same Python version as the Python (or IPython) you're using to run pyspark (the driver).

Solution 4 - Java

I use Mac OS. I fixed the problem!

Below is how I fixed it.

JDK8 seems works fine. (https://github.com/jupyter/jupyter/issues/248)

So I checked my JDK /Library/Java/JavaVirtualMachines, I only have jdk-11.jdk in this path.

I downloaded JDK8 (I followed the link). Which is:

brew tap caskroom/versions
brew cask install java8

After this, I added

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_202.jdk/Contents/Home
export JAVA_HOME="$(/usr/libexec/java_home -v 1.8)"

to ~/.bash_profile file. (you sholud check your jdk1.8 file name)

It works now! Hope this help :)

Solution 5 - Java

I will repost how I solved it here just for future references.

How I solved my similar problem

Prerequisite:

anaconda already installed
Spark already installed (https://spark.apache.org/downloads.html)
pyspark already installed (https://anaconda.org/conda-forge/pyspark)

Steps I did (NOTE: set the folder path accordingly to your system)

> 1. set the following environment variables. > 2. SPARK_HOME to 'C:\spark\spark-3.0.1-bin-hadoop2.7' > 3. set HADOOP_HOME to 'C:\spark\spark-3.0.1-bin-hadoop2.7' > 4. set PYSPARK_DRIVER_PYTHON to 'jupyter' > 5. set PYSPARK_DRIVER_PYTHON_OPTS to 'notebook' > 6. add 'C:\spark\spark-3.0.1-bin-hadoop2.7\bin;' to PATH system variable. > 7. Change the java installed folder directly under C: (Previously java was installed under Program files, so I re-installed directly > under C:) > 8. so my JAVA_HOME will become like this 'C:\java\jdk1.8.0_271'

now. it works !

Solution 6 - Java

Had the same issue with my iphython notebook (IPython 3.2.1) on Linux (ubuntu).

What was missing in my case was setting the master URL in the $PYSPARK_SUBMIT_ARGS environment like this (assuming you use bash):

export PYSPARK_SUBMIT_ARGS="--master spark://<host>:<port>"

e.g.

export PYSPARK_SUBMIT_ARGS="--master spark://192.168.2.40:7077"

You can put this into your .bashrc file. You get the correct URL in the log for the spark master (the location for this log is reported when you start the master with /sbin/start_master.sh).

Solution 7 - Java

After spending hours and hours trying many different solutions, I can confirm that Java 10 SDK causes this error. On Mac, please navigate to /Library/Java/JavaVirtualMachines then run this command to uninstall Java JDK 10 completely:

sudo rm -rf jdk-10.jdk/

After that, please download JDK 8 then the problem will be solved.

Solution 8 - Java

I had the same error with PySpark, and setting JAVA_HOME to Java 11 worked for me (it was originally set to 16). I'm using MacOS and PyCharm. You can check your current Java version by doing echo $JAVA_HOME.

Below is what worked for me. On my Mac I used the following homebrew command, but you can use a different method to install the desired Java version, depending on your OS.

# Install Java 11 (I believe 8 works too)
$ brew install openjdk@11

# Set JAVA_HOME by assigning the path where your Java is
$ export JAVA_HOME=/usr/local/opt/openjdk@11

Note: If you installed using homebrew and need to find the location of the path, you can do $ brew --prefix openjdk@11 and it should return a path like this: /usr/local/opt/openjdk@11

At this point, I could run my PySpark program from the terminal - however, my IDE (PyCharm) still had the same error until I globally changed the JAVA_HOME variable.

To update the variable, first check whether you're using the zsh or bash shell by running echo $SHELL on the command line. For zsh, you'll edit the ~/.zshenv file and for bash you'll edit the ~/.bash_profile.

# open the file
$ vim ~/.zshenv
OR
$ vim ~/.bash_profile

# once inside the file, set the variable with your Java path, then save and close the file
export JAVA_HOME=/usr/local/opt/openjdk@11

# test if it was set successfully
$ echo $JAVA_HOME
/usr/local/opt/openjdk@11

After this step, I could run PySpark through my PyCharm IDE as well.

Solution 9 - Java

Spark is very picky with the Java version you use. It is highly recommended that you use Java 1.8 (The open source AdoptOpenJDK 8 works well too). After install it, set JAVA_HOME to your bash variables, if you use Mac/Linux:

export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)

export PATH=$JAVA_HOME/bin:$PATH

Solution 10 - Java

I got the same Java gateway process exited......port number exception even though I set PYSPARK_SUBMIT_ARGS properly. I'm running Spark 1.6 and trying to get pyspark to work with IPython4/Jupyter (OS: ubuntu as VM guest).

While I got this exception, I noticed an hs_err_*.log was generated and it started with:

There is insufficient memory for the Java Runtime Environment to continue. Native memory allocation (malloc) failed to allocate 715849728 bytes for committing reserved memory.

So I increased the memory allocated for my ubuntu via VirtualBox Setting and restarted the guest ubuntu. Then this Java gateway exception goes away and everything worked out fine.

Solution 11 - Java

I got the same Exception: Java gateway process exited before sending the driver its port number in Cloudera VM when trying to start IPython with CSV support with a syntax error:

PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10.1.4.0

will throw the error, while:

PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10:1.4.0

will not.

The difference is in that last colon in the last (working) example, seperating the Scala version number from the package version number.

Solution 12 - Java

In my case this error came for the script which was running fine before. So I figured out that this might be due to my JAVA update. Before I was using java 1.8 but I had accidentally updated to java 1.9. When I switched back to java 1.8 the error disappeared and everything is running fine. For those, who get this error for the same reason but do not know how to switch back to older java version on ubuntu: run

sudo update-alternatives --config java

and make the selection for java version

Solution 13 - Java

I figured out the problem in Windows system. The installation directory for Java must not have blanks in the path such as in C:\Program Files. I re-installed Java in C\Java. I set JAVA_HOME to C:\Java and the problem went away.

Solution 14 - Java

If you are trying to run spark without hadoop binaries, you might encounter the above mentioned error. One solution is to :

download hadoop separatedly.
add hadoop to your PATH
add hadoop classpath to your SPARK install

The first two steps are trivial, the last step can be best done by adding the following in the $SPARK_HOME/conf/spark-env.sh in each spark node (master and workers)

### in conf/spark-env.sh ###

export SPARK_DIST_CLASSPATH=$(hadoop classpath)

for more info also check: https://spark.apache.org/docs/latest/hadoop-provided.html

Solution 15 - Java

I got this error because I was running low on disk space.

Solution 16 - Java

Had same issue, after installing java using below lines solved the issue !

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

Solution 17 - Java

I have the same error.

My trouble shooting procedures are:

Check out Spark source code.
Follow the error message. In my case: pyspark/java_gateway.py, line 93, in launch_gateway.
Check the code logic to find the root cause then you will resolve it.

In my case the issue is PySpark has no permission to create some temporary directory, so I just run my IDE with sudo

Solution 18 - Java

I have the same error in running pyspark in pycharm. I solved the problem by adding JAVA_HOME in pycharm's environment variables.

Solution 19 - Java

I had the same exception and I tried everything by setting and resetting all environment variables. But the issue in the end drilled down to space in appname property of spark session,that is, "SparkSession.builder.appName("StreamingDemo").getOrCreate()". Immediately after removing space from string given to appname property it got resolved.I was using pyspark 2.7 with eclipse on windows 10 environment. It worked for me. Enclosed are required screenshots.

Solution 20 - Java

For Linux (Ubuntu 18.04) with a JAVA_HOME issue, a key is to point it to the master folder:

Set Java 8 as default by: sudo update-alternatives --config java. If Jave 8 is not installed, install by: sudo apt install openjdk-8-jdk.
Set JAVA_HOME environment variable as the master java 8 folder. The location is given by the first command above removing jre/bin/java. Namely: export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64/". If done on the command line, this will be relevant only for the current session (ref: export command on Linux). To verify: echo $JAVA_HOME.
In order to have this permanently set, add the bolded line above to a file that runs before you start your IDE/Jupyter/python interpreter. This could be by adding the bolded line above to .bashrc. This file loads when a bash is started interactively ref: .bashrc

Solution 21 - Java

The error occured since JAVA is not installed on machine. Spark is developed in scala which usually runs on JAVA.

Try to install JAVA and execute the pyspark statements. It will works

Solution 22 - Java

This usually happens if you do not have java installed in your machine.

Go to command prompt and check the version of your java: type : java -version

you should get output sth like this

java version "1.8.0_241" Java(TM) SE Runtime Environment (build 1.8.0_241-b07) Java HotSpot(TM) 64-Bit Server VM (build 25.241-b07, mixed mode)

If not, go to orcale and download jdk. Check this video on how to download java and add it to the buildpath.

https://www.youtube.com/watch?v=f7rT0h1Q5Wo

Solution 23 - Java

After spending a good amount of time with this issue, I was able to solve this. I own MacOs Catalina, working on Pycharm in an Anaconda environment.

Spark currently supports only Java8. If you install Java through command line, it will by default install the latest Java10+ and would cause all sorts of troubles. To solve this, follow the below steps -

1. Make sure you have Homebrew, else install Homebrew
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

2. Install X-code
xcode-select –-install

3. Install Java8 through the official website (not through terminal)
https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html

4. Install Apache-Spark
 brew install apache-spark 

5. Install Pyspark and Findspark (if you have anaconda)
conda install -c conda-forge findspark 
conda install -c conda-forge/label/gcc7 findspark
conda install -c conda-forge pyspark

Viola! this should let you run PySpark without any issues

Solution 24 - Java

Step:1

Check the java vesrion on from the terminal.

java -version

If you see the bash: java: command not found,which mean you don't have java installed in your system.

Step:2

Install Java using the following command,

sudo apt-get install default-jdk

Step:3

No check java version, you'll see the version have been downloaded.

java -version

result:

openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.20.04, mixed mode, sharing)

Step:4

Now run the pyspark code, you'll never see such error.

Solution 25 - Java

There are many valuable hints here, however, none solved my problem completely so I will show the procedure that worked for me working in an Anaconda Jupyter Notebook on Windows:

Download and install java and pyspark in directories without blank spaces.
[maybe unnecessary] In the anaconda prompt, type where conda and where python and add the paths of the .exe files' directories to your Path variable using the Windows environmental variables tool. Add also the variables JAVA_HOME and SPARK_HOME there with their corresponding paths.
Even doing so, I had to set these variables manually from within the Notebook along with PYSPARK_SUBMIT_ARGS (use your own paths for SPARK_HOME and JAVA_HOME):

import os
os.environ["SPARK_HOME"] = r"C:\Spark\spark-3.2.0-bin-hadoop3.2"
os.environ["PYSPARK_SUBMIT_ARGS"] = "--master local[3] pyspark-shell"
os.environ["JAVA_HOME"] = r"C:\Java\jre1.8.0_311"

Install findspark from the notebook with !pip install findspark.
Run import findspark and findspark.init()
Run from pyspark.sql import SparkSession and spark = SparkSession.builder.getOrCreate()

Some useful links:

https://towardsdatascience.com/installing-apache-pyspark-on-windows-10-f5f0c506bea1

https://sparkbyexamples.com/pyspark/pyspark-exception-java-gateway-process-exited-before-sending-the-driver-its-port-number/

https://www.datacamp.com/community/tutorials/installing-anaconda-windows

Solution 26 - Java

I met this problem and actually not due to the JAVE_HOME setting. i assume you are using windows, and using Anaconda as your python tools. Please check whether you can use command prompt. I cannot run spark due to the crash of cmd. After fix this, spark can work well on my pc.

Solution 27 - Java

Worked hours on this. My problem was with Java 10 installation. I uninstalled it and installed Java 8, and now Pyspark works.

Solution 28 - Java

For me, the answer was to add two 'Content Roots' in 'File' -> 'Project Structure' -> 'Modules' (in IntelliJ):

YourPath\spark-2.2.1-bin-hadoop2.7\python
YourPath\spark-2.2.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip

Solution 29 - Java

This is an old thread but I'm adding my solution for those who use mac.

The issue was with the JAVA_HOME. You have to include this in your .bash_profile.

Check your java -version. If you downloaded the latest Java but it doesn't show up as the latest version, then you know that the path is wrong. Normally, the default path is export JAVA_HOME= /usr/bin/java.

So try changing the path to: /Library/Internet\ Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/bin/java

Alternatively you could also download the latest JDK. https://www.oracle.com/technetwork/java/javase/downloads/index.html and this will automatically replace usr/bin/java to the latest version. You can confirm this by doing java -version again.

Then that should work.

Solution 30 - Java

Make sure that both your Java directory (as found in your path) AND your Python interpreter reside in directories with no spaces in them. These were the cause of my problem.

Solution 31 - Java

In my case it was because I wrote SPARK_DRIVER_MEMORY=10 instead of SPARK_DRIVER_MEMORY=10g in spark-env.sh

Solution 32 - Java

There are so many reasons for this error. My reason is : the version of pyspark is incompatible with spark. pyspark version :2.4.0, but spark version is 2.2.0. it always cause python always fail when starting spark process. then spark cannot tell its ports to python. so error will be "Pyspark: Exception: Java gateway process exited before sending the driver its port number ".

I suggest you dive into source code to find out the real reasons when this error happens

Solution 33 - Java

I go this error fixed by using the below code. I had setup the SPARK_HOME though. You may follow this simple steps from eproblems website

spark_home = os.environ.get('SPARK_HOME', None)

Solution 34 - Java

Had the same issue when was trying to run the pyspark job triggered from the Airflow with remote spark.driver.host. The cause of the issue in my case was:

> Exception: Java gateway process exited before sending the driver its > port number

...

> Exception in thread "main" java.lang.Exception: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.

Fixed by adding exports:

export HADOOP_CONF_DIR=/etc/hadoop/conf

And the same environment variable added in the pyspark script:

import os
os.environ["HADOOP_CONF_DIR"] = '/etc/hadoop/conf'

Solution 35 - Java

I was getting this error when i was using jdk-1.8 32-bit switching to 64-bit works for me.

I was getting this error because 32-bit java could not allocate more than 3G heap memory required by the spark driver (16G):

builder = SparkSession.builder \
        .appName("Spark NLP") \
        .master("local[*]") \
        .config("spark.driver.memory", "16G") \
        .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
        .config("spark.kryoserializer.buffer.max", "1000M") \
        .config("spark.driver.maxResultSize", "0")

I tested making this up to 2G and it worked in 32-bit as well.

Solution 36 - Java

I had the same issue once when I brought up Spark using Docker container. It turned out I set wrong permission for the /tmp folder. If spark has no write permission on /tmp, it will cause this issue too.

Solution 37 - Java

You can simply run the following code in the terminal. Then, I hope this will resolve your error.

sudo apt-get install default-jdk

Solution 38 - Java

The error usually occurs when your system doesn't have java installed.

Check if you have java installed, open up the terminal and do java --version

It's always advisable to use brew install for installing packages. brew install openjdk@11 for installing java

Now that you have java installed, set the path globally depending on the shell you use: Z shell or bash.

cmd + shift + H: Go to home
cmd + shift + [.]: To see the hidden files (zshenv or bash_profile) and save either of the file under export JAVA_HOME=/usr/local/opt/openjdk@11

Solution 39 - Java

If you are using Jupyter notebook from the window machine.

just use the following code

spark =SparkSession.builder.appName('myapp').getOrCreate

Don't use like

spark =SparkSession.builder.appName('myapp').getOrCreate()

Content Type	Original Author	Original Content on Stackoverflow
Question	mt88	View Question on Stackoverflow
Solution 1 - Java	Old Panda	View Answer on Stackoverflow
Solution 2 - Java	Anup Ash	View Answer on Stackoverflow
Solution 3 - Java	user2314737	View Answer on Stackoverflow
Solution 4 - Java	shihs	View Answer on Stackoverflow
Solution 5 - Java	kitokid	View Answer on Stackoverflow
Solution 6 - Java	quax	View Answer on Stackoverflow
Solution 7 - Java	Kiem Nguyen	View Answer on Stackoverflow
Solution 8 - Java	answerzilla	View Answer on Stackoverflow
Solution 9 - Java	Marcelo Tournier	View Answer on Stackoverflow
Solution 10 - Java	Ida	View Answer on Stackoverflow
Solution 11 - Java	Pim Schaaf	View Answer on Stackoverflow
Solution 12 - Java	Pankaj Kumar	View Answer on Stackoverflow
Solution 13 - Java	Sujan Mukherjee	View Answer on Stackoverflow
Solution 14 - Java	natbusa	View Answer on Stackoverflow
Solution 15 - Java	Josh Terrell	View Answer on Stackoverflow
Solution 16 - Java	Soumyaansh	View Answer on Stackoverflow
Solution 17 - Java	J.Bo	View Answer on Stackoverflow
Solution 18 - Java	Joon	View Answer on Stackoverflow
Solution 19 - Java	A known	View Answer on Stackoverflow
Solution 20 - Java	Ran Feldesh	View Answer on Stackoverflow
Solution 21 - Java	Tarun Reddy	View Answer on Stackoverflow
Solution 22 - Java	Arjjun	View Answer on Stackoverflow
Solution 23 - Java	Sahana M	View Answer on Stackoverflow
Solution 24 - Java	Shritam Kumar Mund	View Answer on Stackoverflow
Solution 25 - Java	CAV	View Answer on Stackoverflow
Solution 26 - Java	Ray	View Answer on Stackoverflow
Solution 27 - Java	aghd	View Answer on Stackoverflow
Solution 28 - Java	Yuuura87	View Answer on Stackoverflow
Solution 29 - Java	noiivice	View Answer on Stackoverflow
Solution 30 - Java	Steven	View Answer on Stackoverflow
Solution 31 - Java	hayj	View Answer on Stackoverflow
Solution 32 - Java	ZhangXu	View Answer on Stackoverflow
Solution 33 - Java	abhishek kumar	View Answer on Stackoverflow
Solution 34 - Java	Artyom Rebrov	View Answer on Stackoverflow
Solution 35 - Java	muzamil	View Answer on Stackoverflow
Solution 36 - Java	kennyut	View Answer on Stackoverflow
Solution 37 - Java	Nisan Chhetri	View Answer on Stackoverflow
Solution 38 - Java	archit jain	View Answer on Stackoverflow
Solution 39 - Java	Shashi Kumar Singh	View Answer on Stackoverflow