Spark - Error "A master URL must be set in your configuration" when submitting an app

ScalaApache Spark

Scala Problem Overview


I have an Spark app which runs with no problem in local mode,but have some problems when submitting to the Spark cluster.

The error msg are as follows:

16/06/24 15:42:06 WARN scheduler.TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2, cluster-node-02): java.lang.ExceptionInInitializerError
	at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579)
	at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579)
	at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)
	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:401)
	at GroupEvolutionES$.<init>(GroupEvolutionES.scala:37)
	at GroupEvolutionES$.<clinit>(GroupEvolutionES.scala)
	... 14 more

16/06/24 15:42:06 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, cluster-node-02): java.lang.NoClassDefFoundError: Could not initialize class GroupEvolutionES$
	at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579)
	at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579)
	at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)
	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

In the above code, GroupEvolutionES is the main class. The error msg says "A master URL must be set in your configuration", but I have provided the "--master" parameter to spark-submit.

Anyone who knows how to fix this problem?

Spark version: 1.6.1

Scala Solutions


Solution 1 - Scala

The TLDR:

.config("spark.master", "local")

a list of the options for spark.master in spark 2.2.1

I ended up on this page after trying to run a simple Spark SQL java program in local mode. To do this, I found that I could set spark.master using:

SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL basic example")
.config("spark.master", "local")
.getOrCreate();

An update to my answer:

To be clear, this is not what you should do in a production environment. In a production environment, spark.master should be specified in one of a couple other places: either in $SPARK_HOME/conf/spark-defaults.conf (this is where cloudera manager will put it), or on the command line when you submit the app. (ex spark-submit --master yarn).

If you specify spark.master to be 'local' in this way, spark will try to run in a single jvm, as indicated by the comments below. If you then try to specify --deploy-mode cluster, you will get an error 'Cluster deploy mode is not compatible with master "local"'. This is because setting spark.master=local means that you are NOT running in cluster mode.

Instead, for a production app, within your main function (or in functions called by your main function), you should simply use:

SparkSession
.builder()
.appName("Java Spark SQL basic example")
.getOrCreate();

This will use the configurations specified on the command line/in config files.

Also, to be clear on this too: --master and "spark.master" are the exact same parameter, just specified in different ways. Setting spark.master in code, like in my answer above, will override attempts to set --master, and will override values in spark-defaults.conf, so don't do it in production. Its great for tests though.

also, see this answer. which links to a list of the options for spark.master and what each one actually does.

a list of the options for spark.master in spark 2.2.1

Solution 2 - Scala

Worked for me after replacing

SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME");

with

SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME").setMaster("local[2]").set("spark.executor.memory","1g");

Found this solution on some other thread on stackoverflow.

Solution 3 - Scala

Where is the sparkContext object defined, is it inside the main function?

I too faced the same problem, the mistake which i did was i initiated the sparkContext outside the main function and inside the class.

When I initiated it inside the main function, it worked fine.

Solution 4 - Scala

The default value of "spark.master" is spark://HOST:PORT, and the following code tries to get a session from the standalone cluster that is running at HOST:PORT, and expects the HOST:PORT value to be in the spark config file.

SparkSession spark = SparkSession
    .builder()
    .appName("SomeAppName")
    .getOrCreate();

"org.apache.spark.SparkException: A master URL must be set in your configuration" states that HOST:PORT is not set in the spark configuration file.

To not bother about value of "HOST:PORT", set spark.master as local

SparkSession spark = SparkSession
    .builder()
    .appName("SomeAppName")
    .config("spark.master", "local")
    .getOrCreate();

Here is the link for list of formats in which master URL can be passed to spark.master

Reference : Spark Tutorial - Setup Spark Ecosystem

Solution 5 - Scala

just add .setMaster("local") to your code as shown below:

val conf = new SparkConf().setAppName("Second").setMaster("local") 

It worked for me ! Happy coding !

Solution 6 - Scala

If you are running a standalone application then you have to use SparkContext instead of SparkSession

val conf = new SparkConf().setAppName("Samples").setMaster("local")
val sc = new SparkContext(conf)
val textData = sc.textFile("sample.txt").cache()

Solution 7 - Scala

Replacing :

SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME");
WITH
SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME").setMaster("local[2]").set("spark.executor.memory","1g");

Did the magic.

Solution 8 - Scala

How does spark context in your application pick the value for spark master?

  • You either provide it explcitly withing SparkConf while creating SC.
  • Or it picks from the System.getProperties (where SparkSubmit earlier put it after reading your --master argument).

Now, SparkSubmit runs on the driver -- which in your case is the machine from where you're executing the spark-submit script. And this is probably working as expected for you too.

However, from the information you've posted it looks like you are creating a spark context in the code that is sent to the executor -- and given that there is no spark.master system property available there, it fails. (And you shouldn't really be doing so, if this is the case.)

Can you please post the GroupEvolutionES code (specifically where you're creating SparkContext(s)).

Solution 9 - Scala

I had the same problem, Here is my code before modification :

package com.asagaama

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD

/**
  * Created by asagaama on 16/02/2017.
  */
object Word {

  def countWords(sc: SparkContext) = {
    // Load our input data
    val input = sc.textFile("/Users/Documents/spark/testscase/test/test.txt")
    // Split it up into words
    val words = input.flatMap(line => line.split(" "))
    // Transform into pairs and count
    val counts = words.map(word => (word, 1)).reduceByKey { case (x, y) => x + y }
    // Save the word count back out to a text file, causing evaluation.
    counts.saveAsTextFile("/Users/Documents/spark/testscase/test/result.txt")
  }

  def main(args: Array[String]) = {
    val conf = new SparkConf().setAppName("wordCount")
    val sc = new SparkContext(conf)
    countWords(sc)
  }

}

And after replacing :

val conf = new SparkConf().setAppName("wordCount")

With :

val conf = new SparkConf().setAppName("wordCount").setMaster("local[*]")

It worked fine !

Solution 10 - Scala

var appName:String ="test"
val conf = new SparkConf().setAppName(appName).setMaster("local[*]").set("spark.executor.memory","1g");
val sc =  SparkContext.getOrCreate(conf)
sc.setLogLevel("WARN")

Solution 11 - Scala

try this

make trait

import org.apache.spark.sql.SparkSession
trait SparkSessionWrapper {
   lazy val spark:SparkSession = {
      SparkSession
        .builder()
        .getOrCreate()
    }
}

extends it

object Preprocess extends SparkSessionWrapper {

Solution 12 - Scala

I used this SparkContext constructor instead, and errors were gone:

val sc = new SparkContext("local[*]", "MyApp")

Solution 13 - Scala

We are missing the setMaster("local[*]") to set. Once we added then problem get resolved.

Problem:

val spark = SparkSession
      .builder()
      .appName("Spark Hive Example")
      .config("spark.sql.warehouse.dir", warehouseLocation)
      .enableHiveSupport()
      .getOrCreate()

solution:

val spark = SparkSession
      .builder()
      .appName("Spark Hive Example")
      .config("spark.sql.warehouse.dir", warehouseLocation)
      .enableHiveSupport()
      .master("local[*]")
      .getOrCreate()

Solution 14 - Scala

Tried this option in learning Spark processing with setting up Spark context in local machine. Requisite 1)Keep Spark sessionr running in local 2)Add Spark maven dependency 3)Keep the input file at root\input folder 4)output will be placed at \output folder. Getting max share value for year. down load any CSV from yahoo finance https://in.finance.yahoo.com/quote/CAPPL.BO/history/ Maven dependency and Scala code below -

<dependencies>
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-core_2.11</artifactId>
			<version>2.4.3</version>
			<scope>provided</scope>
		</dependency>
	</dependencies>	  

object MaxEquityPriceForYear {
  def main(args: Array[String]) {
    val sparkConf = new SparkConf().setAppName("ShareMaxPrice").setMaster("local[2]").set("spark.executor.memory", "1g");
    val sc = new SparkContext(sparkConf);
    val input = "./input/CAPPL.BO.csv"
    val output = "./output"
    sc.textFile(input)
      .map(_.split(","))
      .map(rec => ((rec(0).split("-"))(0).toInt, rec(1).toFloat))
      .reduceByKey((a, b) => Math.max(a, b))
      .saveAsTextFile(output)
  }

Solution 15 - Scala

If you are using following code

 val sc = new SparkContext(master, "WordCount", System.getenv("SPARK_HOME"))

Then replace with following lines

  val jobName = "WordCount";
  val conf = new SparkConf().setAppName(jobName);
  val sc = new SparkContext(conf)

 

In Spark 2.0 you can use following code

val spark = SparkSession
  .builder()
  .appName("Spark SQL basic example")
  .config("spark.some.config.option", "some-value")
  .master("local[*]")// need to add
  .getOrCreate()

You need to add .master("local[*]") if runing local here * means all node , you can say insted of 8 1,2 etc

You need to set Master URL if on cluster

Solution 16 - Scala

If you don't provide Spark configuration in JavaSparkContext then you get this error. That is: JavaSparkContext sc = new JavaSparkContext();

Solution: Provide JavaSparkContext sc = new JavaSparkContext(conf);

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionShuai ZhangView Question on Stackoverflow
Solution 1 - ScalaJack DavidsonView Answer on Stackoverflow
Solution 2 - ScalaSachinView Answer on Stackoverflow
Solution 3 - ScalaDazzlerView Answer on Stackoverflow
Solution 4 - ScalaMallikarjun MView Answer on Stackoverflow
Solution 5 - Scalakumar sanuView Answer on Stackoverflow
Solution 6 - ScalaSasikumar MurugesanView Answer on Stackoverflow
Solution 7 - ScalaNazimaView Answer on Stackoverflow
Solution 8 - ScalaSachin TyagiView Answer on Stackoverflow
Solution 9 - Scalauser2989087View Answer on Stackoverflow
Solution 10 - ScalarioView Answer on Stackoverflow
Solution 11 - ScalagyuseongView Answer on Stackoverflow
Solution 12 - ScalaremondoView Answer on Stackoverflow
Solution 13 - ScalaKARTHIKEYAN.AView Answer on Stackoverflow
Solution 14 - ScalaVik_TechnologistView Answer on Stackoverflow
Solution 15 - Scalavaquar khanView Answer on Stackoverflow
Solution 16 - ScalaRimi GandhiView Answer on Stackoverflow