How to use Column.isin with list?

ScalaApache SparkApache Spark-Sql

Scala Problem Overview


val items = List("a", "b", "c")

sqlContext.sql("select c1 from table")
          .filter($"c1".isin(items))
          .collect
          .foreach(println)

The code above throws the following exception.

Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.$colon$colon List(a, b, c) 
at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:49)
at org.apache.spark.sql.functions$.lit(functions.scala:89)
at org.apache.spark.sql.Column$$anonfun$isin$1.apply(Column.scala:642)
at org.apache.spark.sql.Column$$anonfun$isin$1.apply(Column.scala:642)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.sql.Column.isin(Column.scala:642)

Below is my attempt to fix it. It compiles and runs but doesn't return any match. Not sure why.

val items = List("a", "b", "c").mkString("\"","\",\"","\"")

sqlContext.sql("select c1 from table")
          .filter($"c1".isin(items))
          .collect
          .foreach(println)

Scala Solutions


Solution 1 - Scala

According to documentation, isin takes a vararg, not a list. List is actually a confusing name here. You can try converting your List to vararg like this:

val items = List("a", "b", "c")

sqlContext.sql("select c1 from table")
          .filter($"c1".isin(items:_*))
          .collect
          .foreach(println)

Your variant with mkString compiles, because one single String is also a vararg (with number of arguments equal to 1), but it is proably not what you want to achieve.

Solution 2 - Scala

It worked like this in Java Api (Java 8)

.isin(sampleListName.stream().toArray(String[]::new))));

sampleListName is a List

Solution 3 - Scala

Spark has now (since 2.4.0) a method called isInCollection, which is just what you are looking for, instead of isIn.

(shouldn't they unify the methods?)

Solution 4 - Scala

As Tomalak has mentioned it :

isin(java.lang.Object... list)
A boolean expression that is evaluated to true if the value 
of this expression is contained by the evaluated values of the arguments.

Therefore, you just could fix this making the following change :

val items = List("a", "b", "c").map(c => s""""$c"""")

Solution 5 - Scala

Even easier:

sqlContext.sql("select c1 from table")
          .filter($"c1".isin("a", "b", "c"))
          .collect
          .foreach(println)

Unless you have a lot of list values, which isn't the case usually.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNabeghView Question on Stackoverflow
Solution 1 - ScalaTheMPView Answer on Stackoverflow
Solution 2 - ScalaAnandkumarView Answer on Stackoverflow
Solution 3 - ScalaLucas LimaView Answer on Stackoverflow
Solution 4 - ScalaFrancis TothView Answer on Stackoverflow
Solution 5 - ScalapedromorfeuView Answer on Stackoverflow