Gang Of Coders
Home
About Us
Contact Us
All Apache Spark-Sql Solutions on Gang of Coders
Total of 112 Apache Spark-Sql Solutions
Upacking a list to select multiple columns from a spark data frame
Apache Spark
Apache Spark-Sql
Spark Dataframe
Convert a spark DataFrame to pandas DF
Pandas
Apache Spark
Apache Spark-Sql
Difference between DataFrame, Dataset, and RDD in Spark
Dataframe
Apache Spark
Apache Spark-Sql
Rdd
Apache Spark-Dataset
'PipelinedRDD' object has no attribute 'toDF' in PySpark
Python
Apache Spark
Pyspark
Apache Spark-Sql
Rdd
Automatically and Elegantly flatten DataFrame in Spark SQL
Scala
Apache Spark
Apache Spark-Sql
Spark load data and add filename as dataframe column
Apache Spark
Pyspark
Apache Spark-Sql
Find maximum row per group in Spark DataFrame
Apache Spark
Pyspark
Apache Spark-Sql
Spark sql how to explode without losing null values
Java
Apache Spark
Null
Apache Spark-Sql
Convert date from String to Date format in Dataframes
Apache Spark
Apache Spark-Sql
How do I detect if a Spark DataFrame has a column
Scala
Apache Spark
Dataframe
Apache Spark-Sql
Spark unionAll multiple dataframes
Scala
Apache Spark
Apache Spark-Sql
get datatype of column using pyspark
Apache Spark
Pyspark
Apache Spark-Sql
DataFrame partitionBy to a single Parquet file (per partition)
Apache Spark
Apache Spark-Sql
PySpark: multiple conditions in when clause
Python
Apache Spark
Dataframe
Pyspark
Apache Spark-Sql
Including null values in an Apache Spark Join
Sql
Scala
Apache Spark
Join
Apache Spark-Sql
Spark dataframe: collect () vs select ()
Dataframe
Apache Spark
Apache Spark-Sql
How to convert Row of a Scala DataFrame into case class most efficiently?
Scala
Apache Spark
Apache Spark-Sql
DataFrame equality in Apache Spark
Scala
Apache Spark
Dataframe
Apache Spark-Sql
Rdd
How do I check for equality using Spark Dataframe without SQL Query?
Scala
Apache Spark
Dataframe
Apache Spark-Sql
Derive multiple columns from a single column in a Spark DataFrame
Scala
Apache Spark
Dataframe
Apache Spark-Sql
User Defined-Functions
dataframe: how to groupBy/count then filter on count in Scala
Scala
Apache Spark
Apache Spark-Sql
What is the difference between cube, rollup and groupBy operators?
Sql
Apache Spark
Apache Spark-Sql
Cube
Rollup
Spark specify multiple column conditions for dataframe join
Apache Spark
Apache Spark-Sql
Rdd
Filtering DataFrame using the length of a column
Python
Apache Spark
Dataframe
Pyspark
Apache Spark-Sql
Spark SQL Row_number() PartitionBy Sort Desc
Python
Apache Spark
Pyspark
Apache Spark-Sql
Window Functions
Reading csv files with quoted fields containing embedded commas
Csv
Apache Spark
Pyspark
Apache Spark-Sql
Apache Spark-2.0
PySpark create new column with mapping from a dict
Python
Apache Spark
Dictionary
Pyspark
Apache Spark-Sql
How to export data from Spark SQL to CSV
Hadoop
Apache Spark
Export to-Csv
Hiveql
Apache Spark-Sql
Spark Window Functions - rangeBetween dates
Sql
Apache Spark
Pyspark
Apache Spark-Sql
Window Functions
Fetching distinct values on a column using Spark DataFrame
Scala
Apache Spark
Dataframe
Apache Spark-Sql
Spark Dataframe
What is the meaning of partitionColumn, lowerBound, upperBound, numPartitions parameters?
Apache Spark
Jdbc
Apache Spark-Sql
How to change dataframe column names in pyspark?
Python
Apache Spark
Pyspark
Apache Spark-Sql
How to add a constant column in a Spark DataFrame?
Python
Apache Spark
Dataframe
Pyspark
Apache Spark-Sql
How to select the first row of each group?
Sql
Scala
Apache Spark
Dataframe
Apache Spark-Sql
How can I change column types in Spark SQL's DataFrame?
Scala
Apache Spark
Apache Spark-Sql
How do I add a new column to a Spark DataFrame (using PySpark)?
Python
Apache Spark
Dataframe
Pyspark
Apache Spark-Sql
How to sort by column in descending order in Spark SQL?
Scala
Apache Spark
Apache Spark-Sql
Concatenate columns in Apache Spark DataFrame
Sql
Apache Spark
Dataframe
Apache Spark-Sql
Spark - load CSV file as DataFrame?
Scala
Apache Spark
Hadoop
Apache Spark-Sql
Hdfs
How to convert rdd object to dataframe in spark
Scala
Apache Spark
Apache Spark-Sql
Rdd
Filter Pyspark dataframe column with None value
Python
Apache Spark
Dataframe
Pyspark
Apache Spark-Sql
Show distinct column values in pyspark dataframe
Python
Apache Spark
Pyspark
Apache Spark-Sql
How to define partitioning of DataFrame?
Scala
Apache Spark
Dataframe
Apache Spark-Sql
Partitioning
How to check if spark dataframe is empty?
Apache Spark
Pyspark
Apache Spark-Sql
How to change a dataframe column from String type to Double type in PySpark?
Python
Apache Spark
Dataframe
Pyspark
Apache Spark-Sql
How to delete columns in pyspark dataframe
Apache Spark
Apache Spark-Sql
Pyspark
Load CSV file with Spark
Python
Csv
Apache Spark
Pyspark
Apache Spark-Sql
Spark Dataframe distinguish columns with duplicated name
Python
Apache Spark
Dataframe
Pyspark
Apache Spark-Sql
Spark DataFrame groupBy and sort in the descending order (pyspark)
Python
Apache Spark
Dataframe
Pyspark
Apache Spark-Sql
Best way to get the max value in a Spark dataframe column
Python
Apache Spark
Pyspark
Apache Spark-Sql
Convert pyspark string to date format
Python
Apache Spark
Pyspark
Apache Spark-Sql
How to create an empty DataFrame with a specified schema?
Scala
Apache Spark
Dataframe
Apache Spark-Sql
Extract column values of Dataframe as List in Apache Spark
Scala
Apache Spark
Apache Spark-Sql
How to export a table dataframe in PySpark to csv?
Python
Apache Spark
Dataframe
Apache Spark-Sql
Export to-Csv
Concatenate two PySpark dataframes
Python
Apache Spark
Pyspark
Apache Spark-Sql
What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism?
Performance
Apache Spark
Hadoop
Apache Spark-Sql
Renaming column names of a DataFrame in Spark Scala
Scala
Apache Spark
Dataframe
Apache Spark-Sql
How to save DataFrame directly to Hive?
Scala
Apache Spark
Hive
Apache Spark-Sql
Join two data frames, select all columns from one and some columns from the other
Dataframe
Apache Spark
Pyspark
Apache Spark-Sql
Split Spark Dataframe string column into multiple columns
Apache Spark
Pyspark
Apache Spark-Sql
Overwrite specific partitions in spark dataframe write method
Apache Spark
Apache Spark-Sql
Spark Dataframe
Updating a dataframe column in spark
Python
Dataframe
Apache Spark
Pyspark
Apache Spark-Sql
Spark SQL: apply aggregate functions to a list of columns
Apache Spark
Dataframe
Apache Spark-Sql
Aggregate Functions
Renaming columns for PySpark DataFrame aggregates
Dataframe
Apache Spark
Pyspark
Apache Spark-Sql
Removing duplicates from rows based on specific columns in an RDD/Spark DataFrame
Apache Spark
Apache Spark-Sql
Pyspark
Get current number of partitions of a DataFrame
Python
Scala
Dataframe
Apache Spark
Apache Spark-Sql
How to write unit tests in Spark 2.0+?
Scala
Unit Testing
Apache Spark
Junit
Apache Spark-Sql
How to find count of Null and Nan values for each column in a PySpark dataframe efficiently?
Apache Spark
Pyspark
Apache Spark-Sql
How to pivot Spark DataFrame?
Dataframe
Apache Spark
Pyspark
Apache Spark-Sql
Pivot
pyspark dataframe filter or include based on list
Apache Spark
Filter
Pyspark
Apache Spark-Sql
Pyspark: Split multiple array columns into rows
Python
Apache Spark
Dataframe
Pyspark
Apache Spark-Sql
how to filter out a null value from spark dataframe
Scala
Apache Spark
Apache Spark-Sql
Spark Dataframe
Cannot find col function in pyspark
Python
Apache Spark
Pyspark
Apache Spark-Sql
Pyspark Sql
How to use JDBC source to write and read data in (Py)Spark?
Python
Scala
Apache Spark
Apache Spark-Sql
Pyspark
How to join on multiple columns in Pyspark?
Python
Apache Spark
Join
Pyspark
Apache Spark-Sql
How does createOrReplaceTempView work in Spark?
Apache Spark
Apache Spark-Sql
Spark Dataframe
How to use Column.isin with list?
Scala
Apache Spark
Apache Spark-Sql
Create Spark DataFrame. Can not infer schema for type: <type 'float'>
Python
Apache Spark
Dataframe
Pyspark
Apache Spark-Sql
How to make good reproducible Apache Spark examples
Dataframe
Apache Spark
Pyspark
Apache Spark-Sql
Removing duplicate columns after a DF join in Spark
Python
Apache Spark
Pyspark
Apache Spark-Sql
Querying Spark SQL DataFrame with complex types
Sql
Scala
Apache Spark
Dataframe
Apache Spark-Sql
How to loop through each row of dataFrame in pyspark
Apache Spark
Dataframe
For Loop
Pyspark
Apache Spark-Sql
Spark - SELECT WHERE or filtering?
Apache Spark
Apache Spark-Sql
How do I convert an array (i.e. list) column to Vector
Python
Apache Spark
Pyspark
Apache Spark-Sql
Apache Spark-Ml
How to perform union on two DataFrames with different amounts of columns in spark?
Python
Apache Spark
Pyspark
Apache Spark-Sql
Pyspark Dataframes
Add an empty column to Spark DataFrame
Python
Apache Spark
Dataframe
Pyspark
Apache Spark-Sql
Provide schema while reading csv file as a dataframe
Scala
Apache Spark
Dataframe
Apache Spark-Sql
Spark Csv
Errors when using OFF_HEAP Storage with Spark 1.4.0 and Tachyon 0.6.4
Apache Spark
Apache Spark-Sql
Alluxio
Take n rows from a spark dataframe and pass to toPandas()
Python
Apache Spark-Sql
Spark Dataframe
How to avoid duplicate columns after join?
Scala
Apache Spark
Apache Spark-Sql
Why does join fail with "java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]"?
Scala
Apache Spark
Join
Apache Spark-Sql
Filter df when values matches part of a string in pyspark
Python
Apache Spark
Pyspark
Apache Spark-Sql
How to convert column with string type to int form in pyspark data frame?
Python
Dataframe
Apache Spark
Pyspark
Apache Spark-Sql
How to query JSON data column using Spark DataFrames?
Scala
Apache Spark
Dataframe
Apache Spark-Sql
Spark Cassandra-Connector
How to aggregate values into collection after groupBy?
Scala
Apache Spark
Apache Spark-Sql
How to split Vector into columns - using PySpark
Python
Apache Spark
Pyspark
Apache Spark-Sql
Apache Spark-Ml
How to flatten a struct in a Spark dataframe?
Java
Apache Spark
Pyspark
Apache Spark-Sql
PySpark - rename more than one column using withColumnRenamed
Apache Spark
Pyspark
Apache Spark-Sql
Rename
How to import multiple csv files in a single load?
Apache Spark
Apache Spark-Sql
Spark Dataframe
How to get name of dataframe column in PySpark?
Apache Spark
Pyspark
Apache Spark-Sql
Columnname
Median / quantiles within PySpark groupBy
Apache Spark
Pyspark
Apache Spark-Sql
Pyspark Sql
Pyspark: Filter dataframe based on multiple conditions
Sql
Filter
Pyspark
Apache Spark-Sql
Pyspark Sql
Difference between df.repartition and DataFrameWriter partitionBy?
Apache Spark-Sql
Data Partitioning
Apache Spark -- Assign the result of UDF to multiple dataframe columns
Python
Apache Spark
Pyspark
Apache Spark-Sql
User Defined-Functions
Spark functions vs UDF performance?
Performance
Apache Spark
Pyspark
Apache Spark-Sql
User Defined-Functions
Retrieve top n in each group of a DataFrame in pyspark
Python
Apache Spark
Dataframe
Pyspark
Apache Spark-Sql
PySpark: withColumn() with two conditions and three outcomes
Apache Spark
Hive
Pyspark
Apache Spark-Sql
Hiveql
Generate a Spark StructType / Schema from a case class
Apache Spark
Apache Spark-Sql
How to melt Spark DataFrame?
Apache Spark
Pyspark
Apache Spark-Sql
Melt
aggregate function Count usage with groupBy in Spark
Java
Scala
Apache Spark
Pyspark
Apache Spark-Sql
What are the various join types in Spark?
Scala
Apache Spark
Apache Spark-Sql
Spark Dataframe
Apache Spark-2.0
How to count unique ID after groupBy in pyspark
Python
Pyspark
Apache Spark-Sql