You need to use agg. Example:

&lt;!-- language: python --&gt;

    from pyspark import SparkContext
    from pyspark.sql import HiveContext
    from pyspark.sql import functions as F

    sc = SparkContext(&quot;local&quot;)
    
    sqlContext = HiveContext(sc)
    
    df = sqlContext.createDataFrame([
        (&quot;a&quot;, None, None),
        (&quot;a&quot;, &quot;code1&quot;, None),
        (&quot;a&quot;, &quot;code2&quot;, &quot;name2&quot;),
    ], [&quot;id&quot;, &quot;code&quot;, &quot;name&quot;])
    
    df.show()
    
    +---+-----+-----+
    | id| code| name|
    +---+-----+-----+
    |  a| null| null|
    |  a|code1| null|
    |  a|code2|name2|
    +---+-----+-----+

Note in the above you have to create a HiveContext. See https://stackoverflow.com/a/35529093/690430 for dealing with different Spark versions.

&lt;!-- language: python --&gt;

    (df
      .groupby(&quot;id&quot;)
      .agg(F.collect_set(&quot;code&quot;),
           F.collect_list(&quot;name&quot;))
      .show())
    
    +---+-----------------+------------------+
    | id|collect_set(code)|collect_list(name)|
    +---+-----------------+------------------+
    |  a|   [code1, code2]|           [name2]|
    +---+-----------------+------------------+


If your dataframe is large, you can try using [pandas udf(GROUPED_AGG)](https://spark.apache.org/docs/2.4.4/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf) to avoid memory error. It is also much faster.

&gt;Grouped aggregate Pandas UDFs are similar to Spark aggregate functions. Grouped aggregate Pandas UDFs are used with groupBy().agg() and pyspark.sql.Window. It defines an aggregation from one or more pandas.Series to a scalar value, where each pandas.Series represents a column within the group or window. [pandas udf](https://spark.apache.org/docs/2.4.4/sql-pyspark-pandas-with-arrow.html#pandas-udfs-aka-vectorized-udfs)

example:

```
import pyspark.sql.functions as F

@F.pandas_udf(&#39;string&#39;, F.PandasUDFType.GROUPED_AGG)
def collect_list(name):
    return &#39;, &#39;.join(name)

grouped_df = df.groupby(&#39;id&#39;).agg(collect_list(df[&quot;name&quot;]).alias(&#39;names&#39;))
```

I&#39;m using `RecyclerView` inside `NestedScrollView`. Also i set `setNestedScrollingEnabled` to false for `recyclerview`


**to support lower API** 

`ViewCompat.setNestedScrollingEnabled(mRecyclerView, false);`



Now! When user scrolled the view every thing seems okay, but!!! views in recyclerview does not recycled!!! and Heap size grows swiftly!!

**Update**:
RecyclerView layout manager is `StaggeredLayoutManager`

**fragment_profile.xml**:

    &lt;android.support.design.widget.CoordinatorLayout
        xmlns:android=&quot;http://schemas.android.com/apk/res/android&quot;
        xmlns:app=&quot;http://schemas.android.com/apk/res-auto&quot;
        xmlns:tools=&quot;http://schemas.android.com/tools&quot;
        android:id=&quot;@+id/coordinator&quot;
        android:layout_width=&quot;match_parent&quot;
        android:layout_height=&quot;match_parent&quot;
        android:orientation=&quot;vertical&quot; &gt;

            &lt;android.support.design.widget.AppBarLayout
                android:id=&quot;@+id/appbar&quot;
                android:layout_width=&quot;match_parent&quot;
                android:layout_height=&quot;wrap_content&quot;
                android:theme=&quot;@style/ThemeOverlay.AppCompat.Dark.ActionBar&quot; &gt;
            &lt;/android.support.design.widget.AppBarLayout&gt;

            &lt;android.support.v4.widget.SwipeRefreshLayout
                android:id=&quot;@+id/profileSwipeRefreshLayout&quot;
                android:layout_width=&quot;match_parent&quot;
                android:layout_height=&quot;match_parent&quot; &gt;

                    &lt;!-- RecyclerView and NestedScrollView --&gt;
                    &lt;include layout=&quot;@layout/fragment_profile_details&quot; /&gt;

            &lt;/android.support.v4.widget.SwipeRefreshLayout&gt;

    &lt;/android.support.design.widget.CoordinatorLayout&gt;


**fragment_profile_details.xml**:

    &lt;LinearLayout
        xmlns:android=&quot;http://schemas.android.com/apk/res/android&quot;
        xmlns:app=&quot;http://schemas.android.com/apk/res-auto&quot;
        xmlns:tools=&quot;http://schemas.android.com/tools&quot;
        android:id=&quot;@+id/rootLayout&quot;
        android:layout_width=&quot;match_parent&quot;
        android:layout_height=&quot;match_parent&quot;
        app:layout_behavior=&quot;@string/appbar_scrolling_view_behavior&quot;
        android:orientation=&quot;vertical&quot; &gt;

            &lt;android.support.v4.widget.NestedScrollView
                android:id=&quot;@+id/nested_scrollbar&quot;
                android:layout_width=&quot;match_parent&quot;
                android:layout_height=&quot;match_parent&quot;
                android:layout_gravity=&quot;fill_vertical&quot;
                app:layout_behavior=&quot;@string/appbar_scrolling_view_behavior&quot;
                android:fillViewport=&quot;true&quot;
                android:scrollbars=&quot;none&quot; &gt;

                    &lt;LinearLayout
                        android:id=&quot;@+id/nested_scrollbar_linear&quot;
                        android:layout_width=&quot;match_parent&quot;
                        android:layout_height=&quot;wrap_content&quot;
                        android:descendantFocusability=&quot;blocksDescendants&quot;
                        android:orientation=&quot;vertical&quot; &gt;

                            &lt;android.support.v7.widget.CardView
                                android:id=&quot;@+id/profileCardview&quot;
                                android:layout_width=&quot;match_parent&quot;
                                android:layout_height=&quot;wrap_content&quot;
                                app:cardBackgroundColor=&quot;@color/card_backgroind&quot;
                                app:cardCornerRadius=&quot;0dp&quot;
                                app:cardElevation=&quot;0dp&quot; &gt;

                                &lt;!-- Profile related stuff like avatar and etc. ---&gt;

                            &lt;/android.support.v7.widget.CardView&gt;

                            &lt;android.support.v7.widget.RecyclerView
                                android:id=&quot;@+id/list_view&quot;
                                android:layout_width=&quot;match_parent&quot;
                                android:layout_height=&quot;wrap_content&quot;
                                android:layout_marginBottom=&quot;@dimen/four&quot;
                                android:layout_marginEnd=&quot;@dimen/four&quot;
                                android:layout_marginLeft=&quot;@dimen/four&quot;
                                android:layout_marginRight=&quot;@dimen/four&quot;
                                android:layout_marginStart=&quot;@dimen/four&quot;
                                android:layout_marginTop=&quot;@dimen/four&quot;
                                app:layout_behavior=&quot;@string/appbar_scrolling_view_behavior&quot;
                                android:clipToPadding=&quot;false&quot; /&gt;

                    &lt;/LinearLayout&gt;
            &lt;/android.support.v4.widget.NestedScrollView&gt;
    &lt;/LinearLayout&gt;


**ProfileFragment.java**:

    mAdapter		= new MainAdapter(getActivity(), glide, Data);
	
	listView		= (RecyclerView) view.findViewById(R.id.list_view);

	ViewCompat.setNestedScrollingEnabled(listView, false);	
	listView.setAdapter(mAdapter);
	
	mStaggeredLM	= new StaggeredGridLayoutManager(2, StaggeredGridLayoutManager.VERTICAL);
	mStaggeredLM.setGapStrategy(StaggeredGridLayoutManager.GAP_HANDLING_MOVE_ITEMS_BETWEEN_SPANS);
	
	listView.setLayoutManager(mStaggeredLM);
		
	mScroll.setOnScrollChangeListener(new OnScrollChangeListener() {
			
			@Override
			public void onScrollChange(NestedScrollView arg0, int arg1, int arg2, int arg3, int arg4) {
                
				View view	= (View) mScroll.getChildAt(mScroll.getChildCount() - 1);
				int diff	= (view.getBottom() - ( mScroll.getHeight() + mScroll.getScrollY()));
				
				if(diff == 0){
					
					int visibleItemCount			= mStaggeredLM.getChildCount();
					int totalItemCount				= mStaggeredLM.getItemCount();
					
					int[] lastVisibleItemPositions	= mStaggeredLM.findLastVisibleItemPositions(null);
					int lastVisibleItemPos	= getLastVisibleItem(lastVisibleItemPositions);

					Log.e(&quot;getChildCount&quot;, String.valueOf(visibleItemCount));
	                Log.e(&quot;getItemCount&quot;, String.valueOf(totalItemCount));
	                Log.e(&quot;lastVisibleItemPos&quot;, String.valueOf(lastVisibleItemPos));
	                
	                if ((visibleItemCount + 5) &gt;= totalItemCount) {
	                
	                	mLoadMore.setVisibility(View.VISIBLE);
	                    Log.e(&quot;LOG&quot;, &quot;Last Item Reached!&quot;);
	                }
	                
					mMore = true;
					mFresh = false;
					mRefresh = false;
					getPosts();
				}
				
			}
			
		});

P.s : I&#39;ve set load more to scroll view, because `recyclerview` do it continuously and none stoppable!  

Any help is appreciated

RecyclerView does not Recycling Views when use it inside NestedScrollView

I need to customise the look of a back button in a Swift project.

Here&#39;s what I have:
[![Default Back Button][1]][1]


Here&#39;s what I want:
[![Custom Back Button][2]][2]

I&#39;ve tried creating my own UIBarButtonItem but I can&#39;t figure out how to get the image to be beside the text, rather than as a background or a replacement for the text.

    let backButton = UIBarButtonItem(title: &quot;Custom&quot;, style: .Plain, target: self, action: nil    )
    //backButton.image = UIImage(named: &quot;imageName&quot;) //Replaces title
    backButton.setBackgroundImage(UIImage(named: &quot;imageName&quot;), forState: .Normal, barMetrics: .Default) // Stretches image
    navigationItem.setLeftBarButtonItem(backButton, animated: false)

  [1]: http://i.stack.imgur.com/WBW7p.png
  [2]: http://i.stack.imgur.com/KaGCs.png

Swift Custom NavBar Back Button Image and Text

How can I use `collect_set` or `collect_list` on a dataframe after `groupby`. for example: `df.groupby(&#39;key&#39;).collect_set(&#39;values&#39;)`. I get an error: `AttributeError: &#39;GroupedData&#39; object has no attribute &#39;collect_set&#39;
`

pyspark collect_set or collect_list with groupby

<p>How can I use <code>collect_set</code> or <code>collect_list</code> on a dataframe after <code>groupby</code>. for example: <code>df.groupby('key').collect_set('values')</code>. I get an error: <code>AttributeError: 'GroupedData' object has no attribute 'collect_set' </code></p>


I am trying to convert list to page in spring. I have converted it using 

&gt; new PageImpl&lt;User&gt;(users, pageable, users.size());

But now I having problem with sorting and pagination itself. When I try passing size and page, the pagination doesn&#39;t work.

Here&#39;s the code I am using.

My Controller

    	public ResponseEntity&lt;User&gt; getUsersByProgramId(
			@RequestParam(name = &quot;programId&quot;, required = true) Integer programId Pageable pageable) {

		List&lt;User&gt; users = userService.findAllByProgramId(programId);
		Page&lt;User&gt; pages = new PageImpl&lt;User&gt;(users, pageable, users.size());

		return new ResponseEntity&lt;&gt;(pages, HttpStatus.OK);
	}


Here is my user Repo

    public interface UserRepo extends JpaRepository&lt;User, Integer&gt;{

	public List&lt;User&gt; findAllByProgramId(Integer programId);

Here is my service

    	public List&lt;User&gt; findAllByProgramId(Integer programId);


Conversion of List to Page in Spring

How can I create easily a range of consecutive integers in dart? For example:

    // throws a syntax error :)
    var list = [1..10];

Dart: create a list from 0 to N

I am in the process of implementing a filterable list with React. The structure of the list is as shown in the image below.

[![enter image description here][1]][1]

**PREMISE**

Here&#39;s a description of how it is supposed to work:

* The state resides in the highest level component, the `Search` component.
* The state is described as follows:
&lt;pre&gt;
{
    visible : boolean,
    files : array,
    filtered : array,
    query : string,
    currentlySelectedIndex : integer
}
&lt;/pre&gt;

* `files ` is a potentially very large, array containing file paths (10000 entries is a plausible number).
* `filtered` is the filtered array after the user types at least 2 characters. I know it&#39;s derivative data and as such an argument could be made about storing it in the state but it is needed for
* `currentlySelectedIndex` which is the index of the currently selected element from the filtered list.

* User types more than 2 letters into the `Input` component, the array is filtered and for each entry in the filtered array a `Result` component is rendered
* Each `Result` component is displaying the full path that partially matched the query, and the partial match part of the path is highlighted. For example the DOM of a Result component, if the user had typed &#39;le&#39; would be something like this :

  `&lt;li&gt;this/is/a/fi&lt;strong&gt;le&lt;/strong&gt;/path&lt;/li&gt;`
* If the user presses the up or down keys while the `Input` component is focused the `currentlySelectedIndex` changes based on the `filtered` array. This causes the `Result` component that matches the index to be marked as selected causing a re-render

**PROBLEM**

Initially I tested this with a small enough array of `files`, using the development version of React, and all worked fine. 

The problem appeared when I had to deal with a `files` array as big as 10000 entries. Typing 2 letters in the Input would generate a big list and when I pressed the up and down keys to navigate it it would be very laggy.

At first I did not have a defined component for the `Result` elements and I was merely making the list on the fly, on each render of the `Search` component, as such:


    results  = this.state.filtered.map(function(file, index) {
        var start, end, matchIndex, match = this.state.query;
    
         matchIndex = file.indexOf(match);
         start = file.slice(0, matchIndex);
         end = file.slice(matchIndex + match.length);
    
         return (
             &lt;li onClick={this.handleListClick}
                 data-path={file}
                 className={(index === this.state.currentlySelected) ? &quot;valid selected&quot; : &quot;valid&quot;}
                 key={file} &gt;
                 {start}
                 &lt;span className=&quot;marked&quot;&gt;{match}&lt;/span&gt;
                 {end}
             &lt;/li&gt;
         );
    }.bind(this));

As you can tell, every time the `currentlySelectedIndex` changed, it would cause a re-render and the list would be re-created each time. I thought that since I had set a `key` value on each `li` element React would avoid re-rendering every other `li` element that did not have its `className` change, but apparently it wasn&#39;t so.

I ended up defining a class for the `Result` elements, where it explicitly checks whether each `Result` element should re-render based on whether it was previously selected and based on the current user input :

    var ResultItem = React.createClass({
        shouldComponentUpdate : function(nextProps) {
            if (nextProps.match !== this.props.match) {
                return true;
            } else {
                return (nextProps.selected !== this.props.selected);
            }
        },
        render : function() {
            return (
                &lt;li onClick={this.props.handleListClick}
                    data-path={this.props.file}
                    className={
                        (this.props.selected) ? &quot;valid selected&quot; : &quot;valid&quot;
                    }
                    key={this.props.file} &gt;
                    {this.props.children}
                &lt;/li&gt;
            );
        }
    });

And the list is now created as such: 

    results = this.state.filtered.map(function(file, index) {
        var start, end, matchIndex, match = this.state.query, selected;
    
        matchIndex = file.indexOf(match);
        start = file.slice(0, matchIndex);
        end = file.slice(matchIndex + match.length);
        selected = (index === this.state.currentlySelected) ? true : false
    
        return (
            &lt;ResultItem handleClick={this.handleListClick}
                data-path={file}
                selected={selected}
                key={file}
                match={match} &gt;
                {start}
                &lt;span className=&quot;marked&quot;&gt;{match}&lt;/span&gt;
                {end}
            &lt;/ResultItem&gt;
        );
    }.bind(this));
    }

This made performance *slightly* better, but it&#39;s still not good enough. Thing is when I tested on the production version of React things worked buttery smooth, no lag at all. 

**BOTTOMLINE**

**Is such a noticeable discrepancy between development and production versions of React normal?**

**Am I understanding/doing something wrong when I think about how React manages the list?** 

**UPDATE 14-11-2016**

I have found this presentation of Michael Jackson, where he tackles an issue very similar to this one: https://youtu.be/7S8v8jfLb1Q?t=26m2s

The solution is very similar to the one proposed by AskarovBeknar&#39;s [answer](https://stackoverflow.com/a/38193164/4651083), below

**UPDATE 14-4-2018**

Since this is apparently a popular question and things have progressed since the original question was asked, while I do encourage you to watch the video linked above, in order to get a grasp of a virtual layout, I also encourage you to use the [React Virtualized](https://github.com/bvaughn/react-virtualized) library if you do not want to re-invent the wheel. 


  [1]: http://i.stack.imgur.com/lTcZm.png

Big list performance with React

I&#39;m trying the make a horizontal scrollable bootstrap row. The row contains customer reviews wrapped in div&#39;s. The width of each testimonial div is `33.333%`.

`white-space: nowrap` and `display: inline-block` doesn&#39;t work. 

What am I doing wrong? 


    &lt;div class=&quot;row&quot;&gt;
	    &lt;div class=&quot;col-lg-12 text-center&quot;&gt;
	  	    &lt;div class=&quot;section-title&quot;&gt;
               &lt;div class=&quot;testimonial_group&quot;&gt;
                   &lt;div class=&quot;testimonial&quot;&gt;...&lt;/div&gt;
                   &lt;div class=&quot;testimonial&quot;&gt;...&lt;/div&gt;
                   &lt;div class=&quot;testimonial&quot;&gt;...&lt;/div&gt;
                   &lt;div class=&quot;testimonial&quot;&gt;...&lt;/div&gt;
                   ...
               &lt;/div&gt;
		    &lt;/div&gt;
	    &lt;/div&gt;
    &lt;/div&gt;

Horizontal scrollable div&#39;s in a bootstrap row

I have a data frame like this in pandas:

     column1      column2
     [a,b,c]        1
     [d,e,f]        2
     [g,h,i]        3

**Expected output:**
========================================
    column1      column2
      a              1
      b              1
      c              1
      d              2
      e              2
      f              2
      g              3
      h              3
      i              3

How to process this data ? 

Pandas expand rows from list data available in column

I have a dataframe that looks like this:

                  Company Name              Organisation Name  Amount
    10118  Vifor Pharma UK Ltd  Welsh Assoc for Gastro &amp; Endo 2700.00
    10119  Vifor Pharma UK Ltd    Welsh IBD Specialist Group,  169.00
    10120  Vifor Pharma UK Ltd             West Midlands AHSN 1200.00
    10121  Vifor Pharma UK Ltd           Whittington Hospital   63.00
    10122  Vifor Pharma UK Ltd                 Ysbyty Gwynedd   75.93

How do I sum the `Amount` and count the `Organisation Name`, to get a new dataframe that looks like this?

                  Company Name             Organisation Count   Amount
    10118  Vifor Pharma UK Ltd                              5 11000.00

I know how to sum *or* count:

    df.groupby(&#39;Company Name&#39;).sum()
    df.groupby(&#39;Company Name&#39;).count()

But not how to do both!

  [1]: http://i.stack.imgur.com/zzA4C.png

Group dataframe and get sum AND count?

I need to count unique `ID` values in every `domain`.

I have data:

    ID, domain
    123, &#39;vk.com&#39;
    123, &#39;vk.com&#39;
    123, &#39;twitter.com&#39;
    456, &#39;vk.com&#39;
    456, &#39;facebook.com&#39;
    456, &#39;vk.com&#39;
    456, &#39;google.com&#39;
    789, &#39;twitter.com&#39;
    789, &#39;vk.com&#39;

I try `df.groupby([&#39;domain&#39;, &#39;ID&#39;]).count()`

But I want to get

    domain, count
    vk.com   3
    twitter.com   2
    facebook.com   1
    google.com   1


Count unique values per groups with Pandas

I am using this data frame:

    Fruit   Date      Name  Number
    Apples  10/6/2016 Bob    7
    Apples  10/6/2016 Bob    8
    Apples  10/6/2016 Mike   9
    Apples  10/7/2016 Steve 10
    Apples  10/7/2016 Bob    1
    Oranges 10/7/2016 Bob    2
    Oranges 10/6/2016 Tom   15
    Oranges 10/6/2016 Mike  57
    Oranges 10/6/2016 Bob   65
    Oranges 10/7/2016 Tony   1
    Grapes  10/7/2016 Bob    1
    Grapes  10/7/2016 Tom   87
    Grapes  10/7/2016 Bob   22
    Grapes  10/7/2016 Bob   12
    Grapes  10/7/2016 Tony  15

I want to aggregate this by `Name` and then by fruit to get a total number of `Fruit` per `Name`. For example:

    Bob,Apples,16

I tried grouping by `Name` and `Fruit` but how do I get the total number of Fruit?

How do I Pandas group-by to get sum?

Suppose I have a structured dataframe as follows:

    df = pd.DataFrame({&quot;A&quot;:[&#39;a&#39;,&#39;a&#39;,&#39;a&#39;,&#39;b&#39;,&#39;b&#39;],
                       &quot;B&quot;:[1]*5})
The `A` column has previously been sorted. I wish to find the first row index of where `df[df.A!=&#39;a&#39;]`. The end goal is to use this index to break the data frame into groups based on `A`. 

Now I realise that there is a groupby functionality. However, the dataframe is quite large and this is a simplified toy example. Since `A` has been sorted already, it would be faster if I can just **find the 1st index** of where `df.A!=&#39;a&#39;`. Therefore it is important that whatever method that you use **the scanning stops once the first element is found**.

pandas - find first occurrence

I have data of the following form:

    df = pd.DataFrame({
        &#39;group&#39;: [1, 1, 2, 3, 3, 3, 4],
        &#39;param&#39;: [&#39;a&#39;, &#39;a&#39;, &#39;b&#39;, np.nan, &#39;a&#39;, &#39;a&#39;, np.nan]
    })
    print(df)

    #    group param
    # 0      1     a
    # 1      1     a
    # 2      2     b
    # 3      3   NaN
    # 4      3     a
    # 5      3     a
    # 6      4   NaN

Non-null values within groups are always the same. I want to count the non-null value for each group (where it exists) once, and then find the total counts for each value. 

I&#39;m currently doing this in the following (clunky and inefficient) way:

    param = []
    for _, group in df[df.param.notnull()].groupby(&#39;group&#39;):
        param.append(group.param.unique()[0])
    print(pd.DataFrame({&#39;param&#39;: param}).param.value_counts())

    # a    2
    # b    1

I&#39;m sure there&#39;s a way to do this more cleanly and without using a loop, but I just can&#39;t seem to work it out. Any help would be much appreciated.

Count unique values using pandas groupby

I can&#39;t understand the logic behind the terms *union types* and *intersection types* in TypeScript.

Pragmatically, if the properties of different types are sets of properties, if I combine them with the `&amp;` operator, the resulting type will be the *union* of the of those sets. Following that logic, I would expect types like this to be called *union types*. If I combine them with `|`, I can only use the common properties of them, the *intersection* of the sets.

[Wikipedia](https://en.wikipedia.org/wiki/Boolean_algebra_(structure)#Examples) seems to back that logic:
&gt; The power set (set of all subsets) of any given nonempty set S forms a Boolean algebra, an algebra of sets, with the two operations ∨ := ∪ (union) and ∧ := ∩ (intersection).

However, according to [typescriptlang.org](https://www.typescriptlang.org/docs/handbook/advanced-types.html#intersection-types), it&#39;s exactly the opposite: `&amp;` is used to produce *intersection types* and `|` is used for *union types*.

I&#39;m sure there is another way of looking at it, but I cannot figure it out.

Naming of TypeScript&#39;s union and intersection types

It maybe because Sets are relatively new to Javascript but I haven&#39;t been able to find an article, on StackO or anywhere else, that talks about the performance difference between the two in Javascript. So, what is the difference, in terms of performance, between the two? Specifically, when it comes to removing, adding and iterating.

Javascript Set vs. Array performance

Why does the `set` function call wipe out the dupes, but parsing a set literal does not?

    &gt;&gt;&gt; x = Decimal(&#39;0&#39;)
    &gt;&gt;&gt; y = complex(0,0)
    &gt;&gt;&gt; set([0, x, y])
    {0}
    &gt;&gt;&gt; {0, x, y}
    {Decimal(&#39;0&#39;), 0j}

(Python 2.7.12.  Possibly same root cause as for [this][1] similar question)


  [1]: https://stackoverflow.com/q/40225520/674039

Set literal gives different result from set function call

How come when I change the order of the two sets in the unions below, I get different results?

    set1 = {1, 2, 3}
    set2 = {True, False}

    print(set1 | set2)
    # {False, 1, 2, 3}

    print(set2 | set1)
    #{False, True, 2, 3}



Union of 2 sets does not contain all items

I want to convert JavaScript `Set` to `string` with space.

For example, if I have a set like:

    var foo = new Set();
    foo.add(&#39;hello&#39;);
    foo.add(&#39;world&#39;);
    foo.add(&#39;JavaScript&#39;);

And I&#39;d like to print the string from the set: `hello world JavaScript` (space between each element).

I tried below codes but they are not working:

    foo.toString(); // Not working
    String(foo); // Not working

Is there **simplest and easiest way** to convert from **Set** to **string**?



How to convert Set to string with space?

I would like to modify the cell values of a dataframe column (Age) where currently it is blank and I would only do it if another column (Survived) has the value 0  for the corresponding row where it is blank for Age. If it is 1 in the Survived column but blank in Age column then I will keep it as null. 

I tried to use `&amp;&amp;` operator but it didn&#39;t work. Here is my code:

&lt;!-- language: none --&gt;

    tdata.withColumn(&quot;Age&quot;,  when((tdata.Age == &quot;&quot; &amp;&amp; tdata.Survived == &quot;0&quot;), mean_age_0).otherwise(tdata.Age)).show()

Any suggestions how to handle that? Thanks.

Error Message:

    SyntaxError: invalid syntax
      File &quot;&lt;ipython-input-33-3e691784411c&gt;&quot;, line 1
        tdata.withColumn(&quot;Age&quot;,  when((tdata.Age == &quot;&quot; &amp;&amp; tdata.Survived == &quot;0&quot;), mean_age_0).otherwise(tdata.Age)).show()
                                                        ^

PySpark: multiple conditions in when clause

I have a date pyspark dataframe with a string column in the format of ```MM-dd-yyyy``` and I am attempting to convert this into a date column.  

I tried: 

```
df.select(to_date(df.STRING_COLUMN).alias(&#39;new_date&#39;)).show()
```

And I get a string of nulls. Can anyone help?

Convert pyspark string to date format

Spark now offers predefined functions that can be used in dataframes, and it seems they are highly optimized. My original question was going to be on which is faster, but I did some testing myself and found the spark functions to be about 10 times faster at least in one instance. Does anyone know why this is so, and when would a udf be faster (only for instances that an identical spark function exists)?

Here is my testing code (ran on Databricks community ed):

&lt;!-- language: python --&gt;

    # UDF vs Spark function
    from faker import Factory
    from pyspark.sql.functions import lit, concat
    fake = Factory.create()
    fake.seed(4321)

    # Each entry consists of last_name, first_name, ssn, job, and age (at least 1)
    from pyspark.sql import Row
    def fake_entry():
      name = fake.name().split()
      return (name[1], name[0], fake.ssn(), fake.job(), abs(2016 - fake.date_time().year) + 1)

    # Create a helper function to call a function repeatedly
    def repeat(times, func, *args, **kwargs):
        for _ in xrange(times):
            yield func(*args, **kwargs)
    data = list(repeat(500000, fake_entry))
    print len(data)
    data[0]

    dataDF = sqlContext.createDataFrame(data, (&#39;last_name&#39;, &#39;first_name&#39;, &#39;ssn&#39;, &#39;occupation&#39;, &#39;age&#39;))
    dataDF.cache()

UDF function:


&lt;!-- language: python --&gt;

    concat_s = udf(lambda s: s+ &#39;s&#39;)
    udfData = dataDF.select(concat_s(dataDF.first_name).alias(&#39;name&#39;))
    udfData.count()

Spark Function:


&lt;!-- language: python --&gt;

    spfData = dataDF.select(concat(dataDF.first_name, lit(&#39;s&#39;)).alias(&#39;name&#39;))
    spfData.count()

Ran both multiple times, the udf usually took about 1.1 - 1.4 s, and the Spark `concat` function always took under 0.15 s. 

Spark functions vs UDF performance?

**Context:** I have a `DataFrame` with 2 columns: word and vector. Where the column type of &quot;vector&quot; is `VectorUDT`.

An Example:

    word    |  vector
    assert  | [435,323,324,212...]

And I want to get this:

    word   |  v1 | v2  | v3 | v4 | v5 | v6 ......
    assert | 435 | 5435| 698| 356|....

**Question:**

How can I split a column with vectors in several columns for each dimension using PySpark ?

Thanks in advance

How to split Vector into columns - using PySpark

There&#39;s a DataFrame in pyspark with data as below:

    user_id object_id score
    user_1  object_1  3
    user_1  object_1  1
    user_1  object_2  2
    user_2  object_1  5
    user_2  object_2  2
    user_2  object_2  6

What I expect is returning 2 records in each group with the same user_id, which need to have the highest score. Consequently, the result should look as the following:

    user_id object_id score
    user_1  object_1  3
    user_1  object_2  2
    user_2  object_2  6
    user_2  object_1  5
    
I&#39;m really new to pyspark, could anyone give me a code snippet or portal to the related documentation of this problem? Great thanks!



Content Type	Original Author	Original Content on Stackoverflow
Question	Hanan Shteingart	View Question on Stackoverflow
Solution 1 - List	Kamil Sindi	View Answer on Stackoverflow
Solution 2 - List	Allen	View Answer on Stackoverflow

pyspark collect_set or collect_list with groupby

List Problem Overview

List Solutions

Solution 1 - List

Solution 2 - List

Swift Custom NavBar Back Button Image and Text

RecyclerView does not Recycling Views when use it inside NestedScrollView

Attributions