For example:    

you can use:

    df %&gt;% filter(!is.na(a))
to remove the NA in column a.

If someone is here in 2020, after making all the pipes, if u pipe `%&gt;% na.exclude` will take away all the NAs in the pipe!

From @Ben Bolker:

&gt; [T]his has nothing specifically to do with dplyr::filter()

From @Marat Talipov: 

&gt; [A]ny comparison with NA, including NA==NA, will return NA

From a [related answer][1] by @farnsy:  

&gt; The == operator does not treat NA&#39;s as you would expect it to. 
&gt; 
&gt; Think of NA as meaning &quot;I don&#39;t know what&#39;s there&quot;. The correct answer
&gt; to 3 &gt; NA is obviously NA because we don&#39;t know if the missing value
&gt; is larger than 3 or not. Well, it&#39;s the same for NA == NA. They are
&gt; both missing values but the true values could be quite different, so
&gt; the correct answer is &quot;I don&#39;t know.&quot;
&gt; 
&gt; R doesn&#39;t know what you are doing in your analysis, so instead of
&gt; potentially introducing bugs that would later end up being published
&gt; an embarrassing you, it doesn&#39;t allow comparison operators to think NA
&gt; is a value.


  [1]: https://stackoverflow.com/questions/25100974/na-matches-na-but-is-not-equal-to-na-why

I always use this and it is working perfectly
```
cool$day[cool$day==&#39;&#39;] &lt;- NA  
cool$day[is.na(cool$day)] &lt;- &quot;NA&quot;

cool &lt;- cool[!cool$day == &quot;NA&quot;, ]
```

Sometimes a line containing a ternary operator in Python gets too long:

    answer = &#39;Ten for that? You must be mad!&#39; if does_not_haggle(brian) else &quot;It&#39;s worth ten if it&#39;s worth a shekel.&quot;

Is there a recommended way to make a line break at 79 characters with a ternary operator? I did not find it in &lt;a href=&quot;https://www.python.org/dev/peps/pep-0008/&quot;&gt;PEP 8&lt;/a&gt;. 

How to make a line break on the Python ternary operator?

I come up with this

    n=1;
    curAvg = 0;
    loop{
      curAvg = curAvg + (newNum - curAvg)/n;
      n++;
    }
I think highlights of this way are:&lt;br&gt;
- It avoids big numbers (and possible overflow if you would sum and then divide)&lt;br&gt;
- you save one register (not need to store sum)

The trouble might be with summing error - but I assume that generally there shall be balanced numbers of round up and round down so the error shall not sum up dramatically.

Do you see any pitfalls in this solution?
Have you any better proposal?

How to efficiently compute average on the fly (moving average)?

My data looks like this:

    library(tidyverse)

    df &lt;- tribble(
        ~a, ~b, ~c,
        1, 2, 3, 
        1, NA, 3, 
        NA, 2, 3
    )

I can remove all `NA` observations with `drop_na()`:

    df %&gt;% drop_na()

Or remove all `NA` observations in a single column (`a` for example):
    
    df %&gt;% drop_na(a)

Why can&#39;t I just use a regular `!=` filter pipe?

    df %&gt;% filter(a != NA)

Why do we have to use a special function from tidyr to remove NAs?



Removing NA observations with dplyr::filter()

My data looks like this:
<pre><code class="hljs language-r">library(tidyverse)

df &#x3C;- tribble(
 ~a, ~b, ~c,
 1, 2, 3, 
 1, NA, 3, 
 NA, 2, 3
)
</code></pre>
I can remove all <code>NA</code> observations with <code>drop_na()</code>:
<pre><code class="hljs language-scss">df %>% drop_na()
</code></pre>
Or remove all <code>NA</code> observations in a single column (<code>a</code> for example):
<pre><code class="hljs language-scss">df %>% drop_na(a)
</code></pre>
Why can't I just use a regular <code>!=</code> filter pipe?
<pre><code class="hljs language-css">df %>% filter(a != NA)
</code></pre>
Why do we have to use a special function from tidyr to remove NAs?

My question involves summing up values across multiple columns of a data frame and creating a new column corresponding to this summation using `dplyr`. The data entries in the columns are binary(0,1). I am thinking of a row-wise analog of the `summarise_each` or `mutate_each` function of `dplyr`. Below is a minimal example of the data frame:

    library(dplyr)
    df=data.frame(
      x1=c(1,0,0,NA,0,1,1,NA,0,1),
      x2=c(1,1,NA,1,1,0,NA,NA,0,1),
      x3=c(0,1,0,1,1,0,NA,NA,0,1),
      x4=c(1,0,NA,1,0,0,NA,0,0,1),
      x5=c(1,1,NA,1,1,1,NA,1,0,1))
    
    &gt; df
       x1 x2 x3 x4 x5
    1   1  1  0  1  1
    2   0  1  1  0  1
    3   0 NA  0 NA NA
    4  NA  1  1  1  1
    5   0  1  1  0  1
    6   1  0  0  0  1
    7   1 NA NA NA NA
    8  NA NA NA  0  1
    9   0  0  0  0  0
    10  1  1  1  1  1

I could use something like:

    df &lt;- df %&gt;% mutate(sumrow= x1 + x2 + x3 + x4 + x5)

but this would involve writing out the names of each of the columns. I have like 50 columns.
In addition, the column names change at different iterations of the loop in which I want to implement this
operation so I would like to try avoid having to give any column names.

How can I do that most efficiently?
Any assistance would be greatly appreciated. 

Sum across multiple columns with dplyr

I have a data.frame `df` with 600+ variables.  I&#39;m writing a function that automates the creation of columns and need to visually check them once.

The `str` function provides a good summary: 

    str(df)
    &#39;data.frame&#39;:	29 obs. of  602 variables:
     $ uniqueSessionsIni: POSIXct, format: &quot;2015-01-05 15:00:00&quot; &quot;2015-01-05 16:00:00&quot; &quot;2015-01-05 17:00:00&quot; ...
     $ uniqueSessionsEnd: POSIXct, format: &quot;2015-01-05 15:59:00&quot; &quot;2015-01-05 16:59:00&quot; &quot;2015-01-05 17:59:00&quot; ...
     $ m0p0             : POSIXct, format: &quot;2015-01-05 15:00:00&quot; &quot;2015-01-05 15:00:00&quot; &quot;2015-01-05 15:00:00&quot; ...
     $ m1p0             : POSIXct, format: &quot;2015-01-05 15:01:00&quot; &quot;2015-01-05 15:01:00&quot; &quot;2015-01-05 15:01:00&quot; ...
     $ m2p0             : POSIXct, format: &quot;2015-01-05 15:02:00&quot; &quot;2015-01-05 15:02:00&quot; &quot;2015-01-05 15:02:00&quot; ...    

and it goes on...  
but truncates the output, as below:

    $ m33p1            : POSIXct, format: &quot;2015-01-05 15:34:00&quot; &quot;2015-01-05 15:34:00&quot; &quot;2015-01-05 15:34:00&quot; ...
    $ m34p1            : POSIXct, format: &quot;2015-01-05 15:35:00&quot; &quot;2015-01-05 15:35:00&quot; &quot;2015-01-05 15:35:00&quot; ...
    $ m35p1            : POSIXct, format: &quot;2015-01-05 15:36:00&quot; &quot;2015-01-05 15:36:00&quot; &quot;2015-01-05 15:36:00&quot; ...
    $ m36p1            : POSIXct, format: &quot;2015-01-05 15:37:00&quot; &quot;2015-01-05 15:37:00&quot; &quot;2015-01-05 15:37:00&quot; ...
    [list output truncated]
How can I display the full list of 602 variables?

list output truncated - How to expand listed variables with str() in R

I have a quite long title in a rmarkdown document and I would like to force a line break in a specific position.

Minimum example:

    ---
    title: &quot;Quite long title want the * line break at the asterisk&quot;
    output: html_document
    ---

I have tried: \n, \newline, \\\\ and a manual line break. None of them seem to work.

I believe it has to be quite straightforward but I haven&#39;t been able to find a solution.





How can I force a line break in rmarkdown&#39;s title?

So we are used to say to every R new user that &quot;*`apply` isn&#39;t vectorized, check out the Patrick Burns [R Inferno](http://www.burns-stat.com/pages/Tutor/R_inferno.pdf) Circle 4*&quot; which says (I quote):

&gt; A common reflex is to use a function in the apply family. **This is not**
&gt; **vectorization, it is loop-hiding**. The apply function has a for loop in
&gt; its definition. The lapply function buries the loop, but execution
&gt; times tend to be roughly equal to an explicit for loop.

Indeed, a quick look on the `apply` source code reveals the loop:

    grep(&quot;for&quot;, capture.output(getAnywhere(&quot;apply&quot;)), value = TRUE)
    ## [1] &quot;        for (i in 1L:d2) {&quot;  &quot;    else for (i in 1L:d2) {&quot;

Ok so far, but a look at `lapply` or `vapply` actually reveals a completely different picture:

    lapply
    ## function (X, FUN, ...) 
    ## {
    ##     FUN &lt;- match.fun(FUN)
    ##     if (!is.vector(X) || is.object(X)) 
    ##        X &lt;- as.list(X)
    ##     .Internal(lapply(X, FUN))
    ## }
    ## &lt;bytecode: 0x000000000284b618&gt;
    ## &lt;environment: namespace:base&gt;

So apparently there is no R `for` loop hiding there, rather they are calling internal C written function. 

A quick look in the [rabbit](https://github.com/wch/r-source/blob/trunk/src/main/names.c#L627-L630) [hole](https://github.com/wch/r-source/blob/trunk/src/main/apply.c#L34) reveals pretty much the same picture

Moreover, let&#39;s take the `colMeans` function for example, which was never accused in not being vectorised

    colMeans
    # function (x, na.rm = FALSE, dims = 1L) 
    # {
    #   if (is.data.frame(x)) 
    #     x &lt;- as.matrix(x)
    #   if (!is.array(x) || length(dn &lt;- dim(x)) &lt; 2L) 
    #     stop(&quot;&#39;x&#39; must be an array of at least two dimensions&quot;)
    #   if (dims &lt; 1L || dims &gt; length(dn) - 1L) 
    #     stop(&quot;invalid &#39;dims&#39;&quot;)
    #   n &lt;- prod(dn[1L:dims])
    #   dn &lt;- dn[-(1L:dims)]
    #   z &lt;- if (is.complex(x)) 
    #     .Internal(colMeans(Re(x), n, prod(dn), na.rm)) + (0+1i) * 
    #     .Internal(colMeans(Im(x), n, prod(dn), na.rm))
    #   else .Internal(colMeans(x, n, prod(dn), na.rm))
    #   if (length(dn) &gt; 1L) {
    #     dim(z) &lt;- dn
    #     dimnames(z) &lt;- dimnames(x)[-(1L:dims)]
    #   }
    #   else names(z) &lt;- dimnames(x)[[dims + 1]]
    #   z
    # }
    # &lt;bytecode: 0x0000000008f89d20&gt;
    #   &lt;environment: namespace:base&gt;

Huh? It also just calls `.Internal(colMeans(...` which we can also find in the [rabbit hole](https://github.com/wch/r-source/blob/trunk/src/main/names.c#L731). So how is this different from `.Internal(lapply(..`?

Actually a quick benchmark reveals that `sapply` performs no worse than `colMeans` and much better than a `for` loop for a big data set

    m &lt;- as.data.frame(matrix(1:1e7, ncol = 1e5))
    system.time(colMeans(m))
    # user  system elapsed 
    # 1.69    0.03    1.73 
    system.time(sapply(m, mean))
    # user  system elapsed 
    # 1.50    0.03    1.60 
    system.time(apply(m, 2, mean))
    # user  system elapsed 
    # 3.84    0.03    3.90 
    system.time(for(i in 1:ncol(m)) mean(m[, i]))
    # user  system elapsed 
    # 13.78    0.01   13.93 

In other words, is it correct to say that `lapply` and `vapply` **are actually vectorised** (compared to `apply` which is a `for` loop that also calls `lapply`) and what did Patrick Burns really mean to say?


Is the &quot;*apply&quot; family really not vectorized?

I know that in Shiny Server Pro there is a function of password control.
The question is that Shiny has function passwordInput(), which is like textInput()
Has anybody thought about how to do the following:

1) Launching the application only after correct password input
2) Launching the part of application after correct password input (for example, I have some tabs in shinydashboard, and I want to make an acces to one of them only by password)

Thanks!

Starting Shiny app after password input

I want to select multiple columns based on their names with a *regex* expression. I am trying to do it with the piping syntax of the `dplyr` package. I checked the other topics, but only found answers about a single string.

With base R:

    library(dplyr)    
    mtcars[grepl(&#39;m|ar&#39;, names(mtcars))]
    ###                      mpg am gear carb
    ### Mazda RX4           21.0  1    4    4
    ### Mazda RX4 Wag       21.0  1    4    4

However it doesn&#39;t work with the select/contains way:

    mtcars %&gt;% select(contains(&#39;m|ar&#39;))
    ### data frame with 0 columns and 32 rows

What&#39;s wrong?


select columns based on multiple strings with dplyr contains()

I am new to dplyr and trying to do the following transformation without any luck. I&#39;ve searched across the internet and I have found examples to do the same in ddply but I&#39;d like to use dplyr.

I have the following data:

       month   type  count
    1  Feb-14  bbb   341
    2  Feb-14  ccc   527
    3  Feb-14  aaa  2674
    4  Mar-14  bbb   811
    5  Mar-14  ccc  1045
    6  Mar-14  aaa  4417
    7  Apr-14  bbb  1178
    8  Apr-14  ccc  1192
    9  Apr-14  aaa  4793
    10 May-14  bbb   916
    ..    ...  ...   ...

I want to use dplyr to calculate the percentage of each type (aaa, bbb, ccc) at a month level i.e.

       month   type  count  per
    1  Feb-14  bbb   341    9.6%
    2  Feb-14  ccc   527    14.87%
    3  Feb-14  aaa  2674    ..
    ..    ...  ...   ...

I&#39;ve tried

    data %&gt;%
      group_by(month, type) %&gt;%
      summarise(count / sum(count))

This gives a 1 as each value. How do I make the sum(count) sum across all the types in the month?

Finding percentage in a sub-group using group_by and summarise

So, if one wishes to apply an operation row by row in dplyr, one can use the `rowwise` function, for example: https://stackoverflow.com/questions/21818181/applying-a-function-to-every-row-of-a-table-using-dplyr/24728107#24728107

Is there a `unrowwise` function which you can use to stop doing operations row by row? Currently, it seems adding a `group_by` after the `rowwise` removes row operations, e.g.

    data.frame(a=1:4) %&gt;% rowwise() %&gt;% group_by(a)
    # ...
    # Warning message:
    # Grouping rowwise data frame strips rowwise nature 

Does this mean one should use `group_by(1)` if you wish to explicitly remove `rowwise`?


How does one stop using rowwise in dplyr?

If I have a large DF (hundreds and hundreds) columns with different  col_names randomly distributed alphabetically:

    df.x &lt;- data.frame(2:11, 1:10, rnorm(10))
    colnames(df.x) &lt;- c(&quot;ID&quot;, &quot;string&quot;, &quot;delta&quot;)

How would I order all of the data (vertically) by col_name alphabetically?

Essentially, I have hundreds of CSV(sep=&quot;|&quot;) text files that I need to read their columns into a single df, order those columns alphabetically and then use some other dplyf functions to get a final result. I have all of this figured out except how to order the columns alphabetically. I do not want to sort the columns (up and down) by alphabet, rather, the actual vertical orientation of the col_names and their corresponding data. Analogous to cutting and pasting entire columns of data in Excel.

For example I reviewed this approach but this is the &quot;sort&quot; the rows alphabetically bit, which is not what I&#39;m looking to do.

https://stackoverflow.com/questions/1296646/how-to-sort-a-dataframe-by-columns-in-r?rq=1

Thanks!

Content Type	Original Author	Original Content on Stackoverflow
Question	emehex	View Question on Stackoverflow
Solution 1 - R	JeffZheng	View Answer on Stackoverflow
Solution 2 - R	shacke	View Answer on Stackoverflow
Solution 3 - R	emehex	View Answer on Stackoverflow
Solution 4 - R	Anya Sti	View Answer on Stackoverflow

Removing NA observations with dplyr::filter()

R Problem Overview

R Solutions

Solution 1 - R

Solution 2 - R

Solution 3 - R

Solution 4 - R

How to efficiently compute average on the fly (moving average)?

How to make a line break on the Python ternary operator?

Attributions