What is the most useful R trick?

R

R Problem Overview


In order to share some more tips and tricks for R, what is your single-most useful feature or trick? Clever vectorization? Data input/output? Visualization and graphics? Statistical analysis? Special functions? The interactive environment itself?

One item per post, and we will see if we get a winner by means of votes.

[Edit 25-Aug 2008]: So after one week, it seems that the simple str() won the poll. As I like to recommend that one myself, it is an easy answer to accept.

R Solutions


Solution 1 - R

str() tells you the structure of any object.

Solution 2 - R

One very useful function I often use is dput(), which allows you to dump an object in the form of R code.

# Use the iris data set
R> data(iris)
# dput of a numeric vector
R> dput(iris$Petal.Length)
c(1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.6, 
1.4, 1.1, 1.2, 1.5, 1.3, 1.4, 1.7, 1.5, 1.7, 1.5, 1, 1.7, 1.9, 
1.6, 1.6, 1.5, 1.4, 1.6, 1.6, 1.5, 1.5, 1.4, 1.5, 1.2, 1.3, 1.4, 
1.3, 1.5, 1.3, 1.3, 1.3, 1.6, 1.9, 1.4, 1.6, 1.4, 1.5, 1.4, 4.7, 
4.5, 4.9, 4, 4.6, 4.5, 4.7, 3.3, 4.6, 3.9, 3.5, 4.2, 4, 4.7, 
3.6, 4.4, 4.5, 4.1, 4.5, 3.9, 4.8, 4, 4.9, 4.7, 4.3, 4.4, 4.8, 
5, 4.5, 3.5, 3.8, 3.7, 3.9, 5.1, 4.5, 4.5, 4.7, 4.4, 4.1, 4, 
4.4, 4.6, 4, 3.3, 4.2, 4.2, 4.2, 4.3, 3, 4.1, 6, 5.1, 5.9, 5.6, 
5.8, 6.6, 4.5, 6.3, 5.8, 6.1, 5.1, 5.3, 5.5, 5, 5.1, 5.3, 5.5, 
6.7, 6.9, 5, 5.7, 4.9, 6.7, 4.9, 5.7, 6, 4.8, 4.9, 5.6, 5.8, 
6.1, 6.4, 5.6, 5.1, 5.6, 6.1, 5.6, 5.5, 4.8, 5.4, 5.6, 5.1, 5.1, 
5.9, 5.7, 5.2, 5, 5.2, 5.4, 5.1)
# dput of a factor levels
R> dput(levels(iris$Species))
c("setosa", "versicolor", "virginica")

It can be very useful to post easily reproducible data chunks when you ask for help, or to edit or reorder the levels of a factor.

Solution 3 - R

head() and tail() to get the first and last parts of a dataframe, vector, matrix, function, etc. Especially with large data frames, this is a quick way to check that it has loaded ok.

Solution 4 - R

One nice feature: Reading data uses connections which can be local files, remote files accessed via http, pipes from other programs or more.

As a simple example, consider this access for N=10 random integers between min=100 and max=200 from random.org (which supplies true random numbers based on atmospheric noise rather than a pseudo random number generator):

R> site <- "http://random.org/integers/"         # base URL
R> query <- "num=10&min=100&max=200&col=2&base=10&format=plain&rnd=new"
R> txt <- paste(site, query, sep="?")            # concat url and query string
R> nums <- read.table(file=txt)                  # and read the data
R> nums                                          # and show it
   V1  V2
1 165 143
2 107 118
3 103 132
4 191 100
5 138 185
R>

As an aside, the random package provides several convenience functions for accessing random.org.

Solution 5 - R

I find I am using with() and within() more and more. No more $ littering my code and one doesn't need to start attaching objects to the search path. More seriously, I find with() etc make the intention of my data analysis scripts much clearer.

> df <- data.frame(A = runif(10), B = rnorm(10))
> A <- 1:10 ## something else hanging around...
> with(df, A + B) ## I know this will use A in df!
 [1]  0.04334784 -0.40444686  1.99368816  0.13871605 -1.17734837
 [6]  0.42473812  2.33014226  1.61690799  1.41901860  0.8699079

with() sets up an environment within which the R expression is evaluated. within() does the same thing but allows you to modify the data object used to create the environment.

> df <- within(df, C <- rpois(10, lambda = 2))
> head(df)
           A          B C
1 0.62635571 -0.5830079 1
2 0.04810539 -0.4525522 1
3 0.39706979  1.5966184 3
4 0.95802501 -0.8193090 2
5 0.76772541 -1.9450738 2
6 0.21335006  0.2113881 4

Something I didn't realise when I first used within() is that you have to do an assignment as part of the expression evaluated and assign the returned object (as above) to get the desired effect.

Solution 6 - R

Data Input trick = RGoogleDocs package

http://www.omegahat.org/RGoogleDocs/

I have found Google spreadsheets to be a fantastic way for all collaborators to be on the same page. Furthermore, Google Forms allows one to capture data from respondents and effortlessly write it to a google spreadsheet. Since data changes frequently and is almost never final it is far preferable for R to read a google spreadsheet directly than to futz with downloading csv files and reading them in.

# Get data from google spreadsheet
library(RGoogleDocs)
ps <-readline(prompt="get the password in ")
auth = getGoogleAuth("[email protected]", ps, service="wise")
sheets.con <- getGoogleDocsConnection(auth)
ts2=getWorksheets("Data Collection Repos",sheets.con)
names(ts2)
init.consent <-sheetAsMatrix(ts2$Sheet1,header=TRUE, as.data.frame=TRUE, trim=TRUE)

I cannot rembember which but one or two of the following commands takes several seconds.

  1. getGoogleAuth

  2. getGoogleDocsConnection

  3. getWorksheets

Solution 7 - R

Use backticks to reference non standard names.

> df <- data.frame(x=rnorm(5),y=runif(5))
> names(df) <- 1:2
> df
           1         2
1 -1.2035003 0.6989573
2 -1.2146266 0.8272276
3  0.3563335 0.0947696
4 -0.4372646 0.9765767
5 -0.9952423 0.6477714
> df$1
Error: unexpected numeric constant in "df$1"
> df$`1`
[1] -1.2035003 -1.2146266  0.3563335 -0.4372646 -0.9952423

In this case, df[,"1"] would also work. But back ticks work inside formulas!

> lm(`2`~`1`,data=df)

Call:
lm(formula = `2` ~ `1`, data = df)

Coefficients:
(Intercept)          `1`  
     0.4087      -0.3440  

[Edit] Dirk asks why one would give invalid names? I don't know! But I certainly encounter this problem in practice fairly often. For example, using hadley's reshape package:

> library(reshape)
> df$z <- c(1,1,2,2,2)
> recast(df,z~.,id.var="z")
Aggregation requires fun.aggregate: length used as default
  z (all)
1 1     4
2 2     6
> recast(df,z~.,id.var="z")$(all)
Error: unexpected '(' in "recast(df,z~.,id.var="z")$("
> recast(df,z~.,id.var="z")$`(all)`
Aggregation requires fun.aggregate: length used as default
[1] 4 6

Solution 8 - R

Don't know how well known this is/isn't, but something that I've definitely taken advantage of are the pass-by-reference capabilities of environments.

zz <- new.env()
zz$foo <- c(1,2,3,4,5)
changer <- function(blah) {
   blah$foo <- 5
}
changer(zz)
zz$foo

For this example it doesn't make sense why it'd be useful, but if you're passing large objects around it can help.

Solution 9 - R

My new favorite thing is the foreach library. It lets you do all of the nice apply things, but with a somewhat easier syntax:

list_powers <- foreach(i = 1:100) %do% {
  lp <- x[i]^i
  return (lp)
}

The best part is that if you are doing something that actually requires a significant amount of time, you can switch from %do% to %dopar% (with the appropriate backend library) to instantly parallelize, even across a cluster. Very slick.

Solution 10 - R

I do a lot of basic manipulation of data, so here are two built-in functions ( transform , subset ) and one library ( sqldf ) that I use daily.

create sample sales data

sales <- expand.grid(country = c('USA', 'UK', 'FR'),
                     product = c(1, 2, 3))
sales$revenue <- rnorm(dim(sales)[1], mean=100, sd=10)

> sales
  country product   revenue
1     USA       1 108.45965
2      UK       1  97.07981
3      FR       1  99.66225
4     USA       2 100.34754
5      UK       2  87.12262
6      FR       2 112.86084
7     USA       3  95.87880
8      UK       3  96.43581
9      FR       3  94.59259

use transform() to add a column

## transform currency to euros
usd2eur <- 1.434
transform(sales, euro = revenue * usd2eur)

>
  country product   revenue     euro
1     USA       1 108.45965 155.5311
2      UK       1  97.07981 139.2125
3      FR       1  99.66225 142.9157
...

use subset() to slice the data

subset(sales, 
       country == 'USA' & product %in% c(1, 2), 
       select = c('product', 'revenue'))

>
  product  revenue
1       1 108.4597
4       2 100.3475

use sqldf() to slice and aggregate with SQL

The sqldf package provides an SQL interface to R data frames

##  recast the previous subset() expression in SQL
sqldf('SELECT product, revenue FROM sales \
       WHERE country = "USA" \
       AND product IN (1,2)')

>
  product  revenue
1       1 108.4597
2       2 100.3475

Perform an aggregation or GROUP BY

sqldf('select country, sum(revenue) revenue \ 
       FROM sales \
       GROUP BY country')

>
  country  revenue
1      FR 307.1157
2      UK 280.6382
3     USA 304.6860

For more sophisticated map-reduce-like functionality on data frames, check out the plyr package. And if find yourself wanting to pull your hair out, I recommend checking out Data Manipulation with R.

Solution 11 - R

?ave

Subsets of 'x[]' are averaged, where each subset consist of those observations with the same factor levels. Usage: ave(x, ..., FUN = mean)

I use it all the time. (e.g. in this answer here at so)

Solution 12 - R

A way to speed up code and eliminate for loops.

instead of for loops that loop through a dataframe looking for values. just take a subset of the df with those values, much quicker.

so instead of:

for(i in 1:nrow(df)){
  if (df$column[i] == x) {
    df$column2[i] <- y
    or any other similiar code
  }
}

do something like this:

df$column2[df$column1 == x] <- y

that base concept is applicable extremely often and is a great way to get rid of for loops

Solution 13 - R

Sometimes you need to rbind multiple data frames. do.call() will let you do that (someone had to explain this to me when bind I asked this question, as it doesn't appear to be an obvious use).

foo <- list()

foo[[1]] <- data.frame(a=1:5, b=11:15)
foo[[2]] <- data.frame(a=101:105, b=111:115)
foo[[3]] <- data.frame(a=200:210, b=300:310)
    
do.call(rbind, foo)

Solution 14 - R

In R programming (not interactive sessions), I use if (bad.condition) stop("message") a lot. Every function starts with a few of these, and as I work through computations, I pepper these in, too. I guess I got into the habit from using assert() in C. The benefits are two-fold. First, it's a lot faster to get working code with these checks in place. Second, and probably more important, it is a lot easier to work with existing code when you see these checks on every screen in your editor. You won't have to wonder whether x>0, or trust a comment stating that it is ... you'll know, from a glance, that it is.

PS. my first post here. Be gentle!

Solution 15 - R

The traceback() function is a must when you have an error somewhere and do not understand it readily. It will print a trace of the stack, very helpful as R is not very verbose by default.

Then setting options(error=recover) will allow you to "enter" into the function raising the error and try and understand what happens exactly, as if you had full control over it and could put a browser() in it.

These three functions can really help debugging your code.

Solution 16 - R

I'm really surprised no one has posted about apply, tapply, lapply, and sapply. A general rule I use when doing stuff in R is that if I have a for loop that is doing data processing or simulations, I try to factor it out and replace it with an *apply. Some people shy away from the *apply functions because they think only single parameter functions can be passed in. Nothing could be further from the truth! Like passing around functions with parameters as first class objects in Javascript, you do this in R with anonymous functions. For example:

 > sapply(rnorm(100, 0, 1), round)
  [1]  1  1  0  1  1 -1 -2  0  2  2 -2 -1  0  1 -1  0  1 -1  0 -1  0  0  0  0  0
 [26]  2  0 -1 -2  0  0  1 -1  1  5  1 -1  0  1  1  1  2  0 -1  1 -1  1  0 -1  1
 [51]  2  1  1 -2 -1  0 -1  2 -1  1 -1  1 -1  0 -1 -2  1  1  0 -1 -1  1  1  2  0
 [76]  0  0  0 -2 -1  1  1 -2  1 -1  1  1  1  0  0  0 -1 -3  0 -1  0  0  0  1  1


> sapply(rnorm(100, 0, 1), round(x, 2)) # How can we pass a parameter?
Error in match.fun(FUN) : object 'x' not found


# Wrap your function call in an anonymous function to use parameters
> sapply(rnorm(100, 0, 1), function(x) {round(x, 2)})
  [1] -0.05 -1.74 -0.09 -1.23  0.69 -1.43  0.76  0.55  0.96 -0.47 -0.81 -0.47
 [13]  0.27  0.32  0.47 -1.28 -1.44 -1.93  0.51 -0.82 -0.06 -1.41  1.23 -0.26
 [25]  0.22 -0.04 -2.17  0.60 -0.10 -0.92  0.13  2.62  1.03 -1.33 -1.73 -0.08
 [37]  0.45 -0.93  0.40  0.05  1.09 -1.23 -0.35  0.62  0.01 -1.08  1.70 -1.27
 [49]  0.55  0.60 -1.46  1.08 -1.88 -0.15  0.21  0.06  0.53 -1.16 -2.13 -0.03
 [61]  0.33 -1.07  0.98  0.62 -0.01 -0.53 -1.17 -0.28 -0.95  0.71 -0.58 -0.03
 [73] -1.47 -0.75 -0.54  0.42 -1.63  0.05 -1.90  0.40 -0.01  0.14 -1.58  1.37
 [85] -1.00 -0.90  1.69 -0.11 -2.19 -0.74  1.34 -0.75 -0.51 -0.99 -0.36 -1.63
 [97] -0.98  0.61  1.01  0.55

# Note that anonymous functions aren't being called, but being passed.
> function() {print('hello #rstats')}()
function() {print('hello #rstats')}()
> a = function() {print('hello #rstats')}
> a
function() {print('hello #rstats')}
> a()
[1] "hello #rstats"

(For those that follow #rstats, I also posted this there).

Remember, use apply, sapply, lapply, tapply, and do.call! Take avantage of R's vectorization. You should never walk up to a bunch of R code and see:

N = 10000
l = numeric()
for (i in seq(1:N)) {
    sim <- rnorm(1, 0, 1)
    l <- rbind(l, sim)
}

Not only is this not vectorized, but the array structure in R is not grown as it is in Python (doubling size when space runs out, IIRC). So each rbind step must first grow l enough to accept the results from rbind(), then copy all over the previous l's contents. For fun, try the above in R. Notice how long it takes (you won't even need Rprof or any timing function). Then try

N=10000
l <- rnorm(N, 0, 1)

The following is better than the first version too:

N = 10000
l = numeric(N)
for (i in seq(1:N)) {
    sim <- rnorm(1, 0, 1)
    l[i] <- sim
}

Solution 17 - R

Upon Dirk's advice, I am posting single examples. I hope they are not too "cute" [clever, but I don't care] or trivial for this audience.

Linear models are the bread and butter of R. When the number of independent variables is high, one has two choices. The first is to it use lm.fit(), which receives the design matrix x and the response y as arguments, similarly to Matlab. The drawback to this approach is that the return value is a list of objects (fitted coefficients, residuals, etc), not an object of class "lm", which can be nicely summarized, used for prediction, stepwise selection, etc. The second approach is create a formula:

> A
           X1         X2          X3         X4         y
1  0.96852363 0.33827107 0.261332257 0.62817021 1.6425326
2  0.08012755 0.69159828 0.087994158 0.93780481 0.9801304
3  0.10167545 0.38119304 0.865209832 0.16501662 0.4830873
4  0.06699458 0.41756415 0.258071616 0.34027775 0.7508766
   ...

> (f=paste("y ~",paste(names(A)[1:4],collapse=" + ")))
[1] "y ~ X1 + X2 + X3 + X4"

> lm(formula(f),data=A)

Call:
lm(formula = formula(f), data = A)

Coefficients:
(Intercept)           X1           X2           X3           X4  
    0.78236      0.95406     -0.06738     -0.43686     -0.06644  

Solution 18 - R

You can assign a value returning from an if-else block.

Instead of, e.g.

condition <- runif(1) > 0.5
if(condition) x <- 1 else x <- 2

you can do

x <- if(condition) 1 else 2

Exactly how this works is deep magic.

Solution 19 - R

As a total noob to R and a novice at stats I love unclass() to print all elements of a data frame as an ordinary list.

It's pretty handy for a look at a complete data set all in one go to quickly eyeball any potential issues.

Solution 20 - R

CrossTable() from the gmodels package provides easy access to SAS- and SPSS-style crosstabs, along with the usual tests (Chisq, McNemar, etc.). Basically, it's xtabs() with fancy output and some additional tests - but it does make sharing output with the heathens easier.

Solution 21 - R

Definitively system(). To be able to have access to all the unix tools (at least under Linux/MacOSX) from inside the R environment has rapidly become invaluable in my daily workflow.

Solution 22 - R

Here is an annoying workaround to convert a factor into a numeric. (Similar for other data types as well)

old.var <- as.numeric(levels(old.var))[as.numeric(old.var)]

Solution 23 - R

Although this question has been up for a while I recently discovered a great trick on the [SAS and R blog][1] for using the command cut. The command is used to divide data into categories and I will use the iris dataset as an example and divide it into 10 categories:

> irisSL <- iris$Sepal.Length
> str(irisSL)
 num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
> cut(irisSL, 10)
  [1] (5.02,5.38] (4.66,5.02] (4.66,5.02] (4.3,4.66]  (4.66,5.02] (5.38,5.74] (4.3,4.66]  (4.66,5.02] (4.3,4.66]  (4.66,5.02]
 [11] (5.38,5.74] (4.66,5.02] (4.66,5.02] (4.3,4.66]  (5.74,6.1]  (5.38,5.74] (5.38,5.74] (5.02,5.38] (5.38,5.74] (5.02,5.38]
 [21] (5.38,5.74] (5.02,5.38] (4.3,4.66]  (5.02,5.38] (4.66,5.02] (4.66,5.02] (4.66,5.02] (5.02,5.38] (5.02,5.38] (4.66,5.02]
 [31] (4.66,5.02] (5.38,5.74] (5.02,5.38] (5.38,5.74] (4.66,5.02] (4.66,5.02] (5.38,5.74] (4.66,5.02] (4.3,4.66]  (5.02,5.38]
 [41] (4.66,5.02] (4.3,4.66]  (4.3,4.66]  (4.66,5.02] (5.02,5.38] (4.66,5.02] (5.02,5.38] (4.3,4.66]  (5.02,5.38] (4.66,5.02]
 [51] (6.82,7.18] (6.1,6.46]  (6.82,7.18] (5.38,5.74] (6.46,6.82] (5.38,5.74] (6.1,6.46]  (4.66,5.02] (6.46,6.82] (5.02,5.38]
 [61] (4.66,5.02] (5.74,6.1]  (5.74,6.1]  (5.74,6.1]  (5.38,5.74] (6.46,6.82] (5.38,5.74] (5.74,6.1]  (6.1,6.46]  (5.38,5.74]
 [71] (5.74,6.1]  (5.74,6.1]  (6.1,6.46]  (5.74,6.1]  (6.1,6.46]  (6.46,6.82] (6.46,6.82] (6.46,6.82] (5.74,6.1]  (5.38,5.74]
 [81] (5.38,5.74] (5.38,5.74] (5.74,6.1]  (5.74,6.1]  (5.38,5.74] (5.74,6.1]  (6.46,6.82] (6.1,6.46]  (5.38,5.74] (5.38,5.74]
 [91] (5.38,5.74] (5.74,6.1]  (5.74,6.1]  (4.66,5.02] (5.38,5.74] (5.38,5.74] (5.38,5.74] (6.1,6.46]  (5.02,5.38] (5.38,5.74]
[101] (6.1,6.46]  (5.74,6.1]  (6.82,7.18] (6.1,6.46]  (6.46,6.82] (7.54,7.9]  (4.66,5.02] (7.18,7.54] (6.46,6.82] (7.18,7.54]
[111] (6.46,6.82] (6.1,6.46]  (6.46,6.82] (5.38,5.74] (5.74,6.1]  (6.1,6.46]  (6.46,6.82] (7.54,7.9]  (7.54,7.9]  (5.74,6.1] 
[121] (6.82,7.18] (5.38,5.74] (7.54,7.9]  (6.1,6.46]  (6.46,6.82] (7.18,7.54] (6.1,6.46]  (5.74,6.1]  (6.1,6.46]  (7.18,7.54]
[131] (7.18,7.54] (7.54,7.9]  (6.1,6.46]  (6.1,6.46]  (5.74,6.1]  (7.54,7.9]  (6.1,6.46]  (6.1,6.46]  (5.74,6.1]  (6.82,7.18]
[141] (6.46,6.82] (6.82,7.18] (5.74,6.1]  (6.46,6.82] (6.46,6.82] (6.46,6.82] (6.1,6.46]  (6.46,6.82] (6.1,6.46]  (5.74,6.1] 
10 Levels: (4.3,4.66] (4.66,5.02] (5.02,5.38] (5.38,5.74] (5.74,6.1] (6.1,6.46] (6.46,6.82] (6.82,7.18] ... (7.54,7.9]

[1]: http://sas-and-r.blogspot.com/2010/01/example-721-write-function-to-simulate.html "SAS and R blog"

Solution 24 - R

Another trick. Some packages, like glmnet, only take as inputs the design matrix and the response variable. If one wants to fit a model with all interactions between features, she can't use the formula "y ~ .^2". Using expand.grid() allows us to take advantage of the powerful array indexing and vector operations of R.

interArray=function(X){
    n=ncol(X)
    ind=expand.grid(1:n,1:n)
    return(X[,ind[,1]]*X[,ind[,2]])
}

> X
          X1         X2
1 0.96852363 0.33827107
2 0.08012755 0.69159828
3 0.10167545 0.38119304
4 0.06699458 0.41756415
5 0.08187816 0.09805104

> interArray(X)
           X1          X2        X1.1        X2.1
1 0.938038022 0.327623524 0.327623524 0.114427316
2 0.006420424 0.055416073 0.055416073 0.478308177
3 0.010337897 0.038757974 0.038757974 0.145308137
4 0.004488274 0.027974536 0.027974536 0.174359821
5 0.006704033 0.008028239 0.008028239 0.009614007

Solution 25 - R

One of my favorite, if not somewhat unorthodox tricks, is the use of eval() and parse(). This example perhaps illustrates how it can be helpful

NY.Capital <- 'Albany'
state <- 'NY'
parameter <- 'Capital'
eval(parse(text=paste(state, parameter, sep='.')))

[1] "Albany"

This type of situation occurs more often than not, and use of eval() and parse() can help address it. Of course, I welcome any feedback on alternative ways of coding this up.

Solution 26 - R

set.seed() sets the random number generator state.

For example:

> set.seed(123)
> rnorm(1)
[1] -0.5604756
> rnorm(1)
[1] -0.2301775
> set.seed(123)
> rnorm(1)
[1] -0.5604756

Solution 27 - R

For those who are writing C to be called from R: .Internal(inspect(...)) is handy. For example:

> .Internal(inspect(quote(a+2)))
  @867dc28 06 LANGSXP g0c0 [] 
  @8436998 01 SYMSXP g1c0 [MARK,gp=0x4000] "+"
  @85768b0 01 SYMSXP g1c0 [MARK,NAM(2)] "a"
  @8d7bf48 14 REALSXP g0c1 [] (len=1, tl=0) 2

Solution 28 - R

d = '~/R Code/Library/'

files = list.files(d,'.r$')

for (f in files) { if (!(f == 'mysource.r' )) { print(paste('Sourcing',f)) source(paste(d,f,sep='')) } }

I use the above code to source all the files in a directory at start up with various utility programs I use in my interactive session with R. I am sure there are better ways but I find it useful for my work. The line that does this is as follows.

source("~/R Code/Library/mysource.r")

Solution 29 - R

To perform an operation on a number of variables in a data frame. This is stolen from subset.data.frame.

get.vars<-function(vars,data){
	nl <- as.list(1L:ncol(data))
	names(nl) <- names(data)
	vars <- eval(substitute(vars), nl, parent.frame())
	data[,vars]
    #do stuff here
}

get.vars(c(cyl:hwy,class),mpg)

Solution 30 - R

I've posted this once before but I use it so much I thought I'd post it again. Its just a little function to return the names and position numbers of a data.frame. Its nothing special to be sure, but I almost never make it through a session without using it multiple times.

##creates an object from a data.frame listing the column names and location

namesind=function(df){

temp1=names(df)
temp2=seq(1,length(temp1))
temp3=data.frame(temp1,temp2)
names(temp3)=c("VAR","COL")
return(temp3)
rm(temp1,temp2,temp3)

}

ni <- namesind

Solution 31 - R

I mention this one because there is a distinct lack of examples using it on SO.

The new(ish) aggregate.formula syntax makes it much more flexible and useful than the old generic aggregate. It keeps the name of the aggregated variable, is more compact than the list syntax, and it allows you to do multiple variables at the same time including dot notation on either side of the formula.

use

newdf <- aggregate( cbind(rt, acc) ~ x + y + subj, olddf, mean )

instead of...

newdf <- with( olddf, aggregate( rt, list(x = x, y = y, subj = subj), mean ))
names(newdf)[4] <- 'rt'
newdf$acc <- with( olddf, 
                   aggregate( acc, list(x = x, y = y, subj = subj), mean ))[,4]

Perhaps as a bit of a side note... the aggregate.data.frame examples in ?aggregate as well. The function does a lot of things people don't know about.

Solution 32 - R

As a recent R addict, I love the ?function_name and use it all the time

-k

Solution 33 - R

It seems I cannot comment (maybe it has to do with this "reputation" business)

Anyway further to the RGoogleDocs tips above:

ps <-readline(prompt="get the password in ")

This won't work from within Emacs, which I like to use for R, with ESS of course.

On Linux, you can use zenity to get the password from user input, and set it to hide the input, so as an additional benefit, your password is not plaintext on your screen:

mypass <- system("zenity --entry --hide-text",intern=TRUE)

Solution 34 - R

I like the expressiveness of the language, e.g. selects:

df[df$col > something, c('col2', 'col3')]

tapply:

tapply(df$col_value, df$col_factor, function)

etc.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDirk EddelbuettelView Question on Stackoverflow
Solution 1 - RhadleyView Answer on Stackoverflow
Solution 2 - RjubaView Answer on Stackoverflow
Solution 3 - RRob HyndmanView Answer on Stackoverflow
Solution 4 - RDirk EddelbuettelView Answer on Stackoverflow
Solution 5 - RGavin SimpsonView Answer on Stackoverflow
Solution 6 - RFarrelView Answer on Stackoverflow
Solution 7 - REduardo LeoniView Answer on Stackoverflow
Solution 8 - RgeoffjentryView Answer on Stackoverflow
Solution 9 - RJAShapiroView Answer on Stackoverflow
Solution 10 - RmedriscollView Answer on Stackoverflow
Solution 11 - REduardo LeoniView Answer on Stackoverflow
Solution 12 - RDanView Answer on Stackoverflow
Solution 13 - RandrewjView Answer on Stackoverflow
Solution 14 - RdankView Answer on Stackoverflow
Solution 15 - RCalimoView Answer on Stackoverflow
Solution 16 - RVinceView Answer on Stackoverflow
Solution 17 - RgappyView Answer on Stackoverflow
Solution 18 - RRichie CottonView Answer on Stackoverflow
Solution 19 - RJohnView Answer on Stackoverflow
Solution 20 - RMatt ParkerView Answer on Stackoverflow
Solution 21 - RPaoloView Answer on Stackoverflow
Solution 22 - RRyan R. RosarioView Answer on Stackoverflow
Solution 23 - RStedyView Answer on Stackoverflow
Solution 24 - RgappyView Answer on Stackoverflow
Solution 25 - RandrewjView Answer on Stackoverflow
Solution 26 - RChristopher DuBoisView Answer on Stackoverflow
Solution 27 - RJoshua UlrichView Answer on Stackoverflow
Solution 28 - RmcheemaView Answer on Stackoverflow
Solution 29 - RIan FellowsView Answer on Stackoverflow
Solution 30 - Rkpierce8View Answer on Stackoverflow
Solution 31 - RJohnView Answer on Stackoverflow
Solution 32 - RknguyenView Answer on Stackoverflow
Solution 33 - RMarianneView Answer on Stackoverflow
Solution 34 - RTomasView Answer on Stackoverflow