What is your preferred style for naming variables in R?
RCoding StyleNaming ConventionsR Problem Overview
Which conventions for naming variables and functions do you favor in R code?
As far as I can tell, there are several different conventions, all of which coexist in cacophonous harmony:
1. Use of period separator, e.g.
stock.prices <- c(12.01, 10.12)
col.names <- c('symbol','price')
Pros: Has historical precedence in the R community, prevalent throughout the R core, and recommended by Google's R Style Guide.
Cons: Rife with object-oriented connotations, and confusing to R newbies
2. Use of underscores
stock_prices <- c(12.01, 10.12)
col_names <- c('symbol','price')
Pros: A common convention in many programming langs; favored by Hadley Wickham's Style Guide, and used in ggplot2 and plyr packages.
Cons: Not historically used by R programmers; is annoyingly mapped to '<-' operator in Emacs-Speaks-Statistics (alterable with 'ess-toggle-underscore').
3. Use of mixed capitalization (camelCase)
stockPrices <- c(12.01, 10.12)
colNames <- c('symbol','price')
Pros: Appears to have wide adoption in several language communities.
Cons: Has recent precedent, but not historically used (in either R base or its documentation).
Finally, as if it weren't confusing enough, I ought to point out that the Google Style Guide argues for dot notation for variables, but mixed capitalization for functions.
The lack of consistent style across R packages is problematic on several levels. From a developer standpoint, it makes maintaining and extending other's code difficult (esp. where its style is inconsistent with your own). From a R user standpoint, the inconsistent syntax steepens R's learning curve, by multiplying the ways a concept might be expressed (e.g. is that date casting function asDate(), as.date(), or as_date()? No, it's as.Date()).
R Solutions
Solution 1 - R
Good previous answers so just a little to add here:
-
underscores are really annoying for ESS users; given that ESS is pretty widely used you won't see many underscores in code authored by ESS users (and that set includes a bunch of R Core as well as CRAN authors, excptions like Hadley notwithstanding);
-
dots are evil too because they can get mixed up in simple method dispatch; I believe I once read comments to this effect on one of the R list: dots are a historical artifact and no longer encouraged;
-
so we have a clear winner still standing in the last round: camelCase. I am also not sure if I really agree with the assertion of 'lacking precendent in the R community'.
And yes: pragmatism and consistency trump dogma. So whatever works and is used by colleagues and co-authors. After all, we still have white-space and braces to argue about :)
Solution 2 - R
I did a survey of what naming conventions that are actually used on CRAN that got accepted to the R Journal :) Here is a graph summarizing the results:
Turns out (no surprises perhaps) that lowerCamelCase was most often used for function names and period.separated names most often used for parameters. To use UpperCamelCase, as advocated by Google's R style guide is really rare however, and it is a bit strange that they advocate using that naming convention.
The full paper is here:
http://journal.r-project.org/archive/2012-2/RJournal_2012-2_Baaaath.pdf
Solution 3 - R
Underscores all the way! Contrary to popular opinion, there are a number of functions in base R that use underscores. Run grep("^[^\\.]*$", apropos("_"), value = T)
to see them all.
I use the official Hadley style of coding ;)
Solution 4 - R
I like camelCase when the camel actually provides something meaningful -- like the datatype.
dfProfitLoss, where df = dataframe
or
vdfMergedFiles(), where the function takes in a vector and spits out a dataframe
While I think _ really adds to the readability, there just seems to be too many issues with using .-_ or other characters in names. Especially if you work across several languages.
Solution 5 - R
This comes down to personal preference, but I follow the google style guide because it's consistent with the style of the core team. I have yet to see an underscore in a variable in base R.
Solution 6 - R
As I point out here:
it's worth bearing in mind how understandable your variable names are to your co-workers/users if they are non-native speakers...
For that reason I'd say underscores and periods are better than capitalisation, but as you point out consistency is essential within your script.
Solution 7 - R
As others have mentioned, underscores will screw up a lot of folks. No, it's not verboten but it isn't particularly common either.
Using dots as a separator gets a little hairy with S3 classes and the like.
In my experience, it seems like a lot of the high muckity mucks of R prefer the use of camelCase, with some dot usage and a smattering of underscores.
Solution 8 - R
I have a preference for mixedCapitals.
But I often use periods to indicate what the variable type is:
mixedCapitals.mat is a matrix. mixedCapitals.lm is a linear model. mixedCapitals.lst is a list object.
and so on.
Solution 9 - R
Usually I rename my variables using a ix of underscores and a mixed capitalization (camelCase). Simple variables are naming using underscores, example:
PSOE_votes -> number of votes for the PSOE (political group of Spain).
PSOE_states -> Categorical, indicates the state where PSOE wins {Aragon, Andalucia, ...)
PSOE_political_force -> Categorial, indicates the position between political groups of PSOE {first, second, third)
PSOE_07 -> Union of PSOE_votes + PSOE_states + PSOE_political_force at 2007 (header -> votes, states, position)
If my variable is a result of to applied function in one/two Variables I using a mixed capitalization.
Example:
positionXstates <- xtabs(~states+position, PSOE_07)