Test for numeric elements in a character string

RegexR

Regex Problem Overview


I want to test a character string and see which elements could actually be numeric. I can use regex to test for integer successful but am looking to see which elements have all digits and 1 or less decimals. Below is what I've tried:

x <- c("0.33", ".1", "3", "123", "2.3.3", "1.2r")
!grepl("[^0-9]", x)   #integer test

grepl("[^0-9[\\.{0,1}]]", x)  # I know it's wrong but don't know what to do

I'm looking for a logical output so I'd expect the following results:

[1] TRUE TRUE TRUE TRUE FALSE FALSE

Regex Solutions


Solution 1 - Regex

Maybe there's a reason some other pieces of your data are more complicated that would break this, but my first thought is:

> !is.na(as.numeric(x))
[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE

As noted below by Josh O'Brien this won't pick up things like 7L, which the R interpreter would parse as the integer 7. If you needed to include those as "plausibly numeric" one route would be to pick them out with a regex first,

x <- c("1.2","1e4","1.2.3","5L")
> x
[1] "1.2"   "1e4"   "1.2.3" "5L"   
> grepl("^[[:digit:]]+L",x)
[1] FALSE FALSE FALSE  TRUE

...and then strip the "L" from just those elements using gsub and indexing.

Solution 2 - Regex

I recently encountered a similar problem where I was trying to write a function to format values passed as a character string from another function. The formatted values would ultimately end up in a table and I wanted to create logic to identify NA, character strings, and character representations of numbers so that I could apply sprintf() on them before generating the table.

Although more complicated to read, I do like the robustness of the grepl() approach. I think this gets all of the examples brought up in the comments.

x <- c("0",37,"42","-5","-2.3","1.36e4","4L","La","ti","da",NA)

y <- grepl("[-]?[0-9]+[.]?[0-9]*|[-]?[0-9]+[L]?|[-]?[0-9]+[.]?[0-9]*[eE][0-9]+",x)

This would be evaluate to (formatted to help with visualization):

x
[1] "0"  "37"   "42"  "-5"   "-2.3"   "1.36e4" "4L" "La"     "ti"     "da"     NA 

y
[1] TRUE  TRUE   TRUE  TRUE   TRUE     TRUE    TRUE FALSE   FALSE    FALSE    FALSE

The regular expression is TRUE for:

  • positive or negative numbers with no more than one decimal OR
  • positive or negative integers (e.g., 4L) OR
  • positive or negative numbers in scientific notation

Additional terms could be added to handle decimals without a leading digit or numbers with a decimal point but not digits after the decimal if the dataset contained numbers in poor form.

Solution 3 - Regex

Avoid re-inventing the wheel with check.numeric() from package varhandle.

The function accepts the following arguments:

> v The character vector or factor vector. (Mandatory) > > na.rm logical. Should the function ignore NA? Default value is FLASE > since NA can be converted to numeric. (Optional) > > only.integer logical. Only check for integers and do not accept > floating point. Default value is FALSE. (Optional) > > exceptions A character vector containing the strings that should be > considered as valid to be converted to numeric. (Optional) > > ignore.whitespace logical. Ignore leading and tailing whitespace > characters before assessing if the vector can be converted to numeric. > Default value is TRUE. (Optional)

Solution 4 - Regex

Another possibility:

x <- c("0.33", ".1", "3", "123", "2.3.3", "1.2r", "1.2", "1e4", "1.2.3", "5L", ".22", -3)
locs <- sapply(x, function(n) {

    out <- try(eval(parse(text = n)), silent = TRUE)
    !inherits(out, 'try-error')

}, USE.NAMES = FALSE)

x[locs]
## [1] "0.33" ".1"   "3"    "123"  "1.2"  "1e4"  "5L"   ".22"  "-3"  

x[!locs]
## [1] "2.3.3" "1.2r"  "1.2.3"

Solution 5 - Regex

Inspired by the answers here, my function trims leading and trailing white spaces, can handel na.strings, and optionally treats NA as numeric like. Regular expression was enhanced as well. See the help info for details. All you want!

check if a str obj is actually numeric
@description check if a str obj is actually numeric
#' @param x a str vector, or a factor of str vector, or numeric vector. x will be coerced and trimws.
#' @param na.strings case sensitive strings that will be treated to NA.
#' @param naAsTrue whether NA (including actual NA and na.strings) will be treated as numeric like
#' @return a logical vector (vectorized).
#' @export
#' @note Using regular expression
#' \cr TRUE for any actual numeric c(3,4,5,9.9) or c("-3","+4.4",   "-42","4L","9L",   "1.36e4","1.36E4",    NA, "NA", "","NaN", NaN): 
#' \cr positive or negative numbers with no more than one decimal c("-3","+4.4") OR
#' \cr positive or negative integers (e.g., c("-42","4L","39L")) OR
#' \cr positive or negative numbers in scientific notation c("1.36e4","1.36E4")
#' \cr NA, or na.strings
is.numeric.like <- function(x,naAsTrue=TRUE,na.strings=c('','.','NA','na','N/A','n/a','NaN','nan')){
    x = trimws(x,'both')
    x[x %in% na.strings] = NA
    # https://stackoverflow.com/a/21154566/2292993
    result = grepl("^[\\-\\+]?[0-9]+[\\.]?[0-9]*$|^[\\-\\+]?[0-9]+[L]?$|^[\\-\\+]?[0-9]+[\\.]?[0-9]*[eE][0-9]+$",x,perl=TRUE)
    if (naAsTrue) result = result | is.na(x)
    return((result))
}

Solution 6 - Regex

You can also use:

readr::parse_number("I am 4526dfkljvdljkvvkv")

To get 4526.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTyler RinkerView Question on Stackoverflow
Solution 1 - RegexjoranView Answer on Stackoverflow
Solution 2 - Regexpenguinv22View Answer on Stackoverflow
Solution 3 - RegexqwrView Answer on Stackoverflow
Solution 4 - RegexTyler RinkerView Answer on Stackoverflow
Solution 5 - RegexJerry TView Answer on Stackoverflow
Solution 6 - RegexSteveSView Answer on Stackoverflow