Merge Multiple spaces to single space; remove trailing/leading spaces

RPattern Matching

R Problem Overview


I want to merge multiple spaces into single space(space could be tab also) and remove trailing/leading spaces.

For example...

string <- "Hi        buddy        what's up    Bro" 

to

"Hi buddy what's up bro"

I checked the solution given at https://stackoverflow.com/questions/1981349/regex-to-replace-multiple-spaces-with-a-single-space. Note that don't put \t or \n as exact space inside the toy string and feed that as pattern in gsub. I want that in R.

Note that I am unable to put multiple space in toy string. Thanks

R Solutions


Solution 1 - R

This seems to meet your needs.

string <- "  Hi buddy   what's up   Bro "
library(stringr)
str_replace(gsub("\\s+", " ", str_trim(string)), "B", "b")
# [1] "Hi buddy what's up bro"

Solution 2 - R

Or simply try the squish function from stringr

library(stringr)
string <- "  Hi buddy   what's up   Bro "
str_squish(string)
# [1] "Hi buddy what's up Bro"

Solution 3 - R

Another approach using a single regex:

gsub("(?<=[\\s])\\s*|^\\s+|\\s+$", "", string, perl=TRUE)

Explanation (from)

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    [\s]                     any character of: whitespace (\n, \r,
                             \t, \f, and " ")
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

Solution 4 - R

You do not need to import external libraries to perform such a task:

string <- " Hi        buddy        what's up    Bro "
string <- gsub("\\s+", " ", string)
string <- trimws(string)
string
[1] "Hi buddy what's up Bro"

Or, in one line:

string <- trimws(gsub("\\s+", " ", string))

Much cleaner.

Solution 5 - R

The qdapRegex has the rm_white function to handle this:

library(qdapRegex)
rm_white(string)

## [1] "Hi buddy what's up Bro"

Solution 6 - R

You could also try clean from qdap

library(qdap)
library(stringr)
str_trim(clean(string))
#[1] "Hi buddy what's up Bro"

Or as suggested by @Tyler Rinker (using only qdap)

Trim(clean(string))
#[1] "Hi buddy what's up Bro"

Solution 7 - R

For this purpose no need to load any extra libraries as the gsub() of Base r package does the work.
No need to remember those extra libraries. Remove leading and trailing white spaces with trimws() and replace the extra white spaces using gsub() as mentioned by @Adam Erickson.

    `string = " Hi        buddy        what's up    Bro "
     trimws(gsub("\\s+", " ", string))`

Here \\s+ matches one or more white spaces and gsub replaces it with single space.

To know what any regular expression is doing, do visit this link as mentioned by @Tyler Rinker.
Just copy and paste the regular expression you want to know what it is doing and this will do the rest.

Solution 8 - R

Another solution using strsplit:

Splitting text into words, and, then, concatenating single words using paste function.

string <- "Hi        buddy        what's up    Bro" 
stringsplit <- sapply(strsplit(string, " "), function(x){x[!x ==""]})
paste(stringsplit ,collapse = " ")

For more than one document:

string <- c("Hi        buddy        what's up    Bro"," an  example using       strsplit ") 
stringsplit <- lapply(strsplit(string, " "), function(x){x[!x ==""]})
sapply(stringsplit ,function(d) paste(d,collapse = " "))

enter image description here

Solution 9 - R

This seems to work.
It doesn't eliminate whitespaces at the beginning or the end of the sentence as Rich Scriven's answer but, it merge multiple whitespices

library("stringr")
string <- "Hi     buddy     what's      up       Bro"
str_replace_all(string, "\\s+", " ")
#> str_replace_all(string, "\\s+", " ")
#  "Hi buddy what's up Bro"

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionCKMView Question on Stackoverflow
Solution 1 - RRich ScrivenView Answer on Stackoverflow
Solution 2 - RHenrikView Answer on Stackoverflow
Solution 3 - RTyler RinkerView Answer on Stackoverflow
Solution 4 - RAdam EricksonView Answer on Stackoverflow
Solution 5 - RTyler RinkerView Answer on Stackoverflow
Solution 6 - RakrunView Answer on Stackoverflow
Solution 7 - Rheisenbug47View Answer on Stackoverflow
Solution 8 - RSam SView Answer on Stackoverflow
Solution 9 - Ralejandro00View Answer on Stackoverflow