Ruby split by whitespace

RubySplitWhitespace

Ruby Problem Overview


How can I write a Ruby function that splits the input by any kind of whitespace, and remove all the whitespace from the result? For example, if the input is

 aa bbb
cc    dd ee

Then return an array ["aa", "bbb", "cc", "dd", "ee"].

Ruby Solutions


Solution 1 - Ruby

This is the default behavior of String#split:

input = <<-TEXT
 aa bbb
cc    dd ee
TEXT

input.split

Result:

["aa", "bbb", "cc", "dd", "ee"]

This works in all versions of Ruby that I tested, including 1.8.7, 1.9.3, 2.0.0, and 2.1.2.

Solution 2 - Ruby

The following should work for the example you gave:

str.gsub(/\s+/m, ' ').strip.split(" ")

it returns:

["aa", "bbb", "cc", "dd", "ee"]

Meaning of code:

/\s+/m is the more complicated part. \s means white space, so \s+ means one ore more white space letters. In the /m part, m is called a modifier, in this case it means, multiline, meaning visit many lines, not just one which is the default behavior. So, /\s+/m means, find sequences of one or more white spaces.

gsub means replace all.

strip is the equivalent of trim in other languages, and removes spaces from the front and end of the string.

As, I was writing the explanation, it could be the case where you do end up with and end-line character at the end or the beginning of the string.

To be safe

The code could be written as:

str.gsub(/\s+/m, ' ').gsub(/^\s+|\s+$/m, '').split(" ")

So if you had:

str = "\n     aa bbb\n    cc    dd ee\n\n"

Then you'd get:

["aa", "bbb", "cc", "dd", "ee"]

Meaning of new code:

^\s+ a sequence of white spaces at the beginning of the string

\s+$ a sequence of white spaces at the end of the string

So gsub(/^\s+|\s+$/m, '') means remove any sequence of white space at the beginning of the string and at the end of the string.

Solution 3 - Ruby

input = <<X
     aa bbb
cc    dd ee
X

input.strip.split(/\s+/)

Solution 4 - Ruby

input.split("\s")

If "\s" is used instead of /\s/, whites-paces will be removed from the result.

Solution 5 - Ruby

As a slight modification to Vidaica's answer, in Ruby 2.1.1 it looks like

input.split(" ")

Will compensate for all whitespace, be it spaces, tabs, or newlines, yielding:

["aa", "bbb", "cc", "dd", "ee"]

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJJ BeckView Question on Stackoverflow
Solution 1 - RubyAjedi32View Answer on Stackoverflow
Solution 2 - RubyCandideView Answer on Stackoverflow
Solution 3 - RubysawaView Answer on Stackoverflow
Solution 4 - RubyvidangView Answer on Stackoverflow
Solution 5 - RubyJ3RNView Answer on Stackoverflow