Split a string on whitespace in Go?

RegexGo

Regex Problem Overview


Given an input string such as " word1 word2 word3 word4 ", what would be the best approach to split this as an array of strings in Go? Note that there can be any number of spaces or unicode-spacing characters between each word.

In Java I would just use someString.trim().split("\\s+").

(Note: possible duplicate https://stackoverflow.com/questions/4466091/split-string-using-regular-expression-in-go doesn't give any good quality answer. Please provide an actual example, not just a link to the regexp or strings packages reference.)

Regex Solutions


Solution 1 - Regex

The strings package has a Fields method.

someString := "one    two   three four "

words := strings.Fields(someString)

fmt.Println(words, len(words)) // [one two three four] 4

DEMO: http://play.golang.org/p/et97S90cIH

From the docs:

> Fields splits the string s around each instance of one or more consecutive white space characters, as defined by unicode.IsSpace, returning a slice of substrings of s or an empty slice if s contains only white space.

Solution 2 - Regex

If you're using tip: regexp.Split

func (re *Regexp) Split(s string, n int) []string

Split slices s into substrings separated by the expression and returns a slice of the substrings between those expression matches.

The slice returned by this method consists of all the substrings of s not contained in the slice returned by FindAllString. When called on an expression that contains no metacharacters, it is equivalent to strings.SplitN.

Example:

s := regexp.MustCompile("a*").Split("abaabaccadaaae", 5)
// s: ["", "b", "b", "c", "cadaaae"]

The count determines the number of substrings to return:

n > 0: at most n substrings; the last substring will be the unsplit remainder.
n == 0: the result is nil (zero substrings)
n < 0: all substrings

Solution 3 - Regex

I came up with the following, but that seems a bit too verbose:

import "regexp"
r := regexp.MustCompile("[^\\s]+")
r.FindAllString("  word1   word2 word3   word4  ", -1)

which will evaluate to:

[]string{"word1", "word2", "word3", "word4"}

Is there a more compact or more idiomatic expression?

Solution 4 - Regex

You can use package strings function split strings.Split(someString, " ")

strings.Split

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionralfoideView Question on Stackoverflow
Solution 1 - RegexI Hate LazyView Answer on Stackoverflow
Solution 2 - RegexzzzzView Answer on Stackoverflow
Solution 3 - RegexralfoideView Answer on Stackoverflow
Solution 4 - Regexuser2368285View Answer on Stackoverflow