How to split a string in Haskell?
StringHaskellString Problem Overview
Is there a standard way to split a string in Haskell?
lines
and words
work great from splitting on a space or newline, but surely there is a standard way to split on a comma?
I couldn't find it on Hoogle.
To be specific, I'm looking for something where split "," "my,comma,separated,list"
returns ["my","comma","separated","list"]
.
String Solutions
Solution 1 - String
Remember that you can look up the definition of Prelude functions!
http://www.haskell.org/onlinereport/standard-prelude.html
Looking there, the definition of words
is,
words :: String -> [String]
words s = case dropWhile Char.isSpace s of
"" -> []
s' -> w : words s''
where (w, s'') = break Char.isSpace s'
So, change it for a function that takes a predicate:
wordsWhen :: (Char -> Bool) -> String -> [String]
wordsWhen p s = case dropWhile p s of
"" -> []
s' -> w : wordsWhen p s''
where (w, s'') = break p s'
Then call it with whatever predicate you want!
main = print $ wordsWhen (==',') "break,this,string,at,commas"
Solution 2 - String
There is a package for this called split.
cabal install split
Use it like this:
ghci> import Data.List.Split
ghci> splitOn "," "my,comma,separated,list"
["my","comma","separated","list"]
It comes with a lot of other functions for splitting on matching delimiters or having several delimiters.
Solution 3 - String
If you use Data.Text, there is splitOn:
http://hackage.haskell.org/packages/archive/text/0.11.2.0/doc/html/Data-Text.html#v:splitOn
This is built in the Haskell Platform.
So for instance:
import qualified Data.Text as T
main = print $ T.splitOn (T.pack " ") (T.pack "this is a test")
or:
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Text as T
main = print $ T.splitOn " " "this is a test"
Solution 4 - String
In the module Text.Regex (part of the Haskell Platform), there is a function:
splitRegex :: Regex -> String -> [String]
which splits a string based on a regular expression. The API can be found at Hackage.
Solution 5 - String
Use Data.List.Split
, which uses split
:
[me@localhost]$ ghci
Prelude> import Data.List.Split
Prelude Data.List.Split> let l = splitOn "," "1,2,3,4"
Prelude Data.List.Split> :t l
l :: [[Char]]
Prelude Data.List.Split> l
["1","2","3","4"]
Prelude Data.List.Split> let { convert :: [String] -> [Integer]; convert = map read }
Prelude Data.List.Split> let l2 = convert l
Prelude Data.List.Split> :t l2
l2 :: [Integer]
Prelude Data.List.Split> l2
[1,2,3,4]
Solution 6 - String
Without importing anything a straight substitution of one character for a space, the target separator for words
is a space. Something like:
words [if c == ',' then ' ' else c|c <- "my,comma,separated,list"]
or
words let f ',' = ' '; f c = c in map f "my,comma,separated,list"
You can make this into a function with parameters. You can eliminate the parameter character-to-match my matching many, like in:
[if elem c ";,.:-+@!$#?" then ' ' else c|c <-"my,comma;separated!list"]
Solution 7 - String
Try this one:
import Data.List (unfoldr)
separateBy :: Eq a => a -> [a] -> [[a]]
separateBy chr = unfoldr sep where
sep [] = Nothing
sep l = Just . fmap (drop 1) . break (== chr) $ l
Only works for a single char, but should be easily extendable.
Solution 8 - String
split :: Eq a => a -> [a] -> [[a]]
split d [] = []
split d s = x : split d (drop 1 y) where (x,y) = span (/= d) s
E.g.
split ';' "a;bb;ccc;;d"
> ["a","bb","ccc","","d"]
A single trailing delimiter will be dropped:
split ';' "a;bb;ccc;;d;"
> ["a","bb","ccc","","d"]
Solution 9 - String
I find this simpler to understand:
split :: Char -> String -> [String]
split c xs = case break (==c) xs of
(ls, "") -> [ls]
(ls, x:rs) -> ls : split c rs
Solution 10 - String
I started learning Haskell yesterday, so correct me if I'm wrong but:
split :: Eq a => a -> [a] -> [[a]]
split x y = func x y [[]]
where
func x [] z = reverse $ map (reverse) z
func x (y:ys) (z:zs) = if y==x then
func x ys ([]:(z:zs))
else
func x ys ((y:z):zs)
gives:
*Main> split ' ' "this is a test"
["this","is","a","test"]
or maybe you wanted
*Main> splitWithStr " and " "this and is and a and test"
["this","is","a","test"]
which would be:
splitWithStr :: Eq a => [a] -> [a] -> [[a]]
splitWithStr x y = func x y [[]]
where
func x [] z = reverse $ map (reverse) z
func x (y:ys) (z:zs) = if (take (length x) (y:ys)) == x then
func x (drop (length x) (y:ys)) ([]:(z:zs))
else
func x ys ((y:z):zs)
Solution 11 - String
I don’t know how to add a comment onto Steve’s answer, but I would like to recommend the
GHC libraries documentation,
and in there specifically the
Sublist functions in Data.List
Which is much better as a reference, than just reading the plain Haskell report.
Generically, a fold with a rule on when to create a new sublist to feed, should solve it too.
Solution 12 - String
Example in the ghci:
> import qualified Text.Regex as R
> R.splitRegex (R.mkRegex "x") "2x3x777"
> ["2","3","777"]
Solution 13 - String
In addition to the efficient and pre-built functions given in answers I'll add my own which are simply part of my repertory of Haskell functions I was writing to learn the language on my own time:
-- Correct but inefficient implementation
wordsBy :: String -> Char -> [String]
wordsBy s c = reverse (go s []) where
go s' ws = case (dropWhile (\c' -> c' == c) s') of
"" -> ws
rem -> go ((dropWhile (\c' -> c' /= c) rem)) ((takeWhile (\c' -> c' /= c) rem) : ws)
-- Breaks up by predicate function to allow for more complex conditions (\c -> c == ',' || c == ';')
wordsByF :: String -> (Char -> Bool) -> [String]
wordsByF s f = reverse (go s []) where
go s' ws = case ((dropWhile (\c' -> f c')) s') of
"" -> ws
rem -> go ((dropWhile (\c' -> (f c') == False)) rem) (((takeWhile (\c' -> (f c') == False)) rem) : ws)
The solutions are at least tail-recursive so they won't incur a stack overflow.
Solution 14 - String
I am far late but would like to add it here for those interested, if you're looking for a simple solution without relying on any bloated packages:
split :: String -> String -> [String]
split _ "" = []
split delim str =
split' "" str []
where
dl = length delim
split' :: String -> String -> [String] -> [String]
split' h t f
| dl > length t = f ++ [h ++ t]
| delim == take dl t = split' "" (drop dl t) (f ++ [h])
| otherwise = split' (h ++ take 1 t) (drop 1 t) f