Data.Text vs String

StringHaskellText

String Problem Overview


While the general opinion of the Haskell community seems to be that it's always better to use Text instead of String, the fact that still the APIs of most of maintained libraries are String-oriented confuses the hell out of me. On the other hand, there are notable projects, which consider String as a mistake altogether and provide a Prelude with all instances of String-oriented functions replaced with their Text-counterparts.

So are there any reasons for people to keep writing String-oriented APIs except backwards- and standard Prelude-compatibility and the "switch-making intertia"? Are there possibly any other drawbacks to Text as compared to String?

Particularly, I'm interested in this because I'm designing a library and trying to decide which type to use to express error messages.

String Solutions


Solution 1 - String

My unqualified guess is that most library writers don't want to add more dependencies than necessary. Since strings are part of literally every Haskell distribution (it's part of the language standard!), it is a lot easier to get adopted if you use strings and don't require your users to sort out Text distributions from hackage.

It's one of those "design mistakes" that you just have to live with unless you can convince most of the community to switch over night. Just look at how long it has taken to get Applicative to be a superclass of Monad – a relatively minor but much wanted change – and imagine how long it would take to replace all the String things with Text.


To answer your more specific question: I would go with String unless you get noticeable performance benefits by using Text. Error messages are usually rather small one-off things so it shouldn't be a big problem to use String.

On the other hand, if you are the kind of ideological purist that eschews pragmatism for idealism, go with Text.


* I put design mistakes in scare quotes because strings as a list-of-chars is a neat property that makes them easy to reason about and integrate with other existing list-operating functions.

Solution 2 - String

If your API is targeted at processing large amounts of character oriented data and/or various encodings, then your API should use Text.

If your API is primarily for dealing with small one-off strings, then using the built-in String type should be fine.

Using String for large amounts of text will make applications using your API consume significantly more memory. Using it with foreign encodings could seriously complicate usage depending on how your API works.

String is quite expensive (at least 5N words where N is the number of Char in the String). A word is same number of bits as the processor architecture (ex. 32 bits or 64 bits): http://blog.johantibell.com/2011/06/memory-footprints-of-some-common-data.html

Solution 3 - String

There are at least three reasons to use [Char] in small projects.

  1. [Char] does not rely on any arcane staff, like foreign pointers, raw memory, raw arrays, etc that may work differently on different platforms or even be unavailable altogether

  2. [Char] is the lingua franka in haskell. There are at least three 'efficient' ways to handle unicode data in haskell: utf8-bytestring, Data.Text.Text and Data.Vector.Unboxed.Vector Char, each requiring dealing with extra package.

  3. by using [Char] one gains access to all power of [] monad, including many specific functions (alternative string packages do try to help with it, but still)

Personally, I consider utf16-based Data.Text one of the most questionable desicions of the haskell community, since utf16 combines flaws of both utf8 and utf32 encoding while having none of their benefits.

Solution 4 - String

I wonder if Data.Text is always more efficient than Data.String???

"cons" for instance is O(1) for Strings and O(n) for Text. Append is O(n) for Strings and O(n+m) for strict Text's. Likewise,

    let foo = "foo" ++ bigchunk
        bar = "bar" ++ bigchunk

is more space efficient for Strings than for strict Texts.

Other issue not related to efficiency is pattern matching (perspicuous code) and lazyness (predictably per-character in Strings, somehow implementation dependent in lazy Text).

Text's are obviously good for static character sequences and for in-place modification. For other forms of structural editing, Data.String might have advantages.

Solution 5 - String

I do not think there is a single technical reason for String to remain. And I can see several ones for it to go.

Overall I would first argue that in the Text/String case there is only one best solution :

  • String performances are bad, everyone agrees on that

  • Text is not difficult to use. All functions commonly used on String are available on Text, plus some useful more in the context of strings (substitution, padding, encoding)

  • having two solutions creates unnecessary complexity unless all base functions are made polymorphic. Proof : there are SO questions on the subject of automatic conversions. So this is a problem.

So one solution is less complex than two, and the shortcomings of String will make it disappear eventually. The sooner the better !

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNikita VolkovView Question on Stackoverflow
Solution 1 - StringkqrView Answer on Stackoverflow
Solution 2 - StringAlain O'DeaView Answer on Stackoverflow
Solution 3 - StringpermeakraView Answer on Stackoverflow
Solution 4 - StringPassing ByView Answer on Stackoverflow
Solution 5 - StringTitouView Answer on Stackoverflow