immutable strings vs std::string
C++StringImmutabilityC++ Problem Overview
I've recent been reading about immutable strings https://stackoverflow.com/questions/93091 and https://stackoverflow.com/questions/2365272 as well some stuff about why D chose immutable strings. There seem to be many advantages.
- trivially thread safe
- more secure
- more memory efficient in most use cases.
- cheap substrings (tokenizing and slicing)
Not to mention most new languages have immutable strings, D2.0, Java, C#, Python, etc.
Would C++ benefit from immutable strings?
Is it possible to implement an immutable string class in c++ (or c++0x) that would have all of these advantages?
update:
There are two attempts at immutable strings const_string and fix_str. Neither have been updated in half a decade. Are they even used? Why didn't const_string ever make it into boost?
C++ Solutions
Solution 1 - C++
I found most people in this thread do not really understand what immutable_string
is. It is not only about the constness. The really power of immutable_string
is the performance (even in single thread program) and the memory usage.
Imagine that, if all strings are immutable, and all string are implemented like
class string {
char* _head ;
size_t _len ;
} ;
How can we implement a sub-str operation? We don't need to copy any char. All we have to do is assign the _head
and the _len
. Then the sub-string shares the same memory segment with the source string.
Of course we can not really implement a immutable_string only with the two data members. The real implementation might need a reference-counted(or fly-weighted) memory block. Like this
class immutable_string {
boost::fly_weight<std::string> _s ;
char* _head ;
size_t _len ;
} ;
Both the memory and the performance would be better than the traditional string in most cases, especially when you know what you are doing.
Of course C++ can benefit from immutable string, and it is nice to have one. I have checked the boost::const_string
and the fix_str
mentioned by Cubbi. Those should be what I am talking about.
Solution 2 - C++
As an opinion:
- Yes, I'd quite like an immutable string library for C++.
- No, I would not like std::string to be immutable.
Is it really worth doing (as a standard library feature)? I would say not. The use of const gives you locally immutable strings, and the basic nature of systems programming languages means that you really do need mutable strings.
Solution 3 - C++
My conclusion is that C++ does not require the immutable pattern because it has const semantics.
In Java, if you have a Person
class and you return the String name
of the person with the getName()
method, your only protection is the immutable pattern. If it would not be there you would have to clone()
your strings all night and day (as you have to do with data members that are not typical value-objects, but still needs to be protected).
In C++ you have const std::string& getName() const
. So you can write SomeFunction(person.getName())
where it is like void SomeFunction(const std::string& subject)
.
- No copy happened
- If anyone wants to copy he is free to do so
- Technique applies to all data types, not just strings
Solution 4 - C++
You're certainly not the only person who though that. In fact, there is const_string library by Maxim Yegorushkin, which seems to have been written with inclusion into boost in mind. And here's a little newer library, fix_str by Roland Pibinger. I'm not sure how tricky would full string interning at run-time be, but most of the advantages are achievable when necessary.
Solution 5 - C++
I don't think there's a definitive answer here. It's subjective—if not because personal taste then at least because of the type of code one most often deals with. (Still, a valuable question.)
Immutable strings are great when memory is cheap—this wasn't true when C++ was developed, and it isn't the case on all platforms targeted by C++. (OTOH on more limited platforms C seems much more common than C++, so that argument is weak.)
You can create an immutable string class in C++, and you can make it largely compatible with std::string
—but you will still lose when comparing to a built-in string class with dedicated optimizations and language features.
std::string
is the best standard string we get, so I wouldn't like to see any messing with it. I use it very rarely, though; std::string
has too many drawbacks from my point of view.
Solution 6 - C++
const std::string
There you go. A string literal is also immutable, unless you want to get into undefined behavior.
Edit: Of course that's only half the story. A const string variable isn't useful because you can't make it reference a new string. A reference to a const string would do it, except that C++ won't allow you to reassign a reference as in other languages like Python. The closest thing would be a smart pointer to a dynamically allocated string.
Solution 7 - C++
Immutable strings are great if, whenever it's necessary to create a new a string, the memory manager will always be able to determine determine the whereabouts of every string reference. On most platforms, language support for such ability could be provided at relatively modest cost, but on platforms without such language support built in it's much harder.
If, for example, one wanted to design a Pascal implementation on x86 that supported immutable strings, it would be necessary for the string allocator to be able to walk the stack to find all string references; the only execution-time cost of that would be requiring a consistent function-call approach [e.g. not using tail calls, and having every non-leaf function maintain a frame pointer]. Each memory area allocated with new
would need to have a bit to indicate whether it contained any strings and those that do contain strings would need to have an index to a memory-layout descriptor, but those costs would be pretty slight.
If a GC wasn't table to walk the stack, then it would be necessary to have code use handles rather than pointers, and have code create string handles when local variables come into scope, and destroy the handles when they go out of scope. Much greater overhead.
Solution 8 - C++
Qt also uses immutable strings with copy-on-write.
There is some debate about how much performance it really buys you with decent compilers.
Solution 9 - C++
constant strings make little sense with value semantics, and sharing isn't one of C++'s greatest strengths...
Solution 10 - C++
Strings are mutable in Ruby.
$ irb
>> foo="hello"
=> "hello"
>> bar=foo
=> "hello"
>> foo << "world"
=> "helloworld"
>> print bar
helloworld=> nil
> - trivially thread safe
I would tend to forget safety arguments. If you want to be thread-safe, lock it, or don't touch it. C++ is not a convenient language, have your own conventions.
> - more secure
No. As soon as you have pointer arithmetics and unprotected access to the address space, forget about being secure. Safer against innocently bad coding, yes.
> - more memory efficient in most use cases.
Unless you implement CPU-intensive mechanisms, I don't see how.
> - cheap substrings (tokenizing and slicing)
That would be one very good point. Could be done by referring to a string with backreferences, where modifications to a string would cause a copy. Tokenizing and slicing become free, mutations become expensive.
Solution 11 - C++
C++ strings are thread safe, all immutable objects are guaranteed to be thread safe but Java's StringBuffer is mutable like C++ string is and the both of them are thread safe. Why worry about speed, define your method or function parameters with the const keyword to tell the compiler the string will be immutable in that scope. Also if string object is immutable on demand, waiting when you absolutely need to use the string, in other words, when you append other strings to the main string, you have a list of strings until you actually need the whole string then they are joined together at that point.
immutable and mutable object operate at the same speed to my knowledge , except their methods which is a matter of pro and cons. constant primitives and variable primitives move at different speeds because at the machine level, variables are assigned to a register or a memory space which require a few binary operations, while constants are labels that don't require any of those and are thus faster (or less work is done). works only for primitives and not for object.