Light C Unicode Library
CUnicodeUtf 8C Problem Overview
I'm looking for a small C library to handle utf8 strings.
Specifically, splitting based on unicode delimiters for use with stemming algorithms.
Related posts have suggested:
ICU http://www.icu-project.org/ (I found it too bulky for my purposes on embedded devices)
UTF8-CPP: http://utfcpp.sourceforge.net/ (Excellent, but C++ not C)
Has anyone found any platform independent, small codebase libraries for handling unicode strings (doesn't need to do naturalisation).
C Solutions
Solution 1 - C
A nice, light, library which I use successfully is utf8proc.
Solution 2 - C
There's also MicroUTF-8, but it may require login credentials to view or download the source.
Solution 3 - C
UTF-8 is specially designed so that many byte-oriented string functions continue to work or only need minor modifications.
C's strstr
function, for instance, will work perfectly as long as both its inputs are valid, null-terminated UTF-8 strings. strcpy
works fine as long as its input string starts at a character boundary (for instance the return value of strstr
).
So you may not even need a separate library!