Light C Unicode Library

CUnicodeUtf 8

C Problem Overview


I'm looking for a small C library to handle utf8 strings.

Specifically, splitting based on unicode delimiters for use with stemming algorithms.

Related posts have suggested:

ICU http://www.icu-project.org/ (I found it too bulky for my purposes on embedded devices)

UTF8-CPP: http://utfcpp.sourceforge.net/ (Excellent, but C++ not C)

Has anyone found any platform independent, small codebase libraries for handling unicode strings (doesn't need to do naturalisation).

C Solutions


Solution 1 - C

A nice, light, library which I use successfully is utf8proc.

Solution 2 - C

There's also MicroUTF-8, but it may require login credentials to view or download the source.

Solution 3 - C

UTF-8 is specially designed so that many byte-oriented string functions continue to work or only need minor modifications.

C's strstr function, for instance, will work perfectly as long as both its inputs are valid, null-terminated UTF-8 strings. strcpy works fine as long as its input string starts at a character boundary (for instance the return value of strstr).

So you may not even need a separate library!

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAkuseteView Question on Stackoverflow
Solution 1 - CAviView Answer on Stackoverflow
Solution 2 - CxenuView Answer on Stackoverflow
Solution 3 - CArteliusView Answer on Stackoverflow