Git and the Umlaut problem on Mac OS X

GitMacosVersioning

Git Problem Overview


Today I discovered a bug for Git on Mac OS X.

For example, I will commit a file with the name überschrift.txt with the German special character Ü at the beginning. From the command git status I get following output.

Users-iMac: user$ git status

On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#	"U\314\210berschrift.txt"
nothing added to commit but untracked files present (use "git add" to track)

It seems that Git 1.7.2 has a problem with German special characters on Mac OS X. Is there a solution to get Git read the file names correct?

Git Solutions


Solution 1 - Git

Enable core.precomposeunicode on the mac

git config --global core.precomposeunicode true

For this to work, you need to have at least Git 1.8.2.

Mountain Lion ships with 1.7.5. To get a newer git either use [git-osx-installer][1] or [homebrew][2] (requires Xcode).

That's it.

[1]: https://code.google.com/p/git-osx-installer/ "dd" [2]: http://mxcl.github.com/homebrew/

Solution 2 - Git

The cause is the different implementation of how the filesystem stores the file name.

In Unicode, Ü can be represented in two ways, one is by Ü alone, the other is by U + "combining umlaut character". A Unicode string can contain both forms, but as it's confusing to have both, the file system normalizes the unicode string by setting every umlauted-U to Ü, or U + "combining umlaut character".

Linux uses the former method, called Normal-Form-Composed (or NFC), and Mac OS X uses the latter method, called Normal-Form-Decomposed (NFD).

Apparently Git doesn't care about this point and simply uses the byte sequence of the filename, which leads to the problem you're having.

The mailing list thread Git, Mac OS X and German special characters has a patch in it so that Git compares the file names after normalization.

Solution 3 - Git

The following put in ~/.gitconfig works for me on 10.12.1 Sierra for UTF-8 names:

precomposeunicode = true
quotepath = false

The first option is needed so that git 'understands' UTF-8 and the second one so that it doesn't escape the characters.

Solution 4 - Git

To make git add file work with umlauts in file names on Mac OS X, you may convert file path strings from composed into canonically decomposed UTF-8 using iconv.

# test case

mkdir testproject
cd testproject

git --version    # git version 1.7.6.1
locale charmap   # UTF-8

git init
file=$'\303\234berschrift.txt'    # composed UTF-8 (Linux-compatible)
touch "$file"
echo 'Hello, world!' > "$file"

# convert composed into canonically decomposed UTF-8
# cf. http://codesnippets.joyent.com/posts/show/12251
# printf '%s' "$file" | iconv -f utf-8 -t utf-8-mac | LC_ALL=C vis -fotc 
#git add "$file"
git add "$(printf '%s' "$file" | iconv -f utf-8 -t utf-8-mac)"  

git commit -a -m 'This is my commit message!'
git show
git status
git ls-files '*'
git ls-files -z '*' | tr '\0' '\n'

touch $'caf\303\251 1' $'caf\303\251 2' $'caf\303\251 3'
git ls-files --other '*'
git ls-files -z --other '*' | tr '\0' '\n'

Solution 5 - Git

Change the repository's OSX-specific core.precomposeunicode flag to true:

git config core.precomposeunicode.true

To make sure new repositories get that flag, also run:

git config --global core.precomposeunicode true

Here is the relevant snippet from the manpage:

> This option is only used by Mac OS implementation of Git. When > core.precomposeunicode=true, Git reverts the unicode decomposition of > filenames done by Mac OS. This is useful when sharing a repository > between Mac OS and Linux or Windows. (Git for Windows 1.7.10 or higher > is needed, or Git under cygwin 1.7). When false, file names are > handled fully transparent by Git, which is backward compatible with > older versions of Git.

Solution 6 - Git

It is correct.

Your filename is in UTF-8, Ü being represented as LATIN CAPITAL LETTER U + COMBINING DIAERESIS (Unicode 0x0308, utf8 0xcc 0x88) instead of LATIN CAPITAL LETTER U WITH DIAERESIS (Unicode 0x00dc, utf8 0xc3 0x9c). The Mac OS X HFS file system decomposes Unicode in a such way. Git in turn shows the octal-escape form of the non-ASCII filename bytes.

Note that Unicode filenames can make your repository non-portable. For example, msysgit has had problems dealing with Unicode filenames.

Solution 7 - Git

I had similar problem with my personal repository, so I wrote a helper script with Python 3. You can grap it here: https://github.com/sjtoik/umlaut-cleaner

The script needs a bit of manual labour, but not much.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Question0xPixelfrostView Question on Stackoverflow
Solution 1 - GitchickenView Answer on Stackoverflow
Solution 2 - GitYujiView Answer on Stackoverflow
Solution 3 - Gitel.nickoView Answer on Stackoverflow
Solution 4 - GitpeteView Answer on Stackoverflow
Solution 5 - Gituser1338062View Answer on Stackoverflow
Solution 6 - GitlaaltoView Answer on Stackoverflow
Solution 7 - GitcrysazView Answer on Stackoverflow