In C# what is the difference between ToUpper() and ToUpperInvariant()?

C#Internationalization

C# Problem Overview


In C#, what is the difference between ToUpper() and ToUpperInvariant()?

Can you give an example where the results might be different?

C# Solutions


Solution 1 - C#

ToUpper uses the current culture. ToUpperInvariant uses the invariant culture.

The canonical example is Turkey, where the upper case of "i" isn't "I".

Sample code showing the difference:

using System;
using System.Drawing;
using System.Globalization;
using System.Threading;
using System.Windows.Forms;

public class Test
{
    [STAThread]
    static void Main()
    {
        string invariant = "iii".ToUpperInvariant();
        CultureInfo turkey = new CultureInfo("tr-TR");
        Thread.CurrentThread.CurrentCulture = turkey;
        string cultured = "iii".ToUpper();
        
        Font bigFont = new Font("Arial", 40);
        Form f = new Form {
            Controls = {
                new Label { Text = invariant, Location = new Point(20, 20),
                            Font = bigFont, AutoSize = true},
                new Label { Text = cultured, Location = new Point(20, 100),
                            Font = bigFont, AutoSize = true }
            }
        };        
        Application.Run(f);
    }
}

For more on Turkish, see this Turkey Test blog post.

I wouldn't be surprised to hear that there are various other capitalization issues around elided characters etc. This is just one example I know off the top of my head... partly because it bit me years ago in Java, where I was upper-casing a string and comparing it with "MAIL". That didn't work so well in Turkey...

Solution 2 - C#

Jon's answer is perfect. I just wanted to add that ToUpperInvariant is the same as calling ToUpper(CultureInfo.InvariantCulture).

That makes Jon's example a little simpler:

using System;
using System.Drawing;
using System.Globalization;
using System.Threading;
using System.Windows.Forms;

public class Test
{
    [STAThread]
    static void Main()
    {
        string invariant = "iii".ToUpper(CultureInfo.InvariantCulture);
        string cultured = "iii".ToUpper(new CultureInfo("tr-TR"));

        Application.Run(new Form {
            Font = new Font("Times New Roman", 40),
            Controls = { 
                new Label { Text = invariant, Location = new Point(20, 20), AutoSize = true }, 
                new Label { Text = cultured, Location = new Point(20, 100), AutoSize = true }, 
            }
        });
    }
}

I also used New Times Roman because it's a cooler font.

I also set the Form's Font property instead of the two Label controls because the Font property is inherited.

And I reduced a few other lines just because I like compact (example, not production) code.

I really had nothing better to do at the moment.

Solution 3 - C#

String.ToUpper and String.ToLower can give different results given different cultures. The most known example is the Turkish example, for which converting lowercase latin "i" to uppercase, doesn't result in a capitalized latin "I", but in the Turkish "I".

Capitalization of I depending on culture, upper row - lower case letters, lower row - upper case letters

As for me it was confusing even with the above picture (source), I wrote a program (see source code below) to see the exact output for the Turkish example:

# Lowercase letters
Character              | UpperInvariant | UpperTurkish | LowerInvariant | LowerTurkish
English i - i (\u0069) | I (\u0049)     | I (\u0130)   | i (\u0069)     | i (\u0069)
Turkish i - ı (\u0131) | ı (\u0131)     | I (\u0049)   | ı (\u0131)     | ı (\u0131)

# Uppercase letters
Character              | UpperInvariant | UpperTurkish | LowerInvariant | LowerTurkish
English i - I (\u0049) | I (\u0049)     | I (\u0049)   | i (\u0069)     | ı (\u0131)
Turkish i - I (\u0130) | I (\u0130)     | I (\u0130)   | I (\u0130)     | i (\u0069)

As you can see:

  1. Uppercasing lower case letters and lowercasing upper case letters give different results for invariant culture and Turkish culture.
  2. Uppercasing upper case letters and lowercasing lower case letters has no effect, no matter what the culture is.
  3. Culture.CultureInvariant leaves the Turkish characters as is
  4. ToUpper and ToLower are reversible, that is lowercasing a character after uppercasing it, brings it to the original form, as long as for both operations the same culture was used.

According to MSDN, for Char.ToUpper and Char.ToLower Turkish and Azeri are the only affected cultures because they are the only ones with single-character casing differences. For strings, there might be more cultures affected.


Source code of a console application used to generate the output:

using System;
using System.Globalization;
using System.Linq;
using System.Text;

namespace TurkishI
{
	class Program
	{
		static void Main(string[] args)
		{
			var englishI = new UnicodeCharacter('\u0069', "English i");
			var turkishI = new UnicodeCharacter('\u0131', "Turkish i");

			Console.WriteLine("# Lowercase letters");
			Console.WriteLine("Character              | UpperInvariant | UpperTurkish | LowerInvariant | LowerTurkish");
			WriteUpperToConsole(englishI);
			WriteLowerToConsole(turkishI);

			Console.WriteLine("\n# Uppercase letters");
			var uppercaseEnglishI = new UnicodeCharacter('\u0049', "English i");
			var uppercaseTurkishI = new UnicodeCharacter('\u0130', "Turkish i");
			Console.WriteLine("Character              | UpperInvariant | UpperTurkish | LowerInvariant | LowerTurkish");
			WriteLowerToConsole(uppercaseEnglishI);
			WriteLowerToConsole(uppercaseTurkishI);

			Console.ReadKey();
		}

		static void WriteUpperToConsole(UnicodeCharacter character)
		{
			Console.WriteLine("{0,-9} - {1,10} | {2,-14} | {3,-12} | {4,-14} | {5,-12}",
				character.Description,
				character,
				character.UpperInvariant,
				character.UpperTurkish,
				character.LowerInvariant,
				character.LowerTurkish
			);
		}

		static void WriteLowerToConsole(UnicodeCharacter character)
		{
			Console.WriteLine("{0,-9} - {1,10} | {2,-14} | {3,-12} | {4,-14} | {5,-12}",
				character.Description,
				character,
				character.UpperInvariant,
				character.UpperTurkish,
				character.LowerInvariant,
				character.LowerTurkish
			);
		}
	}


	class UnicodeCharacter
	{
		public static readonly CultureInfo TurkishCulture = new CultureInfo("tr-TR");

		public char Character { get; }

		public string Description { get; }

		public UnicodeCharacter(char character) : this(character, string.Empty) {  }

		public UnicodeCharacter(char character, string description)
		{
			if (description == null) {
				throw new ArgumentNullException(nameof(description));
			}

			Character = character;
			Description = description;
		}

		public string EscapeSequence => ToUnicodeEscapeSequence(Character);

		public UnicodeCharacter LowerInvariant => new UnicodeCharacter(Char.ToLowerInvariant(Character));

		public UnicodeCharacter UpperInvariant => new UnicodeCharacter(Char.ToUpperInvariant(Character));

		public UnicodeCharacter LowerTurkish => new UnicodeCharacter(Char.ToLower(Character, TurkishCulture));

		public UnicodeCharacter UpperTurkish => new UnicodeCharacter(Char.ToUpper(Character, TurkishCulture));


		private static string ToUnicodeEscapeSequence(char character)
		{
			var bytes = Encoding.Unicode.GetBytes(new[] {character});
			var prefix = bytes.Length == 4 ? @"\U" : @"\u";
			var hex = BitConverter.ToString(bytes.Reverse().ToArray()).Replace("-", string.Empty);
			return $"{prefix}{hex}";
		}

		public override string ToString()
		{
			return $"{Character} ({EscapeSequence})";
		}
	}
}

Solution 4 - C#

Start with MSDN

http://msdn.microsoft.com/en-us/library/system.string.toupperinvariant.aspx

> The ToUpperInvariant method is > equivalent to > ToUpper(CultureInfo.InvariantCulture)

Just because a capital i is 'I' in English, doesn't always make it so.

Solution 5 - C#

ToUpperInvariant uses the rules from the invariant culture

Solution 6 - C#

there is no difference in english. only in turkish culture a difference can be found.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionLill LanseyView Question on Stackoverflow
Solution 1 - C#Jon SkeetView Answer on Stackoverflow
Solution 2 - C#TergiverView Answer on Stackoverflow
Solution 3 - C#krzychuView Answer on Stackoverflow
Solution 4 - C#CaffGeekView Answer on Stackoverflow
Solution 5 - C#taylonrView Answer on Stackoverflow
Solution 6 - C#StefanvdsView Answer on Stackoverflow