Could string comparisons really differ based on culture when the string is guaranteed not to change?

C#ResharperConfiguration FilesCultureinfoString Comparison

C# Problem Overview


I'm reading encrypted credentials/connection strings from a config file. Resharper tells me, "String.IndexOf(string) is culture-specific here" on this line:

if (line.Contains("host=")) {
	_host = line.Substring(line.IndexOf(
		"host=") + "host=".Length, line.Length - "host=".Length);

...and so wants to change it to:

if (line.Contains("host=")) {
	_host = line.Substring(line.IndexOf("host=", System.StringComparison.Ordinal) + "host=".Length, line.Length - 	"host=".Length);

The value I'm reading will always be "host=" regardless of where the app may be deployed. Is it really sensible to add this "System.StringComparison.Ordinal" bit?

More importantly, could it hurt anything (to use it)?

C# Solutions


Solution 1 - C#

Absolutely. Per MSDN (http://msdn.microsoft.com/en-us/library/d93tkzah.aspx),

> This method performs a word (case-sensitive and culture-sensitive) > search using the current culture.

So you may get different results if you run it under a different culture (via regional and language settings in Control Panel).

In this particular case, you probably won't have a problem, but throw an i in the search string and run it in Turkey and it will probably ruin your day.

See MSDN: http://msdn.microsoft.com/en-us/library/ms973919.aspx

> These new recommendations and APIs exist to alleviate misguided assumptions about the behavior of default string APIs. The canonical > example of bugs emerging where non-linguistic string data is > interpreted linguistically is the "Turkish-I" problem. > > For nearly all Latin alphabets, including U.S. English, the character > i (\u0069) is the lowercase version of the character I (\u0049). This > casing rule quickly becomes the default for someone programming in > such a culture. However, in Turkish ("tr-TR"), there exists a capital > "i with a dot," character (\u0130), which is the capital version of > i. Similarly, in Turkish, there is a lowercase "i without a dot," or > (\u0131), which capitalizes to I. This behavior occurs in the Azeri > culture ("az") as well. > > Therefore, assumptions normally made about capitalizing i or > lowercasing I are not valid among all cultures. If the default > overloads for string comparison routines are used, they will be > subject to variance between cultures. For non-linguistic data, as in > the following example, this can produce undesired results:

    Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US")
Console.WriteLine("Culture = {0}",
   Thread.CurrentThread.CurrentCulture.DisplayName);
Console.WriteLine("(file == FILE) = {0}", 
   (String.Compare("file", "FILE", true) == 0));

Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");
Console.WriteLine("Culture = {0}",
   Thread.CurrentThread.CurrentCulture.DisplayName);
Console.WriteLine("(file == FILE) = {0}", 
   (String.Compare("file", "FILE", true) == 0));

> Because of the difference of the comparison of I, results of the > comparisons change when the thread culture is changed. This is the > output:

Culture = English (United States)
(file == FILE) = True
Culture = Turkish (Turkey)
(file == FILE) = False

Here is an example without case:

var s1 = "é"; //é as one character (ALT+0233)
var s2 = "é"; //'e', plus combining acute accent U+301 (two characters)

Console.WriteLine(s1.IndexOf(s2, StringComparison.Ordinal)); //-1
Console.WriteLine(s1.IndexOf(s2, StringComparison.InvariantCulture)); //0
Console.WriteLine(s1.IndexOf(s2, StringComparison.CurrentCulture)); //0

Solution 2 - C#

http://msdn.microsoft.com/en-us/library/bb385972.aspx">CA1309: UseOrdinalStringComparison

It doesn't hurt to not use it, but "by explicitly setting the parameter to either the StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase, your code often gains speed, increases correctness, and becomes more reliable.".


What exactly is Ordinal, and why does it matter to your case?

> An operation that uses ordinal sort rules performs a comparison based > on the numeric value (Unicode code point) of each Char in the string. > An ordinal comparison is fast but culture-insensitive. When you use > ordinal sort rules to sort strings that start with Unicode characters > (U+), the string U+xxxx comes before the string U+yyyy if the value of > xxxx is numerically less than yyyy.

And, as you stated... the string value you are reading in is not culture sensitive, so it makes sense to use an Ordinal comparison as opposed to a Word comparison. Just remember, Ordinal means "this isn't culture sensitive".

Solution 3 - C#

To answer your specific question: No, but a static analysis tool is not going to be able to realize that your input value will never have locale-specific information in it.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionB. Clay Shannon-B. Crow RavenView Question on Stackoverflow
Solution 1 - C#Mark SowulView Answer on Stackoverflow
Solution 2 - C#myermianView Answer on Stackoverflow
Solution 3 - C#500 - Internal Server ErrorView Answer on Stackoverflow