What is the easiest/best/most correct way to iterate through the characters of a string in Java?

JavaStringIterationCharacterTokenize

Java Problem Overview


Some ways to iterate through the characters of a string in Java are:

  1. Using StringTokenizer?
  2. Converting the String to a char[] and iterating over that.

What is the easiest/best/most correct way to iterate?

Java Solutions


Solution 1 - Java

I use a for loop to iterate the string and use charAt() to get each character to examine it. Since the String is implemented with an array, the charAt() method is a constant time operation.

String s = "...stuff...";

for (int i = 0; i < s.length(); i++){
    char c = s.charAt(i);        
    //Process char
}

That's what I would do. It seems the easiest to me.

As far as correctness goes, I don't believe that exists here. It is all based on your personal style.

Solution 2 - Java

Two options

for(int i = 0, n = s.length() ; i < n ; i++) { 
    char c = s.charAt(i); 
}

or

for(char c : s.toCharArray()) {
    // process c
}

The first is probably faster, then 2nd is probably more readable.

Solution 3 - Java

Note most of the other techniques described here break down if you're dealing with characters outside of the BMP (Unicode Basic Multilingual Plane), i.e. code points that are outside of the u0000-uFFFF range. This will only happen rarely, since the code points outside this are mostly assigned to dead languages. But there are some useful characters outside this, for example some code points used for mathematical notation, and some used to encode proper names in Chinese.

In that case your code will be:

String str = "....";
int offset = 0, strLen = str.length();
while (offset < strLen) {
  int curChar = str.codePointAt(offset);
  offset += Character.charCount(curChar);
  // do something with curChar
}

The Character.charCount(int) method requires Java 5+.

Source: http://mindprod.com/jgloss/codepoint.html

Solution 4 - Java

In Java 8 we can solve it as:

String str = "xyz";
str.chars().forEachOrdered(i -> System.out.print((char)i));
str.codePoints().forEachOrdered(i -> System.out.print((char)i));

The method chars() returns an IntStream as mentioned in doc:

> Returns a stream of int zero-extending the char values from this > sequence. Any char which maps to a surrogate code point is passed > through uninterpreted. If the sequence is mutated while the stream is > being read, the result is undefined.

The method codePoints() also returns an IntStream as per doc:

> Returns a stream of code point values from this sequence. Any > surrogate pairs encountered in the sequence are combined as if by > Character.toCodePoint and the result is passed to the stream. Any > other code units, including ordinary BMP characters, unpaired > surrogates, and undefined code units, are zero-extended to int values > which are then passed to the stream.

How is char and code point different? As mentioned in this article:

> Unicode 3.1 added supplementary characters, bringing the total number > of characters to more than the 2^16 = 65536 characters that can be > distinguished by a single 16-bit char. Therefore, a char value no > longer has a one-to-one mapping to the fundamental semantic unit in > Unicode. JDK 5 was updated to support the larger set of character > values. Instead of changing the definition of the char type, some of > the new supplementary characters are represented by a surrogate pair > of two char values. To reduce naming confusion, a code point will be > used to refer to the number that represents a particular Unicode > character, including supplementary ones.

Finally why forEachOrdered and not forEach ?

The behaviour of forEach is explicitly nondeterministic where as the forEachOrdered performs an action for each element of this stream, in the encounter order of the stream if the stream has a defined encounter order. So forEach does not guarantee that the order would be kept. Also check this question for more.

For difference between a character, a code point, a glyph and a grapheme check this question.

Solution 5 - Java

I agree that StringTokenizer is overkill here. Actually I tried out the suggestions above and took the time.

My test was fairly simple: create a StringBuilder with about a million characters, convert it to a String, and traverse each of them with charAt() / after converting to a char array / with a CharacterIterator a thousand times (of course making sure to do something on the string so the compiler can't optimize away the whole loop :-) ).

The result on my 2.6 GHz Powerbook (that's a mac :-) ) and JDK 1.5:

  • Test 1: charAt + String --> 3138msec
  • Test 2: String converted to array --> 9568msec
  • Test 3: StringBuilder charAt --> 3536msec
  • Test 4: CharacterIterator and String --> 12151msec

As the results are significantly different, the most straightforward way also seems to be the fastest one. Interestingly, charAt() of a StringBuilder seems to be slightly slower than the one of String.

BTW I suggest not to use CharacterIterator as I consider its abuse of the '\uFFFF' character as "end of iteration" a really awful hack. In big projects there's always two guys that use the same kind of hack for two different purposes and the code crashes really mysteriously.

Here's one of the tests:

	int count = 1000;
	...
	
	System.out.println("Test 1: charAt + String");
	long t = System.currentTimeMillis();
	int sum=0;
	for (int i=0; i<count; i++) {
		int len = str.length();
		for (int j=0; j<len; j++) {
			if (str.charAt(j) == 'b')
				sum = sum + 1;
		}
	}
	t = System.currentTimeMillis()-t;
	System.out.println("result: "+ sum + " after " + t + "msec");

Solution 6 - Java

There are some dedicated classes for this:

import java.text.*;

final CharacterIterator it = new StringCharacterIterator(s);
for(char c = it.first(); c != CharacterIterator.DONE; c = it.next()) {
   // process c
   ...
}

Solution 7 - Java

If you have Guava on your classpath, the following is a pretty readable alternative. Guava even has a fairly sensible custom List implementation for this case, so this shouldn't be inefficient.

for(char c : Lists.charactersOf(yourString)) {
	// Do whatever you want		
}

UPDATE: As @Alex noted, with Java 8 there's also CharSequence#chars to use. Even the type is IntStream, so it can be mapped to chars like:

yourString.chars()
        .mapToObj(c -> Character.valueOf((char) c))
        .forEach(c -> System.out.println(c)); // Or whatever you want

Solution 8 - Java

If you need to iterate through the code points of a String (see this answer) a shorter / more readable way is to use the CharSequence#codePoints method added in Java 8:

for(int c : string.codePoints().toArray()){
    ...
}

or using the stream directly instead of a for loop:

string.codePoints().forEach(c -> ...);

There is also CharSequence#chars if you want a stream of the characters (although it is an IntStream, since there is no CharStream).

Solution 9 - Java

I wouldn't use StringTokenizer as it is one of classes in the JDK that's legacy.

The javadoc says:

> StringTokenizer is a legacy class that > is retained for compatibility reasons > although its use is discouraged in new > code. It is recommended that anyone > seeking this functionality use the > split method of String or the > java.util.regex package instead.

Solution 10 - Java

If you need performance, then you must test on your environment. No other way.

Here example code:

int tmp = 0;
String s = new String(new byte[64*1024]);
{
	long st = System.nanoTime();
	for(int i = 0, n = s.length(); i < n; i++) {
		tmp += s.charAt(i);
	}
	st = System.nanoTime() - st;
	System.out.println("1 " + st);
}

{
	long st = System.nanoTime();
	char[] ch = s.toCharArray();
	for(int i = 0, n = ch.length; i < n; i++) {
		tmp += ch[i];
	}
	st = System.nanoTime() - st;
	System.out.println("2 " + st);
}
{
	long st = System.nanoTime();
	for(char c : s.toCharArray()) {
		tmp += c;
	}
	st = System.nanoTime() - st;
	System.out.println("3 " + st);
}
System.out.println("" + tmp);

On Java online I get:

1 10349420
2 526130
3 484200
0

On Android x86 API 17 I get:

1 9122107
2 13486911
3 12700778
0

Solution 11 - Java

public class Main {

public static void main(String[] args) {
     String myStr = "Hello";
     String myStr2 = "World";
      
     for (int i = 0; i < myStr.length(); i++) {    
            char result = myStr.charAt(i);
                 System.out.println(result);
     } 
        
     for (int i = 0; i < myStr2.length(); i++) {    
            char result = myStr2.charAt(i);
                 System.out.print(result);              
     }    
   }
}

Output:

H
e
l
l
o
World

Solution 12 - Java

See [The Java Tutorials: Strings][1].

public class StringDemo {
	public static void main(String[] args) {
		String palindrome = "Dot saw I was Tod";
		int len = palindrome.length();
		char[] tempCharArray = new char[len];
		char[] charArray = new char[len];
		
		// put original string in an array of chars
		for (int i = 0; i < len; i++) {
			tempCharArray[i] = palindrome.charAt(i);
		} 
		
		// reverse array of chars
		for (int j = 0; j < len; j++) {
			charArray[j] = tempCharArray[len - 1 - j];
		}
		
		String reversePalindrome =  new String(charArray);
		System.out.println(reversePalindrome);
	}
}

Put the length into int len and use for loop. [1]: http://java.sun.com/docs/books/tutorial/java/data/strings.html

Solution 13 - Java

StringTokenizer is totally unsuited to the task of breaking a string into its individual characters. With String#split() you can do that easily by using a regex that matches nothing, e.g.:

String[] theChars = str.split("|");

But StringTokenizer doesn't use regexes, and there's no delimiter string you can specify that will match the nothing between characters. There is one cute little hack you can use to accomplish the same thing: use the string itself as the delimiter string (making every character in it a delimiter) and have it return the delimiters:

StringTokenizer st = new StringTokenizer(str, str, true);

However, I only mention these options for the purpose of dismissing them. Both techniques break the original string into one-character strings instead of char primitives, and both involve a great deal of overhead in the form of object creation and string manipulation. Compare that to calling charAt() in a for loop, which incurs virtually no overhead.

Solution 14 - Java

Elaborating on this answer and this answer.

Above answers point out the problem of many of the solutions here which don't iterate by code point value -- they would have trouble with any surrogate chars. The java docs also outline the issue here (see "Unicode Character Representations"). Anyhow, here's some code that uses some actual surrogate chars from the supplementary Unicode set, and converts them back to a String. Note that .toChars() returns an array of chars: if you're dealing with surrogates, you'll necessarily have two chars. This code should work for any Unicode character.

    String supplementary = "Some Supplementary: 𠜎𠜱𠝹𠱓";
    supplementary.codePoints().forEach(cp -> 
            System.out.print(new String(Character.toChars(cp))));

Solution 15 - Java

This Example Code will Help you out!

import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.TreeMap;
 
public class Solution {
	public static void main(String[] args) {
		HashMap<String, Integer> map = new HashMap<String, Integer>();
		map.put("a", 10);
		map.put("b", 30);
		map.put("c", 50);
		map.put("d", 40);
		map.put("e", 20);
		System.out.println(map);
 
		Map sortedMap = sortByValue(map);
		System.out.println(sortedMap);
	}
 
	public static Map sortByValue(Map unsortedMap) {
		Map sortedMap = new TreeMap(new ValueComparator(unsortedMap));
		sortedMap.putAll(unsortedMap);
		return sortedMap;
	}
 
}
 
class ValueComparator implements Comparator {
	Map map;
 
	public ValueComparator(Map map) {
		this.map = map;
	}
 
	public int compare(Object keyA, Object keyB) {
		Comparable valueA = (Comparable) map.get(keyA);
		Comparable valueB = (Comparable) map.get(keyB);
		return valueB.compareTo(valueA);
	}
}

Solution 16 - Java

So typically there are two ways to iterate through string in java which has already been answered by multiple people here in this thread, just adding my version of it First is using

String s = sc.next() // assuming scanner class is defined above
for(int i=0; i<s.length(); i++){
     s.charAt(i)   // This being the first way and is a constant time operation will hardly add any overhead
  }

char[] str = new char[10];
str = s.toCharArray() // this is another way of doing so and it takes O(n) amount of time for copying contents from your string class to the character array

If performance is at stake then I will recommend using the first one in constant time, if it is not then going with the second one makes your work easier considering the immutability with string classes in java.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionPaul WicksView Question on Stackoverflow
Solution 1 - JavajjnguyView Answer on Stackoverflow
Solution 2 - JavaDave CheneyView Answer on Stackoverflow
Solution 3 - Javask.View Answer on Stackoverflow
Solution 4 - Javaakhil_mittalView Answer on Stackoverflow
Solution 5 - JavaView Answer on Stackoverflow
Solution 6 - JavaBruno De FraineView Answer on Stackoverflow
Solution 7 - JavaToukoView Answer on Stackoverflow
Solution 8 - JavaAlex - GlassEditor.comView Answer on Stackoverflow
Solution 9 - JavaAlanView Answer on Stackoverflow
Solution 10 - JavaEnybyView Answer on Stackoverflow
Solution 11 - JavaunpluggeDloopView Answer on Stackoverflow
Solution 12 - JavaEugene YokotaView Answer on Stackoverflow
Solution 13 - JavaAlan MooreView Answer on Stackoverflow
Solution 14 - JavaHawkeye ParkerView Answer on Stackoverflow
Solution 15 - JavadevDeejayView Answer on Stackoverflow
Solution 16 - JavaSumit KapoorView Answer on Stackoverflow