How to index a String in Rust

StringIndexingRust

String Problem Overview


I am attempting to index a string in Rust, but the compiler throws an error. My code (Project Euler problem 4, playground):

fn is_palindrome(num: u64) -> bool {
    let num_string = num.to_string();
    let num_length = num_string.len();

    for i in 0 .. num_length / 2 {
        if num_string[i] != num_string[(num_length - 1) - i] {
            return false;
        }
    }
    
    true
}

The error:

error[E0277]: the trait bound `std::string::String: std::ops::Index<usize>` is not satisfied
 --> <anon>:7:12
  |
7 |         if num_string[i] != num_string[(num_length - 1) - i] {
  |            ^^^^^^^^^^^^^
  |
  = note: the type `std::string::String` cannot be indexed by `usize`

Is there a reason why String can not be indexed? How can I access the data then?

String Solutions


Solution 1 - String

Yes, indexing into a string is not available in Rust. The reason for this is that Rust strings are encoded in UTF-8 internally, so the concept of indexing itself would be ambiguous, and people would misuse it: byte indexing is fast, but almost always incorrect (when your text contains non-ASCII symbols, byte indexing may leave you inside a character, which is really bad if you need text processing), while char indexing is not free because UTF-8 is a variable-length encoding, so you have to traverse the entire string to find the required code point.

If you are certain that your strings contain ASCII characters only, you can use the as_bytes() method on &str which returns a byte slice, and then index into this slice:

let num_string = num.to_string();

// ...

let b: u8 = num_string.as_bytes()[i];
let c: char = b as char;  // if you need to get the character as a unicode code point

If you do need to index code points, you have to use the char() iterator:

num_string.chars().nth(i).unwrap()

As I said above, this would require traversing the entire iterator up to the ith code element.

Finally, in many cases of text processing, it is actually necessary to work with grapheme clusters rather than with code points or bytes. With the help of the unicode-segmentation crate, you can index into grapheme clusters as well:

use unicode_segmentation::UnicodeSegmentation

let string: String = ...;
UnicodeSegmentation::graphemes(&string, true).nth(i).unwrap()

Naturally, grapheme cluster indexing has the same requirement of traversing the entire string as indexing into code points.

Solution 2 - String

The correct approach to doing this sort of thing in Rust is not indexing but iteration. The main problem here is that Rust's strings are encoded in UTF-8, a variable-length encoding for Unicode characters. Being variable in length, the memory position of the nth character can't determined without looking at the string. This also means that accessing the nth character has a runtime of O(n)!

In this special case, you can iterate over the bytes, because your string is known to only contain the characters 0–9 (iterating over the characters is the more general solution but is a little less efficient).

Here is some idiomatic code to achieve this (playground):

fn is_palindrome(num: u64) -> bool {
    let num_string = num.to_string();
    let half = num_string.len() / 2;
    
    num_string.bytes().take(half).eq(num_string.bytes().rev().take(half))
}

We go through the bytes in the string both forwards (num_string.bytes().take(half)) and backwards (num_string.bytes().rev().take(half)) simultaneously; the .take(half) part is there to halve the amount of work done. We then simply compare one iterator to the other one to ensure at each step that the nth and nth last bytes are equivalent; if they are, it returns true; if not, false.

Solution 3 - String

If what you are looking for is something similar to an index, you can use

.chars() and .nth() on a string.


.chars() -> Returns an iterator over the chars of a string slice.

.nth() -> Returns the nth element of the iterator, in an Option


Now you can use the above in several ways, for example:

let s: String = String::from("abc");
//If you are sure
println!("{}", s.chars().nth(x).unwrap());
//or if not
println!("{}", s.chars().nth(x).expect("message"));

Solution 4 - String

You can convert a String or &str to a vec of a chars and then index that vec.

For example:

fn main() {
    let s = "Hello world!";
    let my_vec: Vec<char> = s.chars().collect();
    println!("my_vec[0]: {}", my_vec[0]);
    println!("my_vec[1]: {}", my_vec[1]);
}
        

Here you have a live example

Solution 5 - String

The bellow code works fine, not sure about performance and O complexity and hopefully someone can add more information about this solution.

fn is_palindrome(num: u64) -> bool {
    let num_string = String::from(num.to_string());
    let num_length = num_string.len();
    for i in 0..num_length / 2 {
        let left = &num_string[i..i + 1];
        let right = &num_string[((num_length - 1) - i)..num_length - i];
        if left != right {
            return false;
        }
    }
    true
}

Solution 6 - String

Indexing on String is not allowed because (please check the book):

  • it is not clear what the indexed value should be: a byte, a character, or a grapheme cluster (which we call a letter in common sense)
  • strings are vectors of bytes (u8) encoded with UTF-8 and UTF-8 is a variable length encoding, i.e. every character can take different number of bytes - from 1 to 4. So to get a character or grapheme cluster by index would require a whole string traversal (O(n) in average and the worst cases) from the beginning to determine valid bytes bounds of the character or the grapheme.

So if you input doesn't contain diacritics (considered as a separate character) and it's ok to approximate letter with character, you can use chars() iterator and DoubleEndedIterator trait for two pointers approach:

    fn is_palindrome(num: u64) -> bool {
        let s = num.to_string();
        let mut iterator = s.chars();
        loop  {
            let ch = iterator.next();
            let ch_end = iterator.next_back();
            
            if ch.is_none() || ch_end.is_none() {
                break;
            }
            if ch.unwrap() != ch_end.unwrap() {
                return false
            }
        }
        true
    }

Solution 7 - String

this is not suitable for all uses by any means, but if you just need to reference the previous character (or, with a little rework, the next character), then it's possible to do so without iterating through the entire str.

the scenario here is that there is a str slice, string, and pattern was found in the slice. i want to know the character immediately before the pattern.

call prev_char like prev_char(string.as_bytes(), pattern_index) where pattern index is the index of the first byte of pattern in string.

utf-8 encoding is well defined and this works just by backing up until it finds one of the starting bytes (either high order bit 0 or bits 11) and then converting that 1-4 byte [u8] slice to a str.

this code just unwraps it because the pattern was found in a valid utf-8 str to begin with, so no error is possible. if your data has not been validated it might be best to return a result rather than an Option.

enum PrevCharStates {
    Start,
    InEncoding,
}

fn prev_char(bytes: &[u8], starting_index: usize) -> Option<&str> {
    let mut ix = starting_index;
    let mut state = PrevCharStates::Start;

    while ix > 0 {
        ix -= 1;
        let byte = bytes[ix];
        match state {
            PrevCharStates::Start => {
                if byte & 0b10000000 == 0 {
                    return Some(std::str::from_utf8(&bytes[ix..starting_index]).unwrap());
                } else if byte & 0b11000000 == 0b10000000 {
                    state = PrevCharStates::InEncoding;
                }
            },
            PrevCharStates::InEncoding => {
                if byte & 0b11000000 == 0b11000000 {
                    return Some(std::str::from_utf8(&bytes[ix..starting_index]).unwrap());
                } else if byte & 0b11000000 != 0b10000000 {
                    return None;
                }
            }
        }
    }
    None
}

Solution 8 - String

There are two reasons indexing is not working in Rust:

  • In rust, strings are stored as a collection of utf-8 encoded bytes. In memory, strings are just collections of 1's and 0's. a program needs to be able to interpret those 1's and 0's and print out the correct characters. that's where encoding comes into play.

       fn main(){
           let sample:String=String::from("2bytesPerChar")
           // we could this in higher programming languages. in rust we get error. cannot be indexed by an integer
           let c:char=sample[0]
       }
    

String is a collection of bytes. so what is the lenght of our "2bytesPerChar". Because some chars can be 1 to 4 bytes long. Assume that first character has 2 bytes. If you want to get the first char in string, using the indexing, hello[0] will specify the first byte which is the only half of the first string.

  • Another reason is there are 3 relevant ways a word in represented in unicode: Bytes, scalar values, grapheme clusters. If we use indexing rust does not know what we will receive. Bytes, scalar value or grapheme clusters. so we have to use more specific methods.

How to access the characters in String

  • Return bytes

       for b in "dsfsd".bytes(){
           // bytes method returns a collection of bytes and here we are iterating over every byte and printing it out
           println!("{}",b)
       }
    
  • Return scalar values:

   // we could iterate over scalar values using char methods
   for c in "kjdskj".chars(){
       println!("{}",c)
   }
  • return grapheme values:

In order to keep rust standard library lean, the ability iterate over graphene clusters is not included by default. we need to import a crate

// in cargo.toml
   [dependencies]
   unicode-segmentation="1.7.1"

then:

   use unicode_segmentation::UnicodeSegmentation;
   // we pass true to get extended grapheme clusters
   for g in "dada"graphemes(true){
       println!("{}",g)
   }

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSam MyersView Question on Stackoverflow
Solution 1 - StringVladimir MatveevView Answer on Stackoverflow
Solution 2 - StringChris MorganView Answer on Stackoverflow
Solution 3 - StringAngel AngelView Answer on Stackoverflow
Solution 4 - StringiceqingView Answer on Stackoverflow
Solution 5 - StringAbderrahmen HanafiView Answer on Stackoverflow
Solution 6 - StringMaksim RyndinView Answer on Stackoverflow
Solution 7 - StringbmacnaughtonView Answer on Stackoverflow
Solution 8 - StringYilmazView Answer on Stackoverflow