How do I check if a string is entirely made of the same substring?

JavascriptStringAlgorithm

Javascript Problem Overview


I have to create a function which takes a string, and it should return true or false based on whether the input consists of a repeated character sequence. The length of the given string is always greater than 1 and the character sequence must have at least one repetition.

"aa" // true(entirely contains two strings "a")
"aaa" //true(entirely contains three string "a")
"abcabcabc" //true(entirely containas three strings "abc")

"aba" //false(At least there should be two same substrings and nothing more)
"ababa" //false("ab" exists twice but "a" is extra so false)

I have created the below function:

function check(str){
  if(!(str.length && str.length - 1)) return false;
  let temp = '';
  for(let i = 0;i<=str.length/2;i++){
    temp += str[i]
    //console.log(str.replace(new RegExp(temp,"g"),''))
    if(!str.replace(new RegExp(temp,"g"),'')) return true;
  }
  return false;
}

console.log(check('aa')) //true
console.log(check('aaa')) //true
console.log(check('abcabcabc')) //true
console.log(check('aba')) //false
console.log(check('ababa')) //false

Checking of this is part of the real problem. I can't afford a non-efficient solution like this. First of all, it's looping through half of the string.

The second problem is that it is using replace() in each loop which makes it slow. Is there a better solution regarding performance?

Javascript Solutions


Solution 1 - Javascript

There’s a nifty little theorem about strings like these.

> A string consists of the same pattern repeated multiple times if and only if the string is a nontrivial rotation of itself.

Here, a rotation means deleting some number of characters from the front of the string and moving them to the back. For example, the string hello could be rotated to form any of these strings:

hello (the trivial rotation)
elloh 
llohe 
lohel 
ohell 

To see why this works, first, assume that a string consists of k repeated copies of a string w. Then deleting the first copy of the repeated pattern (w) from the front of the string and tacking it onto the back will give back the same string. The reverse direction is a bit trickier to prove, but the idea is that if you rotate a string and get back what you started with, you can apply that rotation repeatedly to tile the string with multiple copies of the same pattern (that pattern being the string you needed to move to the end to do the rotation).

Now the question is how to check whether this is the case. For that, there’s another beautiful theorem we can use:

> If x and y are strings of the same length, then x is a rotation of y if and only if x is a substring of yy.

As an example, we can see that lohel is a rotation of hello as follows:

hellohello
   ^^^^^

In our case, we know that every string x will always be a substring of xx (it’ll appear twice, once at each copy of x). So basically we just need to check if our string x is a substring of xx without allowing it to match at the first or halfway character. Here’s a one-liner for that:

function check(str) {
    return (str + str).indexOf(str, 1) !== str.length;
}

Assuming indexOf is implemented using a fast string matching algorithm, this will run in time O(n), where n is the length of the input string.

Hope this helps!

Solution 2 - Javascript

You can do it by a capturing group and backreference. Just check it's the repetition of the first captured value.

function check(str) {
  return /^(.+)\1+$/.test(str)
}

console.log(check('aa')) //true
console.log(check('aaa')) //true
console.log(check('abcabcabc')) //true
console.log(check('aba')) //false
console.log(check('ababa')) //false

In the above RegExp:

  1. ^ and $ stands for start and end anchors to predict the position.
  2. (.+) captures any pattern and captures the value(except \n).
  3. \1 is backreference of first captured value and \1+ would check for repetition of captured value.

Regex explanation here

For RegExp debugging use: https://regex101.com/r/pqlAuP/1/debugger

https://jsperf.com/reegx-and-loop/1 -->

Performance : https://jsperf.com/reegx-and-loop/13

Solution 3 - Javascript

Perhaps the fastest algorithmic approach is building a Z-function in linear time:

> The Z-function for this string is an array of length n where the i-th > element is equal to the greatest number of characters starting from > the position i that coincide with the first characters of s. > > In other words, z[i] is the length of the longest common prefix > between s and the suffix of s starting at i.

C++ implementation for reference:

vector<int> z_function(string s) {
    int n = (int) s.length();
    vector<int> z(n);
    for (int i = 1, l = 0, r = 0; i < n; ++i) {
        if (i <= r)
            z[i] = min (r - i + 1, z[i - l]);
        while (i + z[i] < n && s[z[i]] == s[i + z[i]])
            ++z[i];
        if (i + z[i] - 1 > r)
            l = i, r = i + z[i] - 1;
    }
    return z;
}

JavaScript implementation
Added optimizations - building a half of z-array and early exit

function z_function(s) {
  var n = s.length;
  var z = Array(n).fill(0);
  var i, l, r;
  //for our task we need only a half of z-array
  for (i = 1, l = 0, r = 0; i <= n/2; ++i) {
    if (i <= r)
      z[i] = Math.min(r - i + 1, z[i - l]);
    while (i + z[i] < n && s[z[i]] == s[i + z[i]])
      ++z[i];

      //we can check condition and return here
     if (z[i] + i === n && n % i === 0) return true;
    
    if (i + z[i] - 1 > r)
      l = i, r = i + z[i] - 1;
  }
  return false; 
  //return z.some((zi, i) => (i + zi) === n && n % i === 0);
}
console.log(z_function("abacabacabac"));
console.log(z_function("abcab"));

Then you need to check indexes i that divide n. If you find such i that i+z[i]=n then the string s can be compressed to the length i and you can return true.

For example, for

string s= 'abacabacabac'  with length n=12`

z-array is

(0, 0, 1, 0, 8, 0, 1, 0, 4, 0, 1, 0)

and we can find that for

i=4
i+z[i] = 4 + 8 = 12 = n
and
n % i = 12 % 4 = 0`

so s might be represented as substring of length 4 repeated three times.

Solution 4 - Javascript

I read the answer of gnasher729 and implemented it. The idea is that if there are any repetitions, then there must be (also) a prime number of repetitions.

function* primeFactors (n) {
    for (var k = 2; k*k <= n; k++) {
        if (n % k == 0) {
            yield k
            do {n /= k} while (n % k == 0)
        }
    }
    if (n > 1) yield n
}
 
function check (str) {
    var n = str.length
    primeloop:
    for (var p of primeFactors(n)) {
        var l = n/p
        var s = str.substring(0, l)
        for (var j=1; j<p; j++) {
            if (s != str.substring(l*j, l*(j+1))) continue primeloop
        }
        return true
    }
    return false
}

A slightly different algorithm is this:

function check (str) {
    var n = str.length
    for (var p of primeFactors(n)) {
        var l = n/p
        if (str.substring(0, n-l) == str.substring(l)) return true
    }
    return false
}

I've updated the jsPerf page that contains the algorithms used on this page.

Solution 5 - Javascript

Assume the string S has length N and is made of duplicates of the substring s, then the length of s divides N. For example, if S has length 15, then the substring has length 1, 3, or 5.

Let S be made of (p*q) copies of s. Then S is also made of p copies of (s, repeated q times). We have therefore two cases: If N is prime or 1, then S can only be made of copies of the substring of length 1. If N is composite, then we only need to check substrings s of length N / p for primes p dividing the length of S.

So determine N = the length of S, then find all its prime factors in time O (sqrt (N)). If there is only one factor N, check if S is the same string repeated N times, otherwise for each prime factor p, check if S consists of p repeations of the first N / p characters.

Solution 6 - Javascript

I think a recursive function might be very fast as well. The first observation is that the maximum repeated pattern length is half as long as the total string. And we could just test all possible repeated pattern lengths: 1, 2, 3, ..., str.length/2

The recursive function isRepeating(p,str) tests if this pattern is repeated in str.

If str is longer than the pattern, the recursion requires the first part (same length as p) to be a repetition as well as the remainder of str. So str is effectively broken up into pieces of length p.length.

If the tested pattern and str are of equal size, recursion ends here, successfully.

If the length is different (happens for "aba" and pattern "ab") or if the pieces are different, then false is returned, propagating up the recursion.

function check(str)
{
  if( str.length==1 ) return true; // trivial case
  for( var i=1;i<=str.length/2;i++ ) { // biggest possible repeated pattern has length/2 characters

    if( str.length%i!=0 ) continue; // pattern of size i doesn't fit
    
    var p = str.substring(0, i);
    if( isRepeating(p,str) ) return true;
  }
  return false;
}


function isRepeating(p, str)
{
  if( str.length>p.length ) { // maybe more than 2 occurences

    var left = str.substring(0,p.length);
    var right = str.substring(p.length, str.length);
    return left===p && isRepeating(p,right);
  }
  return str===p; 
}

console.log(check('aa')) //true
console.log(check('aaa')) //true 
console.log(check('abcabcabc')) //true
console.log(check('aba')) //false
console.log(check('ababa')) //false

Performance: https://jsperf.com/reegx-and-loop/13

Solution 7 - Javascript

Wrote this in Python. I know it is not the platform, but it did take 30 mins of time. P.S.=> PYTHON

def checkString(string):
    gap = 1 
    index= 0
    while index < len(string)/2:
        value  = [string[i:i+gap] for i in range(0,len(string),gap) ]
       
        x = [string[:gap]==eachVal for eachVal in value]

        if all(x):
            print("THEY ARE  EQUAL")
            break 

        gap = gap+1
        index= index+1 

checkString("aaeaaeaaeaae")

Solution 8 - Javascript

My approach is similar to gnasher729, in that it uses the potential length of the substring as the main focus, but it is less math-y and process intensive:

L: Length of original string

S: Potential lengths of valid sub-strings

Loop S from (integer part of) L/2 to 1. If L/S is an integer check your original string against the fist S characters of the original string repeated L/S times.

The reason for looping from L/2 backwards and not from 1 onwards is to get the largest possible substring. If you want the smallest possible substring loop from 1 to L/2. Example: "abababab" has both "ab" and "abab" as possible substrings. Which of the two would be faster if you only care about a true/false result depends on the type of strings/substrings this will be applied to.

Solution 9 - Javascript

The following Mathematica code almost detects if the list is repeated at least once. If the string is repeated at least once, it returns true, but it might also return true if the string is a linear combination of repeated strings.

IsRepeatedQ[list_] := Module[{n = Length@list},
   Round@N@Sum[list[[i]] Exp[2 Pi I i/n], {i, n}] == 0
];

This code looks for the "full-length" contribution, which must be zero in a repeating string, but the string accbbd is also considered repeated, as it is a sum of the two repeated strings ababab and 012012.

The idea is to use Fast Fourier Transform, and look for the frequency spectra. By looking at other frequencies, one should be able to detect this strange scenario as well.

Solution 10 - Javascript

The basic idea here is to examine any potential substring, beginning at length 1 and stopping at half of the original string's length. We only look at substring lengths that divide the original string length evenly (i.e. str.length % substring.length == 0).

This implementation looks at the first character of each possible substring iteration before moving to the second character, which might save time if the substrings are expected to be long. If no mismatch is found after examining the entire substring, then we return true.

We return false when we run out of potential substrings to check.

function check(str) {
  const len = str.length;
  for (let subl = 1; subl <= len/2; ++subl) {
    if ((len % subl != 0) || str[0] != str[subl])
      continue;
    
    let i = 1;
    for (; i < subl; ++i)
    {
      let j = 0;
      for (; j < len; j += subl)
        if (str[i] != str[j + i])
          break;
      if (j != len)
        break;
    }
    
    if (i == subl)
      return true;
  }
  return false;
}

console.log(check('aa')) //true
console.log(check('aaa')) //true
console.log(check('abcabcabc')) //true
console.log(check('aba')) //false
console.log(check('ababa')) //false

Solution 11 - Javascript

It's been more than a year since this question was posted but I used length of string and object form to validate whether it is true or false.

const check = (str) => {
  let count = 0;
  let obj = {};
  if (str.length < 2) return false;
  
  for(let i = 0; i < str.length; i++) {
    if (!obj[str[i]]) {
       count+=1;
      obj[str[i]] = 0;
    };
    obj[str[i]] = obj[str[i]] + 1;
  };
  
  if (Object.values(obj).every(item => item === 1)) {
    return false
  };
  
  if ([...str].length%count === 0) {
    return true
  } else {
    return false
  };
};

console.log(check("abcabcabcac")) // false
console.log(check("aaa")) // true
console.log(check("acaca")) // false
console.log(check("aa")) // true
console.log(check("abc")) // false
console.log(check("aabc")) // false

Solution 12 - Javascript

I'm not familiar with JavaScript, so I don't know how fast this is going to be, but here is a linear time solution (assuming reasonable builtin implementation) using only builtins. I'll describe the algorithm in pseudocode.

function check(str) {
    t = str + str;
    find all overlapping occurrences of str in t;
    for each occurrence at position i
        if (i > 0 && i < str.length && str.length % i == 0)
            return true;  // str is a repetition of its first i characters
    return false;
}

The idea is similar to MBo's answer. For each i that divides the length, str is a repetition of its first i characters if and only if it remains the same after shifting for i characters.

It comes to my mind that such a builtin may be unavailable or inefficient. In this case, it is always possible to implement the KMP algorithm manually, which takes about the same amount of code as the algorithm in MBo's answer.

Solution 13 - Javascript

One of the simple ideas is to replace the string with the substring of "" and if any text exist then it is false, else it is true.

'ababababa'.replace(/ab/gi,'')
"a" // return false
'abababab'.replace(/ab/gi,'')
 ""// return true

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMaheer AliView Question on Stackoverflow
Solution 1 - JavascripttemplatetypedefView Answer on Stackoverflow
Solution 2 - JavascriptPranav C BalanView Answer on Stackoverflow
Solution 3 - JavascriptMBoView Answer on Stackoverflow
Solution 4 - Javascriptuser42723View Answer on Stackoverflow
Solution 5 - Javascriptgnasher729View Answer on Stackoverflow
Solution 6 - JavascriptAxel PodehlView Answer on Stackoverflow
Solution 7 - JavascriptJustABeginnerView Answer on Stackoverflow
Solution 8 - JavascriptSunKnight0View Answer on Stackoverflow
Solution 9 - JavascriptPer AlexanderssonView Answer on Stackoverflow
Solution 10 - JavascriptAustin MullinsView Answer on Stackoverflow
Solution 11 - JavascriptGoonGamjaView Answer on Stackoverflow
Solution 12 - Javascriptinfmagic2047View Answer on Stackoverflow
Solution 13 - JavascriptVinod kumar GView Answer on Stackoverflow