Javascript and regex: split string and keep the separator

JavascriptRegex

Javascript Problem Overview


I have a string:

var string = "aaaaaa<br />&dagger; bbbb<br />&Dagger; cccc"

And I would like to split this string with the delimiter <br /> followed by a special character.

To do that, I am using this:

string.split(/<br \/>&#?[a-zA-Z0-9]+;/g);

I am getting what I need, except that I am losing the delimiter. Here is the example: http://jsfiddle.net/JwrZ6/1/

How can I keep the delimiter?

Javascript Solutions


Solution 1 - Javascript

I was having similar but slight different problem. Anyway, here are examples of three different scenarios for where to keep the deliminator.

"1、2、3".split("、") == ["1", "2", "3"]
"1、2、3".split(/(、)/g) == ["1", "、", "2", "、", "3"]
"1、2、3".split(/(?=、)/g) == ["1", "、2", "、3"]
"1、2、3".split(/(?!、)/g) == ["1、", "2、", "3"]
"1、2、3".split(/(.*?、)/g) == ["", "1、", "", "2、", "3"]

Warning: The fourth will only work to split single characters. ConnorsFan presents an alternative:

// Split a path, but keep the slashes that follow directories
var str = 'Animation/rawr/javascript.js';
var tokens = str.match(/[^\/]+\/?|\//g);

Solution 2 - Javascript

Use (positive) lookahead so that the regular expression asserts that the special character exists, but does not actually match it:

string.split(/<br \/>(?=&#?[a-zA-Z0-9]+;)/g);

See it in action:

var string = "aaaaaa<br />&dagger; bbbb<br />&Dagger; cccc";
console.log(string.split(/<br \/>(?=&#?[a-zA-Z0-9]+;)/g));

Solution 3 - Javascript

If you wrap the delimiter in parantheses it will be part of the returned array.

string.split(/(<br \/>&#?[a-zA-Z0-9]+);/g);
// returns ["aaaaaa", "<br />&dagger;", "bbbb", "<br />&Dagger;", "cccc"]

Depending on which part you want to keep change which subgroup you match

string.split(/(<br \/>)&#?[a-zA-Z0-9]+;/g);
// returns ["aaaaaa", "<br />", "bbbb", "<br />", "cccc"]

You could improve the expression by ignoring the case of letters string.split(/(
)&#?[a-z0-9]+;/gi);

And you can match for predefined groups like this: \d equals [0-9] and \w equals [a-zA-Z0-9_]. This means your expression could look like this.

string.split(/<br \/>(&#?[a-z\d]+;)/gi);

There is a good Regular Expression Reference on JavaScriptKit.

Solution 4 - Javascript

answered it here also https://stackoverflow.com/questions/11230178/javascript-split-regular-expression-keep-the-delimiter/33252569#33252569

use the (?=pattern) lookahead pattern in the regex example

var string = '500x500-11*90~1+1';
string = string.replace(/(?=[$-/:-?{-~!"^_`\[\]])/gi, ",");
string = string.split(",");

this will give you the following result.

[ '500x500', '-11', '*90', '~1', '+1' ]

Can also be directly split

string = string.split(/(?=[$-/:-?{-~!"^_`\[\]])/gi);

giving the same result

[ '500x500', '-11', '*90', '~1', '+1' ]

Solution 5 - Javascript

I made a modification to jichi's answer, and put it in a function which also supports multiple letters.

String.prototype.splitAndKeep = function(separator, method='seperate'){
    var str = this;
    if(method == 'seperate'){
        str = str.split(new RegExp(`(${separator})`, 'g'));
    }else if(method == 'infront'){
        str = str.split(new RegExp(`(?=${separator})`, 'g'));
    }else if(method == 'behind'){
        str = str.split(new RegExp(`(.*?${separator})`, 'g'));
        str = str.filter(function(el){return el !== "";});
    }
    return str;
};

jichi's answers 3rd method would not work in this function, so I took the 4th method, and removed the empty spaces to get the same result.

edit: second method which excepts an array to split char1 or char2

String.prototype.splitAndKeep = function(separator, method='seperate'){
    var str = this;
    function splitAndKeep(str, separator, method='seperate'){
        if(method == 'seperate'){
            str = str.split(new RegExp(`(${separator})`, 'g'));
        }else if(method == 'infront'){
            str = str.split(new RegExp(`(?=${separator})`, 'g'));
        }else if(method == 'behind'){
            str = str.split(new RegExp(`(.*?${separator})`, 'g'));
            str = str.filter(function(el){return el !== "";});
        }
        return str;
    }
    if(Array.isArray(separator)){
        var parts = splitAndKeep(str, separator[0], method);
        for(var i = 1; i < separator.length; i++){
            var partsTemp = parts;
            parts = [];
            for(var p = 0; p < partsTemp.length; p++){
                parts = parts.concat(splitAndKeep(partsTemp[p], separator[i], method));
            }
        }
        return parts;
    }else{
        return splitAndKeep(str, separator, method);
    }
};

usage:

str = "first1-second2-third3-last";

str.splitAndKeep(["1", "2", "3"]) == ["first", "1", "-second", "2", "-third", "3", "-last"];

str.splitAndKeep("-") == ["first1", "-", "second2", "-", "third3", "-", "last"];

Solution 6 - Javascript

If you group the split pattern, its match will be kept in the output and it is by design:

> If separator is a regular expression with capturing parentheses, then > each time separator matches, the results (including any undefined > results) of the capturing parentheses are spliced into the output > array. > > https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split#description

You don't need a lookahead or global flag unless your search pattern uses one.

const str = How much wood would a woodchuck chuck, if a woodchuck could chuck wood?

const result = str.split(/(\s+)/); console.log(result);

// We can verify the result const isSame = result.join('') === str; console.log({ isSame });

You can use multiple groups. You can be as creative as you like and what remains outside the groups will be removed:

const str = How much wood would a woodchuck chuck, if a woodchuck could chuck wood?

const result = str.split(/(\s+)(\w{1,2})\w+/); console.log(result, result.join(''));

Solution 7 - Javascript

An extension function splits string with substring or RegEx and the delimiter is putted according to second parameter ahead or behind.

    String.prototype.splitKeep = function (splitter, ahead) {
        var self = this;
        var result = [];
        if (splitter != '') {
            var matches = [];
            // Getting mached value and its index
            var replaceName = splitter instanceof RegExp ? "replace" : "replaceAll";
            var r = self[replaceName](splitter, function (m, i, e) {
                matches.push({ value: m, index: i });
                return getSubst(m);
            });
            // Finds split substrings
            var lastIndex = 0;
            for (var i = 0; i < matches.length; i++) {
                var m = matches[i];
                var nextIndex = ahead == true ? m.index : m.index + m.value.length;
                if (nextIndex != lastIndex) {
                    var part = self.substring(lastIndex, nextIndex);
                    result.push(part);
                    lastIndex = nextIndex;
                }
            };
            if (lastIndex < self.length) {
                var part = self.substring(lastIndex, self.length);
                result.push(part);
            };
            // Substitution of matched string
            function getSubst(value) {
                var substChar = value[0] == '0' ? '1' : '0';
                var subst = '';
                for (var i = 0; i < value.length; i++) {
                    subst += substChar;
                }
                return subst;
            };
        }
        else {
            result.add(self);
        };
        return result;
    };

The test:

    test('splitKeep', function () {
        // String
        deepEqual("1231451".splitKeep('1'), ["1", "231", "451"]);
        deepEqual("123145".splitKeep('1', true), ["123", "145"]);
        deepEqual("1231451".splitKeep('1', true), ["123", "145", "1"]);
        deepEqual("hello man how are you!".splitKeep(' '), ["hello ", "man ", "how ", "are ", "you!"]);
        deepEqual("hello man how are you!".splitKeep(' ', true), ["hello", " man", " how", " are", " you!"]);
        // Regex
        deepEqual("mhellommhellommmhello".splitKeep(/m+/g), ["m", "hellomm", "hellommm", "hello"]);
        deepEqual("mhellommhellommmhello".splitKeep(/m+/g, true), ["mhello", "mmhello", "mmmhello"]);
    });

Solution 8 - Javascript

I've been using this:

String.prototype.splitBy = function (delimiter) {
  var 
    delimiterPATTERN = '(' + delimiter + ')', 
    delimiterRE = new RegExp(delimiterPATTERN, 'g');
  
  return this.split(delimiterRE).reduce((chunks, item) => {
    if (item.match(delimiterRE)){
      chunks.push(item)
    } else {
      chunks[chunks.length - 1] += item
    };
    return chunks
  }, [])
}

Except that you shouldn't mess with String.prototype, so here's a function version:

var splitBy = function (text, delimiter) {
  var 
    delimiterPATTERN = '(' + delimiter + ')', 
    delimiterRE = new RegExp(delimiterPATTERN, 'g');
  
  return text.split(delimiterRE).reduce(function(chunks, item){
    if (item.match(delimiterRE)){
      chunks.push(item)
    } else {
      chunks[chunks.length - 1] += item
    };
    return chunks
  }, [])
}

So you could do:

var haystack = "aaaaaa<br />&dagger; bbbb<br />&Dagger; cccc"
var needle =  '<br \/>&#?[a-zA-Z0-9]+;';
var result = splitBy(haystack , needle)
console.log( JSON.stringify( result, null, 2) )

And you'll end up with:

[
  "<br />&dagger; bbbb",
  "<br />&Dagger; cccc"
]

Solution 9 - Javascript

Most of the existing answers predate the introduction of lookbehind assertions in JavaScript in 2018. You didn't specify how you wanted the delimiters to be included in the result. One typical use case would be sentences delimited by punctuation ([.?!]), where one would want the delimiters to be included at the ends of the resulting strings. This corresponds to the fourth case in the accepted answer, but as noted there, that solution only works for single characters. Arbitrary strings with the delimiters appended at the end can be formed with a lookbehind assertion:

'It is. Is it? It is!'.split(/(?<=[.?!])/)
/* [ 'It is.', ' Is it?', ' It is!' ] */

Solution 10 - Javascript

I've also came up with this solution. No regex needed, very readable.

const str = "hello world what a great day today balbla"
const separatorIndex = str.indexOf("great")
const parsedString = str.slice(separatorIndex)

console.log(parsedString)

or in one line:

const str = "hello what a great day today balbla" const parsedString = str.indexOf(str.indexOf("great")) console.log(parsedString)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMilos CuculovicView Question on Stackoverflow
Solution 1 - JavascriptjichiView Answer on Stackoverflow
Solution 2 - JavascriptJonView Answer on Stackoverflow
Solution 3 - JavascriptTorsten WalterView Answer on Stackoverflow
Solution 4 - JavascriptFryView Answer on Stackoverflow
Solution 5 - JavascriptSwiftNinjaProView Answer on Stackoverflow
Solution 6 - JavascriptsnnsnnView Answer on Stackoverflow
Solution 7 - JavascriptBerezhView Answer on Stackoverflow
Solution 8 - Javascriptuser2467065View Answer on Stackoverflow
Solution 9 - JavascriptjorikiView Answer on Stackoverflow
Solution 10 - JavascriptDoneDeal0View Answer on Stackoverflow