How to split String with some separator but without removing that separator in Java?

JavaRegexStringSplit

Java Problem Overview


I'm facing problem in splitting String.

I want to split a String with some separator but without losing that separator.

When we use somestring.split(String separator) method in Java it splits the String but removes the separator part from String. I don't want this to happen.

I want result like below:

String string1="Ram-sita-laxman";
String seperator="-";
string1.split(seperator);

Output:

[Ram, sita, laxman]

but I want the result like the one below instead:

[Ram, -sita, -laxman]

Is there a way to get output like this?

Java Solutions


Solution 1 - Java

string1.split("(?=-)");

This works because split actually takes a regular expression. What you're actually seeing is a "zero-width positive lookahead".

I would love to explain more but my daughter wants to play tea party. :)

Edit: Back!

To explain this, I will first show you a different split operation:

"Ram-sita-laxman".split("");

This splits your string on every zero-length string. There is a zero-length string between every character. Therefore, the result is:

["", "R", "a", "m", "-", "s", "i", "t", "a", "-", "l", "a", "x", "m", "a", "n"]

Now, I modify my regular expression ("") to only match zero-length strings if they are followed by a dash.

"Ram-sita-laxman".split("(?=-)");
["Ram", "-sita", "-laxman"]

In that example, the ?= means "lookahead". More specifically, it mean "positive lookahead". Why the "positive"? Because you can also have negative lookahead (?!) which will split on every zero-length string that is not followed by a dash:

"Ram-sita-laxman".split("(?!-)");
["", "R", "a", "m-", "s", "i", "t", "a-", "l", "a", "x", "m", "a", "n"]

You can also have positive lookbehind (?<=) which will split on every zero-length string that is preceded by a dash:

"Ram-sita-laxman".split("(?<=-)");
["Ram-", "sita-", "laxman"]

Finally, you can also have negative lookbehind (?<!) which will split on every zero-length string that is not preceded by a dash:

"Ram-sita-laxman".split("(?<!-)");
["", "R", "a", "m", "-s", "i", "t", "a", "-l", "a", "x", "m", "a", "n"]

These four expressions are collectively known as the lookaround expressions.

Bonus: Putting them together

I just wanted to show an example I encountered recently that combines two of the lookaround expressions. Suppose you wish to split a CapitalCase identifier up into its tokens:

"MyAwesomeClass" => ["My", "Awesome", "Class"]

You can accomplish this using this regular expression:

"MyAwesomeClass".split("(?<=[a-z])(?=[A-Z])");

This splits on every zero-length string that is preceded by a lower case letter ((?<=[a-z])) and followed by an upper case letter ((?=[A-Z])).

This technique also works with camelCase identifiers.

Solution 2 - Java

It's a bit dodgy, but you could introduce a dummy separator using a replace function. I don't know the Java methods, but in C# it could be something like:

string1.Replace("-", "#-").Split("#");

Of course, you'd need to pick a dummy separator that's guaranteed not to be anywhere else in the string.

Solution 3 - Java

Adam hit the nail on the head! I used his answer to figure out how to insert filename text from the file dialog browser into a rich text box. The problem I ran into was when I was adding a new line at the "" in the file string. The string.split command was splitting at the \ and deleting it. After using a mixture of Adam's code I was able to create a new line after each \ in the file name.

Here is the code I used:

OpenFileDialog fd = new OpenFileDialog();
        fd.Multiselect = true;
        fd.ShowDialog();

        foreach (string filename in fd.FileNames)
        {
            string currentfiles = uxFiles.Text;
            string value = "\r\n" + filename;

     //This line allows the Regex command to split after each \ in the filename. 

            string[] lines = Regex.Split(value, @"(?<=\\)");

            foreach (string line in lines)
            {
                uxFiles.Text = uxFiles.Text + line + "\r\n";
            }
        }

Enjoy!

Walrusking

Solution 4 - Java

A way to do this is to split your string, then add your separator at the beginning of each extracted string except the first one.

Solution 5 - Java

seperator="-";
String[] splitstrings = string1.split(seperator);
for(int i=1; i<splitstring.length;i++)
{
   splitstring[i] = seperator + splitstring[i];
}

that is the code fitting to LadaRaider's answer.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionsagView Question on Stackoverflow
Solution 1 - JavaAdam PaynterView Answer on Stackoverflow
Solution 2 - JavaAndrew CooperView Answer on Stackoverflow
Solution 3 - JavaWalruskingView Answer on Stackoverflow
Solution 4 - JavaDalmasView Answer on Stackoverflow
Solution 5 - JavamadView Answer on Stackoverflow