JS regex to split by line

JavascriptRegexNewline

Javascript Problem Overview


How do you split a long piece of text into separate lines? Why does this return line1 twice?

/^(.*?)$/mg.exec('line1\r\nline2\r\n');

> ["line1", "line1"]

I turned on the multi-line modifier to make ^ and $ match beginning and end of lines. I also turned on the global modifier to capture all lines.

I wish to use a regex split and not String.split because I'll be dealing with both Linux \n and Windows \r\n line endings.

Javascript Solutions


Solution 1 - Javascript

arrayOfLines = lineString.match(/[^\r\n]+/g);

As Tim said, it is both the entire match and capture. It appears regex.exec(string) returns on finding the first match regardless of global modifier, wheras string.match(regex) is honouring global.

Solution 2 - Javascript

Use

result = subject.split(/\r?\n/);

Your regex returns line1 twice because line1 is both the entire match and the contents of the first capturing group.

Solution 3 - Javascript

I am assuming following constitute newlines

  1. \r followed by \n
  2. \n followed by \r
  3. \n present alone
  4. \r present alone

Please Use

var re=/\r\n|\n\r|\n|\r/g;

arrayofLines=lineString.replace(re,"\n").split("\n");

for an array of all Lines including the empty ones.

OR

Please Use

arrayOfLines = lineString.match(/[^\r\n]+/g); 

For an array of non empty Lines

Solution 4 - Javascript

Even simpler regex that handles all line ending combinations, even mixed in the same file, and removes empty lines as well:

var lines = text.split(/[\r\n]+/g);

With whitespace trimming:

var lines = text.trim().split(/\s*[\r\n]+\s*/g);

Solution 5 - Javascript

First replace all \r\n with \n, then String.split.

Solution 6 - Javascript

Unicode Compliant Line Splitting

Unicode® Technical Standard #18 defines what constitutes line boundaries. That same section also gives a regular expression to match all line boundaries. Using that regex, we can define the following JS function that splits a given string at any line boundary (preserving empty lines as well as leading and trailing whitespace):

const splitLines = s => s.split(/\r\n|(?!\r\n)[\n-\r\x85\u2028\u2029]/)

I don't understand why the negative look-ahead part ((?!\r\n)) is necessary, but that is what is suggested in the Unicode document 路‍♂️.

The above document recommends to define a regular expression meta-character for matching all line ending characters and sequences. Perl has \R for that. Unfortunately, JavaScript does not include such a meta-character. Alas, I could not even find a TC39 proposal for that.

Solution 7 - Javascript

http://jsfiddle.net/uq55en5o/

var lines = text.match(/^.*((\r\n|\n|\r)|$)/gm);

I have done something like this. Above link is my fiddle.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJoJoView Question on Stackoverflow
Solution 1 - JavascriptReactiveRavenView Answer on Stackoverflow
Solution 2 - JavascriptTim PietzckerView Answer on Stackoverflow
Solution 3 - JavascriptArup HoreView Answer on Stackoverflow
Solution 4 - JavascriptciscoheatView Answer on Stackoverflow
Solution 5 - JavascriptTimView Answer on Stackoverflow
Solution 6 - JavascriptraphinesseView Answer on Stackoverflow
Solution 7 - JavascriptAbhijit_SrikumarView Answer on Stackoverflow