How to remove carriage returns and new lines in Postgresql?

RegexPostgresql

Regex Problem Overview


All,

I am stuck again trying to get my data in a format that I need it in. I have a text field that looks like this.

> "deangelo 001 deangelo > > local origin of name: italain > > from the american name deangelo > > meaning: of the angels > > emotional spectrum • he is a fountain of joy for all. > > personal integrity • his good name is his most precious asset. > personality • it’s hard to soar with eagles when you’re surrounded by > turkeys! relationships • starts slowly, but a relationship with > deangelo builds over time. travel & leisure • a trip of a lifetime > is in his future. > > career & money • a gifted child, deangelo will need to be > challenged constantly. > > life’s opportunities • joy and happiness await this blessed person. > > deangelo’s lucky numbers: 12 • 38 • 18 • 34 • 29 • 16 > > "

What would the best way be in Postgresql to remove the carriage returns and new lines? I've tried several things and none of them want to behave.

select regexp_replace(field, E'\r\c', '  ', 'g') from mytable
    WHERE id = 5520805582

SELECT regexp_replace(field, E'[^\(\)\&\/,;\*\:.\>\<[:space:]a-zA-Z0-9-]', ' ')
    FROM mytable
    WHERE field~ E'[^\(\)\&\/,;\*\:.\<\>[:space:]a-zA-Z0-9-]'
    AND id = 5520805582;

Thanks in advance, Adam

Regex Solutions


Solution 1 - Regex

select regexp_replace(field, E'[\\n\\r]+', ' ', 'g' )

read the manual http://www.postgresql.org/docs/current/static/functions-matching.html

Solution 2 - Regex

select regexp_replace(field, E'[\\n\\r\\u2028]+', ' ', 'g' )

I had the same problem in my postgres d/b, but the newline in question wasn't the traditional ascii CRLF, it was a unicode line separator, character U2028. The above code snippet will capture that unicode variation as well.

Update... although I've only ever encountered the aforementioned characters "in the wild", to follow lmichelbacher's advice to translate even more [unicode newline-like][1] characters, use this:

select regexp_replace(field, E'[\\n\\r\\f\\u000B\\u0085\\u2028\\u2029]+', ' ', 'g' )

[1]: https://en.wikipedia.org/wiki/Newline#Unicode "newline-like"

Solution 3 - Regex

OP asked specifically about regexes since it would appear there's concern for a number of other characters as well as newlines, but for those just wanting strip out newlines, you don't even need to go to a regex. You can simply do:

select replace(field,E'\n','');

I think this is an SQL-standard behavior, so it should extend back to all but perhaps the very earliest versions of Postgres. The above tested fine for me in 9.4 and 9.2

Solution 4 - Regex

In the case you need to remove line breaks from the begin or end of the string, you may use this:

UPDATE table 
SET field = regexp_replace(field, E'(^[\\n\\r]+)|([\\n\\r]+$)', '', 'g' );

Have in mind that the hat ^ means the begin of the string and the dollar sign $ means the end of the string.

Hope it help someone.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionaeupinhereView Question on Stackoverflow
Solution 1 - RegexvalgogView Answer on Stackoverflow
Solution 2 - RegexpbnelsonView Answer on Stackoverflow
Solution 3 - RegexSeldom 'Where's Monica' NeedyView Answer on Stackoverflow
Solution 4 - RegexBernharView Answer on Stackoverflow