Search and replace in bash using regular expressions

RegexBash

Regex Problem Overview


I've seen this example:

hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//[0-9]/}

Which follows this syntax: ${variable//pattern/replacement}

Unfortunately the pattern field doesn't seem to support full regex syntax (if I use . or \s, for example, it tries to match the literal characters).

How can I search/replace a string using full regex syntax?

Regex Solutions


Solution 1 - Regex

Use sed:

MYVAR=ho02123ware38384you443d34o3434ingtod38384day
echo "$MYVAR" | sed -e 's/[a-zA-Z]/X/g' -e 's/[0-9]/N/g'
# prints XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX

Note that the subsequent -e's are processed in order. Also, the g flag for the expression will match all occurrences in the input.

You can also pick your favorite tool using this method, i.e. perl, awk, e.g.:

echo "$MYVAR" | perl -pe 's/[a-zA-Z]/X/g and s/[0-9]/N/g'

This may allow you to do more creative matches... For example, in the snip above, the numeric replacement would not be used unless there was a match on the first expression (due to lazy and evaluation). And of course, you have the full language support of Perl to do your bidding...

Solution 2 - Regex

This actually can be done in pure bash:

hello=ho02123ware38384you443d34o3434ingtod38384day
re='(.*)[0-9]+(.*)'
while [[ $hello =~ $re ]]; do
  hello=${BASH_REMATCH[1]}${BASH_REMATCH[2]}
done
echo "$hello"

...yields...

howareyoudoingtodday

Solution 3 - Regex

These examples also work in bash no need to use sed:

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[a-zA-Z]/X} 
echo ${MYVAR//[0-9]/N}

you can also use the character class bracket expressions

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[[:alpha:]]/X} 
echo ${MYVAR//[[:digit:]]/N}

output

XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX

What @Lanaru wanted to know however, if I understand the question correctly, is why the "full" or PCRE extensions \s\S\w\W\d\D etc don't work as supported in php ruby python etc. These extensions are from Perl-compatible regular expressions (PCRE) and may not be compatible with other forms of shell based regular expressions.

These don't work:

#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//\d/}


#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | sed 's/\d//g'

output with all literal "d" characters removed

ho02123ware38384you44334o3434ingto38384ay

but the following does work as expected

#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | perl -pe 's/\d//g'

output

howareyoudoingtodday

Hope that clarifies things a bit more but if you are not confused yet why don't you try this on Mac OS X which has the REG_ENHANCED flag enabled:

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day;
echo $MYVAR | grep -o -E '\d'

On most flavours of *nix you will only see the following output:

d
d
d

nJoy!

Solution 4 - Regex

If you are making repeated calls and are concerned with performance, This test reveals the BASH method is ~15x faster than forking to sed and likely any other external process.

hello=123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X

P1=$(date +%s)

for i in {1..10000}
do
   echo $hello | sed s/X//g > /dev/null
done

P2=$(date +%s)
echo $[$P2-$P1]

for i in {1..10000}
do
   echo ${hello//X/} > /dev/null
done

P3=$(date +%s)
echo $[$P3-$P2]

Solution 5 - Regex

Use [[:digit:]] (note the double brackets) as the pattern:

$ hello=ho02123ware38384you443d34o3434ingtod38384day
$ echo ${hello//[[:digit:]]/}
howareyoudoingtodday

Just wanted to summarize the answers (especially @nickl-'s https://stackoverflow.com/a/22261334/2916086).

Solution 6 - Regex

I know this is an ancient thread, but it was my first hit on Google, and I wanted to share the following resub that I put together, which adds support for multiple $1, $2, etc. backreferences...

#!/usr/bin/env bash

############################################
###  resub - regex substitution in bash  ###
############################################

resub() {
    local match="$1" subst="$2" tmp

    if [[ -z $match ]]; then
        echo "Usage: echo \"some text\" | resub '(.*) (.*)' '\$2 me \${1}time'" >&2
        return 1
    fi

    ### First, convert "$1" to "$BASH_REMATCH[1]" and 'single-quote' for later eval-ing...

    ### Utility function to 'single-quote' a list of strings
    squot() { local a=(); for i in "$@"; do a+=( $(echo \'${i//\'/\'\"\'\"\'}\' )); done; echo "${a[@]}"; }

    tmp=""
    while [[ $subst =~ (.*)\${([0-9]+)}(.*) ]] || [[ $subst =~ (.*)\$([0-9]+)(.*) ]]; do
        tmp="\${BASH_REMATCH[${BASH_REMATCH[2]}]}$(squot "${BASH_REMATCH[3]}")${tmp}"
        subst="${BASH_REMATCH[1]}"
    done
    subst="$(squot "${subst}")${tmp}"

    ### Now start (globally) substituting

    tmp=""
    while read line; do
        counter=0
        while [[ $line =~ $match(.*) ]]; do
            eval tmp='"${tmp}${line%${BASH_REMATCH[0]}}"'"${subst}"
            line="${BASH_REMATCH[$(( ${#BASH_REMATCH[@]} - 1 ))]}"
        done
        echo "${tmp}${line}"
    done
}

resub "$@"

##################
###  EXAMPLES  ###
##################

###  % echo "The quick brown fox jumps quickly over the lazy dog" | resub quick slow
###    The slow brown fox jumps slowly over the lazy dog

###  % echo "The quick brown fox jumps quickly over the lazy dog" | resub 'quick ([^ ]+) fox' 'slow $1 sheep'
###    The slow brown sheep jumps quickly over the lazy dog

###  % animal="sheep"
###  % echo "The quick brown fox 'jumps' quickly over the \"lazy\" \$dog" | resub 'quick ([^ ]+) fox' "\"\$low\" \${1} '$animal'"
###    The "$low" brown 'sheep' 'jumps' quickly over the "lazy" $dog

###  % echo "one two three four five" | resub "one ([^ ]+) three ([^ ]+) five" 'one $2 three $1 five'
###    one four three two five

###  % echo "one two one four five" | resub "one ([^ ]+) " 'XXX $1 '
###    XXX two XXX four five

###  % echo "one two three four five one six three seven eight" | resub "one ([^ ]+) three ([^ ]+) " 'XXX $1 YYY $2 '
###    XXX two YYY four five XXX six YYY seven eight

H/T to @Charles Duffy re: (.*)$match(.*)

Solution 7 - Regex

This example in the input hello ugly world it searches for the regex bad|ugly and replaces it with nice

#!/bin/bash

# THIS FUNCTION NEEDS THREE PARAMETERS
# arg1 = input              Example:  hello ugly world
# arg2 = search regex       Example:  bad|ugly
# arg3 = replace            Example:  nice
function regex_replace()
{
  # $1 = hello ugly world
  # $2 = bad|ugly
  # $3 = nice

  # REGEX
  re="(.*?)($2)(.*)"

  if [[ $1 =~ $re ]]; then
    # if there is a match
    
    # ${BASH_REMATCH[0]} = hello ugly world
    # ${BASH_REMATCH[1]} = hello 
    # ${BASH_REMATCH[2]} = ugly
    # ${BASH_REMATCH[3]} = world    

    # hello + nice + world
    echo ${BASH_REMATCH[1]}$3${BASH_REMATCH[3]}
  else    
    # if no match return original input  hello ugly world
    echo "$1"
  fi    
}

# prints 'hello nice world'
regex_replace 'hello ugly world' 'bad|ugly' 'nice'

# to save output to a variable
x=$(regex_replace 'hello ugly world' 'bad|ugly' 'nice')
echo "output of replacement is: $x"
exit

Solution 8 - Regex

Set the var

hello=ho02123ware38384you443d34o3434ingtod38384day

then, echo with regex replacement on var

echo ${hello//[[:digit:]]/}

and this will print:

howareyoudoingtodday

Extra - if you'd like the opposite (to get the digit characters)

echo ${hello//[![:digit:]]/}

and this will print:

021233838444334343438384

Solution 9 - Regex

You can use python. This will be not efficient, but gets the job done with a bit more flexible syntax.

apply on file

The following pythonscript will replace "FROM" (but not "notFrom") with "TO".

regex_replace.py

import sys
import re

for line in sys.stdin:
    line = re.sub(r'(?<!not)FROM', 'TO', line)
    sys.stdout.write(line)

You can apply that on a text file, like

$ cat test.txt
bla notFROM
FROM FROM
bla bla
FROM bla

bla  notFROM FROM

bla FROM
bla bla


$ cat test.txt | python regex_replace.py
bla notFROM
TO TO
bla bla
TO bla

bla  notFROM TO

bla TO
bla bla

apply on variable

#!/bin/bash

hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello

PYTHON_CODE=$(cat <<END
import sys
import re

for line in sys.stdin:
    line = re.sub(r'[0-9]', '', line)
    sys.stdout.write(line)
END
)
echo $hello | python -c "$PYTHON_CODE"

output

ho02123ware38384you443d34o3434ingtod38384day
howareyoudoingtodday

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionLanaruView Question on Stackoverflow
Solution 1 - RegexjheddingsView Answer on Stackoverflow
Solution 2 - RegexCharles DuffyView Answer on Stackoverflow
Solution 3 - Regexnickl-View Answer on Stackoverflow
Solution 4 - RegexJosiah DeWittView Answer on Stackoverflow
Solution 5 - RegexyegeniyView Answer on Stackoverflow
Solution 6 - RegexDabe MurphyView Answer on Stackoverflow
Solution 7 - RegexTono NamView Answer on Stackoverflow
Solution 8 - RegexVladimir DjuricicView Answer on Stackoverflow
Solution 9 - RegexMarkus DutschkeView Answer on Stackoverflow