How to extract a value from a string using regex and a shell?

RegexShell

Regex Problem Overview


I am in shell and I have this string: 12 BBQ ,45 rofl, 89 lol

Using the regexp: \d+ (?=rofl), I want 45 as a result.

Is it correct to use regex to extract data from a string? The best I have done is to highlight the value in some of the online regex editor. Most of the time it remove the value from my string.

I am investigating expr, but all I get is syntax errors.

How can I manage to extract 45 in a shell script?

Regex Solutions


Solution 1 - Regex

You can do this with GNU grep's perl mode:

echo "12 BBQ ,45 rofl, 89 lol" | grep -P '\d+ (?=rofl)' -o
echo "12 BBQ ,45 rofl, 89 lol" | grep --perl-regexp '\d+ (?=rofl)' --only-matching

-P and --perl-regexp mean Perl-style regular expression. -o and --only-matching mean to output only the matching text.

Solution 2 - Regex

Yes regex can certainly be used to extract part of a string. Unfortunately different flavours of *nix and different tools use slightly different Regex variants.

This sed command should work on most flavours (Tested on OS/X and Redhat)

echo '12 BBQ ,45 rofl, 89 lol' | sed  's/^.*,\([0-9][0-9]*\).*$/\1/g'

Solution 3 - Regex

It seems that you are asking multiple things. To answer them:

  • Yes, it is ok to extract data from a string using regular expressions, that's what they're there for

  • You get errors, which one and what shell tool do you use?

  • You can extract the numbers by catching them in capturing parentheses:

      .*(\d+) rofl.*
    

    and using $1 to get the string out (.* is for "the rest before and after on the same line)

With sed as example, the idea becomes this to replace all strings in a file with only the matching number:

sed -e 's/.*(\d+) rofl.*/$1/g' inputFileName > outputFileName

or:

echo "12 BBQ ,45 rofl, 89 lol" | sed -e 's/.*(\d+) rofl.*/$1/g'

Solution 4 - Regex

you can use the shell(bash for example)

$ string="12 BBQ ,45 rofl, 89 lol"
$ echo ${string% rofl*}
12 BBQ ,45
$ string=${string% rofl*}
$ echo ${string##*,}
45

Solution 5 - Regex

Using ripgrep's replace option, it is possible to change the output to a capture group:

rg --only-matching --replace '$1' '(\d+) rofl'
  • --only-matching or -o outputs only the part that matches instead of the whole line.
  • --replace '$1' or -r replaces the output by the first capture group.

Solution 6 - Regex

You can certainly extract that part of a string and that's a great way to parse out data. Regular expression syntax varies a lot so you need to reference the help file for the regex you're using. You might try a regular expression like:

[0-9]+ *[a-zA-Z]+,([0-9]+) *[a-zA-Z]+,[0-9]+ *[a-zA-Z]+

If your regex program can do string replacement then replace the entire string with the result you want and you can easily use that result.

You didn't mention if you're using bash or some other shell. That would help get better answers when asking for help.

Solution 7 - Regex

You can use rextract to extract using a regular expression and reformat the result.

Example:

[$] echo "12 BBQ ,45 rofl, 89 lol" | ./rextract '[,]([\d]+) rofl' '${1}'
45

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSylView Question on Stackoverflow
Solution 1 - RegexMatthew FlaschenView Answer on Stackoverflow
Solution 2 - RegexSteve WeetView Answer on Stackoverflow
Solution 3 - RegexAbelView Answer on Stackoverflow
Solution 4 - Regexghostdog74View Answer on Stackoverflow
Solution 5 - RegexSjoerdView Answer on Stackoverflow
Solution 6 - RegexJayView Answer on Stackoverflow
Solution 7 - RegexTim SavannahView Answer on Stackoverflow