How to extract text from a string using sed?
RegexBashSedRegex Problem Overview
My example string is as follows:
This is 02G05 a test string 20-Jul-2012
Now from the above string I want to extract 02G05
. For that I tried the following regex with sed
$ echo "This is 02G05 a test string 20-Jul-2012" | sed -n '/\d+G\d+/p'
But the above command prints nothing and the reason I believe is it is not able to match anything against the pattern I supplied to sed.
So, my question is what am I doing wrong here and how to correct it.
When I try the above string and pattern with python I get my result
>>> re.findall(r'\d+G\d+',st)
['02G05']
>>>
Regex Solutions
Solution 1 - Regex
How about using grep -E
?
echo "This is 02G05 a test string 20-Jul-2012" | grep -Eo '[0-9]+G[0-9]+'
Solution 2 - Regex
The pattern \d
might not be supported by your sed
. Try [0-9]
or [[:digit:]]
instead.
To only print the actual match (not the entire matching line), use a substitution.
sed -n 's/.*\([0-9][0-9]*G[0-9][0-9]*\).*/\1/p'
Solution 3 - Regex
sed
doesn't recognize \d
, use [[:digit:]]
instead. You will also need to escape the +
or use the -r
switch (-E
on OS X).
Note that [0-9]
works as well for Arabic-Hindu numerals.
Solution 4 - Regex
Try this instead:
echo "This is 02G05 a test string 20-Jul-2012" | sed 's/.* \([0-9]\+G[0-9]\+\) .*/\1/'
But note, if there is two pattern on one line, it will prints the 2nd.
Solution 5 - Regex
Try using rextract. It will let you extract text using a regular expression and reformat it.
Example:
$ echo "This is 02G05 a test string 20-Jul-2012" | ./rextract '([\d]+G[\d]+)' '${1}'
2G05