Grep and Python

PythonRegexGrep

Python Problem Overview


I need a way of searching a file using grep via a regular expression from the Unix command line. For example when I type in the command line:

python pythonfile.py 'RE' 'file-to-be-searched'

I need the regular expression 'RE' to be searched in the file and print out the matching lines.

Here's the code I have:

import re
import sys

search_term = sys.argv[1]
f = sys.argv[2]

for line in open(f, 'r'):
    if re.search(search_term, line):
        print line,
        if line == None:
            print 'no matches found'

But when I enter a word which isn't present, no matches found doesn't print

Python Solutions


Solution 1 - Python

The natural question is why not just use grep?! But assuming you can't...

import re
import sys

file = open(sys.argv[2], "r")

for line in file:
     if re.search(sys.argv[1], line):
         print line,

Things to note:

  • search instead of match to find anywhere in string
  • comma (,) after print removes carriage return (line will have one)
  • argv includes python file name, so variables need to start at 1

This doesn't handle multiple arguments (like grep does) or expand wildcards (like the Unix shell would). If you wanted this functionality you could get it using the following:

import re
import sys
import glob

for arg in sys.argv[2:]:
	for file in glob.iglob(arg):
		for line in open(file, 'r'):
			if re.search(sys.argv[1], line):
				print line,

Solution 2 - Python

Concise and memory efficient:

#!/usr/bin/env python
# file: grep.py
import re, sys, collections

collections.deque(map(sys.stdout.write,(l for l in sys.stdin if re.search(sys.argv[1],l))),maxlen=0)

It works like egrep (without too much error handling), e.g.:

cat input-file | grep.py "RE"

And here is the one-liner:

cat input-file | python -c "import re,sys,collections;collections.deque(map(sys.stdout.write,(l for l in sys.stdin if re.search(sys.argv[1],l))),maxlen=0)" "RE"

Note that the collections.deque function is required in Python3 because map has become a lazy function.

Solution 3 - Python

Adapted from a grep in python.

Accepts a list of filenames via [2:], does no exception handling:

#!/usr/bin/env python
import re, sys, os

for f in filter(os.path.isfile, sys.argv[2:]):
    for line in open(f).readlines():
        if re.match(sys.argv[1], line):
            print line

sys.argv[1] resp sys.argv[2:] works, if you run it as an standalone executable, meaning

chmod +x

first

Solution 4 - Python

  1. use sys.argv to get the command-line parameters
  2. use open(), read() to manipulate file
  3. use the Python re module to match lines

Solution 5 - Python

You might be interested in pyp. Citing my other answer:

> "The Pyed Piper", or pyp, is a linux command line text manipulation > tool similar to awk or sed, but which uses standard python string and > list methods as well as custom functions evolved to generate fast > results in an intense production environment.

Solution 6 - Python

The real problem is that the variable line always has a value. The test for "no matches found" is whether there is a match so the code "if line == None:" should be replaced with "else:"

Solution 7 - Python

You can use python-textops3 :

from textops import *

print('\n'.join(cat(f) | grep(search_term)))

with python-textops3 you can use unix-like commands with pipes

Solution 8 - Python

Not sure if your question was clear to me but to fix your code just change your if expression like the following:

import re
import sys

search_term = sys.argv[1]
f = sys.argv[2]
r = None
n = 0
for line in open(f, 'r'):
    n=n+1
    r = re.search(search_term, line)
    if r:
        print(f"{line} found at line {n}")
if not r:
    print('no matches found')

PS: I tested it on Python 3.8.10

if you want to use grep you could

grep -E '(.*)word(.*)' file.txt || echo "pattern not found"

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDavidView Question on Stackoverflow
Solution 1 - PythonNick FortescueView Answer on Stackoverflow
Solution 2 - PythonGiancarlo SportelliView Answer on Stackoverflow
Solution 3 - PythonmikuView Answer on Stackoverflow
Solution 4 - PythonjldupontView Answer on Stackoverflow
Solution 5 - PythonPiotr DobrogostView Answer on Stackoverflow
Solution 6 - PythonrichardView Answer on Stackoverflow
Solution 7 - PythonEricView Answer on Stackoverflow
Solution 8 - PythonbrunocrtView Answer on Stackoverflow