Regular expression for a string containing one word but not another

RegexGoogle AnalyticsRegex Negation

Regex Problem Overview


I'm setting up some goals in Google Analytics and could use a little regex help.

Lets say I have 4 URLs

http://www.anydotcom.com/test/search.cfm?metric=blah&selector=size&value=1
http://www.anydotcom.com/test/search.cfm?metric=blah2&selector=style&value=1
http://www.anydotcom.com/test/search.cfm?metric=blah3&selector=size&value=1
http://www.anydotcom.com/test/details.cfm?metric=blah&selector=size&value=1

I want to create an expression that will identify any URL that contains the string selector=size but does NOT contain details.cfm

I know that to find a string that does NOT contain another string I can use this expression:

(^((?!details.cfm).)*$)

But, I'm not sure how to add in the selector=size portion.

Any help would be greatly appreciated!

Regex Solutions


Solution 1 - Regex

This should do it:

^(?!.*details\.cfm).*selector=size.*$

^.*selector=size.*$ should be clear enough. The first bit, (?!.*details.cfm) is a negative look-ahead: before matching the string it checks the string does not contain "details.cfm" (with any number of characters before it).

Solution 2 - Regex

^(?=.*selector=size)(?:(?!details\.cfm).)+$

If your regex engine supported posessive quantifiers (though I suspect Google Analytics does not), then I guess this will perform better for large input sets:

^[^?]*+(?<!details\.cfm).*?selector=size.*$

Solution 3 - Regex

regex could be (perl syntax):

`/^[(^(?!.*details\.cfm).*selector=size.*)|(selector=size.*^(?!.*details\.cfm).*)]$/`

Solution 4 - Regex

There is a problem with the regex in the accepted answer. It also matches abcselector=size, selector=sizeabc etc.

A correct regex can be ^(?!.*\bdetails\.cfm\b).*\bselector=size\b.*$

Explanation of the regex at regex101:

enter image description here

Solution 5 - Regex

I was looking for a way to avoid --line-buffered on a tail in a similar situation as the OP and Kobi's solution works great for me. In my case excluding lines with either "bot" or "spider" while including ' / ' (for my root document).

My original command:

tail -f mylogfile | grep --line-buffered -v 'bot\|spider' | grep ' / '

Now becomes (with -P perl switch):

tail -f mylogfile | grep -P '^(?!.*(bot|spider)).*\s\/\s.*$'

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionChris StahlView Question on Stackoverflow
Solution 1 - RegexKobiView Answer on Stackoverflow
Solution 2 - RegexTomalakView Answer on Stackoverflow
Solution 3 - RegexdjipkoView Answer on Stackoverflow
Solution 4 - RegexArvind Kumar AvinashView Answer on Stackoverflow
Solution 5 - RegexroonView Answer on Stackoverflow