How to match any non white space character except a particular one?

RegexPerl

Regex Problem Overview


In Perl \S matches any non-whitespace character.

How can I match any non-whitespace character except a backslash \?

Regex Solutions


Solution 1 - Regex

You can use a character class:

/[^\s\\]/

matches anything that is not a whitespace character nor a \. Here's another example:

[abc] means "match a, b or c"; [^abc] means "match any character except a, b or c".

Solution 2 - Regex

You can use a lookahead:

/(?=\S)[^\\]/

Solution 3 - Regex

This worked for me using sed [Edit: comment below points out sed doesn't support \s]

[^ ]

while

[^\s] 

didn't

# Delete everything except space and 'g'
echo "ghai ghai" | sed "s/[^\sg]//g"
gg

echo "ghai ghai" | sed "s/[^ g]//g"
g g

Solution 4 - Regex

On my system: CentOS 5

I can use \s outside of collections but have to use [:space:] inside of collections. In fact I can use [:space:] only inside collections. So to match a single space using this I have to use [[:space:]] Which is really strange.

echo a b cX | sed -r "s/(a\sb[[:space:]]c[^[:space:]])/Result: \1/"

Result: a b cX
  • first space I match with \s
  • second space I match alternatively with [[:space:]]
  • the X I match with "all but no space" [^[:space:]]

These two will not work:

a[:space:]b  instead use a\sb or a[[:space:]]b

a[^\s]b      instead use a[^[:space:]]b

Solution 5 - Regex

If using regular expressions in bash or grep or something instead of just in perl, \S doesn't work to match all non-whitespace chars. The equivalent of \S, however, is [^\r\n\t\f\v ].

So, instead of this:

[^\s\\]

...you'll have to do this instead, to match no whitespace chars (regex: \r\n\t\f\v ) and no backslash (\; regex: \\)

[^\r\n\t\f\v \\]

References:

  1. [my answer] Unix & Linux: Any non-whitespace regular expression

Solution 6 - Regex

In this case, it's easier to define the problem of "non-whitespace without the backslash" to be not "whitespace or backslash", as the accepted answer shows:

/[^\s\\]/

However, for tricker problems, the regex set feature might be handy. You can perform set operations on character classes to get what you want. This one subtracts the set that is just the backslash from the set that is the non-whitespace characters:

use v5.18;
use experimental qw(regex_sets);

my $regex = qr/abc(?[ [\S] - [\\] ])/;


while( <DATA> ) {
	chomp;
	say "[$_] ", /$regex/ ? 'Matched' : 'Missed';
	}

__DATA__
abcd
abc d
abc\d
abcxyz
abc\\xyz

The output shows that neither whitespace nor the backslash matches after c:

[abcd] Matched
[abc d] Missed
[abc\d] Missed
[abcxyz] Matched
[abc\\xyz] Missed

This gets more interesting when the larger set would be difficult to express gracefully and set operations can refine it. I'd rather see the set operation in this example:

[b-df-hj-np-tv-z]
(?[ [a-z] - [aeiou] ])

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionLazerView Question on Stackoverflow
Solution 1 - RegexTim PietzckerView Answer on Stackoverflow
Solution 2 - RegexDenis de BernardyView Answer on Stackoverflow
Solution 3 - Regexstorm_m2138View Answer on Stackoverflow
Solution 4 - RegexTorgeView Answer on Stackoverflow
Solution 5 - RegexGabriel StaplesView Answer on Stackoverflow
Solution 6 - Regexbrian d foyView Answer on Stackoverflow