vim regex replace multiple consecutive spaces with only one space

RegexVim

Regex Problem Overview


I often work with text files which have a variable amount of whitespaces as word separators (text processors like Word do this, to distribute fairly the whitespace amount due to different sizes of letters in certain fonts and they put this annoying variable amount of spaces even when saving as plain text).

I would like to automate the process of replacing these sequences of whitespaces that have variable length with single spaces. I suspect a regex could do it, but there are also whitespaces at the beginning of paragraphs (usually four of them, but not always), which I would want to let unchanged, so basically my regex should also not touch the leading whitespaces and this adds to the complexity.

I'm using vim, so a regex in the vim regex dialect would be very useful to me, if this is doable.

My current progress looks like this:

:%s/ \+/ /g

but it doesn't work correctly.

I'm also considering to write a vim script that could parse text lines one by one, process each line char by char and skip the whitespaces after the first one, but I have a feeling this would be overkill.

Regex Solutions


Solution 1 - Regex

this will replace 2 or more spaces

s/ \{2,}/ /g

or you could add an extra space before the \+ to your version

s/  \+/ /g

Solution 2 - Regex

This will do the trick:

%s![^ ]\zs  \+! !g

Many substitutions can be done in Vim easier than with other regex dialects by using the \zs and \ze meta-sequences. What they do is to exclude part of the match from the final result, either the part before the sequence (\zs, “s” for “start here”) or the part after (\ze, “e” for “end here”). In this case, the pattern must match one non-space character first ([^ ]) but the following \zs says that the final match result (which is what will be replaced) starts after that character.

Since there is no way to have a non-space character in front of line-leading whitespace, it will be not be matched by the pattern, so the substitution will not replace it. Simple.

Solution 3 - Regex

In the interests of pragmatism, I tend to just do it as a three-stage process:

:g/^    /s//XYZZYPARA/g
:g/ \+/s// /g
:g/^XYZZYPARA/s//    /g

I don't doubt that there may be a better way (perhaps using macros or even a pure regex way) but I usually find this works when I'm in a hurry. Of course, if you have lines starting with XYZZYPARA, you may want to adjust the string :-)

It's good enough to turn:

    This is a new paragraph
spanning       two lines.
    And    so    is   this but on one line.

into:

    This is a new paragraph
spanning two lines. 
    And so is this but on one line.

>Aside: If you're wondering why I use :g instead of :s, that's just habit mostly. :g can do everything :s can and so much more. It's actually a way to execute an arbitrary command on selected lines. The command to execute happens to be s in this case so there's no real difference but, if you want to become a vi power user, you should look into :g at some point.

Solution 4 - Regex

There are lots of good answers here (especially Aristotle's: \zs and \ze are well worth learning). Just for completeness, you can also do this with a negative look-behind assertion:

:%s/\(^ *\)\@<! \{2,}/ /g

This says "find 2 or more spaces (' \{2,}') that are NOT preceded by 'the start of the line followed by zero or more spaces'". If you prefer to reduce the number of backslashes, you can also do this:

:%s/\v(^ *)@<! {2,}/ /g

but it only saves you two characters! You could also use ' +' instead of ' {2,}' if you don't mind it doing a load of redundant changes (i.e. changing a single space to a single space).

You could also use the negative look-behind to just check for a single non-space character:

:%s/\S\@<!\s\+/ /g

which is much the same as (a slightly modified version of Aristotle's to treat spaces and tabs as the same in order to save a bit of typing):

:%s/\S\zs \+/ /g

See:

:help \zs
:help \ze
:help \@<!
:help zero-width
:help \v

and (read it all!):

:help pattern.txt

Solution 5 - Regex

Answered; but though i'd toss my work flow in anyway.

%s/  / /g
@:@:@:@:@:@:@:@:@:@:@:@:(repeat till clean)

Fast and simple to remember. There are a far more elegant solutions above; but just my .02.

Solution 6 - Regex

Does this work?

%s/\([^ ]\)  */\1 /g

Solution 7 - Regex

I like this version - it is similar to the look ahead version of Aristotle Pagaltzis, but I find it easier to understand. (Probably just my unfamiliarity with \zs)

s/\([^ ]\) \+/\1 /g

or for all whitespace

s/\(\S\)\s\+/\1 /g

I read it as "replace all occurences of something other than a space followed by multiple spaces with the something and a single space".

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionjedi_coderView Question on Stackoverflow
Solution 1 - RegexmikerobiView Answer on Stackoverflow
Solution 2 - RegexAristotle PagaltzisView Answer on Stackoverflow
Solution 3 - RegexpaxdiabloView Answer on Stackoverflow
Solution 4 - RegexDrAlView Answer on Stackoverflow
Solution 5 - RegexwomView Answer on Stackoverflow
Solution 6 - Regexfrogstarr78View Answer on Stackoverflow
Solution 7 - RegexMichael AndersonView Answer on Stackoverflow