Removing duplicate rows in Notepad++

DuplicatesNotepad++

Duplicates Problem Overview


Is it possible to remove duplicated rows in Notepad++, leaving only a single occurrence of a line?

Duplicates Solutions


Solution 1 - Duplicates

Notepad++ with the TextFX plugin can do this, provided you wanted to sort by line, and remove the duplicate lines at the same time.

To install the TextFX in the latest release of Notepad++ you need to download it from here: https://sourceforge.net/projects/npp-plugins/files/TextFX

The TextFX plugin used to be included in older versions of Notepad++, or be possible to add from the menu by going to Plugins -> Plugin Manager -> Show Plugin Manager -> Available tab -> TextFX -> Install. In some cases it may also be called TextFX Characters, but this is the same thing.

The check boxes and buttons required will now appear in the menu under: TextFX -> TextFX Tools.

Make sure "sort outputs only unique..." is checked. Next, select a block of text (Ctrl+A to select the entire document). Finally, click "sort lines case sensitive" or "sort lines case insensitive"

menu layout in n++

Solution 2 - Duplicates

Since Notepad++ Version 6 you can use this regex in the search and replace dialogue:

^(.*?)$\s+?^(?=.*^\1$)

and replace with nothing. This leaves from all duplicate rows the last occurrence in the file.

No sorting is needed for that and the duplicate rows can be anywhere in the file!

You need to check the options "Regular expression" and ". matches newline":

Notepad++ Replace dialogue

  • ^ matches the start of the line.

  • (.*?) matches any characters 0 or more times, but as few as possible (It matches exactly on row, this is needed because of the ". matches newline" option). The matched row is stored, because of the brackets around and accessible using \1

  • $ matches the end of the line.

  • \s+?^ this part matches all whitespace characters (newlines!) till the start of the next row ==> This removes the newlines after the matched row, so that no empty row is there after the replacement.

  • (?=.*^\1$) this is a positive lookahead assertion. This is the important part in this regex, a row is only matched (and removed), when there is exactly the same row following somewhere else in the file.

Solution 3 - Duplicates

If the rows are immediately after each other then you can use a regex replace:

Search Pattern: ^(.*\r?\n)(\1)+

Replace with: \1

Solution 4 - Duplicates

In version 7.8, you can accomplish this without any plugins - Edit -> Line Operations -> Remove Consecutive Duplicate Lines. You will have to sort the file to place duplicate lines in consecutive order before this works, but it does work like a charm.

Sorting options are available under Edit -> Line Operations -> Sort By ...

Solution 5 - Duplicates

If you don't care about row order (which I don't think you do), then you can use a Linux/FreeBSD/Mac OS X/Cygwin box and do:

$ cat yourfile | sort | uniq > yourfile_nodups

Then open the file again in Notepad++.

Solution 6 - Duplicates

Notepad++

-> Replace window

Ensure that in Search mode you have selected the Regular expression radio button

Find what:

> ^(.*)(\r?\n\1)+$

Replace with:

> $1

Before:

> and we think there > > and we think there > > single line > > Is it possible to > > Is it possible to

After:

> and we think there > > single line > > Is it possible to

Solution 7 - Duplicates

As of Notepad++ version 8.1, there is a specific command to do precisely what this popular question asks. On can remove duplicated rows in a text file with the menu command Edit > Line Operations > Remove Duplicate Lines.

There is no need to install a plugin (as the currently accepted answer suggests), or sort the lines beforehand, or use the regex syntax in the Replace dialogue as other answers suggested.

enter image description here

Solution 8 - Duplicates

The latter versions of Notepad++ do not apparently include the TextFX plugin at all. In order to use the plugin for sorting/eliminating duplicates, the plugin must be either downloaded and installed (more involved) or added using the plugin manager.

A) Easy way (as described [here][1]).

Plugins -> Plugin Manager -> Show Plugin Manager -> Available tab -> TextFX Characters -> Install

B) More involved way, if another version is needed or the easy way does not work.

  1. Download the plugin from SourceForge:

    http://downloads.sourceforge.net/project/npp-plugins/TextFX/TextFX%20v0.26/TextFX.v0.26.unicode.bin.zip

  2. Open the zip file and extract NppTextFX.dll

  3. Place NppTextFX.dll in the Notepad++ plugins directory, such as:
    C:\Program Files\Notepad++\plugins

  4. Start Notepad++, and TextFX will be one of the file menu items (as seen in Answer #1 above by Colin Pickard)

After installing the TextFX plugin, follow the instructions in Answer #1 to sort and remove duplicates.

Also, consider setting up a keyboard shortcut using Settings > Shorcut mapper if you use this command frequently or want to replicate a keyboard shortcut, such as F9 in TextPad for sorting.

[1]: https://stackoverflow.com/questions/12699833/textfx-menu-is-missing-in-notepad "here"

Solution 9 - Duplicates

As of now, it's possible to remove all consecutive duplicate lines with Notepad in-built functionality. Sort the lines first:

Edit > Line Operations > "Sort lines lexicographically",

then

Edit > Line Operations > "Remove Consecutive Duplicate Lines".

The regex solution suggested above didn't remove all duplicate lines for me, but just the consecutive ones as well.

Solution 10 - Duplicates

You may need a plugin to do this. You can try the command line cc.ddl(delete duplicate lines) of ConyEdit. It is a cross-editor plugin for the text editors, including Notepad++.

With ConyEdit running in background, follow the steps below:

  1. enter the command line cc.ddl at the end of the text.
  2. copy the text and the command line.
  3. paste, then you will see what you want.

Example
enter image description here

Solution 11 - Duplicates

Search for the regular expression: \b(\w+)\b([\w\W]*)\b\1\b

Replace it with: $1$2

Hit the Replace button until there are no more matches for the regular expression in your file.

Solution 12 - Duplicates

None worked for me.

A solution is:

Replace

^(.*)\s+(\r?\n\1\s+)+$

with

\1

Solution 13 - Duplicates

The plugin manager is currently unavailable (does not come with the distribution) for Notepad++. You must install it manually (https://github.com/bruderstein/nppPluginManager/releases) and even if you do, a lot of the plugins are not available anymore (no TextFX) plugin.

Maybe there is another plugin which contains the required functionality. Other than that, the only way to do it in Notepad++ is to use some special regex for matching and then replacing (Ctrl + FReplace tab).

Although there are many functionalities available via Edit menu item (trimming, removing empty lines, sorting, converting EOL) there is no "unique" operation available.

If you have Windows 10 then you can enable Bash (just type Ubuntu in Microsoft Store and follow the instructions in the description to install it) and use cat your_file.txt | sort | uniq > your_file_edited.txt. Of course you must be in the same working directory as "your_file.txt" or refer to it via its path.

Solution 14 - Duplicates

Whether the file is sorted or not, you can use below regex to remove duplicates in anywhere occurred in your file.

Find what: ^([^\r]*[^\n])(.*?)\r?\n\1$
Replace with: \1\2
Search Mode:

  • "Regular expression"
  • Check the ". matches newline" option

do "Replace All" as many time as possible until you see "0 occurrences were replaced"

Solution 15 - Duplicates

Extending the top answer, you can also use a 2nd lookahead to find rows that are almost duplicates of other rows.

^(\s*(<PackageReference Include=".*" Version=).*)$\s+?^(?=.*^\2.*$)

Here I'm after multiple references to the same <PackageReference Include=".*" string, regardless of its version.

Test data

<PackageReference Include="Package1" Version="2.2.1" />

    <PackageReference Include="Package1" Version="2.2.1" /> // Match
<PackageReference Include="Package1" Version="2.2.2" />

<PackageReference Include="Package2" Version="5.1" /> // Match
<PackageReference Include="Package2" Version="5.2" />

<PackageReference Include="Package3" Version="2.2.1" /> // No match
<PackageReference Include="Package4" Version="2.2.1" />

See a breakdown of what the regex terms mean and try with your own data on this regex101 share.

Solution 16 - Duplicates

Difficult to do this in NPP. Better way is following:

Download cygwin utility, it is simple Linux terminal under windows. It allow to execute any Linux command in Windows. And you have sort -u there.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionPrzemysław MichalskiView Question on Stackoverflow
Solution 1 - DuplicatesColin PickardView Answer on Stackoverflow
Solution 2 - DuplicatesstemaView Answer on Stackoverflow
Solution 3 - DuplicatesGrant PetersView Answer on Stackoverflow
Solution 4 - Duplicatesdr.nixonView Answer on Stackoverflow
Solution 5 - DuplicatesPablo Santa CruzView Answer on Stackoverflow
Solution 6 - Duplicatesblueberry0xffView Answer on Stackoverflow
Solution 7 - DuplicatesdivenexView Answer on Stackoverflow
Solution 8 - DuplicateseeasterlyView Answer on Stackoverflow
Solution 9 - DuplicatesSaPropperView Answer on Stackoverflow
Solution 10 - DuplicatesDonaldView Answer on Stackoverflow
Solution 11 - DuplicatesHesham EraqiView Answer on Stackoverflow
Solution 12 - DuplicatesManohar Reddy PoreddyView Answer on Stackoverflow
Solution 13 - DuplicatesPatronautView Answer on Stackoverflow
Solution 14 - DuplicatesαғsнιηView Answer on Stackoverflow
Solution 15 - DuplicatesRJFalconerView Answer on Stackoverflow
Solution 16 - DuplicateshaykpView Answer on Stackoverflow