Removing duplicate characters from a string

Python

Python Problem Overview


How can I remove duplicate characters from a string using Python? For example, let's say I have a string:

foo = 'mppmt'

How can I make the string:

foo = 'mpt'

NOTE: Order is not important

Python Solutions


Solution 1 - Python

If order does not matter, you can use

"".join(set(foo))

set() will create a set of unique letters in the string, and "".join() will join the letters back to a string in arbitrary order.

If order does matter, you can use a dict instead of a set, which since Python 3.7 preserves the insertion order of the keys. (In the CPython implementation, this is already supported in Python 3.6 as an implementation detail.)

foo = "mppmt"
result = "".join(dict.fromkeys(foo))

resulting in the string "mpt". In earlier versions of Python, you can use collections.OrderedDict, which has been available starting from Python 2.7.

Solution 2 - Python

If order does matter, how about:

>>> foo = 'mppmt'
>>> ''.join(sorted(set(foo), key=foo.index))
'mpt'

Solution 3 - Python

If order is not the matter:

>>> foo='mppmt'
>>> ''.join(set(foo))
'pmt'

To keep the order:

>>> foo='mppmt'
>>> ''.join([j for i,j in enumerate(foo) if j not in foo[:i]])
'mpt'

Solution 4 - Python

Create a list in Python and also a set which doesn't allow any duplicates. Solution1 :

def fix(string):
    s = set()
    list = []
    for ch in string:
        if ch not in s:
            s.add(ch)
            list.append(ch)
    
    return ''.join(list)        

string = "Protiijaayiiii"
print(fix(string))

Method 2 :

s = "Protijayi"

aa = [ ch  for i, ch in enumerate(s) if ch not in s[:i]]
print(''.join(aa))

Solution 5 - Python

As was mentioned "".join(set(foo)) and collections.OrderedDict will do. A added foo = foo.lower() in case the string has upper and lower case characters and you need to remove ALL duplicates no matter if they're upper or lower characters.

from collections import OrderedDict
foo = "EugeneEhGhsnaWW"
foo = foo.lower()
print "".join(OrderedDict.fromkeys(foo))

prints eugnhsaw

Solution 6 - Python

> #Check code and apply in your Program: >
> #Input= 'pppmm'
s = 'ppppmm' s = ''.join(set(s))
print(s) #Output: pm

Solution 7 - Python

If order is important,

seen = set()
result = []
for c in foo:
    if c not in seen:
        result.append(c)
        seen.add(c)
result = ''.join(result)

Or to do it without sets:

result = []
for c in foo:
    if c not in result:
        result.append(c)
result = ''.join(result)

Solution 8 - Python

def dupe(str1):
    s=set(str1)

    return "".join(s)
str1='geeksforgeeks'
a=dupe(str1)
print(a)

works well if order is not important.

Solution 9 - Python

d = {}
s="YOUR_DESIRED_STRING"
res=[]
for c in s:
    if c not in d:
      res.append(c)
      d[c]=1
print ("".join(res))

variable 'c' traverses through String 's' in the for loop and is checked if c is in a set d (which initially has no element) and if c is not in d, c is appended to the character array 'res' then the index c of set d is changed to 1. after the loop is exited i.e c finishes traversing through the string to store unique elements in set d, the resultant res which has all unique characters is printed.

Solution 10 - Python

As string is a list of characters, converting it to dictionary will remove all duplicates and will retain the order.

"".join(list(dict.fromkeys(foo)))

Solution 11 - Python

Functional programming style while keeping order:

import functools

def get_unique_char(a, b):
    if b not in a:
        return a + b
    else:
        return a

if __name__ == '__main__':
    foo = 'mppmt'

    gen = functools.reduce(get_unique_char, foo)
    print(''.join(list(gen)))

Solution 12 - Python

def remove_duplicates(value):
    var=""
    for i in value:
        if i in value:
            if i in var:
                pass
            else:
                var=var+i
    return var

print(remove_duplicates("11223445566666ababzzz@@@123#*#*"))

Solution 13 - Python

from collections import OrderedDict
def remove_duplicates(value):
        m=list(OrderedDict.fromkeys(value))
        s=''
        for i in m:
            s+=i
        return s
print(remove_duplicates("11223445566666ababzzz@@@123#*#*"))

Solution 14 - Python

 mylist=["ABA", "CAA", "ADA"]
 results=[]
 for item in mylist:
     buffer=[]
     for char in item:
         if char not in buffer:
             buffer.append(char)
     results.append("".join(buffer))
    
 print(results)

 output
 ABA
 CAA
 ADA
 ['AB', 'CA', 'AD']

Solution 15 - Python

Using regular expressions:

import re
pattern = r'(.)\1+' # (.) any character repeated (\+) more than
repl = r'\1'        # replace it once
text = 'shhhhh!!!
re.sub(pattern,repl,text)

output:

sh!

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJSW189View Question on Stackoverflow
Solution 1 - PythonSven MarnachView Answer on Stackoverflow
Solution 2 - PythonDSMView Answer on Stackoverflow
Solution 3 - PythonkevView Answer on Stackoverflow
Solution 4 - PythonSoudipta DuttaView Answer on Stackoverflow
Solution 5 - PythonEugene BerezinView Answer on Stackoverflow
Solution 6 - Pythonhp_eliteView Answer on Stackoverflow
Solution 7 - PythonKevin CoffeyView Answer on Stackoverflow
Solution 8 - Pythonravi tanwarView Answer on Stackoverflow
Solution 9 - PythonTarishView Answer on Stackoverflow
Solution 10 - PythonhrnjanView Answer on Stackoverflow
Solution 11 - PythonOlivier_s_jView Answer on Stackoverflow
Solution 12 - PythonAbhisek MeshramView Answer on Stackoverflow
Solution 13 - Pythonswamy_teja7View Answer on Stackoverflow
Solution 14 - PythonGolden LionView Answer on Stackoverflow
Solution 15 - PythonIndPythCoderView Answer on Stackoverflow