Removing duplicate characters from a string
PythonPython Problem Overview
How can I remove duplicate characters from a string using Python? For example, let's say I have a string:
foo = 'mppmt'
How can I make the string:
foo = 'mpt'
NOTE: Order is not important
Python Solutions
Solution 1 - Python
If order does not matter, you can use
"".join(set(foo))
set()
will create a set of unique letters in the string, and "".join()
will join the letters back to a string in arbitrary order.
If order does matter, you can use a dict
instead of a set, which since Python 3.7 preserves the insertion order of the keys. (In the CPython implementation, this is already supported in Python 3.6 as an implementation detail.)
foo = "mppmt"
result = "".join(dict.fromkeys(foo))
resulting in the string "mpt"
. In earlier versions of Python, you can use collections.OrderedDict
, which has been available starting from Python 2.7.
Solution 2 - Python
If order does matter, how about:
>>> foo = 'mppmt'
>>> ''.join(sorted(set(foo), key=foo.index))
'mpt'
Solution 3 - Python
If order is not the matter:
>>> foo='mppmt'
>>> ''.join(set(foo))
'pmt'
To keep the order:
>>> foo='mppmt'
>>> ''.join([j for i,j in enumerate(foo) if j not in foo[:i]])
'mpt'
Solution 4 - Python
Create a list in Python and also a set which doesn't allow any duplicates. Solution1 :
def fix(string):
s = set()
list = []
for ch in string:
if ch not in s:
s.add(ch)
list.append(ch)
return ''.join(list)
string = "Protiijaayiiii"
print(fix(string))
Method 2 :
s = "Protijayi"
aa = [ ch for i, ch in enumerate(s) if ch not in s[:i]]
print(''.join(aa))
Solution 5 - Python
As was mentioned "".join(set(foo)) and collections.OrderedDict will do. A added foo = foo.lower() in case the string has upper and lower case characters and you need to remove ALL duplicates no matter if they're upper or lower characters.
from collections import OrderedDict
foo = "EugeneEhGhsnaWW"
foo = foo.lower()
print "".join(OrderedDict.fromkeys(foo))
prints eugnhsaw
Solution 6 - Python
> #Check code and apply in your Program:
>
> #Input= 'pppmm'
s = 'ppppmm'
s = ''.join(set(s))
print(s)
#Output: pm
Solution 7 - Python
If order is important,
seen = set()
result = []
for c in foo:
if c not in seen:
result.append(c)
seen.add(c)
result = ''.join(result)
Or to do it without sets:
result = []
for c in foo:
if c not in result:
result.append(c)
result = ''.join(result)
Solution 8 - Python
def dupe(str1):
s=set(str1)
return "".join(s)
str1='geeksforgeeks'
a=dupe(str1)
print(a)
works well if order is not important.
Solution 9 - Python
d = {}
s="YOUR_DESIRED_STRING"
res=[]
for c in s:
if c not in d:
res.append(c)
d[c]=1
print ("".join(res))
variable 'c' traverses through String 's' in the for loop and is checked if c is in a set d (which initially has no element) and if c is not in d, c is appended to the character array 'res' then the index c of set d is changed to 1. after the loop is exited i.e c finishes traversing through the string to store unique elements in set d, the resultant res which has all unique characters is printed.
Solution 10 - Python
As string is a list of characters, converting it to dictionary will remove all duplicates and will retain the order.
"".join(list(dict.fromkeys(foo)))
Solution 11 - Python
Functional programming style while keeping order:
import functools
def get_unique_char(a, b):
if b not in a:
return a + b
else:
return a
if __name__ == '__main__':
foo = 'mppmt'
gen = functools.reduce(get_unique_char, foo)
print(''.join(list(gen)))
Solution 12 - Python
def remove_duplicates(value):
var=""
for i in value:
if i in value:
if i in var:
pass
else:
var=var+i
return var
print(remove_duplicates("11223445566666ababzzz@@@123#*#*"))
Solution 13 - Python
from collections import OrderedDict
def remove_duplicates(value):
m=list(OrderedDict.fromkeys(value))
s=''
for i in m:
s+=i
return s
print(remove_duplicates("11223445566666ababzzz@@@123#*#*"))
Solution 14 - Python
mylist=["ABA", "CAA", "ADA"]
results=[]
for item in mylist:
buffer=[]
for char in item:
if char not in buffer:
buffer.append(char)
results.append("".join(buffer))
print(results)
output
ABA
CAA
ADA
['AB', 'CA', 'AD']
Solution 15 - Python
Using regular expressions:
import re
pattern = r'(.)\1+' # (.) any character repeated (\+) more than
repl = r'\1' # replace it once
text = 'shhhhh!!!
re.sub(pattern,repl,text)
output:
sh!