How can I copy a Python string?

PythonStringPython 2.7

Python Problem Overview


I do this:

a = 'hello'

And now I just want an independent copy of a:

import copy

b = str(a)
c = a[:]
d = a + ''
e = copy.copy(a)

map( id, [ a,b,c,d,e ] )

Out[3]:

[4365576160, 4365576160, 4365576160, 4365576160, 4365576160]

Why do they all have the same memory address and how can I get a copy of a?

Python Solutions


Solution 1 - Python

You don't need to copy a Python string. They are immutable, and the copy module always returns the original in such cases, as do str(), the whole string slice, and concatenating with an empty string.

Moreover, your 'hello' string is interned (certain strings are). Python deliberately tries to keep just the one copy, as that makes dictionary lookups faster.

One way you could work around this is to actually create a new string, then slice that string back to the original content:

>>> a = 'hello'
>>> b = (a + '.')[:-1]
>>> id(a), id(b)
(4435312528, 4435312432)

But all you are doing now is waste memory. It is not as if you can mutate these string objects in any way, after all.

If all you wanted to know is how much memory a Python object requires, use sys.getsizeof(); it gives you the memory footprint of any Python object.

For containers this does not include the contents; you'd have to recurse into each container to calculate a total memory size:

>>> import sys
>>> a = 'hello'
>>> sys.getsizeof(a)
42
>>> b = {'foo': 'bar'}
>>> sys.getsizeof(b)
280
>>> sys.getsizeof(b) + sum(sys.getsizeof(k) + sys.getsizeof(v) for k, v in b.items())
360

You can then choose to use id() tracking to take an actual memory footprint or to estimate a maximum footprint if objects were not cached and reused.

Solution 2 - Python

You can copy a string in python via string formatting :

>>> a = 'foo'  
>>> b = '%s' % a  
>>> id(a), id(b)  
(140595444686784, 140595444726400)  

Solution 3 - Python

I'm just starting some string manipulations and found this question. I was probably trying to do something like the OP, "usual me". The previous answers did not clear up my confusion, but after thinking a little about it I finally "got it".

As long as a, b, c, d, and e have the same value, they reference to the same place. Memory is saved. As soon as the variable start to have different values, they get start to have different references. My learning experience came from this code:

import copy
a = 'hello'
b = str(a)
c = a[:]
d = a + ''
e = copy.copy(a)

print map( id, [ a,b,c,d,e ] )

print a, b, c, d, e

e = a + 'something'
a = 'goodbye'
print map( id, [ a,b,c,d,e ] )
print a, b, c, d, e

The printed output is:

[4538504992, 4538504992, 4538504992, 4538504992, 4538504992]

hello hello hello hello hello

[6113502048, 4538504992, 4538504992, 4538504992, 5570935808]

goodbye hello hello hello hello something

Solution 4 - Python

To put it a different way "id()" is not what you care about. You want to know if the variable name can be modified without harming the source variable name.

>>> a = 'hello'                                                                                                                                                                                                                                                                                        
>>> b = a[:]                                                                                                                                                                                                                                                                                           
>>> c = a                                                                                                                                                                                                                                                                                              
>>> b += ' world'                                                                                                                                                                                                                                                                                      
>>> c += ', bye'                                                                                                                                                                                                                                                                                       
>>> a                                                                                                                                                                                                                                                                                                  
'hello'                                                                                                                                                                                                                                                                                                
>>> b                                                                                                                                                                                                                                                                                                  
'hello world'                                                                                                                                                                                                                                                                                          
>>> c                                                                                                                                                                                                                                                                                                  
'hello, bye'                                                                                                                                                                                                                                                                                           

If you're used to C, then these are like pointer variables except you can't de-reference them to modify what they point at, but id() will tell you where they currently point.

The problem for python programmers comes when you consider deeper structures like lists or dicts:

>>> o={'a': 10}                                                                                                                                                                                                                                                                                        
>>> x=o                                                                                                                                                                                                                                                                                                
>>> y=o.copy()                                                                                                                                                                                                                                                                                         
>>> x['a'] = 20                                                                                                                                                                                                                                                                                        
>>> y['a'] = 30                                                                                                                                                                                                                                                                                        
>>> o                                                                                                                                                                                                                                                                                                  
{'a': 20}                                                                                                                                                                                                                                                                                              
>>> x                                                                                                                                                                                                                                                                                                  
{'a': 20}                                                                                                                                                                                                                                                                                              
>>> y                                                                                                                                                                                                                                                                                                  
{'a': 30}                                                                                                                                                                                                                                                                                              

Here o and x refer to the same dict o['a'] and x['a'], and that dict is "mutable" in the sense that you can change the value for key 'a'. That's why "y" needs to be a copy and y['a'] can refer to something else.

Solution 5 - Python

It is possible, using this simple trick :

a = "Python"
b = a[ : : -1 ][ : : -1 ]
print( "a =" , a )
print( "b =" , b )
a == b  # True
id( a ) == id( b ) # False

Solution 6 - Python

As others have already explained, there's rarely an actual need for this, but nevertheless, here you go:
(works on Python 3, but there's probably something similar for Python 2)

import ctypes

copy          = ctypes.pythonapi._PyUnicode_Copy
copy.argtypes = [ctypes.py_object]
copy.restype  = ctypes.py_object

s1 = 'xxxxxxxxxxxxx'
s2 = copy(s1)

id(s1) == id(s2) # False

Solution 7 - Python

Copying a string can be done two ways either copy the location a = "a" b = a or you can clone which means b wont get affected when a is changed which is done by a = 'a' b = a[:]

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionusual meView Question on Stackoverflow
Solution 1 - PythonMartijn PietersView Answer on Stackoverflow
Solution 2 - PythonRichard UrbanView Answer on Stackoverflow
Solution 3 - Pythonkarl sView Answer on Stackoverflow
Solution 4 - PythonCharles ThayerView Answer on Stackoverflow
Solution 5 - Pythonjust4qwertyView Answer on Stackoverflow
Solution 6 - PythonEli FinkelView Answer on Stackoverflow
Solution 7 - PythonThomas YoungsonView Answer on Stackoverflow