Python string formatting: is '%' more efficient than 'format' function?

PythonPerformance

Python Problem Overview


I wanted to compare different to build a string in Python from different variables:

  • using + to concatenate (referred to as 'plus')
  • using %
  • using "".join(list)
  • using format function
  • using "{0.<attribute>}".format(object)

I compared for 3 types of scenari

  • string with 2 variables
  • string with 4 variables
  • string with 4 variables, each used twice

I measured 1 million operations of each time and performed an average over 6 measures. I came up with the following timings:

           test_plus:   0.29480
        test_percent:   0.47540
           test_join:   0.56240
         test_format:   0.72760
        test_formatC:   0.90000
      test_plus_long:   0.50520
   test_percent_long:   0.58660
      test_join_long:   0.64540
    test_format_long:   1.03400
   test_formatC_long:   1.28020
     test_plus_long2:   0.95220
  test_percent_long2:   0.81580
     test_join_long2:   0.88400
   test_format_long2:   1.51500
  test_formatC_long2:   1.97160

In each scenario, I came up with the following conclusion

  • Concatenation seems to be one of the fastest method
  • Formatting using % is much faster than formatting with format function

I believe format is much better than % (e.g. in this question) and % was almost deprecated.

I have therefore several questions:

  1. Is % really faster than format?
  2. If so, why is that?
  3. Why is "{} {}".format(var1, var2) more efficient than "{0.attribute1} {0.attribute2}".format(object)?

For reference, I used the following code to measure the different timings.

import time
def timing(f, n, show, *args):
	if show: print f.__name__ + ":\t",
	r = range(n/10)
	t1 = time.clock()
	for i in r:
		f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args)
	t2 = time.clock()
	timing = round(t2-t1, 3)
	if show: print timing
	return timing
	

class values(object):
	def __init__(self, a, b, c="", d=""):
		self.a = a
		self.b = b
		self.c = c
		self.d = d

	
def test_plus(a, b):
	return a + "-" + b

def test_percent(a, b):
	return "%s-%s" % (a, b)

def test_join(a, b):
	return ''.join([a, '-', b])
		
def test_format(a, b):
	return "{}-{}".format(a, b)

def test_formatC(val):
	return "{0.a}-{0.b}".format(val)

	
def test_plus_long(a, b, c, d):
	return a + "-" + b + "-" + c + "-" + d

def test_percent_long(a, b, c, d):
	return "%s-%s-%s-%s" % (a, b, c, d)
		
def test_join_long(a, b, c, d):
	return ''.join([a, '-', b, '-', c, '-', d])
	
def test_format_long(a, b, c, d):
	return "{0}-{1}-{2}-{3}".format(a, b, c, d)

def test_formatC_long(val):
	return "{0.a}-{0.b}-{0.c}-{0.d}".format(val)

	
def test_plus_long2(a, b, c, d):
	return a + "-" + b + "-" + c + "-" + d + "-" + a + "-" + b + "-" + c + "-" + d

def test_percent_long2(a, b, c, d):
	return "%s-%s-%s-%s-%s-%s-%s-%s" % (a, b, c, d, a, b, c, d)
	
def test_join_long2(a, b, c, d):
	return ''.join([a, '-', b, '-', c, '-', d, '-', a, '-', b, '-', c, '-', d])
			
def test_format_long2(a, b, c, d):
	return "{0}-{1}-{2}-{3}-{0}-{1}-{2}-{3}".format(a, b, c, d)

def test_formatC_long2(val):
	return "{0.a}-{0.b}-{0.c}-{0.d}-{0.a}-{0.b}-{0.c}-{0.d}".format(val)


def test_plus_superlong(lst):
	string = ""
	for i in lst:
		string += str(i)
	return string
	

def test_join_superlong(lst):
	return "".join([str(i) for i in lst])
	

def mean(numbers):
    return float(sum(numbers)) / max(len(numbers), 1)
		

nb_times = int(1e6)
n = xrange(5)
lst_numbers = xrange(1000)
from collections import defaultdict
metrics = defaultdict(list)
list_functions = [
	test_plus, test_percent, test_join, test_format, test_formatC,
	test_plus_long, test_percent_long, test_join_long, test_format_long, test_formatC_long,
	test_plus_long2, test_percent_long2, test_join_long2, test_format_long2, test_formatC_long2,
	# test_plus_superlong, test_join_superlong,
]
val = values("123", "456", "789", "0ab")
for i in n:
	for f in list_functions:
		print ".",
		name = f.__name__
		if "formatC" in name:
			t = timing(f, nb_times, False, val)
		elif '_long' in name:
			t = timing(f, nb_times, False, "123", "456", "789", "0ab")
		elif '_superlong' in name:
			t = timing(f, nb_times, False, lst_numbers)
		else:
			t = timing(f, nb_times, False, "123", "456")
		metrics[name].append(t) 

# Get Average
print "\n===AVERAGE OF TIMINGS==="
for f in list_functions:
	name = f.__name__
	timings = metrics[name]
	print "{:>20}:\t{:0.5f}".format(name, mean(timings))

Python Solutions


Solution 1 - Python

  1. Yes, % string formatting is faster than the .format method
  2. most likely (this may have a much better explanation) due to % being a syntactical notation (hence fast execution), whereas .format involves at least one extra method call
  3. because attribute value access also involves an extra method call, viz. __getattr__

I ran a slightly better analysis (on Python 3.8.2) using timeit of various formatting methods, results of which are as follows (pretty-printed with BeautifulTable) -

Type \ num_vars 1 2 5 10 50 250
f_str_str 0.056 0.063 0.115 0.173 0.754 3.717
f_str_int 0.055 0.148 0.354 0.656 3.186 15.747
concat_str 0.012 0.044 0.169 0.333 1.888 10.231
pct_s_str 0.091 0.114 0.182 0.313 1.213 6.019
pct_s_int 0.090 0.141 0.248 0.479 2.179 10.768
dot_format_str 0.143 0.157 0.251 0.461 1.745 8.259
dot_format_int 0.141 0.192 0.333 0.620 2.735 13.298
dot_format2_str 0.159 0.195 0.330 0.634 3.494 18.975
dot_format2_int 0.158 0.227 0.422 0.762 4.337 25.498

The trailing _str & _int represent the operation was carried out on respective value types.

Kindly note that the concat_str result for a single variable is essentially just the string itself, so it shouldn't really be considered.

My setup for arriving at the results -

from timeit import timeit
from beautifultable import BeautifulTable  # pip install beautifultable

times = {}

for num_vars in (250, 50, 10, 5, 2, 1):
    f_str = "f'{" + '}{'.join([f'x{i}' for i in range(num_vars)]) + "}'"
    # "f'{x0}{x1}'"
    concat = '+'.join([f'x{i}' for i in range(num_vars)])
    # 'x0+x1'
    pct_s = '"' + '%s'*num_vars + '" % (' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'
    # '"%s%s" % (x0,x1)'
    dot_format = '"' + '{}'*num_vars + '".format(' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'
    # '"{}{}".format(x0,x1)'
    dot_format2 = '"{' + '}{'.join([f'{i}' for i in range(num_vars)]) + '}".format(' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'
    # '"{0}{1}".format(x0,x1)'

    vars = ','.join([f'x{i}' for i in range(num_vars)])
    vals_str = tuple(map(str, range(num_vars))) if num_vars > 1 else '0'
    setup_str = f'{vars} = {vals_str}'
    # "x0,x1 = ('0', '1')"
    vals_int = tuple(range(num_vars)) if num_vars > 1 else 0
    setup_int = f'{vars} = {vals_int}'
    # 'x0,x1 = (0, 1)'

    times[num_vars] = {
        'f_str_str': timeit(f_str, setup_str),
        'f_str_int': timeit(f_str, setup_int),
        'concat_str': timeit(concat, setup_str),
        # 'concat_int': timeit(concat, setup_int), # this will be summation, not concat
        'pct_s_str': timeit(pct_s, setup_str),
        'pct_s_int': timeit(pct_s, setup_int),
        'dot_format_str': timeit(dot_format, setup_str),
        'dot_format_int': timeit(dot_format, setup_int),
        'dot_format2_str': timeit(dot_format2, setup_str),
        'dot_format2_int': timeit(dot_format2, setup_int),
    }

table = BeautifulTable()
table.column_headers = ['Type \ num_vars'] + list(map(str, times.keys()))
# Order is preserved, so I didn't worry much
for key in ('f_str_str', 'f_str_int', 'concat_str', 'pct_s_str', 'pct_s_int', 'dot_format_str', 'dot_format_int', 'dot_format2_str', 'dot_format2_int'):
    table.append_row([key] + [times[num_vars][key] for num_vars in (1, 2, 5, 10, 50, 250)])
print(table)

I couldn't go beyond num_vars=250 because of the max arguments (255) limit with timeit.

tl;dr - Python string formatting performance : f-strings are fastest and more elegant, but at times (due to some implementation restrictions & being Py3.6+ only), you might have to use other formatting options as necessary.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJean-Francois T.View Question on Stackoverflow
Solution 1 - Pythonshad0w_wa1k3rView Answer on Stackoverflow