sprocket i/o

thomas stromberg on technology, nature, and motorcycles

sprocket i/o header image 2

String concatenation performance in Python

October 12th, 2007 · Comments

There are 6 common techniques for concatenating strings in Python, five of which are applicable when variables are involved. Since I began learning Python a year and a half ago, I’ve heard numerous people say that you should not concatenate strings using + or += due to performance issues in Python, but instead, you should use string formatting. This came up on a list I was on today, so I figured it was time to do a bit of benchmarking. It turns out that for common uses, this is bad bad advice under Python 2.4.

String Concatenation Methods

Here’s an overview of the popular methods:

  • add: a + b + c + d
  • add equals: a += b; a += c; a += d
  • format strings: ‘%s%s%s%s’ % (a, b, c, d)
  • named format strings:‘%(a)s%(b)s%(c)s%(d)s’ % {‘a’: a, ‘b’: b, ‘c’: c, ‘d’: d}”
  • join: ”.join([a,b,c,d])

Benchmarking Code

My goal was to run each of the above methods on strings of various length 1,000,000 times, to see which performed best. To do this, I used the timeit module included with Python.

#!/usr/bin/python
# benchmark various string concatenation methods. Run each 5*1,000,000 times
# and pick the best time out of the 5. Repeats for string lengths of
# 4, 16, 64, 256, 1024, and 4096. Outputs in CSV format via stdout.
import timeit
 
tests = {
  'add':  "x = a + b + c + d",
  'join': "x = ''.join([a,b,c,d])",
  'addequals': "x = a; x += b; x += c; x += d",
  'format': "x = '%s%s%s%s' % (a, b, c, d)",
  'full_format': "x = '%(a)s%(b)s%(c)s%(d)s' % {'a': a, 'b': b, 'c': c, 'd': d}"
}
 
count = 1
for i in range(6):
  count = count * 4
  init = "a = '%s'; b = '%s'; c = '%s'; d = '%s'" % \
         ('a' * count, 'b' * count, 'c' * count, 'd' * count)
 
  for test in tests:
    t = timeit.Timer(tests[test], init)
    best = min(t.repeat(5, 1000000))
    print "'%s',%s,%s" % (test, count, best)

The Results

Here is the raw data behind the graph, best score for a particular string length is in bold.

>

type 4 char 16 char 64 char 256 char 1024 char 4096 char
add 0.38 0.45 0.67 0.99 2.33 5.03
add equals 0.48 0.55 0.73 1.16 1.77 7.61
format 0.71 0.73 1.21 2.05 3.07 7.64
join 0.76 0.76 0.96 1.1 1.56 4.63
named format 1.35 1.35 2.06 2.77 3.82 9.82

In Summary

  • Simple addition is the fastest string concatenation for small strings, followed by add equals.
  • ”.join() is the fastest string concatenation for large strings.
  • * named format is always the worst performer.
  • * using string formatting for joins is equally as good as add equals for large strings, but for small strings it’s mediocre.

I learned something indeed. It makes a lot of sense if you think about it.

Tags: technology

Viewing 1 Comment

 

Trackbacks

(Trackback URL)

close Reblog this comment
blog comments powered by Disqus