There are 6 common techniques for concatenating strings in Python, five of which are applicable when variables are involved. Since I began learning Python a year and a half ago, I’ve heard numerous people say that you should not concatenate strings using + or += due to performance issues in Python, but instead, you should use string formatting. This came up on a list I was on today, so I figured it was time to do a bit of benchmarking. It turns out that for common uses, this is bad bad advice under Python 2.4.
String Concatenation Methods
Here’s an overview of the popular methods:
- add: a + b + c + d
- add equals: a += b; a += c; a += d
- format strings: ‘%s%s%s%s’ % (a, b, c, d)
- named format strings:‘%(a)s%(b)s%(c)s%(d)s’ % {‘a’: a, ‘b’: b, ‘c’: c, ‘d’: d}”
- join: ”.join([a,b,c,d])
Benchmarking Code
My goal was to run each of the above methods on strings of various length 1,000,000 times, to see which performed best. To do this, I used the timeit module included with Python.
#!/usr/bin/python # benchmark various string concatenation methods. Run each 5*1,000,000 times # and pick the best time out of the 5. Repeats for string lengths of # 4, 16, 64, 256, 1024, and 4096. Outputs in CSV format via stdout. import timeit tests = { 'add': "x = a + b + c + d", 'join': "x = ''.join([a,b,c,d])", 'addequals': "x = a; x += b; x += c; x += d", 'format': "x = '%s%s%s%s' % (a, b, c, d)", 'full_format': "x = '%(a)s%(b)s%(c)s%(d)s' % {'a': a, 'b': b, 'c': c, 'd': d}" } count = 1 for i in range(6): count = count * 4 init = "a = '%s'; b = '%s'; c = '%s'; d = '%s'" % \ ('a' * count, 'b' * count, 'c' * count, 'd' * count) for test in tests: t = timeit.Timer(tests[test], init) best = min(t.repeat(5, 1000000)) print "'%s',%s,%s" % (test, count, best)
The Results
Here is the raw data behind the graph, best score for a particular string length is in bold.
| type | 4 char | 16 char | 64 char | 256 char | 1024 char | 4096 char |
| add | 0.38 | 0.45 | 0.67 | 0.99 | 2.33 | 5.03 |
| add equals | 0.48 | 0.55 | 0.73 | 1.16 | 1.77 | 7.61 |
| format | 0.71 | 0.73 | 1.21 | 2.05 | 3.07 | 7.64 |
| join | 0.76 | 0.76 | 0.96 | 1.1 | 1.56 | 4.63 |
| named format | 1.35 | 1.35 | 2.06 | 2.77 | 3.82 | 9.82 |
In Summary
- Simple addition is the fastest string concatenation for small strings, followed by add equals.
- ”.join() is the fastest string concatenation for large strings.
- * named format is always the worst performer.
- * using string formatting for joins is equally as good as add equals for large strings, but for small strings it’s mediocre.
I learned something indeed. It makes a lot of sense if you think about it.
Add New Comment
Viewing 1 Comment
Thanks. Your comment is awaiting approval by a moderator.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Add New Comment
Trackbacks
(Trackback URL)