There are 6 common techniques for concatenating strings in Python, five of which are applicable when variables are involved. Since I began learning Python a year and a half ago, I’ve heard numerous people say that you should not concatenate strings using + or += due to performance issues in Python, but instead, you should use string formatting. This came up on a list I was on today, so I figured it was time to do a bit of benchmarking. It turns out that for common uses, this is bad bad advice under Python 2.4.
String Concatenation Methods
Here’s an overview of the popular methods:
- add: a + b + c + d
- add equals: a += b; a += c; a += d
- format strings: ‘%s%s%s%s’ % (a, b, c, d)
- named format strings:‘%(a)s%(b)s%(c)s%(d)s’ % {‘a’: a, ‘b’: b, ‘c’: c, ‘d’: d}”
- join: ”.join([a,b,c,d])
Benchmarking Code
My goal was to run each of the above methods on strings of various length 1,000,000 times, to see which performed best. To do this, I used the timeit module included with Python.
#!/usr/bin/python # benchmark various string concatenation methods. Run each 5*1,000,000 times # and pick the best time out of the 5. Repeats for string lengths of # 4, 16, 64, 256, 1024, and 4096. Outputs in CSV format via stdout. import timeit tests = { 'add': "x = a + b + c + d", 'join': "x = ''.join([a,b,c,d])", 'addequals': "x = a; x += b; x += c; x += d", 'format': "x = '%s%s%s%s' % (a, b, c, d)", 'full_format': "x = '%(a)s%(b)s%(c)s%(d)s' % {'a': a, 'b': b, 'c': c, 'd': d}" } count = 1 for i in range(6): count = count * 4 init = "a = '%s'; b = '%s'; c = '%s'; d = '%s'" % \ ('a' * count, 'b' * count, 'c' * count, 'd' * count) for test in tests: t = timeit.Timer(tests[test], init) best = min(t.repeat(5, 1000000)) print "'%s',%s,%s" % (test, count, best)
The Results
Here is the raw data behind the graph, best score for a particular string length is in bold.
| type | 4 char | 16 char | 64 char | 256 char | 1024 char | 4096 char |
| add | 0.38 | 0.45 | 0.67 | 0.99 | 2.33 | 5.03 |
| add equals | 0.48 | 0.55 | 0.73 | 1.16 | 1.77 | 7.61 |
| format | 0.71 | 0.73 | 1.21 | 2.05 | 3.07 | 7.64 |
| join | 0.76 | 0.76 | 0.96 | 1.1 | 1.56 | 4.63 |
| named format | 1.35 | 1.35 | 2.06 | 2.77 | 3.82 | 9.82 |
In Summary
- Simple addition is the fastest string concatenation for small strings, followed by add equals.
- ”.join() is the fastest string concatenation for large strings.
- * named format is always the worst performer.
- * using string formatting for joins is equally as good as add equals for large strings, but for small strings it’s mediocre.
I learned something indeed. It makes a lot of sense if you think about it.
Add New Comment
Viewing 3 Comments
Thanks. Your comment is awaiting approval by a moderator.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Add New Comment
Trackbacks
(Trackback URL)