Python data types are big in memory!

If you wanna write python app in the right way you must know how much costs any single data structure you are going to use in term of memory.
In my case I was surprised how 2GB of comma separated input strings as input was blown to 7GB by a simple string.split.
So once again, when you use python for map reduce jobs or other tasks, try to convert input strings to more complex data structures only at time you really need that data.. if you can!
Here below you can have a look at the size use by the common python data structures.
My hints in a nutshell: use strings as much as you can and try to use tuples (immutable data structures) over lists which reserve more space in memory of what they need.
Moreover please keep in mind that getsizeof function usually doesn’t return the size of the content (see list of dictionaries) but the size of the pure data structure and the pointers to the data vars in it

# String
>>> sys.getsizeof(“”)
37
>>> sys.getsizeof(“a”)
38

#Unicode
>>> sys.getsizeof(u”a”)
56
>>> sys.getsizeof(u””)
52

#bytearray
>>> sys.getsizeof(bytearray(“”))
48
>>> sys.getsizeof(bytearray(“a”))
50

# tuple
>>> sys.getsizeof(tuple(“”))
56
>>> sys.getsizeof(tuple(“a”))
64

#list
>>> sys.getsizeof(list(“”))
72
>>> sys.getsizeof(list(“a”))
104
>>> sys.getsizeof(list(“ab”))
112
>>> list(“ab”)
[‘a’, ‘b’]

#dictionary
>>> sys.getsizeof(dict())
280
>>> sys.getsizeof(dict({1:1}))
280
>>> sys.getsizeof(dict({“ciao”:1}))
280

Leave a Reply

Your email address will not be published. Required fields are marked *