Notes: High Performance Python

Profiling Toolbox:
Print the duration of computation using time.time(). A wrap up annotation can produce simple and elegant code.

''' copied from the book
from functools import wraps
def timefn(fn): 
    def measure_time(*args, **kwargs): t1 = time.time()
        result = fn(*args, **kwargs) t2 = time.time()
        print ("@timefn:" + fn.func_name + " took " + str(t2 - t1) + " seconds")
        return result return measure_time
def calculate(para1, para2, para3):

Use unix time command, make sure to use /usr/bin/time directly
cProfile to profile the whole python module: python -m cProfile -s
Use line_profiler with @profile annotation.
Use memory_profiler with @profile annotation.
Print hp.heap() — need to install guppy
Dowser — live performance monitoring.
dis module can help to inspect CPython bytecode. dist.cist(module.func)
Perf — Linux tool to inspect paging, cache-miss, cpu usage and a lot MORE!

List v.s. Tuple
Use Tuple for immutable list. They both stores references, hence both can store a list of objects of different types.

Dict and Sets
The hashing implementation in Python uses open address. Usually last K bits of a value is used for key evaluation. If collides, another p bits are used to evaluate the offset based on which the next bucket position is selected.

Resizing happens when insertion instead of deletion. 2/3 full is optimal. On resize, the number of buckets increases by 4x until 50,000, after which by 2x. It can reduce size as well when necessary.

Namespacing: global look up > local look up > local assigned variable.
math.sin > sin [import sin from math] > a = math.sin

Iterators and Generators

Iterators are very useful as lazy evaluation is applied here. This is an advanced topic with much more details to consider.

Matrix and Vector Computation
System call (paging, IO, etc) is slow. New memory allocation is slow (In place operations are fast.). Cache-miss causes slow execution. Memory fragmentation causes slow execution. Branching is slow (fail to predict correctly for if/else while loading data to cache).

Bring a chunk of useful data to cache and memory is important. This requires to use appropriate data structure, e.g. numpy array, vector and matrix, that group useful data together. In contrast, normal Python list lists only references with the actual data distributed all over the place.

Less CPU commands often means less execution time. Use ‘Perf’ Linux tool here to gain deep understand of the program.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s