IPython Notebook on AWS EC2

1. install Anaconda3 – it is much easier to install this Python distro straight away.

wget [link to the latest version of anaconda]
bash Anaconda-*.sh

2. generate password in IPython

from IPython.lib import passed
''' type password here and save the generated code '''

3.generate notebook profile

ipython profile create nbserver

4.generate openssl cert

condo update openssl

openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mycert.pem -out mycert.pem

5.edit ipython_notebook_config.py as follows.

# Configuration file for ipython-notebook.
c = get_config()
# Kernel config
c.IPKernelApp.pylab = 'inline'
# Notebook config
c.NotebookApp.certfile = u'/PATH_TO_CERT/mycert.pem'
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.password = [GENERATED_CODE]
c.NotebookApp.port = 8888

6. start IPython notebook server

ipython notebook --profile=nbserver

Visit https://ec2-IP.compute-1.amazonaws.com:8888 to use.

Notes: High Performance Python

Profiling Toolbox:
Print the duration of computation using time.time(). A wrap up annotation can produce simple and elegant code.

''' copied from the book
from functools import wraps
def timefn(fn): 
    def measure_time(*args, **kwargs): t1 = time.time()
        result = fn(*args, **kwargs) t2 = time.time()
        print ("@timefn:" + fn.func_name + " took " + str(t2 - t1) + " seconds")
        return result return measure_time
def calculate(para1, para2, para3):

Use unix time command, make sure to use /usr/bin/time directly
cProfile to profile the whole python module: python -m cProfile -s a.py
Use line_profiler with @profile annotation.
Use memory_profiler with @profile annotation.
Print hp.heap() — need to install guppy
Dowser — live performance monitoring.
dis module can help to inspect CPython bytecode. dist.cist(module.func)
Perf — Linux tool to inspect paging, cache-miss, cpu usage and a lot MORE!

List v.s. Tuple
Use Tuple for immutable list. They both stores references, hence both can store a list of objects of different types.

Dict and Sets
The hashing implementation in Python uses open address. Usually last K bits of a value is used for key evaluation. If collides, another p bits are used to evaluate the offset based on which the next bucket position is selected.

Resizing happens when insertion instead of deletion. 2/3 full is optimal. On resize, the number of buckets increases by 4x until 50,000, after which by 2x. It can reduce size as well when necessary.

Namespacing: global look up > local look up > local assigned variable.
math.sin > sin [import sin from math] > a = math.sin

Iterators and Generators

Iterators are very useful as lazy evaluation is applied here. This is an advanced topic with much more details to consider.

Matrix and Vector Computation
System call (paging, IO, etc) is slow. New memory allocation is slow (In place operations are fast.). Cache-miss causes slow execution. Memory fragmentation causes slow execution. Branching is slow (fail to predict correctly for if/else while loading data to cache).

Bring a chunk of useful data to cache and memory is important. This requires to use appropriate data structure, e.g. numpy array, vector and matrix, that group useful data together. In contrast, normal Python list lists only references with the actual data distributed all over the place.

Less CPU commands often means less execution time. Use ‘Perf’ Linux tool here to gain deep understand of the program.

Configure Jenkins for a Basic Python Project

Install and set up Jenkins following the official document.

Plugins: Install Jenkins Covertura Plugin + GIT Plugin + Jenkins Violations.

Step 1: Create a new Item “py_analysis” and select “Build a free-style software project”

Step 2: Version Control. I put Git here, with repo URL and credential. I added “SSH username with private key” here. The category of credential may vary depending on users’ situation.

Step 3: Add build step – Execute shell


# Delete previously built virtualenv
#if [ -d $PYENV_HOME ]; then
#    rm -rf $PYENV_HOME

# Create virtualenv and install necessary packages
virtualenv $PYENV_HOME
. $PYENV_HOME/bin/activate
pip install --quiet <LIBS I LOVE>
pip install --quiet pylint
pip install --quiet $WORKSPACE/project/  # where your setup.py lives
#nosetests --with-xcoverage --with-xunit --cover-package=model --cover-erase
cd $WORKSPACE/project/testPackage 
nosetests --with-xunit --cover-package=model --cover-erase -a '!slow'
pylint -f parseable $WORKSPACE/project/package | tee pylint.out

Step 4: Post-build Actions
Publish Covertura Coverage Report – Put Covertura xml report pattern as “**/coverage.xml”

Publish Unit Test Report – Put Covertura xml report pattern as “**/nosetest.xml”

Step 5: Save and build.

reference: http://bhfsteve.blogspot.ie/2012/04/automated-python-unit-testing-code_27.html

Python Virtualenv on OSX

This subject looks easy and simple. However I will introduce the work flow in this post if it means something.

Part I: Use Port to install Python on OSX

sudo port install python26 python_select
sudo port select --set python python26


sudo port install python30 python_select
sudo port select --set python python30

(Do sudo apt-get install python-setuptools python-dev build-essential on Linux)

sudo port install -U pip

Part II: Create virtualenv

sudo easy_install -U pip
sudo easy_install -U virtualenv
virtualenv myenv
$ virtualenv –-python=/usr/bin/python2.7 myenv-py27
$ source /tmp/myenv/bin/activate
$ pip install yolk
$ source /tmp/myenv/bin/activate
$ yolk -l
$ deactivate
$ source /tmp/myenv/bin/activate
$ pip freeze > /tmp/requirements.txt
$ source /tmp/myenv/bin/activate
$ pip uninstall Django

A very useful reference.