Python Concurrency & Parallelism - Part I: Threading and Multi Processing

An introduction to threading and processing concepts and tools with examples in Python.

June 20, 2015 - 5 minute read -
python

Writing concurrent and parallel program is not easy, not even in Python. In the mordern computing world, there are many levels of concurrency & parallelism. From bottom to top they are co-routines, threads, processes and inter-node. This tutorial is about handling them in the Python programming language.

To Readers
This tutorial is first written in 20 Jun 2015. It has been altered for many times.
It is written for Python 2.7. But many concepts are also applicable for Python 3.

PartI: The tradition - Threading and Multi-Processing

The development of threads and processes dates back to ancient operating systems. Their births enable the concurrency of a computing machine, i.e. we can writing blogs and listening to music at the same time.

By definition, a process is an instance of a computer program. One process talks to another one via OS APIs. Usually, processes don’t share memory. A thread is usually created by a process to do some smaller taskes. They usually share the same memory with other threads in the same processes.

Nowadays we can say the fundamantal difference threads and processes is whether you are sharing memory. Do remember threads are the the small pieces of executing programs managed by a OS’s scheduler. So OS does context swtiching among threads, not processes. However, context switching between threads of the same process is faster than switching between threads from different processes.

Thread and Process in Python

In CS lectures, professors will usually tell that the term “multi-processing” is the same as “multi-processor”, which stand for computing machines with more than one processing cores. In Python, “multi-processing” have a different meaning. “Multi-threading” means your program has more then one thread and “multi-processing” means your program has more then one process.

In computing, a process is an instance of a computer program that is being executed. (from Wikipedia) Usually, if your program needs to run some existing programs, you will create sub processes of them, e.g. create a sub process of “ls” in Linux to get the list of files in a directory from its standard output. In Python, you will more often create more then one process of the same program. Why?

This is basically because Python (or CPython, to be more specific) has something called global interpreter lock.

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not thread-safe.

(from Official Python Wiki)

Thanks to GIL, threads in your Python programs can take up to one CPU core. One may argue that there are some (or a lot) benefits of GIL, but it’s really making our life harder.

There are several packages you might be interested in Python. They are

  • threading - Higher-level threading interface
  • multiprocessing - Process-based “threading” interface

When to use Python’s thread / process

Below are my suggestions on when to use threads or processes in Python.

Use threads from threading when:

  • Your code needs to do something concurrently, e.g. non-blocking user interfaces.
  • You don’t want one task to block the main thread, e.g. downloading a large file.
  • You don’t mean to use multiple CPU cores to accelerate computing of something.
  • You do not feel very happy to write code for handling sharing data between processes manually.

Use processes from multiprocessing when:

  • You want to utilize your computational intensive program by increase the number of CPU cores in use.
  • You are ready to handle sharing data between process.

Using threading.Thread

Create a thread

Create a thread is very simple. For example, the code below declares a class named “MyStupidThread”.

import threading

class MyStupidThread(threading.Thread):

    def run(self):
        for i in range(10):
            print "hi i am running..."

Then you can initialize an instance of this thread by dummyThread = MyStupidThread().

Then we can use the following methods to work with the thread instance.

dummyThread.start()  # Start running the thread.
dummyThread.join()   # Block the execution of current code until the dummy thread finishes.

Stop an executing thread

If you just want to use a separate thread to do some task, it will finish when the run method returns. If you want to create a thread that will do some background tasks until you ask it stop, you may will need to know how to stop a thread.

To implement such a thread, we will usually create an infinite loop in the run method. Thus, it will never return.

def run(self):
    while True:
        print "hi i am running..."

There are ways to force a thread to terminate. But I assume you don’t want this because it will probably break your task during execution.

To stop a thread gracefully, we need to figure out a way to tell the run method to stop. To do this, we can create a “stop” flag: when we set it, the thread should gracefully stop.

A flag can be a threading.Event object, but we can also simply use a class attribute. We modify the thread class above as an example.

import threading
import time

class MyStupidThread(threading.Thread):

    def __init__(self):
        super(MyStupidThread, self).__init__()
        self._stopped = False

    def run(self):
        while not self._stopped:
            print "hi i am running..."
            time.sleep(1)
        print "i am stopped"

    def stop(self):
        self._stopped = True

I put a time.sleep(1) to avoid getting too much input.

To test this code manually is very simple. I create an instance of it and stopped it using a keyboard input.

dummy = MyStupidThread()
dummy.start()
raw_input()  # Block current thread until a user input.
dummy.stop()

By introducing a flag, we guarantee that the thread will only stop after the loop finishes an iteration, and you can also write code to do clean up after the loop.

Using multiprocessing.Process

Be reminded, when another process of current program is created, all the modules will be reloaded. Please be careful of accidentally re-creating objects.

This section is to be written.