Modern computers are good at multitasking. Quad-core devices like the Pi 2 and 3 can share work between their cores, and your programs can run up to four times faster if you know how to code them in parallel. However, this is not always easy.
The full article can be found in The MagPi 57 and was written by James Hobro.
Programming languages like Python are sequential, executing instructions one at a time. What if you want to use all four cores? Luckily, there is help from the multiprocessing module, which allows parts of your program to run in parallel. To demonstrate how it works, we will adapt a program so that its central part runs in parallel, creating some beautiful fractal images in the process. First, you may need to install the matplotlib module, using:
sudo apt-get update sudo apt-get install python-matplotlib
Embarrassingly parallel fractals
The program mandelbrot.py plots an image of the Mandelbrot set, which has an infinitely complex shape based on a surprisingly simple mathematical rule. At its core, it performs a simple but potentially expensive calculation for every pixel. These calculations are all independent, so they can be run in any order or in parallel (this is called an ‘embarrassingly parallel’ problem). Take a look at the program. The work is done by a pair of nested for loops (labelled ‘main loops’) which call the function mandelbrot(), defined a few lines earlier, for each pixel in the image. This is the time-consuming part. As with all basic loops in Python, the calculations are performed sequentially, or one at a time. Try running the program from the command line (unfortunately, multi-process programs cannot be launched from IDLE):
It should generate a Mandelbrot set image like the one on the left. You can resize it and use the controls on the bottom left of the window to zoom and examine it in detail. This program uses only one processing core and runs in about 40 seconds on a Pi 3. We would like to run the main loops in parallel. This is relatively simple, thanks to multiprocessing.
Before we can parallelise the loops, we need to make a small change. Replace the main loops with:
Z = [complex(x,y) for y in Y for x in X] N = map(mandelbrot,Z)
Now run the program again. It should take a similar amount of time to run. What has changed? Instead of calling the mandelbrot() function many times within the nested loops, we now build up a list of arguments for these calls in advance (Z) and use the built-in map() command to call the function on each entry in the list, placing the results in a new list (N). This does exactly the same thing as before, but our code is now in a form that we can parallelise more easily. Add the following line at the top of the program:
from multiprocessing import Pool
and change the main loop section to:
p = Pool() Z = [complex(x,y) for y in Y for x in X] N = p.map(mandelbrot,Z)
This is where multiprocessing works its magic. It creates a multi-process pool (p) and uses it to call a special version of the map() command. It makes all the same calls to mandelbrot() as before, but this time the work is split up and distributed in parallel using the pool. The number of processes in the pool matches the number of cores you have available (four in the Pi 2 and 3) by default. The results are collected back together into a single list (N) at the end. This program should run about twice as fast as before. If you run the top command in another Terminal window while the program is running, you will see the four extra Python processes that make up the pool. We have doubled the speed with a very simple change to our code.
Bunch them up
Can we do better? It turns out that the calls to mandelbrot() are too brief to run efficiently in parallel across the pool. For better performance we must bunch them together in groups. We can do this easily by returning to our two nested loops. For the inner loop, which produces the image along a single line in the x direction, define this function just below the definition of mandelbrot():
def compute_all_x(y): Z = [complex(x,y) for x in X] return map(mandelbrot,Z)
This runs the inner loop sequentially, not in parallel. Now, for our main loop, we call this function in parallel, spreading the calls across the pool:
p = Pool() N = p.map(compute_all_x,Y)
This maps the work in larger chunks (whole inner loops rather than individual pixels). This program should now run roughly four times as fast as the original.
With some simple changes and the use of multiprocessing, we have distributed our time-consuming loop over four cores instead of one. Our code is still simple and readable, although slightly more complex than before. For advanced programmers, there are much faster methods of plotting fractal images. The graphics processor on the Pi can outperform its CPU, as demonstrated by the Mandelbrot set program in the Rapberry Pi documentation. This program, written in C and OpenGL, is much faster than ours, but it is also far more complicated and nearly 20 times as long.
from numpy import linspace, reshape from matplotlib import pyplot xmin, xmax = -2.0 ,0.5 # x range ymin, ymax = -1.25,1.25 # y range nx , ny = 1000,1000 # resolution maxiter = 50 # max iterations def mandelbrot(z): # computation for one pixel c = z for n in range(maxiter): if abs(z)>2: return n # divergence test z = z*z + c return maxiter X = linspace(xmin,xmax,nx) # lists of x and y Y = linspace(ymin,ymax,ny) # pixel co-ordinates # main loops N =  for y in Y: for x in X: z = complex(x,y) N += [mandelbrot(z)] N = reshape(N, (nx,ny)) # change to rectangular array pyplot.imshow(N) # plot the image pyplot.show()