A Concise Guide to Concurrent Programming
Summer II 2010, David Matuszek

Fundamental Concepts

The purpose of doing parallel/concurrent programming is to take advantage of multiple cores (processors) in order to make programs run faster. That said, however,

The terms thread and process each refer to a single sequential flow of execution. Separate threads run in the same memory space, and interact by reading from and writing to shared memory locations. Separate processes each have their own memory, and interact by sending messages.

Parallel processing refers to two or more threads running simultaneously, on different cores (processors), in the same computer. It is essentially a hardware term.

Concurrent processing refers to two or more threads running asynchronously, on one or more cores, usually in the same computer. It is essentially a software term.

Asynchronous means that you cannot tell whether operation A in thread #1 happens before, during, or after operation B in thread #2. Asynchronous processes may be running simultaneously, on different cores, or they may be sharing time on the same core.

An asynchronous function call is a call that starts another function executing but does not wait for it to return. Instead, the calling thread continues processing, and some other mechanism is used for retrieving the function result later.

There are two primary approaches to concurrent programming:

Shared state is appropriate when the problem has fine granularity--that is, threads must work closely together and communicate often. Message passing is appropriate for problems with coarse granularity, that is, they can work independently for long periods of time.

Problems

Concurrency introduces a new set of possible errors:

Analysis

When analyzing the running time of a non-parallel, sequential process, we use the RAM (Random Access Memory) model, which assumes that access to any location in memory takes the same amount of time as access to any other location. This assumption is typically false, but is a very useful simplification--for sequential programs.

The PRAM (Parallel Random Access Memory) model is the obvious extension of the RAM model. It is, however, horribly inappropriate on many architectures.

More suitable for concurrent programs is the CTA (Candidate Type Architecture) model, which assumes:

Locality rule: Minimize the number of non-local references.

Multicore architectures vary greatly. There is not really any substitute for knowing the specific architecture and designing and tweaking programs accordingly. Nevertheless, the CTA model can provide a rough guide to the general quality of an algorithm.

Parallelism does not necessarily make a program run faster; it may even make it slower. Here's why:

Writing concurrent programs

One approach is to write and debug the program as a sequential program, then look for potential concurrency; this avoids having to debug sequential errors and concurrency errors at the same time. However, there are some known good concurrent algorithms, and there are operations (map, scan, reduce) that are especially useful in creating opportunities for concurrency.

The "90/10 rule" states that a program spends 90% of its time in 10% of its code. Humans are very bad at guessing where that 10% lies. Use a profiler (a program that samples your program at frequent intervals to determine where it is spending most of its time) so that you don't waste your efforts where they won't make any difference.

Intel's "Threading Methodology" is:

  1. Analysis. Determine where concurrency might be used and would be helpful.
  2. Design and implementation.
  3. Test for correctness. Be aware that "Testing can show the presence of errors, but never their absence." (Dijkstra)
  4. Tune for performance.

References

Principles of Parallel Programming, by Calvin Lin and Lawrence Snyder. Addison-Wesley, 2008.

The Art of Concurrency: A Thread Monkey's Guide to Writing Parallel Applications, by Clay Breshears. O'Reilly, 2009.

Java Concurrency in Practice, by Brian Goetz. Addison-Wesley, 2006.