All About Recursion (Python version)

Copyright (c) 1980, 1999, 2012 by David Matuszek

I. Recursive definitions
II. A Simple Recursive Procedure
III. The Principle of Information Hiding
IV. When Do You Use Recursion?
V. The Four Rules
VI. Another Simple Example

I. Recursive definitions

A recursive definition is a definition in which the thing being defined occurs as part of its own definition.

Sounds circular, doesn't it? Circular definitions, in computer science as elsewhere, are valueless and should be avoided. The distinction between a recursive and a circular definition lies in the use of the work "part": any recursive definition has a noncircular part, or basis, which is like an ordinary definition, and directly specifies some objects that fit the definition; and also a recursive (circular) part, which allows you to use the objects that fit the definition in determining whether new objects also fit the definition.

Some examples should clarify these ideas.

Here is an ordinary definition:

Vowel: one of the letters "a," "e," "i," "o," "u," or "y".

Clearly, there are exactly six things that fit the above definition of a vowel. Now consider the following circular definition:

Yowl: any yowl followed by a vowel.

By this definition, any yowl followed by a vowel is also a yowl; thus, if we could find one thing which is a yowl, then any things formed by adding vowels to that yowl would themselves be yowls. The problem is in finding that first yowl. Since we have no place to start, the definition is circular, and valueless.

Now consider the following definition:


  1. the letter "h," or
  2. any howl followed by a vowel.

This definition is recursive. The first part is an ordinary definition, and tells us that the letter "h" is a howl; this gives up a basis, or place to start. The second part is the recursive part; it tells us that, since "h" is a howl, so are "ha," "he," "hi," "ho," "hu," and "hy". But since these are howls too, so also must be "haa," "hae,"..., "hyy,".... "haaeeuooeea," etc.

Note that this is a "good" definition in the sense that some things are howls, some things are not howls, and it is easy to tell from the definition which are which. The "circularity" of the second part of the definition causes no harm, because the first part provides a basis.

Recursion abounds in computer science. A typical definition of "identifier" (or "variable name"), for example, is as follows.


  1. a letter, or
  2. an identifier followed by a letter, digit, or underscore.

Notice that the definitions of both "howl" and "identifier" allow arbitrarily long strings. It is possible to write recursive definitions which limit the size of the things being defined, but in general this is neither easy nor desirable. Instead, if there must be limits, they are set by some external agency. Some programming languages, for example, allow arbitrarily long identifiers, but require that two different identifiers must differ in the first k characters, where k is set by the implementation. In the same way, the maximum length of a howl might be determined by your lung capacity.

You have probably noticed that the above definitions don't have to be recursive; they could be made into ordinary definitions by using the phrase "any number of...". This is true--we don't need recursion. In fact, recursion is never absolutely necessary, merely useful.

The definitions we have considered so far have all been examples of direct recursion. For an example of indirect recursion, consider the following pair of definitions (adapted from Lisp):

S-expression: an identifier or a list.

List: a sequence consisting of

  1. a left parenthesis,
  2. zero or more S- expressions separated by blanks, and
  3. a right parenthesis.

Thus, the following things are lists (and also S-expressions):




(() IXOHOXI ())

These definitions are mutually recursive: each is defined in terms of the other. The basis for the S-expression is an identifier, while the basis for the list is the sequence ().

To show that these definitions "work," consider the S-expression (NOT (HARD)).

  1. (NOT (HARD)) is an S-expression because it is a list.
  2. (NOT (HARD)) is a list because it consists of a left parenthesis, the two S-expressions NOT and (HARD), and a right parenthesis.
  3. NOT is an S-expression because it is an identifier.
  4. (HARD) is an S-expression because it is a list.
  5. (HARD) is a list because it consists of a left parenthesis, the S- expression HARD, and a right parenthesis.
  6. HARD is an S-expression because it is an identifier.


Of course you don't have to go through this complete process every time you see an S-expression, but we were being very formal.

Now is a good time to put in a plug for the value of recursion. The definition of "list" given above may seem confusing at first (if you're not used to recursive definitions), but I challenge anyone to write a reasonable definition of "list" which is equivalent to the one given above, yet does not use any form of recursion.

II. A Simple Recursive Procedure

A recursive procedure (or function, or method, or subroutine) is a procedure which calls itself to to part of the work.

Again, the trick is in the use of the word "part". If it called itself to do all of the work, the result would be an infinite recursion (the recursive equivalent of an infinite loop). Just as a loop does some part of the work during each iteration of the loop, so must a recursive procedure do some part of the work at each level of recursion, until eventually all the work is done.

The usual first example of recursion is the factorial function. This is in many respects a poor example, but we will use it for the same reason everyone else does: it is simple.

The factorial of a positive integer n, written n!, is the product of all the positive integers from 1 up to and including n. Thus, we have

1! = 1

2! = 1 * 2 * 2

3! = 1 * 2 * 3 = 6

4! = 1 * 2 * 3 * 4 = 24


Note that, for example, 4! = 1 * 2 * 3 * 4 = (1 * 2 * 3) * 4 = 3! * 4, and in general n! = (n - 1)! * n. This leads to the following recursive definition:

The factorial of a positive integer n, written n!, is

(1) one, if n = 0 (basis), or

(2) (n - 1)! * n, if n > 0 (recursion).

This definition leads immediately to the following computer program:

def factorial(n):
    "Computes the factorial of its argument."
    if n == 0:
        return 1
        return factorial(n - 1) * n

This function has a basis (when n == 0) and a recursive part (when n is positive), and it does part of the work (multiplying by n) at each level. Thus it seems as though the function might possibly work. But if this is the first time you have seen recursion, you probably are not at all comfortable with it.

In fact, the program does work. The main thing wrong with it is that it's a stupid way to compute a factorial--a loop would be simpler and better. Have patience.

Many authors suggest that the best way to understand a recursive routine is to trace through it, keeping track of what happens at every level of the recursion. By all means work through such a trace, if doing so helps you believe recursion can actually work. But you should never think of tracing through a recursive procedure as a means of understanding, or debugging, such a procedure. "Tracing through" has probably kept thousands of people from ever really understanding recursion. The purpose of this paper is to describe a better technique.

III. The Principle of Information Hiding

Perhaps the single most important tool we have for controlling the complexity of programs is modularization: breaking a single complex program up into several simpler, logically independent computations. Depending on the programming language, these are called by various names: functions, methods, procedures, etc.

Whenever a program is broken into two parts, there comes into being an interface between them. This interface consists of the parameter list, any global or common variables they may share between, them, and often more subtle ways of information transmission. If modularization is to succeed in reducing complexity, this interface must be kept as narrow and as explicit as possible.

The best way to keep the interface narrow is to ensure that each routine does just one thing, and that that thing can be easily described in an English sentence with few or no "buts" and "excepts" in it. Keeping parameter lists short and avoiding global variables also helps.

The best way to make the interface explicit is to ensure, whenever possible, that all information transmission occurs via the parameter list; shun global variables. There are those who feel global variables should be avoided entirely, but I don't take such an extreme position. It is often more convenient to have global access to arrays and other large data structures; just take special care not to mess them up.

Side effects should also be avoided. The main effect of a routine is the thing it's supposed to do; side effects are the little extra things it does. Side effects are unhealthful even when they're intentional. For example, if a function factorial(n) computes n!, that's its main effect; but if it also stores the result in a global variable, that's a side effect. The bad thing about it is that it happens in the factorial function, but errors which result from this side effect will show up in any code which calls factorial. Since these routines may themselves be correct, the likely result is that the programmer spends hours trying to debug the wrong routines.

We are now in a position to state the Principle of Information Hiding: every routine should mind its own business. No routine should meddle in the affairs of another. Or, somewhat more formally,

Similar rules hold for classes and objects. An object holds various kinds of data; the class of the object holds methods to manipulate that data. An object's data should be accessed and modified only by the methods given in its class, never from outside the class.

Here are three simple examples of violations of the Principle of Information Hiding. In each case the function is supposed to return the minimum or maximum value in a list of integers. Can you spot the violations?

  1. def minimum(numbers):
        "Returns the smallest number in a list."
        return numbers[0]
  2. def minimum(numbers):
        "Returns the smallest number in a list."
        global i
        min = numbers[0]
        for i in range(1, len(numbers)):
            if numbers[i] < min:
                min = numbers[i]
        return min
  3. def maximum(numbers):
        "Returns the largest number in a list."
        max = 0
        for i in range(0, len(numbers)):
            if numbers[i] > max:
                max = numbers[i]
        return max


  1. Sorting the list is a rather drastic side effect.
  2. The function changes the global variable "i".
  3. This function should not know that the list contains any non-negative numbers.

The last of these (maximum) deserves a little more explanation. As a general-purpose function, it is an error to assume that the input is not a list of negative numbers. As part of a program where this never occurs, the function will work, but it knows (or assumes) too much about its context. At some future date, code may be added to the program which violates this assumption. The best solution is to fix the code; next best is to modify the comment to specify " a list of numbers, not all of which are negative."

These examples are trivial, and in real life are not likely to cause major problems. Things get worse when the rest of the program gets more interesting.

The following skeleton program controls a two-player game between the human and the computer, in which the players alternate moves. The Principle of Information Hiding has been largely respected in this example; indeed, it is impossible to deduce from the program just which game is being played.

def main():
    player = choose_who_goes_first()
    game_over = False
    while not game_over:
        if player == 'human':
            move = ask_human_for_move()
            (game_over, result) = make_move(player, move)
            player = 'computer'
        else: # player == 'computer'
            move = decide_move()
            (game_over, result) = make_move(player, move)
            player = 'human'
    print "Game over!"
    print result

def ask_human_for_move():
    move = raw_input('Enter your move: ')
    if legal_move(move):
        return move
        print "That's not legal. Try again."            
        return ask_human_for_move()

It is possible to understand this program and verify its correctness, given very simple assumptions about what the called routines do. For example, we would expect choose_who_goes_first() to somehow decide who plays first, and to set its parameter to "human" or "computer" accordingly. All information relevant to the operation of the main program is fully specified in the parameter lists.

I have not shown any of the declarations. There is, presumably, a structured variable "board" which represents the playing board; since there is only one of it, and since it is used by all routines, it seems to me to make more sense to make "board" a global variable rather than to put it in the parameter list of every routine.

Now consider an alternate version of the same program.

def main():
    global k
    while not game_over:
        if player == 'human':
            k = k - 1
        else: # player == 'computer'
            k = k + 1
    print "Game over!"
    print result

Due to the many violations of the Principle of Information Hiding, it is impossible to tell whether the program is correct or not, without extensive reference to the called routines.

It may seem at this point that we have drifted away from the topic of recursion, and perhaps we have, since the Principle of Information Hiding applies to all programs. As we shall see in section V, however, it has a special importance for recursive programs.

IV. When Do You Use Recursion?

It is an easily proved fact that you never need recursion; any recursive program can be changed into a nonrecursive one, for example by the use of stacks. (In the same way you can prove than any program could be written in absolute binary.)

The best answer to the question of when to use recursion is simply, when you happen to find it useful. You never just set out to "use recursion," any more than you just set out to "use a loop". You program; sometimes you loop, sometimes you use an array, sometimes you do input/output, and sometimes you recur.

But that's not a very useful answer when you're first getting started. We'll try to be more specific.

One fairly good rule of thumb is to use recursion when you're processing recursively defined data structures. If you try to evaluate an arithmetic expression, for example, parentheses may be used to enclose a "subexpression" which must be evaluated first, and is an expression in its own right. The only reasonable way to to this is to write a recursive expression evaluation routine; loops alone don't suffice.

Another (equivalent) rule is to handle nested structures with recursion. As an example, in many languages any statement may be nested in an IF statement, even another IF statement. There is a clear use for recursion in any processor (compiler, preprocessor, interpreter, debugger, pretty printer) for such a language.

Finally, programs which use a single stack can often be written more clearly as recursive programs which don't use a stack. (Programs which use two or more stacks, however, often cannot be rewritten as recursive programs without any stacks; but the proof of this fact is beyond the scope of the current paper.)

V. The Four Rules.

Here we propose four rules which, if understood and followed, will result in working recursive programs. Once you become comfortable with recursion you will realize that these rules are not absolute, and can be violated for good cause, if care is taken; but stick with them for now.

1. Handle the base cases first.

You already know that a recursive routine must have some base cases, cases that it can compute directly without need for recursion. For example, in the factorial program the only base case is n - 1, and the result returned is 1. The very first thing you should do upon entry to a recursive routine is to test whether you have a base case,

and if you do, to whatever needs to be done for that case and return. Don't mess around, and don't recur, just to what needs to be done and get out.

2. Recur only with a simpler case.

This rule is crucial for preventing circularity and infinite recursion. If you ever, even once, recur with the same (or harder) problem, your program immediately disappears off into Cloud- Cuckoo Land.

What, however, does it mean to be "simpler"? That depends on the problem. "Simpler" can mean: with a smaller number, with a smaller array, with a more nearly sorted array, with a shorter expression, with a larger number, a more nearly random array, or just about anything else.

Once you have established your base cases, those are the "simplest" cases, and "simpler" is anything that is in some clear way "closer" to one or more of those base cases. Sometimes it is obvious when we are getting closer, as in the factorial function: for n > 1, clearly n - 1 is closer to 1 than n is. Similarly when we evaluate expressions, it may be obvious that a shorter expression is a simpler one; or perhaps it is an expression with fewer parentheses, or fewer arithmetic operators. When you write a recursive function, you must be clear in your mind just when a problem is "closer" to the base case (hence "simpler"), and you must stick to it.

Suppose we try to speed up the factorial function by doing two multiplications at each level of recursion, rather than just one. We might get

def factorial(n):
    if n == 0:
        return 1
        return n * (n - 1) * factorial(n - 2)

This doesn't work. Before you read any further, try to figure our the error for yourself. Then figure out the simple fix.

The problem is that n - 2 isn't necessarily "closer" to 1 than n is. In particular, if n is even, so is n - 2, and soon we will overshoot 1 and try to compute the factorials of 0, -2, -4, -6, and so on. It is part of the notion of "closer" that (in a finite system), if we get closer enough times, sooner or later we will get there. (This can be corrected by changing the condition to if n == 0 or n == 1.)

So you to have to be careful that your notion of "closer" will sooner or later get you there.

If this all sounds vague and mysterious, don't panic. For any particular, concrete problem, it is 99% certain that it will be obvious what meaning to attach to the word "simpler". But do take twenty seconds, before you rush ahead with the program, to decide what the word means. Then each time you write a recursive call, make sure you recur with a simpler problem.

3. Don't interfere with the correct operation of the calling routine.

Consider the following sequence:

  1. Compute the value of x,
  2. call some routine,
  3. use x for something.

If the routine destroys the value of x, then clearly there is a problem. We return, finally, to the Principle of Information Hiding: unless it is the specific purpose of a routine to modify x for us, the routine has no business changing x, or even knowing that x exists. In short, the routine should not use x.

If the routine call is recursive--if the routine we are in calls itself--then the called routine must use the same variable names as the calling routine. After all, it's really the same routine both times. However, it doesn't necessarily use the same storage locations.

Local variables just don't cause any problems. However, if a recursive routine uses and modifies a global variable, then you can only understand and debug the routine by understanding how the calling routine manipulates the variable and also how the called routine manipulates the variable. This requires you to think about many levels of the recursive routine, all at the same time.

Briefly, you cannot avoid using the same names over again in a recursive routine, but if they are local, they are safe from whatever happens at other levels of the recursion.

There is one additional caution. If you use a data structure (such as a list, or set, or dictionary) as a parameter, you get a reference to the one actual data structure, and the values inside it can be manipulated in the usual way. Hence, the values internal to a data structure behave somewhat like global variables.

A recursive routine thus may access three types of variables: (1) local variables, which are declared inside the routine itself, and are totally safe; (2) parameters, which behave rather like local variables, and are safe so long as you are careful to avoid side effects; and (3) global variables, which are extremely unsafe--you may look at their current values, but if you change those values you are playing with fire.

Moral: Make sure all your variables are either local or are parameters. If you can't, make sure you never change the value of a global variable. And if you must alter global variables, think long and hard about what you're doing.

Fortunately, it is easy to get into the habit of avoiding globals. You don't have to think through the reasons each time. Just avoid them.

4. Don't look down.

That is, when you reach a recursive call, don't go back to the top of your routine and start working through it all over again. Don't try to "trace through" recursive calls. That way lies madness.

Consider this: if you are working your way through a complex but nonrecursive routine, and it calls, say a sorting routine, do you drop everything and rush off to see if the sorting routine works? Of course not. You simply assume that the sorting routine works (and if you have doubts you save them for later), and keep going in the routine you're checking. Rule 4 simply says to treat all calls this way, even recursive ones.

It may seem odd at first to assume the called routine works, when that called routine happens to be the very one you're checking. However, you must do so: the human mind does have limitations. Pretend, if it helps, that you are not actually recurring, but rather calling an entirely different routine (which happens to have the same name) written by someone you trust.

What is involved here is perhaps best described as an "act of faith." You have to learn to trust recursion; to believe that, if you can get the routine correct at this level, then you don't have to worry about any deeper levels. Until you can bring yourself to make this commitment, you will be compelled to look down into the recursion, and you will never untangle what you see there.

Think about rules one to three, and try to convince yourself logically that a program written according to these rules ought to work. Then, even if you can't yet manage rule four, pretend you do when you write your program.

Now run your program. It will have errors, of course--programs always do--but steadfastly refuse to look down. Go over rules one, two, and three, and make sure you didn't violate one of these. Next, go over the program the same way you always would, looking for errors that have nothing to do with the recursion itself (the error doesn't have to be in the recursion, remember). If this too fails, get help.

Don't even think about looking down, for this one simple reason: it doesn't help.

If the notion of "faith" bothers you, think about the novice programmer who complains that his program can't be wrong, so the computer must have made a mistake. You may have once done this yourself. In time, all programmers learn to trust the computer; now you must learn to trust recursion.

VI. Another Simple Example

This time we will devise a recursive Python program to find the maximum value in an array. Again, alas, it would be better to do this without recursion, but it is difficult to find simple examples that really need recursion, at least until we have learned about some recursively defined data structures.

We will use recursion to find the maximum value in an array. What is a simpler problem of the same sort? Obviously, finding the maximum of a smaller array. So we will plan to recur only with smaller arrays, and our base case will be the smallest array--say, an array of only one element.

Here is one approach: take a number from one end of the array; recursively find the maximum of the remaining array; compare the maximum to the one number taken, and the maximum is the larger of these. For a base case, the maximum of a single-element array is that element.

While this approach is workable, it isn't very interesting. Here's an approach I like better: divide the array (approximately) in half; recursively find the maximum of each half; return the larger of these two maxima as the grand maximum. Again, the base case is a one-element array, with that element as the maximum.

This logic translates easily into Python code.

def maximum(numbers):
    "Find the maximum value in the list 'numbers'."
    if len(numbers) == 1:
        return numbers[0]
    mid = len(numbers) / 2
    leftmax = maximum(numbers[0:mid])
    rightmax = maximum(numbers[mid:len(numbers)])
    if leftmax > rightmax:
        return leftmax
        return rightmax

Does this program satisfy the four rules?

  1. Does it handle the base case(s) first? Yes, if there is only one number in the list, that number is returned.
  2. Does it recur only with a simpler case? Yes, there are two recursions, each with a smaller array.
  3. Does it have side effects that interfere with the calling program? No, because every variable is global, and the numbers list is not changed.
  4. Do we need to "look down" into the recursion, or otherwise consider what is happening at some other level? No.

And we're done.