CMSC 14100 — Lecture 23

Accumulation and tail recursion

I made a claim earlier that for loops were really just a specific expression of recursion. Let's see this with something we've seen a bajillion times: summing a list of numbers.

First, the iterative version that we were introduced to so many weeks ago.


def sum_nums1(lst):
    sum = 0
    for i in lst:
        sum = sum + i
    return sum

Now, the recursive version we'd write based on our discussion a few classes ago.


def sum_nums2(lst):
    if lst == []:
        return 0
    else:
        first, *rest = lst
        return first + sum_nums2(rest)

Suppose we use a list [2,4,3,4]. If we expand the recursion carefully, we get something that looks like: \begin{align*} \mathtt{sum\_nums2([2,4,3,4])} =& 2 + \mathtt{sum\_nums2([4,3,4])} \\ =& 2 + (4 + \mathtt{sum\_nums2([3,4])}) \\ =& 2 + (4 + (3 + \mathtt{sum\_nums2([4])})) \\ =& 2 + (4 + (3 + (4 + \mathtt{sum\_nums2([])}))) \\ =& 2 + (4 + (3 + (4 + 0))) \end{align*}

This differs from how we'd think about computing the sum with our loop, which would look more like \[((((0 + 2) + 4) + 3) + 4).\]

Notice that the terms are collected in a different order. Mathematically, this doesn't make a difference—so much so that we often ignore the parentheses. But to a computer, this does matter, because a computer can only do one thing at a time.

Here, the consequence is that the recursive version of the function delays evaluation until it gets to the base case. On the other hand, the loop evaluates the partial sum immediately and remembers this sum.

Is there a way to do something like this kind of partial evaluation with recursion? Yes—but we have to account for this explicit memory.


def sum_nums3(lst, sum_so_far):
    if lst == []:
        return sum_so_far
    else:
        first, *rest = lst
        return sum_nums3(rest, sum_so_far + first)

If we write out the recursion, we get \begin{align*} \mathtt{sum\_nums3([2,4,3,4], 0)} =& \mathtt{sum\_nums3([4,3,4], (0 + 2))} \\ =& \mathtt{sum\_nums3([3,4], ((0 + 2) + 4)} \\ =& \mathtt{sum\_nums3([4], (((0 + 2) + 4) + 3)} \\ =& \mathtt{sum\_nums3([], ((((0 + 2) + 4) + 3) + 4)} \\ =& ((((0 + 2) + 4) + 3) + 4) \\ \end{align*}

Here, I've left the expression unevaluated, so we can see how the terms are gathered, but the actual function would evaluate on each step: \begin{align*} \mathtt{sum\_nums3([2,4,3,4], 0)} =& \mathtt{sum\_nums3([4,3,4], 0+2)} \\ =& \mathtt{sum\_nums3([3,4], 2+4)} \\ =& \mathtt{sum\_nums3([4], 6+3)} \\ =& \mathtt{sum\_nums3([], 9+4)} \\ =& 13 \\ \end{align*}

Here, we see that sum_so_far plays the role as the accumulator variable sum in our loop in the iterative version. And this is really the secret: for loops are really this version of recursion:

We have an accumulator variable that collects partial results
We always take off the "next" element and use it with the partial result and store the partial result
We remember the accumulator for the next "iteration" (i.e. the next step of recursion)
We even have to initialize the accumulator. In other words, we need to start it off with the appropriate base case. In the above, we explicitly provided this, though we could set an appropriate default value instead, like
```
def sum_nums3(lst, sum_so_far = 0):
```
Then, we can simply run sum_nums3(lst).

Are there any advantages to this? Well, we can see that there's no delayed evaluation like with our original recursive version. Like the iterative version, we're computing the result as we go along. This comes at a slight cost of having to pass along some information, the accumulator. But passing around information is actually a useful trick, since it allows you to "remember" things as you evaluate the recursion.

Generative recursion

So far, we've been focusing on structural recursion. Functions use structural recursion when they follow the structural definition of the data they are applied to. This trick is what gives us those nice templates. Because the functions follow the definitions so closely, half the work is done for you by the time you have to put everything together.

However, there are some disadvantages to this. What if our problem doesn't follow the strucutre of the data? Or what if following the structure so closely makes our algorithm less efficient than we can make it?

We can ask this question for some problems we've seen already. If we're looking for an element in a list, do we really need to search the entire list in order for it? Or if we're sorting, do we really need to compare every element to every other element in the list?

As we just saw, structural recursion has a tight link with iteration with for loops—for loops are structural too! They very closely follow the structure of the data that we're iterating over. But we'll also encounter problems that require us to break out of the structure that for loops provide.

For looping, we know that we can use while loops. But what about recursion?

Recursion that doesn't follow the recursive definition of its input is sometimes called generative recursion because it's recursion that "generates" subproblems. Like while loops, using generative recursion is trickier and less straightforward than using structural recursion. Since we no longer have the recursive structure of the input to guide us, we have to rely solely on the recursive structure of the problem. There's a very general template for this.

Base case. The base case of the problem is the "easy" version of the problem. This is typically where there's an obvious choice because we have something like an empty list or an empty tree.
Recursive case. Here, we split up our input. This does not have to be according to a recursive definition (we're no longer following that anymore). Then we recursively solve our problem on the split up input—these are subproblems. Once we have solutions to our subproblems, we need to combine them into a solution for the big problem.

Let's see some examples of this.

Mergesort

Our first sorting algorithm, insertion sort, relied on structural recursion. As a result, we were bound to the structure of the list: we have to look at every element of the list, which could lead to comparing each element to every other element in the list. This is as many as $\binom n 2$ comparisons, if $n$ is the size of the list. This is a well known quantity: it's $\frac{n(n-1)}{2}$, which is quadratic, or $O(n^2)$.

What this tells us is that if we want to do something faster, we have to break out of the recursive structure of the list. How might we go about doing this?

One of the assumptions we had in the recursive case of insertion sort was that we had a sorted list after a recursive call and that we could easily find a spot for one element in this list.

But what if we generalize this: instead of having one sorted list, we have two? Then if there was an easy way to combine the two sorted lists, we could be in business.

The business of merging is pretty standard: it's just going through the list, comparing the first elements, and putting them in the right order. Here, I show the recursive version, but you can just as easily accomplish this with loops, since we're just going down the list.


def merge(lst1, lst2):
    """
    Combines two sorted lists into one sorted list.

    Input:
        lst1 (list[int]): first sorted list
        lst2 (list[int]): second sorted list

    Output: sorted list of all elements from lst1 and lst2 (list[int])
    """
    if lst1 == []:
        return lst2
    elif lst2 == []:
        return lst1
    else:
        first1, *rest1 = lst1
        first2, *rest2 = lst2
        if first1 < first2:
            return [first1] + merge(rest1, lst2)
        else:
            return [first2] + merge(lst1, rest2)

This idea leads us to a sorting algorithm first proposed by John von Neumann in 1945. If we have two sorted lists, we can merge them easily. But how do we turn our list into two sorted lists? By breaking them in half and recursively sorting them.


def mergesort(lst):
    """
    Sorts a list of integers.

    Input:
        lst (list[int]): list of integers

    Output: sorted list of elements from lst (list[int])
    """
    if 0 <= len(lst) <= 1:
        return lst
    else:
        mid = len(lst) // 2
        left = mergesort(lst[0:mid])
        right = mergesort(lst[mid:])
        return merge(left, right)

Notice that unlike the standard list recursion, we have two base cases: the empty list and the list containing one element. It's worth thinking about why this is necessary (hint: it has to do with our recursive case).

How is this any better than insertion sort? With insertion sort, we understand that we could possibly compare every pair of items in the list together. So we need a way to guarantee that we can avoid this.

Notice that we break our list in half every time before doing a recursive call. Intuitively, this breaking up of the search space in half each time makes it so that many of the comparisons between elements are eliminated. There's a more formal analysis that you'll see if you take Theory of Algorithms, but this leads us to $O(n \log n)$ time.

In other words, insertion sort requires us to decompose the problem along the structure of the list, while mergesort breaks us out of that and skips some comparisons.