CMSC 14100 — Lecture 7

Let's examine the loop from our primality testing function more closely. We have


    result = True
    if n <= 1:
        result = False
    for d in range(2,n):
        if n % i == 0:
            result = False

Notice that this has the form of an accumulation: we initialize the accumulator result and our code is really trying to accomplish is computing a boolean expression:


    result = (n % 2 != 0) and (n % 3 != 0) and  ... and (n % n-1 != 0)

If we think of this in the same way as a sum, we could write our loop using the accumulator pattern we've seen before.


    result = True
    if n <= 1:
        result = False
    for d in range(2,n):
        result = result and (n % d != 0)

Here, we set result = True, since we can think of True as the "identity" for the boolean connective and. But if we consider the definition of a prime number again, we can be a bit more clever.

Recall that the usual definition of a prime number $n$ includes the condition that $n \geq 2$. And in fact, this is the first condition we check. So really, we want a number n that satisfies the boolean expression


    (n > 1) and (n % 2 != 0) and (n % 3 != 0) and  ... and (n % n-1 != 0)

I claim that the following function body will do exactly that:


    result = n > 1
    for d in range(2,n):
        result = result and (n % d != 0)

So why did we jump to writing result = False before? It's because we intuitively recognized that we have a conjunction of several propostions and that any one of them evaluating to False will make the entire result evalute to False.

When we look at it like this and remember our discussion about short-circuiting, we might be able to see a slight inefficiency in this loop: it evaluates every iteration, even though we know we can stop once result is set to False the very first time.

What we would like to do is to force execution of the loop to stop when we hit this condition for the first time.


    result = n > 1
    for d in range(2,n):
        if n % d == 0:
            result = False
            break

To do this, we use the break statement. The break statement will cause the program to immediately exit a loop, regardless of any other conditions on the loop.

If there is no other work to be done in the function, we can achieve the same effect by using return statements instead of storing the desired result.


def is_prime(n):
    """
    Determine whether n is a prime number. A number greater than 2
    is prime if its only divisors are 1 and itself.

    Input:
        n (int): positive integer to test

    Output (bool): True if n is prime, False otherwise
    """
    if n == 1:
        return False
    for d in range(2,n):
        if n % i == 0:
            return False
    return True

Since return will end execution of the function, this will have the same effect as break ending execution of the loop. Note that this would not be a good choice if there was more work left to do in the function.

One example of this is if we wanted to repeatedly test a range of numbers to see if they are prime. Let's think through the approach for this.

We want to know whether each number from 2 up to, say, 20 is prime. This suggests that we need to loop over range(2, 21). Then for each integer we test, we want to run our primality testing loop. Ordinarily, we should just use the function is_prime that we just wrote, but here, we'll pull out the loop to see that we can nest loops.

In the following, we print whether each number from 2 to 30 is prime or not.


>>> for n in range(2,21):
...     is_prime = True
...     for i in range(2,n):
...         if n % i == 0:
...             is_prime = False
...             break
...     print(n, is_prime)
2 True
3 True
4 False
5 True
6 False
7 True
8 False
9 False
10 False
11 True
12 False
13 True
14 False
15 False
16 False
17 True
18 False
19 True
20 False

Observe that break will only exit the innermost loop. When this happens, the rest of the body of the outer loop still gets executed.

More on lists

We mentioned lists very informally earlier. However, it turns out that lists are actually a built-in data type in Python with all sorts of useful properties and tools. Lists are ordered sequences of values—they are a fundamental data type for working with more complex data that is made up of more than one piece. Lists can contain all sorts of items.


    grocery = ["garlic", "egg", "pita", "cereal", "yogurt", "carrot"]
    scores = [1, -3, 4, 1, 2, 0, 1, -2, 5, -1]

The items of a list do not need to be of the same type.


    mixed_list = ["some data", 23, 4.3, True]

The empty list is the list containing no elements. This is also a list denoted by [].

Lists can even contain other lists.


    list_list = [89, [], 233, ["true", "not true"]]

Indexing

Lists are said to be a sequence, which means that they are ordered. Since the items have an ordering, we can access particular elements based on their position in the list. The position of the item in the list is said to be the index of the item in the list.

We can access the $i$th element in a list by referring to the name of the list, along with the position of the desired item starting from 0. So grocery[2] will evaluate to "pita".


    >>> grocery = ["garlic", "egg", "pita", "cereal", "yogurt", "carrot"]
    >>> grocery[2]
    'pita'

However, we must be careful not to try to access a location in the list that doesn't exist. If we try to access grocery[30], Python will throw an error.


    >>> grocery = ["garlic", "egg", "pita", "cereal", "yogurt", "carrot"]
    >>> grocery[30]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    IndexError: list index out of range

A very convenient feature of Python's lists is the expressivity in how we can access particular elements. For example, we typically think of list indices as nonnegative integers, based on their position. However, we can also use negative integers to refer to items from the end of the list.


    >>> grocery = ["garlic", "egg", "pita", "cereal", "yogurt", "carrot"]
    >>> grocery[-1]
    'carrot'

So grocery[-1] will not produce an error, but it will evaluate to "carrot". One way to think about this is to think of the indices as being anchored relative to the start of the list. Then an index $i$ will reach the $(0+i)$th item.

Another very convenient feature is the ability to identify and access sublists, or slices of the list. For example, grocery[1:4] will evaluate to the list ["egg", "pita", "cereal"]. There is a lot of useful stuff you can do with slices, which is discussed in much more detail in the textbook.


    >>> grocery = ["garlic", "egg", "pita", "cereal", "yogurt", "carrot"]
    >>> grocery[1:4]
    ['egg', 'pita', 'cereal']

Arithmetic on lists

Just like strings, there is a notion of "addition" on lists, which is really just concatenation.


    [7, 6, 5] + [3, 4, 6]

This makes sense in a way: both strings and lists can be thought of as sequences. While lists are sequences of arbitrary items, strings are sequences of "characters".

Also like strings, we have a notion of repeated addition. (One may consider it more similar to "power")


    [3, 1, 9, 3] * 3

Using these ideas, we can consider another kind of pattern. Suppose we have a list. We would like to take the items in the list and create a new list based on some transformation on these items. This idea is called mapping. Here's an example that takes a list of strings and makes a corresponding list of the strings' lengths.


    >>> strings = ["cat", "", "pepper", "shoe", "paper"]
    >>> lengths = []
    >>> for string in strings:
    ...    lengths = lengths + [len(string)]
    >>> lengths
    [3, 0, 6, 4, 5]

The idea behind mapping is that mathematically, we can think of this as mapping a list of items [a_1, a_2, ..., a_n] to a new list of items [b_1, b_2, ..., b_n]. Here, each a_i gets "mapped" to an item b_i. Mathematically, we can view this map as a function $a_i \mapsto b_i$.

We can see that the structure of the pattern looks very similar to the accumulator. In fact, one can think of it this way: our accumulator is the new list.


    <new_list> = [] # initialize empty list
    for <item> in <list>:
        # do something with item
        <new_item> = ... <item> ...
        <new_list> = <new_list> + [<new_item>]

However, there's a subtle problem with this approach, and with the list arithmetic we've been doing so far. Each time we use these operations, we are creating new lists.

This doesn't sound like a problem, but imagine that we have two very large lists that we want to manipulate. Whenever we create new lists, we make copies of the existing lists. So every time we want to "add" an item to a list in the way we've described, we're really creating an entirely new list with all the old items in it.

This seems a bit wasteful. Is there a better way?