CMSC 14100 — Lecture 6

From last class, our prime checking code.


    n = 7
    is_prime = True
    for i in range(2,n):
        if n % i == 0:
            is_prime = False

And the observation that our code is really computing the equivalent of the following boolean expression:


    is_prime = (n % 2 != 0) and (n % 3 != 0) and  ... and (n % n-1 != 0)

In other words, we have an accumulator, which follows this pattern:


    <accumulator> = <initial value>
    for <item> in <list>:
        # combine accumulator with item
        <accumulator> = ... <accumulator> ... <item> ...

And so a more literal accumulator loop might look like this instead:


    n = 7
    is_prime = True
    for i in range(2,n):
        is_prime = is_prime and (n % i != 0)

When we look at it like this and remember our discussion about short-circuiting, we might be able to see a slight inefficiency with our program: it essentially evaluates every expression, even though we know we can stop once is_prime is set to False.

It is possible to accomplish this behaviour by using the break statement. The break statement will cause the program to immediately exit a loop, regardless of any other conditions on the loop.


    n = 7
    is_prime = True
    for i in range(2,n):
        if n % i == 0:
            is_prime = False
            break

In this case, we choose to exit the loop as soon as is_prime is set to False.

We can use this to demonstrate the fact that we can put a loop inside a loop (since it's a statement). In the following, we print whether each number from 2 to 30 is prime or not.


    for n in range(2,31):
        is_prime = True
        for i in range(2,n):
            if n % i == 0:
                is_prime = False
                break
        print(n, is_prime)

Observe that break will only exit the innermost loop.

Finally, here's one last pattern we can make use of. Suppose we have a list. We would like to take the items in the list and create a list based on these items. This is called mapping. Here's an example that takes a list of strings and makes a corresponding list of the strings' lengths.


    strings = ["cat", "", "pepper", "shoe", "paper"]
    lengths = []
    for string in strings:
        lengths.append(len(string))

Here, we make use of the append method. This method allows us to add items to an existing list. In the above example, we create a new empty list lengths for storing the lengths of the strings. Our loop takes each item and "maps" each item to its length, which gets added to the new list.

We can generalize this pattern.


    <new_list> = [] # initialize empty list
    for <item> in <list>:
        # do something with item
        <new_item> = ... <item> ...
        <new_list>.append(<new_item>)

If you look hard enough, this looks strangely like an accumulator. It turns out the accumulator pattern is pretty general! The interpretation here is that we are "accumulating" our mapped values into a single value: a new list.

Now, if we think about it, it might seem a bit strange to change a list in this way. Can we really take a list and add things to it? Does this really make sense? So let's talk about lists for a bit.

Lists

We mentioned lists very informally so far. It turns out that lists are actually a built-in data type in Python with all sorts of useful properties and tools. Lists are ordered sequences of values—they are a fundamental data type for working with more complex data that is made up of more than one piece. Lists can contain all sorts of items.


    grocery = ["garlic", "egg", "pita", "cereal", "yogurt", "carrot"]
    measurements = [1, 2, 2, 1, 2, 0, 1, -2, 1, -1]

The items of a list do not need to be of the same type.


    mixed_list = ["some data", 23, 4.3, True]

The empty list is the list containing no elements. This is also a list, denoted by [].

Lists can even contain other lists.


    list_list = [89, [], 233, ["true", "not true"]]

Indexing

Because lists are ordered, we can access particular elements based on their position in the list. This is done by referring to the name of the list, along with the position of the desired item starting from 0. So grocery[2] will evaluate to "pita".

However, we must be careful not to try to access a location in the list that doesn't exist. If we try to access grocery[30], Python will throw an error (an IndexError to be specific).

A very convenient feature of Python's lists is the expressivity in how we can access particular elements. For example, we typically think of list indices as nonnegative integers, based on their position. However, we can also use negative integers to refer to items from the end of the list.

So grocery[-1] will not produce an error, but it will evaluate to "carrot". One way to think about this is to think of the indices as being anchored relative to the start of the list. Then an index $i$ will reach the $0+i$th item.

Another very convenient feature is the ability to identify and access sublists, or slices of the list. For example, grocery[1:4] will evaluate to the list ["egg", "pita", "cereal"]. There is a lot of useful stuff you can do with slices, which is discussed in much more detail in the textbook.

Mutation

Unlike the types of data we've seen before, lists are mutable. This means that we can change the list. This idea doesn't quite make sense for the atomic types of data we've seen before (what does it mean to "change" 7 into -3?), but since lists are composed of multiple pieces of information, the ability to change those pieces makes some sense.

To do this, just as we refer to a position in the list to access the item located there, we can use this to assign a new value at that location.


    grocery[2] = "jam"

In addition to changing the value at a particular location in the list, we can extend the list with new items. We've already seen how to do this:


    grocery.append("tofu")

It is very important to note here that append does not return a list. Instead, it mutates an existing list. Since it does not produce a value, this function actually returns None—that is, it does not produce a value at all.

Lists and variables

This discussion about mutating lists should make you wonder about how lists are represented and stored. Because they are mutable and they are made of multiple pieces of data, we need to be careful about how we understand their representation.

Suppose we would like to make a copy of a list. Easy enough: we saw this happen when we created a new variable and assigned a the value of the old variable to it a few classes ago.


    lst1 = [1,2,3,4]
    lst2 = lst1
    lst2[0] = 100

If we examine this, we notice that both lst1 and lst2 give us [100, 2, 3, 4], even though we intended to update only lst2..

What is happening? Here, we peel back another layer of half-truths we've been overly simplifying. What is actually going on is that there's really one more layer than we've let on: in fact, variables always point to an address that contains a value.

The id function will tell us the address for any variable.


    x = 24
    y = 24
    id(x), id(y)

If you tried running the above, you'd notice a peculiar thing: both x and y have the same reference, even though we didn't assign y = x. But it's true: all of these values are hanging out somewhere in memory and these variables are just referring to them.

For the types we've seen so far, this distinction that variables are holding references rather than literal values isn't actually that important. This is because they are immutable. They can't change and if it seems like they do change, we're really just replacing them with a different value, and therefore a different reference.

But lists and other more complex types of data are mutable—they can and do change. This means that there's a significant difference between operations that create a new list and those that change an existing list.


    lst1 = [1,2,3]
    lst2 = lst1
    lst3 = [1,2,3]
    id(lst1), id(lst2), id(lst3)

Running the above code will show that lst1 and lst2 are really referring to the same list but lst3 is referring to a different one.

So if we call lst1.append(5), we will find an extended list whether we check lst1 or lst2. The append method mutates the list.

However, if we concatenate two lists, by using the list concatenation operator + to achieve a similar result, like in lst2 = lst2 + [6], we find that a new list gets created.

All of this means that if you're not careful, you can end up doing strange things. For example, suppose we would like to set up a list that contains five different lists that we will add things to.


    a = [[]] * 5
    a[1].append("thing")

If we then take a look at a, we will find something unexpected:

[["thing"], ["thing"], ["thing"], ["thing"], ["thing"]]

What happened here? Our first mistake was our interpretation of [[]] * 5. Here, we created an empty list inside a list. But the list actually holds a reference to this particular empty list. And the multiplication created copies of the reference. This means that this creates a list containing five copies of a reference to the same list.

This leads to an interesting question. What does equality mean in such a context? In math, two things are equal if they're the same, but this idea doesn't quite translate directly into a realm where things are able to change.

The way that Python handles this is by defining different types of equality. The first is whether two items are equivalent, in the sense that their values are the same. This is called value equality and is expressed with the equality operator we're familiar with, ==.

The second notion of equality is in the sense of whether the things are literally the same thing—that the two variables are pointing to the same thing. This is called reference equality and is expressed using the is operator.

Arithmetic of lists

Here, we'll introduce some list operations we performed above.

Just like strings, we have a notion of "addition" on lists, which is really just concatenation. For example,


    [7, 6, 5] + [3, 4, 6]

produces the list [7, 6, 5, 3, 4, 6].

Also like strings, we have a notion of "scalar multiplication", which is defined just to be repeated "addition" (i.e. repeated concatenation). So the product


    [3, 1, 9, 3] * 3

produces the list [3, 1, 9, 3, 3, 1, 9, 3, 3, 1, 9, 3].

These operations behave like typical arithmetic operators: they create new values rather than mutate them.

(Why do I call this the arithmetic of lists? These define basic operations like + and * in the same way as we did earlier (briefly) for strings. Maybe a more accurate term is an algebra for lists, in the same way we've defined a logical algebra with Boolean algebra. Recall that in the Boolean algebra, we can think of or as + and and as *.)