CMSC 14100 — Lecture 8

Mutation

Unlike the types of data we've seen before, lists are said to be mutable. This means that we can change the list. This idea doesn't quite make sense for the atomic types of data we've seen before (what does it mean to "change" 7 into -3?), but since lists are composed of multiple pieces of information, the ability to change those pieces makes some sense.

To do this, just as we refer to a position in the list to access the item located there, we can use this to assign a new value at that location.


    >>> grocery = ["garlic", "egg", "pita", "cereal", "yogurt", "carrot"]
    >>> grocery[2] = "jam"
    >>> grocery
    ["garlic", "egg", "jam", "cereal", "yogurt", "carrot"]
    

In addition to changing the value at a particular location in the list, we can extend the list with new items, without creating an entirely new list. This is done by using the append method.


    >>> grocery.append("coffee")
    >>> grocery
    ["garlic", "egg", "jam", "cereal", "yogurt", "carrot", "coffee"]
    

This method will add the provided item to the end of the list. We can use this to build a list more efficiently. Modifying our mapping pattern from earlier, we have the following code.


    >>> strings = ["cat", "", "pepper", "shoe", "paper"]
    >>> lengths = []
    >>> for string in strings:
    ...    lengths.append(len(string))
    >>> lengths
    [3, 0, 6, 4, 5]
    

So our generalized pattern now looks like:


    <new_list> = [] # initialize empty list
    for <item> in <list>:
        # do something with item
        <new_item> = ... <item> ...
        <new_list>.append(<new_item>)
    

It is very important to note here that append does not return a list. Instead, it mutates an existing list. Since it does not produce a value, this function actually returns None—that is, it does not produce a value at all.


    >>> new_item = grocery.append("cucumber")
    >>> grocery
    ["garlic", "egg", "jam", "cereal", "yogurt", "carrot", "coffee", "cucumber"]
    >>> new_item
    

Lists and variables

This discussion about mutating lists should make you wonder about how lists are represented and stored. Because they are mutable and they are made of multiple pieces of data, we need to be careful about how we understand their representation.

Suppose we would like to make a copy of a list. Easy enough: we saw this happen when we created a new variable and assigned a the value of the old variable to it a few classes ago.


    >>> lst1 = [1, 2, 3, 4]
    >>> lst2 = lst1
    >>> lst2
    [1, 2, 3, 4]
    >>> lst2[0] = 100
    >>> lst2
    [100, 2, 3, 4]
    >>> lst1
    [100, 2, 3, 4]
    

What is happening? Here, we peel back another layer of half-truths we've been overly simplifying. What is actually going on is that there's really one more layer than we've let on: in fact, variables always point to an address that contains a value.

The id function will tell us the address for any variable.


    >>> id(lst1)
    4387473792
    >>> id(lst2)
    4387473792
    

Note that the exact address will be different each time you run a program because the program will be using memory in different places.

Now, let's consider something interesting.


    >>> x = 24
    >>> y = 24
    >>> id(x)
    4414466112
    >>> id(y)
    4414466112
    

Wait a second—both x and y have the same address, even though we didn't assign y = x! This suggests something that might be a bit surprising: while we originally thought of storing a value at a place referred to by the variable, in fact it's the other way around. All of these values are just hanging out somewhere in memory and the variables are just referring to them.

For the types we've seen so far, this distinction isn't actually that important. This is because they are immutable. They can't change and if it seems like they do change, we're really just replacing them with a different value, and therefore a different reference.

But lists and other more complex types of data are mutable—they can and do change. This means that there's a significant difference between operations that create a new list and those that change an existing list.


    >>> lst1 = [1,2,3]
    >>> lst2 = lst1
    >>> lst3 = [1,2,3]
    >>> id(lst1)
    4389995712
    >>> id(lst2)
    4389995712
    >>> id(lst3)
    4387473792
    

Here, we see that lst1 and lst2 are really referring to the same list but lst3 is referring to a different one. So if we call lst1.append(5), we will find an extended list whether we check lst1 or lst2. The append method mutates the list.

However, if we concatenate two lists, by using the list concatenation operator + to achieve a similar result, like in lst2 = lst2 + [6], we find that a new list gets created.


    >>> lst1 = [1, 2, 3, 4]
    >>> lst2 = lst1
    >>> lst1.append(5)
    >>> lst1
    [1, 2, 3, 4, 5]
    >>> lst2
    [1, 2, 3, 4, 5]
    >>> lst2 = lst2 + [6]
    >>> lst1
    [1, 2, 3, 4, 5]
    >>> lst2
    [1, 2, 3, 4, 5, 6]
    >>> id(lst1)
    4387933632
    >>> id(lst2)
    4389995840
    

All of this means that if you're not careful, you can end up doing strange things. For example, suppose we would like to set up a list that contains five different lists that we will add things to.


    >>> lst = [[]] * 5
    >>> lst
    [[], [], [], [], []]
    >>> lst[1].append("thing")
    >>> lst
    [['thing'], ['thing'], ['thing'], ['thing'], ['thing']]
    

What happened here? Our first mistake was our interpretation of [[]] * 5. Here, we created an empty list inside a list. But the list actually holds a reference to this particular empty list. And the multiplication created copies of the reference. This means that this creates a list containing five copies of a reference to the same list.


    >>> id(lst[0])
    4388499840
    >>> id(lst[1])
    4388499840
    

This leads to an interesting question. What does equality mean in such a context? In math, two things are equal if they're the same, but this idea doesn't quite translate directly into a realm where things are able to change.

The way that Python handles this is by defining different types of equality. The first is whether two items are equivalent, in the sense that their values are the same. This is called value equality and is expressed with the equality operator we're familiar with, ==.

The second notion of equality is in the sense of whether the things are literally the same thing—that the two variables are pointing to the same thing. This is called reference equality and is expressed using the is operator.

Iteration on lists

Because lists can be of arbitrary length, we often need to iterate over the elements of a list to work with it. We saw how to do this with for loops.


    for item in lst:
        print(item)
    

A common temptation is to iterate over the indices of the list. For example, suppose you wanted the index for something other than accessing a particular element. A classical way to do this that is inherited from other programming languages is loop on the integers from 0 to the length of the list.


    for i in range(len(lst)):
        print(i, lst[i])
    

This is generally discouraged in Python, especially if all you're doing is accessing each element. However, if you need to work specifically with the index, the preferred method is to use the function enumerate. This function enumerates the items of a sequence and provides both the item and the index.


    for i, item in enumerate(lst):
        print(i, item)
    

It's worth discussing what exactly enumerate does. Simply put, this function "numbers off" each item in the list. However, in Python, we will see that there are more types like lists that contain multiple values. We can enumerate, or number items off, with those too, even though they may not have a natural order to them.

But since lists are ordered, enumerate is guaranteed to enumerate the list in the order that the items are positioned in the list, and so the numbering happens to align with the index of the items in the list.

However, looking at this, you might wonder why the for loop with enumerate has two loop variables instead of one. That's a bit strange. The answer is that enumerate produces tuples.