Looping statements allow us to repeat execution of parts of our program based on some condition. Most programming languages have two main flavours of loops, for and while loops, and this is the case for Python. But why might we want to repeat execution of code?
Up to now, we've worked with largely fixed-size data. For example, we can think of an integer, a float, a string, or a boolean as one "piece" of data. In an arbitrary function, we may be given a few pieces of data to work with, but this is always fixed, say, by the number of parameters.
Suppose we want to do something simple like sum a few values. This is something we can do quite easily. But if we are asked to sum more than a few values, such an expression can get repetitive. For example,
>>> a1 = 2
>>> a2 = 4
>>> a3 = 3
>>> a4 = 4
>>> a5 = 3
>>> a6 = 1
>>> a7 = 5
>>> a1 + a2 + a3 + a4 + a5 + a6 + a7
22
If we had a way to repeat the action of addition, then we wouldn't have to write the final expression explicitly. But this is not the real reason we care about repetition. Right now this is just a nice convenience rather than something we fundamentally need to express.
Now, notice that in order to sum up our numbers, we need to know how many values we are adding. This is something we know if we deal with a fixed number of items. But one of the powerful ideas of computer science is the ability to work with arbitrarily large data. By arbitrarily large, we mean data that can vary in size and has no guarantees on how "big" it can be. This is in contrast to what we've been dealing with before, where we know exactly how much and how big our data is.
So if we want to sum up an arbitrary number of values, then we can't write an expression in the way that we did before, because we don't know how exactly how many values we'll have. But what does arbitrarily large data look like?
The first example of arbitrarily large data we'll see is the list. A list is a data structure that contains multiple pieces of data in sequence. Lists are denoted by square brackets [ and ], with each item separated by a comma:
[2,4,3,4,3,1,5]
Just like other values, lists are their own distinct type, which dictates what kinds of actions we can take with them.
>>> type([2,4,3,4,3,1,5])
list
Lists can be arbitrarily large. We do not have to specify a size when creating a list. In other words, if we are given an arbitrary list of items, we don't have any guarantee about how many items are in the list. It can contain five or five hundred items.
What this means is that if we are given a list, we need to be able to handle it in the same way, no matter its size. This means we need a way to execute statements based on the items in the sequence.
for loopsfor loops are based on looping over items in a given collection of items, executing the same block of statements for each item. The form of a for loop is
for <loop_variable> in <collection>:
<statements>
Just like conditional statements, the statements to be executed in the loop need to be appropriately indented.
One can read this as "for each item in the given sequence, do the following". Here's a small example.
>>> nums = [2,4,3,4,3,1,5]
>>> for number in nums:
... print(number)
2
4
3
4
3
1
5
How does this work? Each item of the list is assigned to the loop variable number, in the order that the items appear in the list. This allows us to "do something" with each item in the list. So for each item in the list, the statements in the body of the loop are executed. A single execution of the loop body is called an iteration of the loop. Once we have gone through the entire list of items, the loop ends.
Let's walk through this in more detail for the above for loop.
number is assigned to the first value of the list, 2.
print(number) is executed.
4, then 3, and so on...)
This seems straightforward enough, but I will mention something briefly for you to think about: lists can be empty (denoted by []), so what happens if we try to loop over an empty list?
>>> nums = []
>>> for number in nums:
... print(number)
It doesn't seem like anything happens. This is because we have "reached the end" of the list before we started: there are "no more" items in the list to execute the loop for, so the loop body never gets executed.
Going back to our original intent, we would like to use this to sum the items in the list. Roughly speaking, we are taking the items of our list and accumulating them into a single value.
>>> nums = [2,4,3,4,3,1,5]
>>> total = 0
>>> for n in nums:
... total = total + n
>>> total
22
There are a few things to notice about this loop, because it represents a very common pattern among loops that aggregate values.
First, notice that in order to accumulate our value, we need a way to store information in between each iteration. We use a variable to do this (in this case, total) which we call the accumulator. Observe that the accumulator gets reassigned with every iteration of the loop. Because the accumulator changes so frequently, it's important to be careful when reasoning about it and making sure to understand how it changes with each iteration.
A common question is where the accumulator should be initialized. One common error is initializing the accumulator inside the body of the loop. If we're careful in tracing our code, it's easy to see why this is a problem: placing this inside the loop means that the variable gets initialized (and therefore reset) with each iteration.
Another common error is forgetting to initialize the accumulator in the first place. One can see how omitting this will lead to issues in the first iteration: total = total + n will attempt to evaluate the variable total when performing the assignment, but it doesn't exist yet!
So here is what the accumulator pattern looks like:
<accumulator> = <initial value>
for <item> in <collection>:
# combine accumulator with item
<accumulator> = ... <accumulator> ... <item> ...
Notice that the "combine" step is a bit vague. This is because we could be dealing with all kinds of data that could be combined in all kinds of ways and not just numbers.
This discussion of initializing the accumulator brings us back to an earlier question: what happens when we try to run this loop on an empty list? What should we expect should happen?
>>> nums = []
>>> total = 0
>>> for n in nums:
... total = total + n
>>> total
0
Again, there are no items in the list, so the body of the loop never gets executed. This means that the value of the accumulator total never gets modified.
Now, recall that in both conditional statements and loops, the body of these statements can be a series of any other statements. This means that we can use conditional statements inside the body of a loop or a loop in the branches of a conditional.
>>> nums = [2,4,3,4,3,1,5]
>>> for n in nums:
... if n % 2 == 0:
... print(n, "is even")
... else:
... print(n, "is odd")
2 is even
4 is even
3 is odd
4 is even
3 is odd
1 is odd
5 is odd
Notice here that indentation is important for delimiting which blocks belong to which statements.
Now, one of our original motivations had to do with repeating an action a certain numbers of times. In other words, we would like to be able to count. Our discussion of for loops so far suggests the following.
>>> nums = [0,1,2,3,4,5,6]
>>> for n in nums:
... if n % 2 == 0:
... print(n, "is even")
... else:
... print(n, "is odd")
0 is even
1 is odd
2 is even
3 is odd
4 is even
5 is odd
6 is even
This is a bit silly and it seems like we should be able to avoid having to type 1000 numbers if we just wanted to count to 1000. Python has a mechanism for this, the range function.
>>> for n in range(6):
... if n % 2 == 0:
... print(n, "is even")
... else:
... print(n, "is odd")
0 is even
1 is odd
2 is even
3 is odd
4 is even
5 is odd
From what we observe, it seems that range(6) produces a list of integers from 0 through 5. That is, we can think of the parameter as indicating the number of items rather than the final item.
However, the range function doesn't produce a list—it produces a range.
>>> range(6)
range(0, 6)
>>> type(range(6))
range
But like we've done for other values of other types, we can cast a range as a list.
>>> list(range(6))
[0, 1, 2, 3, 4, 5]
Why might this choice have been made? There are a few reasons. First, if we want to capture a range for the purposes for counting, we don't really need to create one item for each step. Imagine we want to count up to 10000—using a list would require a list of 10000 items.
The second reason is that we may not want to consider a list. We mentioned earlier that lists are only one of several possible kinds of collection types in Python (and it turns out that range could be thought of as another one).
The range function can take one or two inputs (and technically more), which denote the start and end of the range. If range(n) is specified, then it assumes the start is 0, and represents the numbers $0, 1, \dots, n-1$. This is important to keep in mind: the end of the range is not included in the range.
Then if range(m,n) is specified, then the numbers $m, m+1, \dots, n-1$ are represented. Again, notice that $m$, the starting number, is included and the end, $n$ is not.
Here's one last interesting question: what happens if we call range(m,n) with $n \gt m$?
>>> list(range(21, 3))
[]
A common assumption would be that this causes an error, but we see that Python handles it quietly and simply produces an empty range. This is useful for a number of reasons and is important to keep in mind.
Suppose that we want to test whether a number is prime (in truth, much more complicated to do efficiently than it appears). Recall that a prime number is a positive integer greater than 1 that only has 1 and itself as its divisor. Notice that 1 is not a prime number—there is an actual technical explanation for this, but most people just accept this restriction.
So numbers like 2, 13, and 37 are prime, while numbers like 6, 16, and 57 are not. Our definition suggests that we can determine the primality of a number if we verify that there are no divisors between 1 and itself (any larger number can't be a divisor). This gives us a relatively straightforward approach involving loops.
def is_prime(n):
"""
Determine whether n is a prime number. A number greater than 2
is prime if its only divisors are 1 and itself.
Input:
n (int): positive integer to test
Output (bool): True if n is prime, False otherwise
"""
result = True
if n == 1:
result = False
for d in range(2,n):
if n % i == 0:
result = False
return result
>>> is_prime(11)
True
>>> is_prime(1)
False
>>> is_prime(57)
False
A common mistake in the above program is forgetting about the case when $n = 1$. If we didn't test for that condition, then our loop would be over the range range(2, 1), which is empty. Since the loop is never executed, result is never modified and still evaluates to True at the end of the function.
While the code seems a bit more complicated, notice that it still has the form of an accumulation. But what is being accumulated?