CMSC 14100 — Lecture 11

Scope

One issue that we've avoided until now but becomes quite important with the definition of functions is the usage of names. Suppose I have two functions each with loops with loop variables named i. Are these names going to be in conflict?

For a simpler example, consider the following function that just adds one to a given number.


def incr(a):
    one = 1
    return a + one

If we try to access one outside of the function, we'll get an error. In some ways, this makes sense—a variable shouldn't exist outside of where it's defined. This notion is called scope. Variables that are defined within a function only exist inside of that function and are called local variables.

Function parameters are also considered local variables—their scope is the body of the function. When a function is called, values get assigned to those names for the duration of the function execution. This ties in to the general principle that a function should be considered independent units of computation.


    a = 24
    z = incr(a * 2)

This principle works in both directions. Because functions cannot access variables defined in other functions, they cannnot rely on or interfere with some other computation. On the other hand, because functions cannot access variables defined in other functions, they do not need to worry about or keep track of anything going on outside of the function.

One thing to note is that this separation occurs between functions and is not just a matter of whether the variable "exists" or not. For example, consider the following functions.


    def add(m,n):
        ans = incr(m)
        return ans

    def incr(a):
        return a + n

Though incr is called during the execution of a call to add, it still cannot (and should not be able to) access the variable n which was defined by add.

Of course, that should make sense: what happens if some other function calls incr but doesn't have a variable named n? Whoever wrote incr shouldn't have to worry about that.

The Call Stack

Here, we'll expand a bit on how all of this stuff is represented internally. How exactly does the computer keep track of all of the function calls and the various variables that each function call defines?

This is all represented by what is called the call stack. Each time a function call is made, a corresponding entry is added to the stack. Such an entry is called a stack frame.

Why is it called a stack? As we make more and more function calls, more and more frames are added to the stack. However, we always execute the most recent function calls first in order to evaluate them. In other words, the call that is made last is always executed first. The stack then gives us the natural order in which our function calls are all executed.

Each stack frame consists of all of the local defintions that each function makes. This includes the values assigned to the parameters, the local variables that are defined, and at the very end, the value that is returned as output.

Consider the following example.


def f(i):
    i = i + 1
    return i

def g(i):
    j = h(i*5)
    return j

def h(i):
    return i+1

def main():
    i = 7;
    j = k = l = 0;

    j = f(i)
    k = f(j*3)
    l = g(i)
    
    return i, j, k, l

Suppose we call main(). This creates a stack frame.

`main`

Then we execute the assignments.

`main`
`i`	`7`
`j`	`0`
`k`	`0`
`l`	`0`

When we make our first function call, we add a new stack frame for the function. Note that here, the two i's are different: the value of i in main is assigned to the i in f. This means that the value was copied.

`f`
`i`	`7`
`main`
`i`	`7`
`j`	evaluating `f(i)`
`k`	`0`
`l`	`0`

Then we change i in f

`f`
`i`	`8`
`main`
`i`	`7`
`j`	evaluating `f(i)`
`k`	`0`
`l`	`0`

When we execute the return statement, we produce the value that i evaluates to and assign it to j in main. Since the execution of the function is finished, the stack frame for the function is removed.

`main`
`i`	`7`
`j`	`8`
`k`	`0`
`l`	`0`

Our next assignment involves a function call to f(j*3). Again, we evaluate j * 3 to assign that value to i in f.

`f`
`i`	`24`
`main`
`i`	`7`
`j`	`8`
`k`	evaluating `f(j*3)`
`l`	`0`

Working through the same process as before, we end up with 25 assigned to k.

`main`
`i`	`7`
`j`	`8`
`k`	`25`
`l`	`0`

Finally, we make a call g(i). This creates the following stack frame.

`g`
`i`	`7`
`main`
`i`	`7`
`j`	`8`
`k`	`25`
`l`	evaluating `g(i)`

Now, this function makes another function call, so evaluate i*5 and add another stack frame.

`h`
`i`	`35`
`g`
`i`	`7`
`j`	evaluating `h(i*5)`
`main`
`i`	`7`
`j`	`8`
`k`	`25`
`l`	evaluating `g(i)`

This function then returns the value 36, which is i + 1, its stack frame gets removed and the value gets assigned to j in g.

`g`
`i`	`7`
`j`	`36`
`main`
`i`	`7`
`j`	`8`
`k`	`25`
`l`	evaluating `g(i)`

Then g returns its value, so its stack frame goes away and its value gets assigned to l.

`main`
`i`	`7`
`j`	`8`
`k`	`25`
`l`	`36`

The function main then produces all of these values as a tuple and the stack frame gets cleared.

When viewed in this way, the mechanics of functions become a bit more clear:

Variables defined by other functions are inaccessible.
Local variable names clearly live in their own scope, independent of other what happens in other functions.
Parameters are assigned by copying.
Control flow follows the "top" of the stack.

Lists and other complex data

What we described above is the simple case. Unfortunately, things become a bit more complicated when we remember we have to deal with lists.

Recall that the value of a list variable is a reference to the list, not the list itself. This means that when a function is called with a list as an argument, it really receives the location of the list. In this case, all the usual caveats about how list access works hold: it is possible for a function to mutate a list that is defined somewhere else. This is one of those things that makes mutation complicated to reason about.


def incr_zeroth(lst):
    lst[0] = lst[0] + 1

Notice that this function does not have a return statement! Since the function mutates the list, it does not produce any value. Suppose we try to make an assignment:


old_list = [2,3,4,6,8,11]
new_list = incr_zeroth(old_list)

Let's examine the call stack. First, we have our list variable, which really holds a reference (this reference is made up). Notice we have a section for "the memory", which is ill-defined at this point, but can be reached by knowing the address of things. We prepare to make an assignment to new_list.

`main`
`old_list`	`0x239A374`
`new_list`
Address	Contents
`0x2392374`	`[2,3,4,6,8,11]`

When we make a call to incr_zeroth, we add a new stack frame and copy the value, which is a reference, for assignment to the parameter.

`incr_zeroth`
`lst`	`0x239A374`
`main`
`old_list`	`0x239A374`
`new_list`	evaluating `incr_zeroth(old_list)`
Address	Contents
`0x2392374`	`[2,3,4,6,8,11]`

When incr_zeroth makes an assignment, it refers to the original list because we copied a reference.

`incr_zeroth`
`lst`	`0x239A374`
`main`
`old_list`	`0x239A374`
`new_list`	evaluating `incr_zeroth(old_list)`
Address	Contents
`0x2392374`	`[3,3,4,6,8,11]`

Once the function runs out of statements to execute, the stack frame gets removed. Since there was no return statement, the function did not produce any value. So the value None gets assigned to new_list.

`main`
`old_list`	`0x239A374`
`new_list`	`None`
Address	Contents
`0x2392374`	`[3,3,4,6,8,11]`

If we want to produce a new list with the desired values, we have to explicitly create a new list.


def incr_zeroth(lst):
    new_list = lst[:]
    new_list[0] = new_list[0] + 1
    return new_list

Global variables

This talk of scope and mutation can lead to some questionable ideas. For example, if passing a list into a function can result in the list being mutated, why not just define our variables so that they're in scope for all the functions we write and mutate all sorts of variables?

Such variables, which are defined in a file outside of any functions are called global variables. They are "in scope" for all functions in the file. For example, the following code works, even users and emails are defined outside of any functions.


def parse_input():
    lines = s.strip().split("\n")
    
    for l in lines[1:]:
        username, fname, lname = l.split(",")
        users.append((username, fname, lname))
        
def get_emails():
    for username, _, _ in users:
        emails.append(username + "@uchicago.edu")

def print_emails():
    for email in emails:
        print(email)
    
# Pretend that this is read from a CSV file instead
# of being hardcoded in the program
s = """username,fname,lname
aelmore,Aaron,Elmore
amr,Anne,Rogers
kalea,Alex,Kale
timng,Tim,Ng
"""

users = []
emails = []
parse_input()
get_emails()
print_emails()

There are a few issues here. First, one has to be careful in tracking where things are stored and executed. Secondly, this makes reusability a challenge. For example, get_emails depends on the users variable being set to something of the right form before it is called.

Instead, we should write


def parse_input(s):
    lines = s.strip().split("\n")
    
    users = []
    for l in lines[1:]:
        username, fname, lname = l.split(",")
        users.append((username, fname, lname))
    return users
        
def get_emails(users):
    emails = []
    for username, _, _ in users:
        emails.append(username + "@uchicago.edu")
    return emails

def print_emails(emails):
    for email in emails:
        print(email)
    
# Pretend that this is read from a CSV file instead
# of being hardcoded in the program
s = """username,fname,lname
aelmore,Aaron,Elmore
amr,Anne,Rogers
kalea,Alex,Kale
timng,Tim,Ng
"""

users = parse_input(s)
emails = get_emails(users)
print_emails(emails)

Here, each function is self contained and the inputs for each function are explicitly provided. If we want to follow the flow of the program, there is a natural path and order to it.

You should never use global variables in this class. There is no good techincal reason to do so for any of the problems in this class. You are guaranteed to be penalized for it.

The lone exception is when you are defining variables for the purpose of naming constants. Why is this okay? Constants remain the same and are not meant to change.