CMSC 14100 — Lecture 12

Optional parameters

Some terminology:

Parameters are the formal name for the variables that are assigned to things that get passed to the function when the function is call.
The specific values that parameters get bound to are called arguments.

Recall that when a function call occurs, the expressions that the function is called with are evaluated into values. These values get assigned to the parameters for use in the body of the function. In effect, the values get copied when they are passed to a function. (Recall however, that the specifics of "values" means we're really dealing with references)

Let's consider the following problem: we would like to generate $n$ random integers. If we try to come up with examples, we run into some questions: which integers are we considering? Is there a range? If so, what should our inputs to the function be?

If we decide that we should specify a range, then we can describe our function more carefully.


def gen_randint(n, low, high):
    """
    Given integers n, low, and high, produces a list of n random 
    integers between low and high.

    Input:
        n (int): number of integers to generate, should be positive
        low (int): lower bound
        high (int): upper bound, should be greater than low

    Output: list of n integers between low and high (list[int])
    """
    lst = []
    for i in range(n):
        lst.append(random.randint(low, high))
    return lst

However, suppose we realize that most of the times we call this function, we really want to generate numbers between 0 and 100. One neat feature of Python is the ability to set default values for parameters.


def gen_randint(n, low=0, high=100):
    """
    Given integers n, low, and high, produces a list of n random 
    integers between low and high.

    Input:
        n (int): number of integers to generate, should be positive
        low (int): lower bound (default: 0)
        high (int): upper bound, greater than low (default: 100)

    Output: list of n integers between low and high (list[int])
    """
    lst = []
    for i in range(n):
        lst.append(random.randint(low, high))
    return lst

Parameters with default values become optional. This adds a complication. Ordinarily, arguments are provided to a function call in order. But suppose we want to use a different upper bound. Does this mean we are also forced to use a lower bound?

The way to get around this is to include the name of the optional parameter.


    gen_randint(20, high=2000)

We formally split up arguments into two types:

positional arguments are those argument that are passed to the function based on their position;
keyword arguments are those argument that are passed to the function with a specified keyword.

Because positional arguments are determined based on their position, optional parameters must be placed after all parameters that are not optional.

Global variables

This talk of scope and mutation can lead to some questionable ideas. For example, if passing a list into a function can result in the list being mutated, why not just define our variables so that they're in scope for all the functions we write and mutate all sorts of variables?

Such variables, which are defined in a file outside of any functions are called global variables (as opposed to local variables). They are "in scope" for all functions in the file. You should never use global variables in this class. There is no good techincal reason to do so for any of the problems in this class. You are guaranteed to be penalized for it.

The lone exception is when you are defining variables for the purpose of naming constants. Why is this okay? Constants remain the same and are not meant to change.

Dictionaries

In Homework 4, you were asked to write a function count_words, which produced a list of (word, count) tuples. Suppose that we wanted to make use of this list and look up the count of a word. How do we do this?


    def how_many(word_counts, term):
        """
        Look up how many instances of term are recorded in word_counts.

        Input:
            word_counts (list[tuple[str,int]]): list of (word, count) pairs
            term (str): term to look up in word_counts

        Output (int): number of instances of term recorded in word_counts
        """
        for word, count in word_counts:
            if term == word:
                return count
        return 0


    >>> counts = hw4.count_words(['computer', 'ibm', 'program', 'modern', 'customer'], hw4_texts.IBM)
    >>> how_many(counts, 'ibm')
    18

How long does this approach take? Since our specification for word counts required that the list of pairs is in the order of the original word list, there's no particular ordering to the list. This means we have to search the entire list for our term. In the worst case, our term isn't actually in the list and we still have to search the entire list to figure this out. This means that the longer the list is, the longer it can take to run this function.

This leads to the question of whether a list of pairs is really the best representation of our data.

So far all of the complex data types that are collections of other data that we have seen have been sequential: lists, tuples, and strings all rely on data being stored in a linear sequence. But this is not the only way to manage collections of data. After all, not all data has a natural linear ordering. Furthermore, we will not want to access data in a linear way, since this necessarily becomes more expensive as we work with larger collections of data.

This sets the stage for our first non-sequential collection data type. Dictionaries are a collection data type that associates values with "keys" rather than position. For this reason, dictionaries are sometimes called key-value structures or mapping structures. The idea behind each name is the same: rather than refer to a value by its position in a list, we can refer to it by a key that is mapped to it.

Dictionaries are denoted by braces. We can create an empty dictionary in this way.


    >>> d = {}
    >>> d
    {}
    >>> type(d)
    dict

We can add items to the dictionary by assignment, in a way that looks very similar to assignment on lists.


    >>> d["A"] = 90
    >>> d
    {'A': 90}

Unlike lists, which would give you an out of range error, if the entry doesn't exist in the dictionary already, it gets added.

Just as lists are represented by a comma-delimited list of items within square brackets and tuples are the same but with parentheses, we represent dictionaries as key-value pairs inside of braces, where each pair is delimited by commas and the key and value are separated by a colon.


    {key_1: value_1, key_2: value_2, ..., key_n: value_n}

Using this notation, we can directly specify a dictionary.


    >>> d = {"A": 90, "A-": 75, "B+": 60, "B": 50}
    >>> d 
    {"A": 90, "A-": 75, "B+": 60, "B": 50}

It's clear that we can store any value we like, as usual, but what about keys? So far, we've been using strings as keys, but what about other types? Can we use integers or floats? The short answer is yes, but you should be careful. In general, you should use keys that are immutable, though the specific constraints are more complicated. You should also be careful not to treat a dictionary as a list if you use integer keys, since dictionaries are not sequential—there is no guarantee of order in a dictionary.

Using the "subscript" notation, we can access dictionary items in the same way as lists, just using a key rather than an index.


    >>> points = d["A"]
    >>> points
    90

If we attempt to access a key that doesn't exist in the dictionary, we get an error, just like we would with lists.


    >>> d[73]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    KeyError: 73

On the other hand, if we attempt to "reassign" a key in our dictionary to another value, we can do this—dictionaries are mutable.


    >>> d['B+'] = 73
    >>> d
    {'A': 90, 'A-': 75, 'B+': 73, 'B': 50}

Notice that this also means that keys are unique—you can only assign one value to a particular key (hence why dictionaries are mapping structures).

Just like lists, we can use in to test membership. Note that this tests whether the key is contained in the dictionary, not the value.


    >>> 'A' in d
    True
    >>> 90 in d
    False

Similarly, we can use del to remove items from the dictionary, again by key.


    >>> del(d['B+'])
    >>> d
    {'A': 90, 'A-': 75, 'B': 50}

Notice that we almost always interact with the dictionary based on keys. For instance, as with other data types that are collections of data, we can iterate through the dictiionary with for. But notice that we iterate through its keys.


    >>> for i in d:
    ...    print(i)
    A
    A-
    B

It's important to remember that although this seems to give us some ordering for the dictionary (and there is an underlying one), we shouldn't think too hard about it because dictionaries are treated as unordered. That is, two dictionaries with the same key-value pairs are considered equal.

It's possible to access only the keys or only the values through the dictionary methods keys and values. However, if we want to iterate through both keys and values, it is more useful to use the items method, which will produce key-value pairs in the form of tuples.


    >>> for k, v in d.items():
    ...    print(k, v)
    A 90
    A- 75
    B 50

Something you may ask from this is what order the keys get iterated in. The official answer is that dictionaries happen to preserve the order of the items in the order that they were inserted in. However, you shouldn't rely on this fact, since dictionaries are considered equal if they contain the same key-value pairs, no matter what order they're in.

Let's revisit the problem that we started with. Here, we'll write a function that takes a list of words and counts the number of occurrences of each word.


    def count_all_words(words):
        """
        Count the number of occurrences of each item in the list.

        Input:
            words (list[str]): list of words

        Output (dict[str,int]): dictionary that maps a string to the number
            of times it occurred in the list
        """
        counts = {}
        for word in words:
            if word not in counts:
                counts[word] = 0
            counts[word] = counts[word] + 1
        return counts


    >>> count_all_words(['it', 'is', 'a', 'melancholy', 'object', 'to',
    ... 'those', 'who', 'walk', 'through', 'this', 'great', 'town', 'or',
    ... 'travel', 'in', 'the', 'country', 'when', 'they', 'see', 'the',
    ... 'streets', 'the', 'roads', 'and', 'cabbin-doors', 'crowded', 'with',
    ... 'beggars', 'of', 'the', 'female', 'sex', 'followed', 'by', 'three',
    ... 'four', 'or', 'six', 'children', 'all', 'in', 'rags', 'and',
    ... 'importuning', 'every', 'passenger', 'for', 'an', 'alms', 'these',
    ... 'mothers', 'instead', 'of', 'being', 'able', 'to', 'work', 'for',
    ... 'their', 'honest', 'livelihood', 'are', 'forced', 'to', 'employ',
    ... 'all', 'their', 'time', 'in', 'stroling', 'to', 'beg', 'sustenance',
    ... 'for', 'their', 'helpless', 'infants', 'who', 'as', 'they', 'grow',
    ... 'up', 'either', 'turn', 'thieves', 'for', 'want', 'of', 'work',
    ... 'or', 'leave', 'their', 'dear', 'native', 'country', 'to', 'fight',
    ... 'for', 'the', 'pretender', 'in', 'spain', 'or', 'sell',
    ... 'themselves', 'to', 'the', 'barbadoes']) 
    {'it': 1, 'is': 1, 'a': 1, 'melancholy': 1, 'object': 1, 'to': 6,
     'those': 1, 'who': 2, 'walk': 1, 'through': 1, 'this': 1, 'great': 1,
     'town': 1, 'or': 4, 'travel': 1, 'in': 4, 'the': 6, 'country': 2,
     'when': 1, 'they': 2, 'see': 1, 'streets': 1, 'roads': 1, 'and': 2,
     'cabbin-doors': 1, 'crowded': 1, 'with': 1, 'beggars': 1, 'of': 3,
     'female': 1, 'sex': 1, 'followed': 1, 'by': 1, 'three': 1, 'four': 1,
     'six': 1, 'children': 1, 'all': 2, 'rags': 1, 'importuning': 1,
     'every': 1, 'passenger': 1, 'for': 5, 'an': 1, 'alms': 1, 'these': 1,
     'mothers': 1, 'instead': 1, 'being': 1, 'able': 1, 'work': 2,
     'their': 4, 'honest': 1, 'livelihood': 1, 'are': 1, 'forced': 1,
     'employ': 1, 'time': 1, 'stroling': 1, 'beg': 1, 'sustenance': 1,
     'helpless': 1, 'infants': 1, 'as': 1, 'grow': 1, 'up': 1, 'either': 1,
     'turn': 1, 'thieves': 1, 'want': 1, 'leave': 1, 'dear': 1, 'native': 1,
     'fight': 1, 'pretender': 1, 'spain': 1, 'sell': 1, 'themselves': 1,
     'barbadoes': 1}

We now have more options for representing different kinds of collections of data.

Lists represent sequential data.
Tuples represent compound data, but fixed-size data.
Dictionaries represent key-value data.

As we have more options, it is worth asking which representation is the most effective one for our data and the problems we want to solve.

We haven't really addressed how this data is represented in memory. Recall that when we discussed lists, we depicted them as a block of contiguous memory. Though there is a bit more to it than that, this is relatively accurate and accurately depicts the idea that lists are traversed sequentially or via position. But what about dictionaries?

The implementation of dictionaries is pretty interesting, but requires some additional mathematics and data structures knowledge to fully grasp. The short answer is that through some clever mathematics, the structure is able to map a key to a position in memory that allows for fast retrieval. This is why dictionaries can be a more effective choice for representing data than lists, given the right applications and circumstances.