CMSC 14100 — Lecture 24

Binary search

Here's another place where the immediate recursive solution gives us something that's not as fast as it could be: searching a sorted list.

Why? Following the recursive structure of the list means we have to look at each item in order. Obviously to do any faster we have to break out of that and "skip" some elements somehow.

If we think back to binary search trees, we can borrow an idea: try to figure out where to look.


    def binary_search(lst, x):
        """
        Searches a sorted list for the given element.

        Input:
            lst (list[int]): list of integers
            x (int): item to find

        Output: True if x is in lst, False otherwise.
        """
        if lst == []:
            return False
        else:
            mid = len(lst) // 2
            if x == lst[mid]:
                return True
            elif x < lst[mid]:
                return binary_search(lst[0:mid], x)
            else:
                return binary_search(lst[mid+1:], x)
    

This is like our binary search trees, in that if we look in the middle of the list, we naturally divide our search space in half (though with binary search trees this was not a given).

There is one problem with this: we're creating new lists whenever we do some slicing. While we're creating smaller and smaller lists, this cost still adds up.

One trick is to use what we learned with our discussion about accumulation: pass some information along. The observation here is that instead of recreating copies of the relevant parts of the list, we just pass the endpoints for the sublist and restrict our search there. So instead of re-creating the left half of the list, we just say "look in between these two indices"). This way, we're not creating entire lists at each step.


    def binary_search(lst, x, low, high):
        """
        Performs binary search for item in list in-place.

        Input:
            lst (list[int]): list of integers
            x (int): item to look for
            low (int): left index of search space in list
            high (int): right index of search space in list
        
        Output: index of item if item is lst, False otherwise
        """
        if (low >= high):          
            return False
        else:
            mid = (low + high)//2;
            if x == lst[mid]:
                return mid
            elif x < lst[mid]:
                return binary_search(lst, x, low, mid)
            else:
                return binary_search(lst, x, mid+1, high)
    

Since this approach only passes some integers down to the next function call, we get substantial improvements in speed.

Comprehensions

Recall that whenever we want to transform a list into another list, we have a standard mapping pattern,


    lst = [3, 1, 9, 3]
    new_lst = []
    for item in lst:
        new_item = item ** 2
        new_list.append(new_item)
    

It turns out there is a shorthand for this.


    new_lst = [item ** 2 for item in lst]
    

These expressions are called comprehensions. Comprehensions are used to describe collecdtions of items. This comes from the term set comprehension, which you may recall from math: \[\{2x \mid x \in \{0,1,\dots, 9\} \}.\]

We can write comprehensions over sets,


    {2*x for x in range(10)}
    

We would call this a set comprehension, while a comprehension over lists is a list comprehension.

We would read them out in very much the same way, that this set contains $2x$ for $x$ from 0 to 9, and this reading works whether you have a list or set.


    new_lst = [<expression> for <item> in <collection>]
    new_set = {<expression> for <item> in <collection>}
    

Anyhow, comprehensions give a very convenient way to describe new lists or sets. However, sometimes we may not want to simply map every item in a list to another item, but we may also want to do some filtering. For instance, consider the following,


    lst = [2,4,3,4]
    new_lst = []
    for item in lst:
        if item % 2 == 0:
            new_item = item ** 2
            new_list.append(new_item)
    

Here, we don't jut take every item, we select based on a test. This test can be incorporated into a comprehension in the following way.


    new_lst = [item ** 2 for item in lst if item % 2 == 0]
    

The boolean condition acts as a filter on the items and those that satisfy the condition are included, while those that do not are left out.


    new_lst = [<expression> for <item> in <collection> if <condition>]
    

Finally, suppose that we have something similar to a filtering test, but instead of filtering out elements, we want to include a different mapping. For instance,


    lst = [2,4,3,4]
    new_lst = []
    for item in lst:
        new_item = item ** 1/2 
        if item % 2 == 0:
            new_item = item ** 2
        new_list.append(new_item)
    

We can use the following.


    new_lst = [item ** 2 if item % 2 == 0 else item ** 1/2 for item in lst]
    

This is a bit strange, since before, the if was at the end of the expression, but here, it's before. What gives?

It turns out that this is actually not a feature of the comprehension expression, but are expressions in their own right. The expression


    <var> = <val1> if <condition> else <val2>
    

will assign to var the value val1 if the condition is met, otherwise it will assign val2. This is equivalent to


    if <condition>:
        <var> = <val1> 
    else:
        <var> = <val2> 
    
or

    <var> = <val2> 
    if <condition>:
        <var> = <val1> 
    

Note that since this construct is an expression, we can use it whereever we use expressions, like in the following


    lst = [2,4,3,4]
    new_lst = []
    for item in lst:
        new_item = item ** 2 if item % 2 == 0 else item ** 1/2
        new_list.append(new_item)
    

This is a bit hard to read. Personally, I like to break up these expressions so they're a bit easier to read in the following way.


    lst = [2,4,3,4]
    new_lst = []
    for item in lst:
        new_item = (item ** 2 
                    if item % 2 == 0 
                    else item ** 1/2)
        new_list.append(new_item)
    

Here, each value is on its own line and the condition is also on its own line.

One thing to note is that this expression must have an else value, since expressions must evaluate to a value. If you try otherwise, you will get a syntax error!

We can also describe dictionaries using comprehensions, though it's a bit more complicated.


    lst = ["one", "two", "three", "four"]
    d = {s: len(s) for s in lst}
    

In the case of dictionaries, we need to provide two items: a key and a value. As usual, you want to be careful because both dictionaries and sets use the same curly braces.

Also keep in mind that you can take advantage of tuple unpacking in the usual way.


    vals = [v for k, v in d.items()]
    

What about tuple comprehensions? One would very reasonably conclude that we can write them in the same way:


    (2*x for x in range(10))
    

Unfortunately, this does not produce a tuple, for reasons similar to the fact that {} does not produce an empty set: the notation is overloaded to mean more than one thing. What we have above is a generator expression, which creates a generator, which is a special kind of object that we won't be getting into in this class. Luckily, you can turn one of these things into a tuple in the usual way.