CMSC 14100 — Lecture 21

Recall that we can view the list of projects described in the stimulus data as a tree. We want to aggregate these values so, for example, the IL node contained the sum of all the project nodes under it.

This involves two things:

computing a value based on the subtrees, and
mutating the tree so we store the values that we get.

So let's think about how we might do this. We have a tree, so let's break it up into the base and recursive cases.

Here, the base case is not the empty tree! It's actually the leaf nodes, since they contain the values we're looking for. Then the value we want is just the value of the leaf node and we don't need to change anything about the node.
Now, consider the recursive case. Our problem is that we want the value of the node to take on the sum of all the nodes underneath it. This means we need to inspect each node and compute its sum. This is where we apply our recursive call. We take this result and set the value in our node and then pass on the result so the parent node can use it.

This gives us the following function (notice it's not a method, sometimes we want to compute things about trees).


def aggregate_stimulusvalues(tree):
    """
    Given a tree, aggregate the values of the tree and fill in the 
    value.

    Input:
        tree (Tree): tree containing integers as values
    
    Output: sum of values in subtrees of tree (int)
    """
    if tree.children == []:
        return tree.value
    else:
        total_cost = 0
        for subtree in tree.children:
            total_cost += aggregate_stimulusvalues(subtree)
        tree.value = total_cost
        return total_cost

Binary search trees

Trees are useful not just because they can represent hierarchical data, but because, like lists, they are a natural starting point for many other data structures. One such data structure is a binary search tree.

The idea behind a binary search tree is that it's a way to store a set of data that has an ordering and stores it in a way that it can be retrieved quickly. BSTs have two important properties that make them different from ordinary trees.

BSTs are binary trees. That is, every tree has exactly two children. These are called the left child and the right child. It is important that they're ordered and it is important that they must both exist. (Recall that a tree can be empty!)
BSTs have the binary search property:
Given a non-empty tree with value $k$, all the nodes in the left child must have values that are strictly less than $k$ and all the nodes in the right child must have values that are strictly greater than $k$.

Here's an example.

 48
  ├──28
  │  ├──15
  │  │  ├──EMPTY
  │  │  └──21
  │  │     ├──EMPTY
  │  │     └──EMPTY
  │  └──31
  │     ├──EMPTY
  │     └──EMPTY
  └──56
     ├──54
     │  ├──EMPTY
     │  └──EMPTY
     └──93
        ├──92
        │  ├──EMPTY
        │  └──EMPTY
        └──94
           ├──EMPTY
           └──EMPTY

In this rendering, each node has two subtrees: the "upper" one is the left subtree and the "lower" one is the right subtree.

So our BST class will look a bit different from the ordinary Tree class.


class BinarySearchTree:
    """
    A class representing a binary search tree. A binary search tree is 
    either
    - empty, or
    - a node containing a value with a left subtree and a right 
      subtree, such that the binary search tree condition is satisfied.

    Attributes:
    - value: a value, which is None if empty
    - left: BinarySearchTree, or None if empty
    - right: BinarySearchTree, or None if empty
    """

    def __init__(self, value=None):
        """
        Creates a new BST with root value. If the given value is None,
        then this creates an empty BST.
        """
        if value is None:
            self.value = None
            self.left = None
            self.right = None
        else:
            self.value = value
            self.left = BinarySearchTree()
            self.right = BinarySearchTree()

    def is_empty(self):
        """
        Returns True if the tree is empty and False otherwise.
        """
        return self.value is None

Not only is the structure of the tree constrained, but since we have a particular property we need to preserve, we need to control access to it. We've seen this idea before—stacks and queues can be based on lists but have their access controlled. So similarly, BSTs are based on trees and have their access controlled. There are many other data structures (quadtrees, binary heaps, etc.) that are based on restrictions on trees.

So instead of using add_child, we need an insert function specifically for BSTs. There is an algorithm for this and we find that as we insert items into the tree, there's exactly one place for each item to go. An observation that we can make is that the exact tree you get depends on the order that you insert the items.

For example, here is another possible BST using the same integers as above. In the above example, we inserted items into the tree in the following order: 48, 28, 56, 15, 31, 54, 93, 21, 92, 94. In the following example, we insert items in the following order: 54, 21, 92, 15, 28, 56, 93, 31, 94, 48.

 54
  ├──21
  │  ├──15
  │  │  ├──EMPTY
  │  │  └──EMPTY
  │  └──28
  │     ├──EMPTY
  │     └──31
  │        ├──EMPTY
  │        └──48
  │           ├──EMPTY
  │           └──EMPTY
  └──92
     ├──56
     │  ├──EMPTY
     │  └──EMPTY
     └──93
        ├──EMPTY
        └──94
           ├──EMPTY
           └──EMPTY

Instead, let's talk about how BSTs improves lookup by implementing __contains__. Recall that for Tree, we needed to check every child of a node for our value. We claim that a BST improves on this.

Consider the base case: an empty tree. Then clearly our value is not in this tree, so the result is False.
Then the recursive case is a node with two subtrees.
- If the value is in the current node, then we're done! The answer is True.
- But what if it's not? For an ordinary tree, we need to check each subtree. But in a BST, there are only two! Not only that, we know something important about these subtrees: everything on the left is guaranteed to have a value less than that of the current node. Conversely, everything on the right is guaranteed to have a value greater than that of the current node. This gives us a way forward.
  - If our value is less than the value of the current node, we only check the left subtree.
  - If our value is greater than the value of the current node, we only check the right subtree.

This gives us the following algorithm.


    def __contains__(self, item):
        """
        Determine whether the given item is present in the tree.

        Input:
            item (int): the item to be found

        Output: True if item is in the tree, False otherwise (bool)
        """
        if self.is_empty():
            return False
        else:
            if item == self.value:
                return True
            elif item < self.value:
                return self.left.__contains__(item)
            else:
                return self.right.__contains__(item)

And now that we've seen this, it turns out we do essentially the same thing when we insert an item: we need to find where the item should be and if it isn't there already, we add it.


    def insert(self, item):
        """
        Add the given item into the tree if it isn't already present in the 
        tree.

        Input:
            item (int): the item to be inserted
        """
        if self.is_empty():
            self.value = item
            self.left = BinarySearchTree()
            self.right = BinarySearchTree()
        else:
            if item < self.value:
                self.left.insert(item)
            elif item > self.value:
                self.right.insert(item)

In theory, searching in a binary search tree is a huge improvement: we only ever go down exactly one branch of the tree. With a list or an ordinary tree, we have to check every element, since we don't know anything about how the data is organized. But here, our property allows us to cut down our search space by up to a half on each step in ideal conditions.

This sounds very impressive until you notice that I've couched my words a bit. Why only up to half the search space? What does ideal conditions mean?