CMSC 14100 — Lecture 21

Recall that we can view the list of projects described in the stimulus data as a tree. We want to aggregate these values so, for example, the IL node contained the sum of all the project nodes under it.

This involves two things:

  1. computing a value based on the subtrees, and
  2. mutating the tree so we store the values that we get.

So let's think about how we might do this. We have a tree, so let's break it up into the base and recursive cases.

This gives us the following function (notice it's not a method, sometimes we want to compute things about trees).


def aggregate_stimulusvalues(tree):
    """
    Given a tree, aggregate the values of the tree and fill in the 
    value.

    Input:
        tree (Tree): tree containing integers as values
    
    Output: sum of values in subtrees of tree (int)
    """
    if tree.children == []:
        return tree.value
    else:
        total_cost = 0
        for subtree in tree.children:
            total_cost += aggregate_stimulusvalues(subtree)
        tree.value = total_cost
        return total_cost
    

Binary search trees

Trees are useful not just because they can represent hierarchical data, but because, like lists, they are a natural starting point for many other data structures. One such data structure is a binary search tree.

The idea behind a binary search tree is that it's a way to store a set of data that has an ordering and stores it in a way that it can be retrieved quickly. BSTs have two important properties that make them different from ordinary trees.

  1. BSTs are binary trees. That is, every tree has exactly two children. These are called the left child and the right child. It is important that they're ordered and it is important that they must both exist. (Recall that a tree can be empty!)
  2. BSTs have the binary search property:
    Given a non-empty tree with value $k$, all the nodes in the left child must have values that are strictly less than $k$ and all the nodes in the right child must have values that are strictly greater than $k$.

Here's an example.

 48
  ├──28
  │  ├──15
  │  │  ├──EMPTY
  │  │  └──21
  │  │     ├──EMPTY
  │  │     └──EMPTY
  │  └──31
  │     ├──EMPTY
  │     └──EMPTY
  └──56
     ├──54
     │  ├──EMPTY
     │  └──EMPTY
     └──93
        ├──92
        │  ├──EMPTY
        │  └──EMPTY
        └──94
           ├──EMPTY
           └──EMPTY

In this rendering, each node has two subtrees: the "upper" one is the left subtree and the "lower" one is the right subtree.

So our BST class will look a bit different from the ordinary Tree class.


class BinarySearchTree:
    """
    A class representing a binary search tree. A binary search tree is 
    either
    - empty, or
    - a node containing a value with a left subtree and a right 
      subtree, such that the binary search tree condition is satisfied.

    Attributes:
    - value: a value, which is None if empty
    - left: BinarySearchTree, or None if empty
    - right: BinarySearchTree, or None if empty
    """

    def __init__(self, value=None):
        """
        Creates a new BST with root value. If the given value is None,
        then this creates an empty BST.
        """
        if value is None:
            self.value = None
            self.left = None
            self.right = None
        else:
            self.value = value
            self.left = BinarySearchTree()
            self.right = BinarySearchTree()

    def is_empty(self):
        """
        Returns True if the tree is empty and False otherwise.
        """
        return self.value is None
    

Not only is the structure of the tree constrained, but since we have a particular property we need to preserve, we need to control access to it. We've seen this idea before—stacks and queues can be based on lists but have their access controlled. So similarly, BSTs are based on trees and have their access controlled. There are many other data structures (quadtrees, binary heaps, etc.) that are based on restrictions on trees.

So instead of using add_child, we need an insert function specifically for BSTs. There is an algorithm for this and we find that as we insert items into the tree, there's exactly one place for each item to go. An observation that we can make is that the exact tree you get depends on the order that you insert the items.

For example, here is another possible BST using the same integers as above. In the above example, we inserted items into the tree in the following order: 48, 28, 56, 15, 31, 54, 93, 21, 92, 94. In the following example, we insert items in the following order: 54, 21, 92, 15, 28, 56, 93, 31, 94, 48.

 54
  ├──21
  │  ├──15
  │  │  ├──EMPTY
  │  │  └──EMPTY
  │  └──28
  │     ├──EMPTY
  │     └──31
  │        ├──EMPTY
  │        └──48
  │           ├──EMPTY
  │           └──EMPTY
  └──92
     ├──56
     │  ├──EMPTY
     │  └──EMPTY
     └──93
        ├──EMPTY
        └──94
           ├──EMPTY
           └──EMPTY

Instead, let's talk about how BSTs improves lookup by implementing __contains__. Recall that for Tree, we needed to check every child of a node for our value. We claim that a BST improves on this.

This gives us the following algorithm.


    def __contains__(self, item):
        """
        Determine whether the given item is present in the tree.

        Input:
            item (int): the item to be found

        Output: True if item is in the tree, False otherwise (bool)
        """
        if self.is_empty():
            return False
        else:
            if item == self.value:
                return True
            elif item < self.value:
                return self.left.__contains__(item)
            else:
                return self.right.__contains__(item)
    

And now that we've seen this, it turns out we do essentially the same thing when we insert an item: we need to find where the item should be and if it isn't there already, we add it.


    def insert(self, item):
        """
        Add the given item into the tree if it isn't already present in the 
        tree.

        Input:
            item (int): the item to be inserted
        """
        if self.is_empty():
            self.value = item
            self.left = BinarySearchTree()
            self.right = BinarySearchTree()
        else:
            if item < self.value:
                self.left.insert(item)
            elif item > self.value:
                self.right.insert(item)
    

In theory, searching in a binary search tree is a huge improvement: we only ever go down exactly one branch of the tree. With a list or an ordinary tree, we have to check every element, since we don't know anything about how the data is organized. But here, our property allows us to cut down our search space by up to a half on each step in ideal conditions.

This sounds very impressive until you notice that I've couched my words a bit. Why only up to half the search space? What does ideal conditions mean?