Recall that we can view the list of projects described in the stimulus data as a tree. We want to aggregate these values so, for example, the IL node contained the sum of all the project nodes under it.
This involves two things:
So let's think about how we might do this. We have a tree, so let's break it up into the base and recursive cases.
This gives us the following function (notice it's not a method, sometimes we want to compute things about trees).
def aggregate_stimulusvalues(tree):
"""
Given a tree, aggregate the values of the tree and fill in the
value.
Input:
tree (Tree): tree containing integers as values
Output: sum of values in subtrees of tree (int)
"""
if tree.children == []:
return tree.value
else:
total_cost = 0
for subtree in tree.children:
total_cost += aggregate_stimulusvalues(subtree)
tree.value = total_cost
return total_cost
Trees are useful not just because they can represent hierarchical data, but because, like lists, they are a natural starting point for many other data structures. One such data structure is a binary search tree.
The idea behind a binary search tree is that it's a way to store a set of data that has an ordering and stores it in a way that it can be retrieved quickly. BSTs have two important properties that make them different from ordinary trees.
Given a non-empty tree with value $k$, all the nodes in the left child must have values that are strictly less than $k$ and all the nodes in the right child must have values that are strictly greater than $k$.
Here's an example.
48
├──28
│ ├──15
│ │ ├──EMPTY
│ │ └──21
│ │ ├──EMPTY
│ │ └──EMPTY
│ └──31
│ ├──EMPTY
│ └──EMPTY
└──56
├──54
│ ├──EMPTY
│ └──EMPTY
└──93
├──92
│ ├──EMPTY
│ └──EMPTY
└──94
├──EMPTY
└──EMPTY
In this rendering, each node has two subtrees: the "upper" one is the left subtree and the "lower" one is the right subtree.
So our BST class will look a bit different from the ordinary Tree class.
class BinarySearchTree:
"""
A class representing a binary search tree. A binary search tree is
either
- empty, or
- a node containing a value with a left subtree and a right
subtree, such that the binary search tree condition is satisfied.
Attributes:
- value: a value, which is None if empty
- left: BinarySearchTree, or None if empty
- right: BinarySearchTree, or None if empty
"""
def __init__(self, value=None):
"""
Creates a new BST with root value. If the given value is None,
then this creates an empty BST.
"""
if value is None:
self.value = None
self.left = None
self.right = None
else:
self.value = value
self.left = BinarySearchTree()
self.right = BinarySearchTree()
def is_empty(self):
"""
Returns True if the tree is empty and False otherwise.
"""
return self.value is None
Not only is the structure of the tree constrained, but since we have a particular property we need to preserve, we need to control access to it. We've seen this idea before—stacks and queues can be based on lists but have their access controlled. So similarly, BSTs are based on trees and have their access controlled. There are many other data structures (quadtrees, binary heaps, etc.) that are based on restrictions on trees.
So instead of using add_child, we need an insert function specifically for BSTs. There is an algorithm for this and we find that as we insert items into the tree, there's exactly one place for each item to go. An observation that we can make is that the exact tree you get depends on the order that you insert the items.
For example, here is another possible BST using the same integers as above. In the above example, we inserted items into the tree in the following order: 48, 28, 56, 15, 31, 54, 93, 21, 92, 94. In the following example, we insert items in the following order: 54, 21, 92, 15, 28, 56, 93, 31, 94, 48.
54
├──21
│ ├──15
│ │ ├──EMPTY
│ │ └──EMPTY
│ └──28
│ ├──EMPTY
│ └──31
│ ├──EMPTY
│ └──48
│ ├──EMPTY
│ └──EMPTY
└──92
├──56
│ ├──EMPTY
│ └──EMPTY
└──93
├──EMPTY
└──94
├──EMPTY
└──EMPTY
Instead, let's talk about how BSTs improves lookup by implementing __contains__. Recall that for Tree, we needed to check every child of a node for our value. We claim that a BST improves on this.
False.
True.
This gives us the following algorithm.
def __contains__(self, item):
"""
Determine whether the given item is present in the tree.
Input:
item (int): the item to be found
Output: True if item is in the tree, False otherwise (bool)
"""
if self.is_empty():
return False
else:
if item == self.value:
return True
elif item < self.value:
return self.left.__contains__(item)
else:
return self.right.__contains__(item)
And now that we've seen this, it turns out we do essentially the same thing when we insert an item: we need to find where the item should be and if it isn't there already, we add it.
def insert(self, item):
"""
Add the given item into the tree if it isn't already present in the
tree.
Input:
item (int): the item to be inserted
"""
if self.is_empty():
self.value = item
self.left = BinarySearchTree()
self.right = BinarySearchTree()
else:
if item < self.value:
self.left.insert(item)
elif item > self.value:
self.right.insert(item)
In theory, searching in a binary search tree is a huge improvement: we only ever go down exactly one branch of the tree. With a list or an ordinary tree, we have to check every element, since we don't know anything about how the data is organized. But here, our property allows us to cut down our search space by up to a half on each step in ideal conditions.
This sounds very impressive until you notice that I've couched my words a bit. Why only up to half the search space? What does ideal conditions mean?