CMSC 15100 — Lecture 25

Implementing traversals, continued

Last time, we saw the breadth-first traversal algorithm in code. This is another example of generative recursion and the use of accumulators. Notice that the recursion here occurs on the queue. Unlike many of the other functions we've seen, the queue does not consistently move towards the base case. In fact, we begin with only one element on the queue and add elements to it as the algorithm runs. So how do we guarantee that the recursion stops?

Curiously, this is guaranteed by the marked vertices. Though it does not represent the final result, the list of marked vertices acts as context for the computation in the same way that an accumulator does. As the list of marked vertices grows, the number of elements that can be added to the queue shrinks, until there are no additional candidates left.

Now, let's return to depth-first traversal. We need one more helper function: testing whether a vertex has been marked.


(: marked? (-> Symbol (Listof Symbol) Boolean))
;; Given a vertex label and list of marked vertices, determine whether 
;; the vertex has been marked.
(define (marked? v marked)
  (ormap (lambda ([u : Symbol]) (symbol=? u v)) marked))
    

This gives us the following function.


(: depth-first-traversal (-> Symbol Graph (Listof Symbol)))
;; Produce a list of labels of vertices visited via depth-first traversal
(define (depth-first-traversal v g)
  (local
    {(: dft (-> Graph (Stack Symbol) (Listof Symbol) (Listof Symbol)))
     (define (dft g s marked)
       (match (pop s)
         ['None '()]
         [(Some (Pairof curr next))
          (local
            {(define us (unmarked-neighbours curr g marked))}
            (if (marked? curr marked)
                (dft g next marked)
                (cons curr (dft g 
                                (push-all us next) 
                                (append marked (list curr))))))]))}
    (dft g (push v '()) '())))
    

State

One of the things that many people find strange about the functional programming model of computation is the fact that there is no such thing as "state". In other words, there is no mechanism for the computer to "remember" things, nor is there any way for us to "update" this knowledge.

In some ways, we get close to the idea of "memory" when we deal with Universe programs and accumulators. In the case of universe programs, the World can be considered the "state" of the program, while in the case of accumulators, the accumulators "remember" the context of previous levels of recursion. However, both of these things are still technically values and are not able to be stored and modified—every time we "update" a World or an accumulator, we are really tossing out the old one and creating an entirely new one.

It's worth thinking about why we choose to defer using such a mechanism and how this fits in with the broader discipline of computer science. Recall that computer science, despite our best efforts, has its roots firmly in mathematics. Of course, in mathematics, we're not super concerned with how fast a process takes, since we can just assume it and it's true. This thinking extends to the idea of state—it's simply not necessary to think about.

This changes when we have to work with real machines. Real machines have state and can remember things. This adds a complication to how we think about programs. Before, all we had to be concerned with was the input and output of functions. But what happens when functions can do things other than produce a value? All of a sudden, our assumption that a function produces exactly one thing no longer necessarily holds.

When a function does this, we say that it has side effects. An example of this is the display function. This function has no output value, but it does something. It takes your input and puts it on the screen. This is not technically an output, because there's no value that's produced that can be passed on to some other function. How do we reason about such functions?

We saw that even without true state, reasoning about Worlds or accumulators is more difficult than reasoning about structural recursion. It's because of this that we avoided notions of state when first introducing ideas about computer science. But now, you should have enough familiarity with working with data and abstractions to begin thinking about ideas that don't fit neatly into the paradigm of functions taking inputs and producing outputs.

Vectors

A vector is a fixed-length array of arbitrary values. They can be created in a similar way as lists, by using the vector function:


(vector 1 2 3)
    

If you try running this in the interactions pane, DrRacket will tell you that this is


'#(1 2 3)
    

Note that this is very similar to the result of (list 1 2 3), which is '(1 2 3). The difference is the #, which denotes that this is a vector rather than a list.

Like lists, vectors are a polymorphic type and have the type constructor Vectorof. So the above list can be said to be a (Vectorof Integer).

To most people, the difference between lists and vectors is insignificant: they're both linear structures that hold a bunch of stuff. In some languages and settings, they're treated very similarly. But the underlying ideas behind each of these structures are quite different. As we've seen, lists are a recursive structure. But what about vectors?

Note that a vector is of fixed length. This differs from a list, which can grow and shrink. But why is it important that a vector has fixed length?

Recall that one of the issues that we keep running into with lists is that in order to access a specific item in the middle of our list, say the $n$th item, we need to traverse our list in order to get there. In other words, we need to step through $n$ things to get there, which can be a problem if $n$ is large.

But since vectors have a fixed size, they can support immediate access to any element in the vector. To understand why this is, we need to dig a bit deeper into how they're stored in the computer.

When programs are run, they use the computer's memory. Memory is not just an abstract concept or a massive cloud or blob, but it has structure. Crucially, when something is stored in memory, the computer needs to know where to find it. Each of these locations has an address.

Recall that a list has a head and a tail, which is another list. Internally, these can be thought of as two pieces of information: the value we want and the address of another list. This is why it's necessary to comb through a list: the address for the next part of the list can be anywhere. In essence, we're forced to follow this trail of addresses as though they were breadcrumbs.

In contrast, since vectors have a fixed size, we can reserve a contiguous chunk of memory for it. What this allows us to do is calculate the address of a particular element in the vector very easily. It's quite simple—suppose we have the address for the front of the vector, say $a$. Then the $i$th element of the vector can be found at the address $a+i$ (coincidentally, this is one reason why we start counting at 0).

You will see a lot more of this kind of thing in CS 152.

Vectors have a variety of built-in functions that are basically carbon copies of list functions. For instance, we can access a particular element in the array:


(: vector-ref (All (A) (-> (Vectorof A) Integer A)))
    

We can also ask for the length of a vector:


(: vector-length (All (A) (-> (Vectorof A) Integer)))
    

Recall that finding the length of list requires us to traverse the list. With vectors, we can avoid that because we set its size when we first create it. This means that the vector can just keep that information around. This is particularly useful for the program to protect against trying to access memory addresses that are outside of the vector (a common source of bugs and security issues).

We can convert a list to a vector and vice-versa:


(: vector->list (All (A) (-> (Vectorof A) (Listof A))))
(: list->vector (All (A) (-> (Listof A) (Vectorof A))))