CMSC 15100 — Lecture 6

Pattern matching

While structures are very handy for representing many kinds of data and they are very convenient in how Racket produces all of the necessary machinery for interacting with them, the process of extracting fields is quite cumbersome. We can mitigate some of this by using local definitions, as we saw last class, but it still seems a bit inelegant.

For instance, suppose we wanted to work with our fictional Property structure. We'd have to do something like the following


(define-struct Property
  ([address : String]
   [land-value : Integer]
   [trees : Integer]
   [impv-area : Integer]
   [district : String]
   [pops : Boolean]))

...

(local
  {(define addr (Property-address pr))
   (define val (Property-land-value pr))
   (define trees (Property-trees pr)) ...})

Setting this laundry list up is slightly better than having to repeatedly write (Property-...) but not by much. However, there is an alternative mechanism for this: pattern matching.

To do this, we use a match expression, which has the following form:


(match expr
  [pat-1 expr-1]
  [pat-2 expr-2]
  ...
  [pat-n expr-n])

The interpretation of this statement is that it will evaluate expr and try to match the value against one of the given patterns. If expr evaluates to a value that corresponds to a pattern the entire expression will evaluate to the expression for that pattern.

Here, each pat is a pattern. But what is a pattern? Surprisingly, a pattern is not the same thing as an expression. There is a lengthy formal definition of what constitutes a pattern, but we will not be so exhaustive. Here are the pieces relevant for our purposes.

A pattern can be a

constant, such as a Boolean, String, or Number or other literal constant,
name/variable,
wildcard (denoted by _, an underscore), or
a structure pattern.

Let's see what happens with each. The constant is the easiest: the expression simply takes the branch that matches the value exactly.


(match (* 2 3)
  [3 "first"]
  [6 "second"]
  ['a "third"])

A few things to note from this example. First, unlike cond expressions, there is no requirement that we need to cover every possible case. Secondly, it's important to remember that patterns are not just expressions. For instance, the following would not be a valid match statement and Racket will be upset with it.


(match (* 2 3)
  [(* 1 6) "first"]
  [6 "second"]
  ['a "third"])

Next, matching a name or variable is quite simple: every expression will match to a name.


(match (* 2 3)
  [72 "first"]
  [n (+ n 100)]
  ['a "third"])

An interesting thing occurs when we match on a name: the expression becomes bound to the name and we can use it in our result. For example, the above expression matches on the name n and we can compute a result based on n.

The wildcard, _ is similar. This will match every expression, but unlike names, the result is discarded.


(match (* 2 3)
  [72 "first"]
  [_ 100]
  ['a "third"])

However, it is the structure matching that is going to be of the most interest to us. Suppose we have a point p. Currently, if we want to bind names to the fields of p, we would have to write something like


(local
  {(define x (Point-x p))
   (define y (Point-y p))}
  expr)

If we use pattern matching, our expression becomes


(match p
  [(Point x y) expr])

We can return to our quadrant example from last class and rewrite it as follows.


(: quad (-> Point Integer))
;; Given a point, which quadrant does it lie in?
(define (quad p)
  (match p
    [(Point _ 0) 0]
    [(Point 0 _) 0]
    [(Point x y) (cond
                   [(and (> x 0) (> y 0)) 1]
                   [(and (< x 0) (> y 0)) 2]
                   [(and (< x 0) (< y 0)) 3]
                   [else 4])]))

More generally, structure patterns have the form


(struct-name pat1 pat2 ... patn)

where struct-name is the name of a structure type and $n$ patterns for $n$ fields, in the order of the fields as defined in the structure. Note from our example above that we made use of constants, wildcards, and names in our structure pattern, but this also suggests that structure patterns can contain other structure patterns.

It's important to note that match expressions are used differently from cond expressions. Because they have a very similar structure, it's easy to conflate the two. But cond is for testing conditions, which can be any boolean expression one wishes, while match simply tries to match a given expression to some value.

Union Types

Sometimes we want things to be of more than one possible type. For example, consider your instructors, who will be compiling your grades at the end of the quarter. At the University of Chicago, we use letter grades for everything. How should these be represented? Since these seem like text, strings appear to be a natural choice. But since there are only a small handful of strings that are actually valid grades, symbols would also be a fine choice. For simplicity, we'll go with strings for now.

But not every school uses letter grades. For example, one of your instructors has spent most of his career at schools which use a percentage grade system, which assigns a grade from 0 to 100. So one might think that a reasonable type to use for such a system is Integer. However, as you may or may not be aware, grades can be more than just your quality grade: we can have Pass/Fail grading, codes for indicating whether a student has withdrawn, and many other special cases. So it makes sense that we might want to make grades both Integer or String.

This is the idea behind what are called union types in Racket. These go by several different names: tagged unions, sum types, variant types. The idea is that something can be one of several types (but not at the same time).

As an aside, this is in contrast with structures, or product types, also called tuples or records. Sum types (unions) and product types (structures) are the most common examples of algebraic data types.

First, let's define the syntax for union types. A union type consisting of types Type-1, Type-2, ..., Type-k is defined by

(U Type-1 Type-2 ... Type-k)

You can then use this type anywhere you'd use a type name. For example, from our grade example above, we may have the following grade field in a structure for student information

(define-struct Student
  (...
   [grade : (U Integer String)]
   ...))

Of course, using (U Integer String) everywhere is kind of a pain and not very illuminating. We can give this type a name, using define-type.

(define-type Grade (U Integer String))

Recall that symbols define the type consisting of themselves. A very simple use of union types is to collect a bunch of symbols into a single type. These are called enumerations because they are really just finite sets that we enumerate. An example of these would be something like, say, cardinal directions.

(define-type Directions (U 'north 'east 'south 'west))

Defining a type for enumerations like this is very useful because one of our problems with using strings or symbols before was that there are very many more possible strings and symbols than we'd like to account for. The type checker would be displeased if we didn't handle those cases. For example, something like

(: time24 (-> ... Symbol Image))
(define (time24 ... ampm)
  (cond
    [(symbol=? 'am ampm) ...]
    [(symbol=? 'pm ampm) ...]))

would set the type checker off because what if we gave it 'sourdough as a possible am/pm? Using a type like (U 'am 'pm) instead of Symbol fixes this by giving Racket enough information to determine that we've covered all of the cases. We can then push the responsibility of checking for bad inputs back onto the type checker, which is really the entire job of the type checker anyway.

But recall that structs also define a type! Suppose we want to define a structure for a circle:


(define-struct Circle
  ([centre : Point]
   [radius : Real]
   [colour : Image-Color]))

That wasn't so hard. We can do something similar for rectangles.


(define-struct Rectangle
  ([centre : Point]
   [width : Real]
   [height : Real]
   [colour : Image-Color]))

Now, these are both shapes that happen to have sizes and can be drawn by Racket. What we could do is define a functions for doing these things for each type on their own. But if we think about it, these are similar enough that they will likely have the same kinds of functions associated with them: looking up widths, heights, getting drawn, and so on. We can unify these into one type and create functions that operate on those types.

(define-type Shape (U Circle Rectangle))

Now, here's a sticky situation: if we're given a Shape, how do we determine whether it's a Circle or Rectangle? One way would be to write a conditional to test the Shape using Circle? and Rectangle?. However, this is another great use for pattern matching:

(: shape-area (-> Shape Real))
;; Computes the area of the given shape
(define (shape-area shape)
  (match shape
    [(Circle _ rad _) (* pi (sqr rad))]
    [(Rectangle _ w h _) (* w h)]))

This is where pattern matching really shines. Here, we took an arbitrary Shape and used pattern matching to identify the specific kind of shape and assign useful names to the corresponding fields. Without pattern matching, we would have had to first test for the specific kind of shape and then write definitions or use the appropriate selectors for each case.

Here's another example of a function, this time for drawing a shape.

(: draw-shape (-> Shape Image))
;; Produces an image for the given shape
(define (draw-shape shape)
  (match shape
    [(Circle _ rad col) (circle rad 'solid col)]
    [(Rectangle _ w h col) (rectangle w h 'solid col)]))

Note that since we're not placing the shapes on a plane, we can safely ignore the centres. But if we put a bit more work in, we certainly could.