CMSC 15100 — Lecture 1

Welcome to CMSC 15100! Basic course information can be found on the home page of the course website.

What is this course about?

This course is an introduction to computer science. But what is computer science? Typically, when we think about CS, we think about programming: coding, building websites and apps. Some people will get further along to the idea of writing programs to solve problems. So you should expect to learn how to program and how to solve problems by writing programs. Of course, a common assumption is in just how much of each of these things we will be doing. More specifically, we are interested in solving problems by writing programs rather than programming.

This understanding of computer science stretches back to its origins in the early 20th Century, before computers as we know them existed. One of the big questions in mathematics, and in particular mathematical logic, was whether there was a procedure that one could follow to determine the truth or falsehood of a statement in first-order logic. The idea is that since the rules of logic were set, one could just follow the rules and come up with a proof for a particular statement. Back then, these ideas were really about formalism: whether we could formally express all of mathematics in a way such that any mathematical result could be derived by applying a small set of rigidly defined rules.

Nowadays, we recognize this as something that computers are very good at. But all of these ideas led to a very surprising result: such a procedure doesn't and can't exist! In modern terms, there is no algorithm or program that will automatically prove whether a mathematical statement is true or false. Mathematics is impossible to encode and solve automatically using a computer!

This is the very first result in what we now understand to be computer science. And it was first proved in 1936, by a guy named Alan Turing. This timeline may be surprising to you, as it's much more well known that Turing was employed as a cryptanalyst by the Government of the United Kingdom during the Second World War and that computers themselves started appearing at around that time as well. 1936 would place the birth of computer science before the development of computers!

While computer science as a discipline has grown tremendously since the 1930s, the core of computer science still remains the same. The questions that lie at the heart of computer science are still about how difficult it is for computers to solve various problems. Are computers able to solve this problem? Can they do it efficiently? And as with the first steps in answering these questions, these are really questions about problems in mathematics rather than computers. In that sense, computer science is really the mathematics of problems and how to solve them.

Why should you take this course?

One obvious answer is that this is a course that you will need to complete a major in CS. However, you could have taken CMSC 16100 just as well, which accomplishes the same things that we hope to. This leads to the question of what it is exactly that we want to do here and how this might appeal to someone who might not be planning to complete a CS major or might be wondering what the difference is between this course and CMSC 12100.

Basically, this course introduces the basic ideas behind computational problem solving via programming. These goals are also shared by CMSC 12100, but we focus on a more theoretical approach that is intended to give you the foundation to go on to deeper computational thinking.

What will we be covering in this course?

The vehicle through which we will be learning to program is the programming language Racket. Racket is a functional language that is a descendant of a variant of Lisp called Scheme. It is primarily used for introductory computer science education and programming language theory research, but there is a very active community that supports it and does lots of interesting stuff with it.

Programming is really about telling the computer what to do. In this analogy, we need to learn a language that the computer understands in order for us to be able to tell it things. There are many such languages out there that give us different ways to tell the computer things.

Most popular programming languages which are used in software engineering are what we call imperative languages. These languages more closely mimic and resemble the operation of the computer as a machine. Programming in these languages is about manipulating and maintaining the state of the machine. In other words, expressions are imperative in that they're like commands we give to the computer to do something.

Racket is a functional language. The idea behind functional programming languages the idea of functions as the primary unit of computation. Here, expressions evaluate to values and we are concerned with the resulting values from every expression. This resembles the way we think about mathematics more than thinking about a machine. It is for this reason that we introduce the ideas behind computing through a functional paradigm rather than an imperative one.

You will learn to think and program in both of these paradigms by the time you finish the introductory CS sequence. While this course is focused on functional programming, you will learn and work with imperative programming in CS 152.

A terminological aside

Now, you might notice that functional is not really the same as imperative in that it doesn't quite tell you what you're saying. Perhaps a more suitable foil to imperative is something like declarative. Indeed, functional languages tend to be declarative, because we declare a bunch of functions, which themselves are compositions of functions. There's not much that's imperative.

Declarative langauges encompass functional languages, but there are declarative languages that are not functional. The most obvious example is Prolog, which is a logic programming language. Much like functional programming languages, a logic programming language consists of declaring a bunch of logical statements or sentences. The computation of a logical statement depends on the statements that have been declared before it, much like how functions are compositions of functions.

The Shell

One of the immediate challenges we face, before getting into more high-minded ideas about computer science, is the question of how the heck to tell computers what to do. We interact with computers in a variety of ways. With personal computers, we use keyboards and mice and interact through a graphical user interface. With phones and tablets, we interact using touch. And an increasingly common way to interact with computers nowadays is through voice. But before many of these, we primarily interacted with computers through text commands.

For programmers, text is still an important way of interacting with the computer. If you think about it, a program itself is really just a big old text file containing a bunch of instructions. Much like expressions in programming languages, text commands are composable in a way that is not easily replicated in a graphical interface.

The textual interface that we use is called the command-line shell. The shell itself is a piece of software that interprets user input and produces output. The program which we use to access the shell is called a terminal emulator or terminal for short. This is a callback to the olden days when terminals referred to the actual computer that you used. This distinction is important because there are many different shells and terminal emulators that one can use.

For this course, you will need to use a Unix platform for your computing. We will assume that you are using one of the three most popular computing platforms: Linux, Mac OS, or Windows. Luckily, both Linux and Mac OS are Unix platforms. Things are slightly more complicated if you use Windows.

Assuming that you are prepared to move on, you can open a terminal and be presented with a prompt, which may or may not look like

timng@linux1:~$

At this point, the computer is waiting for you to give it a command. However, even here, we can divine a few things. Here, linux1 refers to the name of the machine that you have logged into (linux1 is one of the CS department's Linux machines) and timng is my username. This helpfully tells me that in this particular terminal, I am logged into the machine linux1 as the user timng, which is useful information if I have several terminal sessions open.

The prompt also has tells me that I am in the directory ~, which is the Unix shorthand for my home directory. Each new terminal session typically begins in your home directory. But where is my home directory exactly? To see what the current directory is, we use pwd.

timng@linux1:~$ pwd
/home/timng

This is a good time to talk about paths and the file system. When you begin a terminal session, you are placed somewhere within the computer's file system, usually in your home directory. You then execute commands relative to your location in the file system. So, it's important to understand how to navigate the file system.

Files, which are actual useful data, are organized and placed in directories, which we often think of as folders. Directories can be placed in other directories, and this gives us a nice tree structure to our file system. This allows us to name and find things by using paths. These paths act as an address or directions telling us how to get to a particular location.

There are two ways to think of the locations in the file system: the absolute path or the relative path. The absolute path begins with a /, which is the root directory. This is the topmost directory in Unix systems. Every location in the file system is reachable from the root directory and has a unique path from the root. So /home/timng can be found starting from the root directory, entering the home directory, and entering the timng directory.

The other way to think about paths is relative to your current location. Recall that sessions typically begin in your home directory. We refer to the current directory by a single period .. It's clear how we would refer to files and directories within this directory, but what about those outside of it? We can refer to the parent directory by two periods, ...

Consider the following directory structure.


//
|-- home/
|   |-- timng/
|   |   |-- cs/
|   |   |-- html/

Suppose our current location is /home/timng. Then ./html refers to the path /home/timng/html and .. refers to the path /home. Furthermore, ../.. would then refer to the path /. Since the root is reachable from any location, in effect, it is possible to refer to any location in the file system using a relative path.

Note that on Unix systems, the path is delimited by forward slashes /, while on Windows, the path is delimited by backslashes \. How do you tell which one is which? Imagine a stick |. Since we're reading left to right, if the stick leans forwards in the direction of reading / then it's a slash. If it leans backwards against the direction of reading \ then it's a backslash.

Now that we have a notion of naming where we want to go, we can learn how to move around the file system. First, we can see what's contained in the current directory by using ls

timng@linux1:~$ ls
cs  html

These happen to be directories. This is not immediately obvious. One way we can get some more information from ls is to modify it with a flag, -l.

timng@linux1:~$ ls -l
total 4
drwxrwxrwx  2 timng timng   28 Sep 22 15:04 cs
drwxr-xr-x 12 timng timng 4096 Jul 29 17:26 html

Commands in the command-line interface can be augmented with arguments which follow the command and are separated by spaces. These arguments can modify the behaviour of the command or are sometimes used as input for the command. Here, -l tells ls to list the contents of the directory using the longer listing mode.

This is all very useful, but how are we supposed to know that -l will give us more information? There are two main ways (other than asking Google, of course). The first is by using the --help flag. This will usually give you more information on how to use a particular command.

The other way is to use man. This will give you the man(ual) page for a command, which is often far more comprehensive than what the --help flag will give you. One thing to note is that if you are reading a man page, it may not be clear how you should stop reading it: once you are finished reading, you can press q to return to the prompt. These will also often show up online.

Now, we can change our location by using the command cd for change directory.

timng@linux1:~$ cd html
timng@linux1:html$ pwd
/home/timng/html

Recall that commands are relative to our position in the file system, which is why moving into the html directory was possible just by specifying html rather than its absolute path, /home/timng/html. However, we could have chosen to do so. For example, suppose we wanted to get to the root directory. The easiest way to do this is to use the absolute path /. However, from our current location in /home/timng/html, we can also use the relative path ../../...

Version Control

In this course, you will be submitting homework assignments via git. git is a version control system, originally authored by a guy named Linus Torvalds. It is available for download on all of the major platforms at git-scm.com.

Beyond being handy for submitting assignments for our course, version control systems are extremely useful tools for keeping track of changes and managing large projects where multiple people will simulatneously be working on parts of it. While they were developed for managing software projects, version control systems can be used to manage many other kinds of projects where being able to track changes and manage concurrent work by multiple people is necessary.

There are many version control systems out there (subversion, cvs, Perforce, etc.) but git is currently the most popular and widespread one in use. In addition to being used almost everywhere nowadays, it will be used in many future CS courses here.

Often, you will need to clone a repository so that you can begin working on it. In CS 151, a repository will be created for you, which you will need to clone. Your assignment submissions will then be tracked via the repository.

timng@linux1:~$ git clone https://<url-for-your-project>

Cloning a repository creates a local copy of the entire repository, including the history. Unlike many other version control systems, there is no central or master repository—everyone who is working on the project has an entire copy of the repository. This will not be something that will come up in your work in this course.

You can then happily work and make changes on your local copy of the repository. However, at some point, you will likely want to make the changes in your repository available to everyone else who is working on the project. In the context of this course, this will be when you want to make a submission.

In many other version control systems, one simply adds files and commits changes to a central repository. Things are a bit more involved with git, since there is no central repository. Here is the process in full:

Add files that have changed that you want to commit to the staging area.
Perform a commit. Write a good commit message.
Push your commit to everyone's repositories.

git uses the idea of a staging area to prepare a commit. The idea is that you may not want to commit every change that you've made, particularly if you're working on different parts of the codebase. The staging area allows you to control which changes are ready to be committed. Files can be added to the staging area by

timng@linux1:your-project$ git add work.rkt

To commit changes that have been staged, use

timng@linux1:your-project$ git commit -m "A useful commit message"

You also have the option of specifying files to commit. If you do, the files will be committed, without committing the staged files. The -m flag allows you to specify a commit message. If you do not provide a message this way, git will explicitly ask you for one, so this is more of a step to save time.

Commits are then distinguished from pushes. A commit does not mean that you changes are available to everyone else. At first, this separation seems a bit strange, but this allows you to make commits locally to act as checkpoints. You can then manage revisions locally before making major changes available to others. To do so, use

timng@linux1:your-project$ git push

So you should always remember the process:

Add files that have changed that you want to commit to the staging area.
Perform a commit. Write a good commit message.
Push your commit to everyone's repositories.

In particular, for this course, if you don't push your changes, we will not receive them; i.e. you will not have submitted your work until you push.

Of course, it's important to remmeber that git is a version control system and not an assignment submission system. Because students often treat git as a submission system, we often find that students commit and push very rarely except near a deadline. This is very bad for a number of reasons, such as the chance that something catastrophic happens and you are unable to submit your work (this happens surprisingly often).

Instead, you should take advantage of the fact that git's job is to keep track of changes that you make. Don't be afraid to try things out and commit them when you get one part working. Not only does this ensure that you have something working at all times that you can fall back on, but it also creates a record of your work.

Racket

As mentioned earlier, we will be using the Racket programming language. It can be conveniently downloaded from racket-lang.org. Sinmply follow the instructions to install it.

The main program that we are concerned with is DrRacket, which is the development environment for Racket. While it was designed largely for teaching purposes, it is powerful and complete enough that it remains the most popular development environment for general Racket programmers.

One of the first things we see when we open DrRacket is a window with two panes. The top pane is Definitions pane, where we will edit our program. The bottom pane is the Interactions pane, where we can interact with the code that we've written. You will also notice that the Definitions pane starts out with a file with the line

#lang racket

This line says that the language that we would like to use is, uh, Racket, which is a bit strange.

As it turns out, Racket is not just a single language, but it comes with multiple languages and is a very convenient programming language for creating programming languages. This is going to be important, because we are not going to be using the standard Racket language, but one of the alternative languages, Typed Racket.

We can change to this language simply by changing the first line to

#lang typed/racket

Part of the first lab will involve setting up DrRacket for use in this course. We will also begin to cover the basics of (Typed) Racket next class.