CMSC 14100 — Lecture 3

Working with data

So far, we've seen exactly one program: display "Hello world" on the terminal. This is not a particularly interesting program, largely because it never changes. When we think of the usefulness of computers and programs, it's their ability to take some input and produce some result based on that input.

Part of solving problems computationally will be in understanding the information being received and produced for your problem and managing its representation as data in your programs. Python has several basic tools and concepts for this purpose.

Types

When a value is stored or represented by the computer, what does it look like? In reality, it's literally just a bunch of ones and zeroes—all of it, whether it's numbers, text, or anything else. So how does a computer interpret what all of these things actually mean?

One way that programming languages deal with this is by assigning types to values. Every value in Python has a type associated with it. Because Python is dynamically-typed, we do not need to tell it what type some value is—it will try to figure it out on its own. (Other languages are statically-typed and require us to explicitly declare types)

Types are important because they describe the possible values a piece of data can take on (i.e. what kind of "thing" is it?) and they dictate what kinds of actions and operations we can apply to a value.

Python has a number of built-in types. Here are a few of the more common ones that we'll be using (these are also found in other programming languages).

Integers (int) are the same as integers ($\mathbb Z$) you may remember from math. Because they are whole numbers, they are stored and represented exactly. Integer literals can be created simply by typing them, like -7 or 2434. This is as opposed to...
Reals (float), or real numbers, which are also called floating-point numbers. Reals or floats are representations of real numbers, just like you see in math ($\mathbb R$). Unlike integers, representing a real number exactly is impossible. Instead an approximate representation, called floating-point numbers, are used. This is a standard representation across almost all programming languages. Like integers, these float literals can be created by the obvious representation, such as 3.15 or -9.8. Note that to create a float of a whole number, we include the fractional part explicitly: 5.0 is a float while 5 is considered an integer.
Strings (str) are text data, which can be interpreted as sequences of characters. String literals are denoted by being surrounded by either single or double quotes. It is important to remember the quotes: both "A good idea" and 'A good idea' are both interpreted as the same string, while A good idea is not recognized as a string (or anything in particular).
Booleans (bool) are truth values, either True or False. Note that this is case sensitive: true (which will not be recognized) is not the same as True (a boolean value).

None

Python also has a special value called None which has type NoneType. The value None denotes "no value". This is semantically different from 0 or False.

For example, consider what the difference in meaning may be between a grade of None (no grade assigned) and a grade of 0 (received a grade of 0).

Variables

Something about computers that we can take advantage of is the fact that they can remember a lot of things. Where do they remember them? In its memory. So how do we put and get things in memory? As you may know, computers are very complicated machines, so this is very complicated, but luckily we don't actually need to worry about that—the programming language abstracts this process.

Instead of dealing with the memory directly, our programming language uses variables to manage this information. Just as in math, variables give us a way to refer to specific values that we would like to remember. We give a variable a name and we assign a value to it. In Python, one assigns a value to a variable in the following way:


    variable_name = <value>

Because of the use of = (a questionable decision from decades ago that we must all now live with), this is very easy to misread as variable_name is equal to or is the same as <value>, but this is not what it means. Rather, we should read this as "let $variable\_name$ be $\langle \mathrm{value} \rangle$". This means that variable_name is taking on the value <value>.


    message = "hello world"

Once we've defined a variable, we can use it in other places.


    print(message)

Here, Python will evaluate the variable and look up the value of the variable message to use in the print function.

Now, suppose we try to evaluate a variable that we have not defined yet.


    print(something_else)

Because this variable hasn't been defined yet, Python doesn't know what it is and it will tell you.


NameError: name 'something_else' is not defined

The final line of an error will usually tell you what went wrong and that is usually what you should focus on. Of course, you should take a moment to consider why you're receive that error. For example, if you knew you defined a variable like message but typed something like mesage, you'll get the same error. In this case, you might think that you've already defined your variable, but made a typo.

Unlike what's typical in mathematics, the value of a variable can also change.


    a = 0
    print(a)
    a = 2
    print(a)

Since variables hold values, we can assign variables to the value of some other variable.


    b = a
    print(b)

Suppose we update a after this.


    a = 4

What does b evaluate to now?

A common misconception is that the assignment b = a assigns the value of a to b at all times. But this is not the case. Rather, the assignment will always evaluate the right hand side and then perform assignment on the resulting value.

Let's walk through this more carefully.

Originally, $a \gets 2$.
Then we performed the assignment b = a.
- In performing this assignment, a evaluates to 2, so we have that $b \gets 2$.
After this, we execute the assignment a = 4 so $a \gets 4$.
So now a evaluates to 4 and b evaluates to 2.

Expressions

If we think back to math, we did more than just work with values. We also had expressions to describe more complicated values. Python has a number of basic operations built into it. For example, since we have number types, the usual arithmetic operations mostly work the way you would expect.


    2 + 3
    2 - 3

However, we also have the ability to store values in variables. So we can evaluate these variables and use them with these operations, as well as store the results.


    m = 23
    n = 34
    p = m + n

Multiplication is denoted by *.


    3 * 4

Division is denoted by / and behaves slightly differently. Since the integers aren't closed under division, dividing two numbers in Python is defined to always produce a float.


    22 / 7

Because of this quirk, there's another division operator, //, which will always produce an integer, specifically the quotient of the result.


    22 // 7

These correspond to the two understandings of division you may already have. In the first case, we have division over the real numbers, which will always produce a real number. In the second case, we have division as a function on integers, which will always produce a quotient and a remainder.

With that said, you might notice that // only produces the quotient. If you would also like the remainder, the modulus operator, %, will produce the remainder of the division of its operands.


    22 % 7

You can verify that it's the case that // and % together will give you everything you need for integer division.


    q = 22 // 7
    r = 22 % 7
    print(q * 7 + r)

One final note is that all of these operations are written infix. This is a bit of a problem because complicated expressions with multiple operations can be ambiguous. Luckily, Python follows the usual established convention for order/precedence of operations:

Parentheses,
Exponents,
Multiplication and Division,
Addition and Subtraction.

If two operations have the same precedence, they are evaluated from left to right.

String expressions

It is also possible to write expressions for strings. For example,


    "hot" + "dog"

This seems strange if you think of $+$ as "addition" on numbers, but we've actually seen an example of an operation working differently based on types: division. Just as we have division for reals and division for integers, we can have addition for numbers and addition for strings. This idea that we can define similar operations for different types of objects is common in algebra.

This operation is called concatenation. Informally, it takes two strings and joins them together. At first glance, it seems like a very arbitrary thing to do, but concatenation is actually a very well-defined operation that has its origins in the study of algebra on strings.

One interesting property that concatenation doesn't share with addition on numbers can be seen in the following example,


    "dog" + "hot"

The order of the operands matters in concatenation: $x+y$ is not necessarily the same as $y+x$, which is not what we understand from numeric addition. Formally, addition on numbers is said to be commutative and concatenation on strings is not commutative.

Boolean expressions

Along with arithmetic expressions, another type of expression that is important are Boolean expressions. These are expressions that evaluate to Boolean values, True or False. For example, statements like "5 is greater than 7" evaluate to either true or false (this statement is false by the way).

Python	Math	English
`x < y`	$x \lt y$	$x$ is less than $y$
`x > y`	$x \gt y$	$x$ is greater than $y$
`x <= y`	$x \leq y$	$x$ is less than or equal to $y$ $x$ is at most $y$
`x >= y`	$x \geq y$	$x$ is greater than or equal to $y$ $x$ is at least $y$
`x == y`	$x = y$	$x$ is equal to $y$
`x != y`	$x \neq y$	$x$ is not equal to $y$

We can also call these relational operators comparisons, since they compare values $x$ and $y$. These have a very well-established meaning for numbers, but it turns out that these operators can be used on strings as well. It's not hard to see how == and != can easily be applied on strings, but it may be surprising to learn that the other operators can be applied to strings too.

How exactly should we interpret "cat" < "dog"? Whenever we can use a relation like $\lt$, it means that the objects we're working with have some sort of order. Informally, we can think of strings as being ordered alphabetically (though that is not really the entire story, we'll leave it at that for now...).

We can use these basic boolean expressions (also called predicates) and combine them into more complicated expressions by using Boolean connectives. Suppose we have two Boolean expressions p and q. The Boolean connectives are defined by

conjunction: p and q is True if both p and q evaluate to True, and False otherwise.
disjunction: p or q is True if at least one of p or q evaluates to True, and False otherwise. Note that this definition of or is a bit different from many casual uses in English, where or tends to mean exactly one of the choices. For instance, if you are asked whether you would like tea or coffee, you would get weird looks if you say both.
negation: not p is False if p evaluates to True, and it is True otherwise.

The following table summarizes each of the cases above.

`p`	`q`	`p and q`	`p or q`	`not p`
`True`	`True`	`True`	`True`	`False`
`True`	`False`	`False`	`True`
`False`	`True`	`False`	`True`	`True`
`False`	`False`	`False`	`False`