What is the mathematics of computer science and data analysis anyway? In short, what we really mean is linear algebra. What is it and why does it underpin the science of data analysis?
Broadly speaking, linear algebra is, simply put, the algebra of linear systems. Historically, linear algebra arose from an exercise you may be familiar with: solving systems of linear equations—these are equations where the variables have degree at most 1, so we don't deal with any squares or cubes or anything complicated like that. Much of the early theory arose in the 16th and 17th centuries, due to mathematicians like Gauss, Leibniz, Cauchy, and many others.
At some point, we realized that it makes sense to treat such a system as one "thing" and to study it like that. And so mathematicians ask the usual questions about "things": what do these things do, what do they look like, and so on. As with a lot of mathematics, this led to a great number of applications in engineering, physics, chemistry, and all across the sciences. But for a long time, this was limited as much by our imagination as it was by our capability to work with large enough objects.
This changed as soon as we had access to machines that could do some number crunching for us. One of the beautiful things about computer science is how it has revived and revitalized many areas of mathematics. This is not surprising for a machine that is essentially built out of math.
One view of data science and machine learning is that it's nothing but searching through patterns using linear algebra. Everything sounds pretty easy when you think about it like this. Of course, one can say the same about quantum computing or anything that computers do: they just do a bunch of math.
Our purpose is to understand the how and why of all of this. Of course, with the proliferation of computing power and smart people writing programs, it's easy to wonder what the point in learning the mathematics is. So obviously, we're not learning this stuff so we can whip up our own linear algebra libraries, or even our own machine learning algorithms.
Rather, it's the recognition that when we solve a "real-world" problem using computers, somewhere underneath, after scratching enough layers off, what we're really dealing with is some form of math problem. And there's a bit of a paradox here—understanding how we get to that math problem and what we do with it is not just a mathematical problem.
Giving up control of this process amounts to letting the software engineers make the decisions. This may have been fine ten or fifteen years ago, but half the reason we're here is because we recognize that it isn't enough just to leave it in their hands.
One of the reasons that makes data science work is the idea that data is not just statitstical. Certainly, that's a big part of it, but the underlying mechanisms that make, say, machine learning work are about the hidden structures in the data. Such structures are algebraic in nature, and so we exploit linear algebra to uncover these structures.
A natural question is why we should stop at linearity—surely it can't be the case that such complex relationships are just the product of linear relations. And it's true that it's impossible for everything in the world to be related purely linearly. But it's important to understand the goal of all of this stuff.
Linearity is practical: it's easy to understand, easy to explain, and easy to compute. But we shouldn't confuse easy with simple. Linearity can give us . As you'll see, restricting ourselves to linearity still gives us a wealth of tools to work with and develops a theory that's deep and rich enough to explore for a lifetime.
So we've covered the "linear" part of "linear algebra", but what about the "algebra" part? One thing that's helpful to do when entering a new mathematical area is to try to orient it with other areas of mathematics you may already be familiar with. In your case, these will likely be statistics and calculus. It's calculus that will be a more helpful contrast.
Mathematics, particularly pure mathematics, is concerned with understanding the world through structures. For example, calculus is the mathematics of continuity and concerns itself with the study of objects like the real numbers, continuous functions, and curves.
You may be familiar with algebra as the part of math where you're trying to solve equations by manipulating them. However, that's not really the essence of algebra. Rather, algebra is all about structures, but these are typically not continuous (though this is not always the case). In this sense, it is very similar to an area of mathematics that you may not think of as or want to believe is math, which is computer science.
In primary schooling, we start off with very simple algebra: arithmetic on the integers. We learn about the structure of numbers by two basic operations: addition and multiplication. This leads most people to think of algebra as trying to solve equations (find $x$, here it is), but algebra is actually all about taking some basic operations and applying them to all sorts of different objects and seeing what kinds of structures we get out of them. This is what we'll be doing.
We'll introduce some of the basic objects in linear algebra. The first is the vector.
A vector is an ordered collection of numbers, called scalars.
Because they are so important, we will distinguish vectors visually from other variables. In the notes (and in many texts), they will be set in boldface, so $\mathbf v$ is a vector, while $v$ is not. When writing by hand, we can't exactly write boldface, so we typically draw an arrow over the variable, like $\vec v$.
Unfortunately, this is not really a formal or rigourous definition, for reasons we'll eventually see. But for most people, including us, this works well enough. For our purposes, we are dealing with finite-dimensional vectors over the real numbers $\mathbb R$. One of the great things about math is how a lot of general structures can be set up and used in the same ways with weirder and more complex objects. For example, we can have vectors of complex numbers or vectors of polynomials. We don't need to think about any of this stuff, but it's interesting to think that the same framework has a logic that carries us beyond just plain old real numbers.
So you can think of vectors over real numbers in a few different ways. Traditionally, we can think of a vector as an arrow with one end starting at the origin, with a length and direction. In other words, a vector is an object that's situated in some geometric space. This interpretation is particularly popular in physics.
Consider the vector $\begin{bmatrix}3 \\ 2\end{bmatrix}$. We can represent this on a plane as:
Now, if we have the vector $\begin{bmatrix}3 \\ 2 \\ 1\end{bmatrix}$, we have
We will not try to draw a vector with four componentss, but you get the idea.
A more abstract understanding of a vector is as an object with several components, kind of like a tuple. One can see how such an object can be used to represent a "real" object, with each component storing a particular piece of information about the object.
Consider the vector $\begin{bmatrix}3 \\ 2\end{bmatrix}$. Here are some possible interpretations of this vector:
But what separates vectors from plain old tuples? The difference is in what we can do with vectors. For a similar idea, the thing that makes a list different from a string is the things we're allowed to do with strings that we can't necessarily do with lists. This may or may not be obvious, but operations in math need to be defined before we can use them. For example, it's not obvious that addition should work on vectors. It certainly doesn't work the same way as with numbers, so we have to be a bit careful.
This may seem a bit finicky, but it's not any different from what you'd have to do in a programming language. Vectors are a type or class and support a variety of operations and methods. Playing loose and fast with the rules is what gets you into errors.
Let $\mathbf u = \begin{bmatrix} u_1 \\ \vdots \\ u_n\end{bmatrix}$ and $\mathbf v = \begin{bmatrix}v_1 \\ \vdots \\ v_n \end{bmatrix}$. Then we define $\mathbf u + \mathbf v$ by \[\mathbf u + \mathbf v = \begin{bmatrix} u_1 + v_1 \\ \vdots \\ u_n + v_n \end{bmatrix}.\]
In other words, the sum of two vectors $\mathbf u$ and $\mathbf v$ is the vector that has components that are just the sums of the components of $\mathbf u$ and $\mathbf v$.
Consider the vectors $\begin{bmatrix}3 \\ 5\end{bmatrix}$ and $\begin{bmatrix} 7 \\ 2 \end{bmatrix}$. Then $\begin{bmatrix}3 \\ 5 \end{bmatrix} + \begin{bmatrix}7 \\ 2\end{bmatrix} = \begin{bmatrix}10 \\ 7 \end{bmatrix}$.
Note that when adding two vectors together, by definition, they need to be the same size. For instance, we cannot add the vectors $\begin{bmatrix}3 \\ 1\end{bmatrix}$ and $\begin{bmatrix}-2 \\ 4 \\ 7 \end{bmatrix}$ together.
One helpful interpretation of vector addition is to view it geometrically:
Other than vectors, we have scalars.
A scalar is a real number.
Techncially, scalars are just the things that go in vectors but are not in a vector. Since we've already said we're happy to keep ourselves to real numbers, a scalar is just a real number. However, if we ever wanted to work with, say, complex numbers, then we can have complex scalars.
We typically reserve lowercase (Latin) letters for scalars, though lowercase Greek letters are also a popular choice.
Why are these called scalars? Well, we use scalars to define scalar multiplication.
Let $\mathbf v = \begin{bmatrix}v_1 \\ v_2 \\ \vdots \\ v_n\end{bmatrix}$ be a vector and let $c$ be a scalar. Then we define the scalar multplication of $\mathbf v$ by $c$ by \[c \mathbf v = \begin{bmatrix}cv_1 \\ cv_2 \\ \vdots \\ cv_n\end{bmatrix}.\]
Let $\mathbf u = \begin{bmatrix}4 \\ -2\end{bmatrix}$ and $\mathbf v = \begin{bmatrix} 3 \\ -1 \\ 5 \end{bmatrix}$. Then $3\mathbf u = \begin{bmatrix}12 \\ -6\end{bmatrix}$ and $\frac 1 4 \mathbf v = \begin{bmatrix}\frac 3 4 \\ - \frac 1 4 \\ \frac 5 4 \end{bmatrix}$.
Observe that the effect of scalar multiplication is scaling: the resulting vector has components that are factors of $c$ of the original (i.e. it is scaled by $c$).
Linear algebra is built around these two operations. It's all you need! This tells us something very neat: that the only thing we need to know how to do is add two things together and scale them. Just from these two operations, we get an enormous amount of utility.
Again, we note that the ability to add and scale vectors is what separates them from tuples. Indeed, a formal definition of a vector is an object that allows us to apply these two operations. This means that it's not the case that vectors are just arbitrary ordered collections of items—these items need to be able to be added and scaled.