As we've discussed them, abstract data types are a good idea in the abstract (ha ha) but practically, they are nothing more than convention. In our implementation of a stack, we use a list as a stack, but there's nothing stopping someone from treating that stack like a list, knowingly or not. We would like some mechanisms to formalize the separation between the interface for our abstract data type and its implementation.
To start, let's consider a simpler example. Suppose we want to represent a point in $\mathbb R^2$. We can accomplish this very easily with a 2-tuple (or pair) of floats. However, not every pair of floats is necessarily a point! We can represent many different kinds of things as pairs of floats. For example, a complex number can also be represented as a pair of floats: the first component is the real part and the second component is the imaginary part.
Now this might seem to be a silly distinction, but it becomes important when we think about what we're allowed to do with each of these things. For example, we might try to define multiplication over complex numbers, but this would not make sense for points. But we have no way to tell them apart because they appear identical.
This goes back to our understanding of the need for types. Types tell us what values our data can take on and what functions we can apply to them. Ideally, we would be able to define a type for a point and another type for a complex number. We can then define functions that specify the appropriate types for their inputs and outputs.
In object-oriented programming languages like Python, a class is a type definition. A class contains two kinds of data:
Once we have defined a class, we can create objects of that class. An object is a particular instance of the class. In non-object terms, a class is a type and an object is a value.
You may wonder what distinguishes a class from a type. In Python, the answer is: there is no distinction—every type is a class. For some types, this is not surprising. For example, you are very familiar by now that strings and lists all have methods. But what about, say, integers or booleans?
Yes, even those are classes, though Python tries its best to hide this from us. We've already discussed the fact that every variable in Python really stores a reference to a value, whether that's a list or an integer. The truth is that those store references to objects.
What does it mean that a particular list or integer is an object? Clearly, there's an associated value that those objects store. But what about methods? We've seen list methods, but what about integer methods? It turns out that many of the "operations" we use on integers are really syntactic sugar that hides the fact that we're really applying an object's methods.
[2, 3, "a", True, "ooo"].pop()
True.__eq__(False)
# Note that we need the parentheses here because otherwise it would
# seem like we're trying to write a float
(23).__add__(-4)
How do we define a class? We need to know a few things. First, we need a name for our class. Secondly, we need a way to create an object of that class. The creation of an object is performed by a method called the constructor.
Typically, an object is created by using its class name in the same way as a function. This does not really square with your experience in Python because the built-in classes behave slightly differently. But they also have explicit constructors that can be used, which we have used without knowing it.
s = str([3, 1, 5])
n = int("500000000")
What we've called casting (i.e. "converting types") is really constructing a new object using the class' constructor and supplying our old object as an argument to it. They construct objects even when we don't pass any arguments.
e = str()
z = int()
l = list()
So to define a class, we provide a name and define a constructor. The constructor will then create the object's attributes. In Python, the constructor has the name __init__.
class Point:
"""
Represents a point in 2D space.
"""
def __init__(self, x, y):
"""
Inputs:
x (float): x-coordinate, real
y (float): y-coordinate, real
"""
self.x = x
self.y = y
Let's take note of a few things. First, the constructor is a function and we define it in the same way, using def. However, this function definition is inside the definition of the class. It must be one indentation level in to denote that it belongs in this class.
Secondly, the attributes of the point are x and y. These are the values associated with the point, its $x$-coordinate and $y$-coordinate. These attributes are not denoted at the top level of the class (on the same indentation level as the methods) but are initialized inside the constructor.
We can then create a Point in the following way.
p = Point(3.4, -25)
That is, we treat the class name like a function and pass in arguments intended for the constructor. The constructor then assigns the given values to the attributes. Of course, the constructor doesn't have to do just that—it's a function and we can define a function to do whatever we want. So the constructor can do some processing on the arguments, like performing checks in the process of setting up the object.
Here, we note the appearance of a new concept, self. You may have noticed that while we passed in two arguments, the definition of __init__ has three parameters. The missing argument is self, which is the first parameter of every method that gets defined in the class.
The keyword self is meant to represent the object that is using the method. Whenever a method is called, the object that we used to call the method is always passed to the method as an argument, as self.
So what happens when a constructor is called is that it is really syntactic sugar for the following process:
p = Point(4, -5)
our_object.
our_object is assigned to p.
Point.__init__(our_object, 4, -5) is called.
our_object by setting x and y in the way that we defined.
This idea is the same for any method that's defined. For example, let's define a method for measuring the distance of the point from the origin. Recall that for a point $p = (x,y)$, the distance of $p$ from $(0,0)$ is \[d(p) = \sqrt{x^2 + y^2}.\] To add such a method, we define it in the usual way, but inside the class (i.e. one indentation level in.
class Point:
"""
Represents a point in 2D space.
"""
def __init__(self, x, y):
"""
Inputs:
x (float): x-coordinate, real
y (float): y-coordinate, real
"""
self.x = x
self.y = y
def dist_orig(self):
"""
Computes the distance from (0,0) to the point.
"""
return (self.x ** 2 + self.y ** 2) ** 0.5
Then a call d = p.dist_orig() is really turned into something like
d = Point.dist_orig(p)
This is not so different from what we were originally doing with our stack ADT, where we had to explicitly pass in the stack as an argument. However, this syntax makes it more clear that we should think of methods as functions that belong to a particular class and are applied on the object that it is attached to.
With this in mind, let's try to define a stack that we can use without having to pretend that it's a list.
It turns out in working through the ADT and its implemenation, we've already got most of the actual class definition finished. The ADT gives us our desired methods and our implementation gives us the idea to store a list maintaining the stack as an attribute.
One change that we make is that create is an obvious choice for our constructor, so we replace it with __init__.
class Stack:
"""
A collection of items with controlled last in, first out access.
"""
def __init__(self):
"""
Initializes an empty stack.
"""
self.items = []
def is_empty(self):
"""
Return whether this stack is empty.
"""
return self.items == []
def pop(self):
"""
Remove the item at the top of this stack and return it.
The stack cannot be empty.
"""
return self.items.pop()
def push(self, item):
"""
Add the given item to the top of this stack.
"""
self.items.append(item)
Now, we can use the stack as a Stack.
st = Stack()
st.push(3)
st.push(-23)
st.push("hey")
st.pop()
Now it is very clear how to use the stack and much more convenient to use it in the way that's prescribed by our ADT.
However, there is still a bit of a hitch: Python does not actually restrict access to the functions or attributes of a class in any way. So someone who is determined to do wrong can still access the list:
st.items[1]
The extent to which Python attempts to hide information stops here.
You might balk at this, after the whole discussion from last class about how terrible it was that people could access the list and use it as a list, and here we are, still able to accesss the list.
There are conventions. One is that attributes and methods that are not intended to be part of the public interface are prefixed with a single underscore. There is also a mechanism to use two underscores that seems to do what we wish.
class C:
def __init__(self,c):
self.__c = c
a = C(23)
b = a.__c
However, like all of the other mechanisms, all this really does just makes it harder to access in non-standard ways.
b = a._C__c
Why is this the case? Philosophically, Python has chosen to make respecting the boundaries of the class a convention rather than a technical restriction.
Like other mechanisms in the real world, design is often enough to make it easy and natural to follow the rules.
This perhaps speaks to something a bit deeper about the design of systems: the design choices you make can have a significant impact even without putting up hard technical barriers.