One of the conveniences of Python is that it offers several methods that are available for every class to implement which allows that class to behave in ways that are similar to built-in types.
Python officially calls these "special method names". They are also informally called magic methods or dunder methods. Here, "dunder" is short for "double underscore".
For example, what if we want to know how many items are in our stack? We could access the internal list and use len on it.
len(st.items)
But like we mentioned earlier, this is implementation-dependent. What would be better is if could do something like len(st), but Python will tell us that's not possible—for now.
Python offers a way for us to provide such information: by implementing the __len__ method. Underneath, all the built-in len function does is call an object's __len__ method.
[1,2,3,4,5,6].__len__()
So let's add it to our class and implement it.
def __len__(self):
"""
Reports the number of items in the stack.
"""
return len(self._items)
Now, we see that we can call len(st) as desired. This is good for two reasons. First, obviously, this lets us call len directly on a stack. But secondly, this goes back to the idea that we can't rely on _items being how the stack is implemented.
There are many other such methods and we'll implement a few of them. First, you may have noticed that if we "print" our stack, we don't get anything particularly useful. It tells us that we have an object of a certain class and what its reference is. However, we can implement at dunder method so that we get something more useful.
def __repr__(self):
"""
Produces a string representation of the stack.
"""
strs = []
for val in self._items:
strs.append(str(val))
return f"STACK: {self.items} (Top)"
In fact, there are two such methods: __repr__ and __str__. These two methods produce string representations of our object, but they have slightly different use purposes. __repr__ is meant to be unambiguous and intended as a representation of the object. This is meant for developers. On the other hand, __str__ is meant to be a readable string representation intended for a user.
Another useful method is __eq__, which is for checking equality. Without specifying otherwise, Python will treat x == y by evaluating x is y. In other words, by default, equality tests whether the given objects are the same object. We can implement __eq__ to describe what we want equality to mean.
def __eq__(self, st):
"""
Two stacks are the same if they contain the same values in the same
order.
Input:
st (Stack): the stack being compared to
Output: True if both stacks are equal, False otherwise
"""
return self._items == st._items
Again, it's important to highlight that the reason we want to do this is so that we can be clear about our definition of equality. The obvious case is when we want to change the implementation of our class, but maybe we don't want to consider two stacks to be equal if they contain the same items in the same order. Maybe there's some other notion of equality that we would rather have. We can make those choices in this way.
Here's a fun one: the method __add__ is a function that, if a class implements, allows for objects of that class to use the + operator. This is the reason why we can add two numbers, concatenate lists, or concatenate strings using +.
Again, we have to think: what does "adding" two stacks mean? Does it make sense for it to be the concatenation of the two lists? That can be a choice you make and reason through.
Here, we'll take addition to mean: taking the stack on the right, popping each element from it and pushing it onto the stack in that order.
def __add__(self, st):
"""
Adds the contents of st to this stack by popping each item from st
and pushing it onto self.
"""
while not st.is_empty():
self.push(st.pop())
Many large software systems can be modelled by classes and objects that interact with each other. We'll walk through a larger example of this using data from the 2013 Divvy Data Challenge. We'll be working with two datasets: Divvy trips and Divvy stations in 2013.
The goal is to answer the following question: What is the total duration and distance of all Divvy trips taken in 2013? How do we do that?
The problem itself is not too hard: For each trip, figure out the distance between the start and stop stations and get the duration of the trip. Then we just sum over these two pieces of information.
All of this information is provided in CSV files, which we can read into dictionaries as we've done before. Once we do that, it becomes a matter of picking out the right pieces of information. Here's an example of a station and a trip after we hypothetically process those files in the way we've done before:
# stations[276]
{'id': '328',
'name': 'Ellis Ave & 58th St',
'latitude': '41.788746',
'longitude': '-87.601334',
'dpcapacity': '15',
'landmark': '365',
'online date': '9/25/2013'}
# trips[8857]
{'trip_id': '27923',
'starttime': '2013-07-05 14:54',
'stoptime': '2013-07-05 15:14',
'bikeid': '484',
'tripduration': '1188',
'from_station_id': '328',
'from_station_name': 'Ellis Ave & 58th St',
'to_station_id': '97',
'to_station_name': 'Museum Campus',
'usertype': 'Customer',
'gender': '',
'birthday': ''}
This is the same thing we've seen: dictionaries can be used to represent an entity or object that is "real" in some sense. But representing data as dictionaries can lead to the same issues as representing stacks as lists. That is, we're always left with working with a dictionary underneath. This means that whenever we try to write functions for manipulating such entities, we always have to ensure that we're providing the "right" dictionaries to use.
Here, we're going to be juggling two different kinds of dictionaries: dictionaries representing stations and dictionaries representing trips. Since they have similar fields, we have to be extra careful that we're working with the right data at the right time.
That approach might work if all you're doing is computing one thing and you never expect to use that data again. However, it's often the case that you'll revisit a dataset or model multiple times for different reasons. In cases like these, it's worth doing some work to set up a model to facilitate those kinds of computations.
So instead, we can use classes to model this information. The keys we've been using in the dictionaries now have a very obvious analogue as attributes in a class. But now we can also associate methods with our class as well. This way, we won't have to worry about what happens if someone calls a function with an arbitrary dictionary.
As a result we can get a fairly obvious class structure by looking at the JSON object and just using those fields as attributes. We'll also throw in some string representations to make things convenient for us to read.
class DivvyStation:
def __init__(self, station_id, name, latitude, longitude,
dpcapacity, landmark, online_date):
self.station_id = station_id
self.name = name
self.latitude = latitude
self.longitude = longitude
self.dpcapacity = dpcapacity
self.landmark = landmark
self.online_date = online_date
def __repr__(self):
return f"{self.station_id}, name='{self.name}'"
def __str__(self):
return f"Divvy Station #{self.station_id} {self.name}"
This has the obvious benefits we discussed: being able to specify functions that work with our class specifically. But using classes allows us to control the kinds of actions we can take much more closely. For instance, we can ensure in our constructor that the data that is being supplied makes sense.
Ideally, we would have code that could read a file and construct the desired objects. This is something that's not too hard to write. If you're really lazy, you could write a function that takes the dictionaries we have and construct objects out of that.
But here's a useful trick when thinking about how to model classes and you have a large amount of data: just take a few points in your data and manually construct those objects.
s25 = DivvyStation(25, "Michigan Ave & Pearson St",
41.89766, -87.62351, 23, 34, "6/28/2013")
s44 = DivvyStation(44, "State St & Randolph St",
41.8847302, -87.62773357, 27, 2, "6/28/2013")
s52 = DivvyStation(52, "Michigan Ave & Lake St",
41.88605812, -87.62428934, 23, 43, "6/28/2013")
stations = [s25, s44, s52]
This is a handy way to create a small number of items to test whatever functions you might want to write. Rather than deal with an incomprehensibly large dataset, you have some inputs that you can manage and check by hand.
We can also do the same kind of translation from dictionary to class for trips.
class DivvyTrip:
def __init__(self, trip_id, starttime, stoptime, bikeid,
tripduration, from_station_id, from_station_name,
to_station_id, to_station_name,
usertype, gender, birthyear):
self.trip_id = trip_id
self.starttime = starttime
self.stoptime = stoptime
self.bikeid = bikeid
self.tripduration = tripduration
self.from_station_id = from_station_id
self.from_station_name = from_station_name
self.to_station_id = to_station_id
self.to_station_name = to_station_name
self.usertype = usertype
self.gender = gender
self.birthyear = birthyear
def __repr__(self):
return (f"{self.trip_id}, "
f"from_station={self.from_station_id}, "
f"to_station={self.to_station_id}")
def __str__(self):
return (f"Divvy Trip #{self.trip_id} from "
f"{self.from_station_name} to {self.to_station_name}")
However, we can be a bit more clever with this: we have DivvyStations we can use.
class DivvyTrip:
def __init__(self, trip_id, starttime, stoptime, bikeid,
tripduration, from_station, to_station, usertype,
gender, birthyear):
self.trip_id = trip_id
self.starttime = starttime
self.stoptime = stoptime
self.bikeid = bikeid
self.tripduration = tripduration
self.from_station = from_station
self.to_station = to_station
self.usertype = usertype
self.gender = gender
self.birthyear = birthyear
def __repr__(self):
return (f"{self.trip_id}, "
f"from_station={self.from_station.station_id}, "
f"to_station={self.to_station.station_id}")
def __str__(self):
return (f"Divvy Trip #{self.trip_id} "
f"from {self.from_station.name} to "
f"{self.to_station.name}")
Now, we can create a few objects.
trip5433 = DivvyTrip(5433, "2013-06-28 10:43", "2013-06-28 11:03",
218, 1214, s25, s44, "Customer", None, None)
trip4666 = DivvyTrip(4666, "2013-06-27 20:33", "2013-06-27 21:22",
242, 2936, s44, s52, "Customer", None, None)
trip11236 = DivvyTrip(11236, "2013-06-30 15:41", "2013-06-30 15:58",
906, 1023, s25, s44, "Customer", None, None)
trip4646 = DivvyTrip(4646, "2013-06-27 20:22", "2013-06-27 20:39",
477, 996, s52, s52, "Customer", None, None)
trip13805 = DivvyTrip(13805, "2013-07-01 13:21", "2013-07-01 13:35",
469, 858, s44, s25, "Customer", None, None)
trips = [trip5433, trip4666, trip11236, trip4646, trip13805]
Notice that we're providing the stations that we created earlier as the origin and destination stations for the trips. (This is something you'd have to figure out if you're choosing a small number of trips to try out first)