CMSC 14100 — Lecture 16

Private attributes

One of our motivations for classes was to hide implementation details from users. However, there is still a bit of a hitch: Python does not actually restrict access to the functions or attributes of a class in any way. So someone who is determined to do wrong can still access the list for our Stack class:


    >>> st = Stack()
    >>> st.push(23)
    >>> st.push('hi')
    >>> st.push(9.3)
    >>> st.items[1]
    'hi'
    >>> st.items = ['something totally different']
    >>> st.pop()
    'something totally different'

Technically, Python does not have any way to enforce information hiding to the degree that you might see from other languages. However, there are increasing levels of social convention and obfuscation.

attr_name denotes an attribute of a class that is visible and is intended to be accessed by others.
_attr_name, which is a name prefixed by a single underscore, denotes an attribute of a class that is not indended to be used by others. However, there is nothing stopping someone from accessing them if they so wish (which is bad behaviour).
__attr_name, which is a name prefixed by a double underscore, denotes an attribute of a class that is not indended to be used by others and Python will take steps to make this difficult. However, someone who is determined to access them can still do so with some work (which is very bad behaviour).

Let's see how this plays out with our stack example.


    class Stack:
        """
        A collection of items with controlled last in, first out access.
        """
        def __init__(self):
            """
            Initializes an empty stack.
            """
            self.__items = []

        def is_empty(self):
            """
            Determine whether this stack is empty.

            Output (bool): True if stack is empty, False otherwise.
            """
            return self.__items == []

        def push(self, item):
            """
            Add the given item to the top of this stack.

            Input:
                item (Any): an item to put on the top of the stack
            """
            self.__items.append(item)

        def pop(self):
            """
            Remove the item at the top of this stack and return it.
            The stack cannot be empty.

            Output (Any): the item that is removed from the top of the stack
            """
            return self.__items.pop()


    >>> st = Stack()
    >>> st.push(24)
    >>> st.push(34)
    >>> st.is_empty()
    False
    >>> st.__items
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'Stack' object has no attribute '__items'

The double underscores make it so that the attribute can only be accessed by methods in the class. However, like all of the other mechanisms, all this really does just makes it harder to access in non-standard ways.


    >>> st._Stack__items
    [24, 34]

Why is this the case? Philosophically, Python has chosen to make respecting the boundaries of the class a convention rather than a technical restriction. Like other mechanisms in the real world, design is often enough to make it easy and natural to follow the rules. This perhaps speaks to something a bit deeper about the design of systems: the design choices you make can have a significant impact even without putting up hard technical barriers.

Dunder methods

One of the conveniences of Python is that it offers several methods that are available for every class to implement which allows that class to behave in ways that are similar to built-in types.

Python officially calls these "special method names". They are also informally called magic methods or dunder methods. Here, "dunder" is short for "double underscore".

For example, what if we want to know how many items are in our stack? We could access the internal list and use len on it.


    >>> st = Stack()
    >>> st.push(54)
    >>> st.push('pie')
    >>> st.push('puddle')
    >>> len(st.items)
    3

But like we mentioned earlier, this is implementation-dependent and if we chose to hide this attribute, we wouldn't be able to do this. What would be better is if could do something like len(st), but Python will tell us that's not possible—for now.

Python offers a way for us to provide such information: by implementing the __len__ method. Underneath, all the built-in len function does is call an object's __len__ method.


    >>> [1,2,3,4,5,6].__len__()
    6

So let's add it to our class and implement it.


    def __len__(self):
        """
        Reports the number of items in the stack.
        
        Output (int): number of items on the stack
        """
        return len(self.__items)

Now, we see that we can call len(st) as desired. This is good for two reasons. First, obviously, this lets us call len directly on a stack. But secondly, this goes back to the idea that we can't rely on __items being how the stack is implemented.

There are many other such methods and we'll implement a few of them. First, you may have noticed that if we "print" our stack, we don't get anything particularly useful. It tells us that we have an object of a certain class and what its reference is. However, we can implement at dunder method so that we get something more useful.


    def __str__(self):
        """
        Produces a string representation of the stack.

        Output (str): List of items on the stack and top indicator
        """
        vals = []
        for item in self.__items:
            vals.append(str(item))
        vals = ' <- '.join(vals)
        return f"Stack: {vals} (Top)"

In fact, there are two such methods: __repr__ and __str__. These two methods produce string representations of our object, but they have slightly different use purposes. __repr__ is meant to be unambiguous and intended as a representation of the object. This is meant for developers—the official documentation provides this advice:

If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form <...some useful description...> should be returned.

So a __repr__ implementation for our stack may look something like


    def __repr__(self):
        """
        Produces a string representation of the stack.

        Output (str): Representation of internal state of stack
        """
        return f"<Stack: {self.__items}>"

On the other hand, __str__ is meant to be a string representation as a value. This is described in the following way in the official documentation:

Called by str(object), the default __format__() implementation, and the built-in function print(), to compute the “informal” or nicely printable string representation of an object. The return value must be a str object.
This method differs from object.__repr__() in that there is no expectation that __str__() return a valid Python expression: a more convenient or concise representation can be used.

Another useful method is __eq__, which is for checking equality. Without specifying otherwise, Python will treat x == y by evaluating x is y. In other words, by default, equality tests whether the given objects are the same object. We can implement __eq__ to describe what we want equality to mean.


    def __eq__(self, other):
        """
        Two stacks are the same if they contain the same values in the same
        order.

        Input:
            other (Stack): the stack being compared to

        Output: True if both stacks contain the same items in the same order,
            False otherwise
        """
        return self.__items == other.__items

Notice here that __items is accessible for both Stack objects, because we are referring to it in code inside the class definition.

Again, it's important to highlight that the reason we want to do this is so that we can be clear about our definition of equality. The obvious case is when we want to change the implementation of our class, but maybe we don't want to consider two stacks to be equal if they contain the same items in the same order. Maybe there's some other notion of equality that we would rather have. We can make those choices in this way.

Here's a fun one: the method __add__ is a function that, if a class implements, allows for objects of that class to use the + operator. This is the reason why we can add two numbers, concatenate lists, or concatenate strings using +.

Again, we have to think: what does "adding" two stacks mean? Does it make sense for it to be the concatenation of the two lists? That can be a choice you make and reason through.

For example, we can take addition to mean taking the stack on the right, popping each element from it and pushing it onto the stack on the left in that order.


    def __add__(self, other):
        """
        Adds the contents of st to this stack by popping each item from st
        and pushing it onto self.

        Input:
            other (Stack): the other stack
        """
        while not other.is_empty():
            self.push(other.pop())

This would actually not be a good choice for the + operator. This implementation has the effect of mutating both stacks and producing no output. Typically, for expressions, we would expect some value to arise from evaluation and for the operands to remain untouched. However, Python will still let us go ahead with this if we so wished.

Modeling with classes

In addition to using classes to define new data structures, we can also use classes to model systems. For instance, if we return to our Student class, we can imagine that we might want to add additional information, like which classes a student may be taking.


    class Student:
        """
        Represents a University of Chicago student.
        """
        def __init__(self, name, ucid, cnetid):
            """
            Inputs:
                name (str): Full name of student
                ucid (int): UCID number
                cnetid (str): CNetID
            """
            self.name = name
            self.institution = 'University of Chicago'
            self.id_number = ucid
            self.email = cnetid + '@uchicago.edu'
            self.courses = []
            self.credits = 0

        def get_cnetid(self):
            """
            Retrieves the CNetID of the student based on their email

            Output (str): the CNetID
            """
            cnetid = self.email.removesuffix('@uchicago.edu')
            return cnetid

        def __repr__(self):
            return f"Student({self.name}, {self,ucid}, {self.get_cnetid()}, {self.courses}, {self.credits})"

A quick and dirty implementation would simply have us adding the courses to a student as strings. However, courses themselves have information associated with them and students have restrictions on which courses they can take. We may want to model this sort of information.


    class Course:
        """
        A course listing
        """
        def __init__(self, name, number, department, units):
            """
            Inputs:
                name (str): course title
                number (int): course number
                department (str): home department for course
                instructor (str): instructor name
                units (int): number of units
            """
            self.__name = name
            self.__number = number
            self.__department = department
            self.__units = units

        def units(self):
            """
            Retrieves the number of units for the course

            Output (int): units
            """
            return self.__units

        def __repr__(self):
            return f"Course({self.__name}, {self.__number}, {self.__department}, {self.__units})"

A simple restriction on courses that a student can take is the number of credits/units they are taking. We don't let students take more than four courses at a time without permission, which is typically 400 units. So we can add that as a property to check whenever we try to add a course to a student.

Luckily, this is information that is contained in our definition of a course. However, the course is designed so that its attributes are not accessible. This means that we need to provide a way to access this information. In our case, we provide an explicit method for doing so.

It may seem like a waste to go through all of this trouble simply to produce the value of an attribute. However, this can be a desirable property. For instance, allowing access to the attribute would also mean allowing a user to reassign the attribute, which may not be something we want to allow.

In general, using methods to control access to attributes also gives us a way to control how these items are accessed. The course enrollment issue is an example of how we can allow controlled mutation: only add a course if the student hasn't exceeded the number of credits allowed:


    def enroll(self, course):
        units = course.units()
        assert self.credits + units <= 400, "Can't enroll in more than 400 units"

        self.courses.append(course)
        self.credits += units