assert
statement to validate a test case.pdb
module, or by using %debug
in a Jupyter code cell.class
and a function
in Python.class
.instance attributes
and class attributes
.methods
, class methods
and static methods
.subclassing
/inheritance
with Python classes.The material assumes no prior knowledge of Python. Experience with programming concepts or another programming language will help, but is not required to understand the material.
This topic material is based on the Python Programming for Data Science book and adapted for our purposes in the course.
We discussed Python functions. But how can we be sure that our function is doing exactly what we expect it to do? Unit testing is the process of testing our function to ensure it's giving us the results we expect. Let's briefly introduce the concept here.
assert
Statements¶assert
statements are the most common way to test your functions. They cause your program to fail if the tested condition is False
. The syntax is:
assert expression, "Error message if expression is False or raises an error."
assert 1 == 2, "1 is not equal to 2."
Asserting that two numbers are approximately equal can also be helpful. Due to the limitations of floating-point arithmetic in computers, numbers we expect to be equal are sometimes not:
assert 0.1 + 0.2 == 0.3, "Not equal!"
import math # we'll learn about importing modules later
assert math.isclose(0.1 + 0.2, 0.3), "Not equal!"
You can test any statement that evaluates to a boolean:
assert 'varada' in ['gabriel', 'davi', 'juliana'], "Instructor not present!"
Test Driven Development (TDD) is where you write your tests before your actual function (more here). This may seem a little counter-intuitive, but you're creating the expectations of your function before the actual function. This can be helpful for several reasons:
In general, the approach is as follows:
Somewhat related to testing and function design are the philosophies EAFP and LBYL.
These two acronyms refer to coding philosophies about how to write your code. Let's see an example:
d = {'name': 'Doctor Python',
'superpower': 'programming',
'weakness': 'mountain dew',
'enemies': 10}
# EAFP
try:
d['address']
except KeyError:
print('Please forgive me!')
# LBYL
if 'address' in d.keys():
d['address']
else:
print('Saved you before you leapt!')
While EAFP is often vouched for in Python, there's no right and wrong way to code and it's often context-specific.
So if your Python code doesn't work: what do you do? At the moment, most of you probably do "manual testing" or "exploratory testing". You keep changing your code until it works, maybe add some print statements around the place to isolate any problems. For example, consider the random_walker
code below, adapted from COS 126, Conditionals and Loops:
from random import random
def random_walker(T):
"""
Simulates T steps of a 2D random walk, and prints the result after each step.
Returns the squared distance from the origin.
Parameters
----------
T : int
The number of steps to take.
Returns
-------
float
The squared distance from the origin to the endpoint, rounded to 2 decimals.
Examples
--------
>>> random_walker(3)
(0, -1)
(0, 0)
(0, -1)
1.0
"""
x = 0
y = 0
for i in range(T):
rand = random()
if rand < 0.25:
x += 1
if rand < 0.5:
x -= 1
if rand < 0.75:
y += 1
else:
y -= 1
print((x, y))
return round((x ** 2 + y ** 2) ** 0.5, 2)
random_walker(5)
If we re-run the code above, our random walker never goes right (the x-coordinate is never +ve). We might try to add some print statements here to see what's going on:
def random_walker(T):
"""
Simulates T steps of a 2D random walk, and prints the result after each step.
Returns the squared distance from the origin.
Parameters
----------
T : int
The number of steps to take.
Returns
-------
float
The squared distance from the origin to the endpoint, rounded to 2 decimals.
Examples
--------
>>> random_walker(3)
(0, -1)
(0, 0)
(0, -1)
1.0
"""
x = 0
y = 0
for i in range(T):
rand = random()
print(rand)
if rand < 0.25:
print("I'm going right!")
x += 1
if rand < 0.5:
print("I'm going left!")
x -= 1
if rand < 0.75:
y += 1
else:
y -= 1
print((x, y))
return round((x ** 2 + y ** 2) ** 0.5, 2)
random_walker(5)
Ah! We see that even every time after a "I'm going right!"
we immediately get a "I'm going left!"
. The problem is in our if
statements, we should be using elif
for each statement after the intial if
, otherwise multiple conditions may be met each time.
This was a pretty simple debugging case, adding print statements is not always helpful or efficient. Alternatively we can use the module pdb
. pdb
is the Python Debugger included with the standard library. We can use breakpoint()
to leverage pdb
and set a "break point" at any point in our code and then inspect our variables. See the pdb
docs here and this cheatsheet for help interacting with the debugger console.
def random_walker(T):
"""
Simulates T steps of a 2D random walk, and prints the result after each step.
Returns the squared distance from the origin.
Parameters
----------
T : int
The number of steps to take.
Returns
-------
float
The squared distance from the origin to the endpoint, rounded to 2 decimals.
Examples
--------
>>> random_walker(3)
(0, -1)
(0, 0)
(0, -1)
1.0
"""
x = 0
y = 0
for i in range(T):
rand = random()
breakpoint()
if rand < 0.25:
print("I'm going right!")
x += 1
if rand < 0.5:
print("I'm going left!")
x -= 1
if rand < 0.75:
y += 1
else:
y -= 1
print((x, y))
return round((x ** 2 + y ** 2) ** 0.5, 2)
random_walker(5)
So the correct code should be:
def random_walker(T):
"""
Simulates T steps of a 2D random walk, and prints the result after each step.
Returns the squared distance from the origin.
Parameters
----------
T : int
The number of steps to take.
Returns
-------
float
The squared distance from the origin to the endpoint, rounded to 2 decimals.
Examples
--------
>>> random_walker(3)
(0, -1)
(0, 0)
(0, -1)
1.0
"""
x = 0
y = 0
for i in range(T):
rand = random()
if rand < 0.25:
x += 1
elif rand < 0.5:
x -= 1
elif rand < 0.75:
y += 1
else:
y -= 1
print((x, y))
return round((x ** 2 + y ** 2) ** 0.5, 2)
random_walker(5)
I wanted to show pdb
because it's the standard Python debugger. Most Python IDE's also have their own debugging workflow, for example, here's a tutorial on debugging in VSCode. Within Jupyter, there is some "magic" commands that you can use. The one we are interested in here is %debug
. There are a few ways you can use it, but the easiest is if a cell raises an error, we can create a new cell underneath and just write %debug
and run that cell to debug our previous error.
x = 1
x + 'string'
%debug
Here's a step-by-step guide on how to proceed within the IPython debugger (ipdb
) environment:
Examine the Context: Once in the ipdb>
prompt, you can type where
or w
to see the context of the error, which shows you the stack trace of the exception.
Variable Inspection: You can inspect the value of variables at the current point in the stack by simply typing the variable name. For example, typing x
will show you the value of x
.
Understand the Error: Based on the error message and the context provided by the debugger, you should be able to identify that the issue is the attempt to add an integer to a string.
Testing Solutions: The debugger allows you to execute Python code directly in the debugger to test potential solutions. For example, you could try converting the integer to a string before concatenation, str(x) + 'string'
, to see if that resolves the issue.
Exiting the Debugger: Once you understand the error and know how to fix it, you can exit the debugger by typing exit
or q
(for quit).
Using the %debug
magic command in Jupyter notebooks is an effective way to interactively diagnose errors in your code. It allows for a deeper investigation into the state of your program at the point where an exception occurs, facilitating a more informed and precise correction process.
The JupyterLab variable inspector extension is another related helpful tool.
We've seen data types like dict
and list
which are built into Python. We can also create our own data types. These are called classes and an instance of a class is called an object (classes documentation here). The general approach to programming using classes and objects is called object-oriented programming
d = dict()
Here, d
is an object, whereas dict
is a type
type(d)
type(dict)
We say d
is an instance of the type dict
. Hence:
isinstance(d, dict)
"Classes provide a means of bundling data and functionality together" (from the Python docs), in a way that's easy to use, reuse and build upon. It's easiest to discover the utility of classes through an example so let's get started!
Say we want to start storing information about students and instructors in the University's Data Science Program (DSP).
We'll start with first name, last name, and email address in a dictionary:
dsp_1 = {'first': 'Davi',
'last': 'Moreira',
'email': 'davi.moreira@dsp.com'}
We also want to be able to extract a member's full name from their first and last name, but don't want to have to write out this information again. A function could be good for this:
def full_name(first, last):
"""Concatenate first and last with a space."""
return f"{first} {last}"
full_name(dsp_1['first'], dsp_1['last'])
We can just copy-paste the same code to create new members:
dsp_2 = {'first': 'Juliana',
'last': 'Carnaval',
'email': 'juliana.carnaval@dsp.com'}
full_name(dsp_2['first'], dsp_2['last'])
The above was pretty inefficient. You can imagine that the more objects we want and the more complicated the objects get (more data, more functions) the worse this problem becomes! However, this is a perfect use case for a class! A class can be thought of as a blueprint for creating objects, in this case DSP members.
Terminology alert:
Syntax alert:
class
keyword, followed by a name and a colon (:
):class dsp_member:
pass
dsp_1 = dsp_member()
type(dsp_1)
We can add an __init__
method to our class which will be run every time we create a new instance, for example, to add data to the instance. Let's add an __init__
method to our dsp_member
class. self
refers to the instance of a class and should always be passed to class methods as the first argument.
class dsp_member:
def __init__(self, first, last):
# the below are called "attributes"
self.first = first
self.last = last
self.email = first.lower() + "." + last.lower() + "@dsp.com"
dsp_1 = dsp_member('Gabriel', 'Moreira')
print(dsp_1.first)
print(dsp_1.last)
print(dsp_1.email)
To get the full name, we can use the function we defined earlier:
full_name(dsp_1.first, dsp_1.last)
But a better way to do this is to integrate this function into our class as a method
:
class dsp_member:
def __init__(self, first, last):
self.first = first
self.last = last
self.email = first.lower() + "." + last.lower() + "@dsp.com"
def full_name(self):
return f"{self.first} {self.last}"
dsp_1 = dsp_member('Gabriel', 'Moreira')
print(dsp_1.first)
print(dsp_1.last)
print(dsp_1.email)
print(dsp_1.full_name())
Notice that we need the parentheses above because we are calling a method
(think of it as a function), not an attribute
.
Attributes like dsp_1.first
are sometimes called instance attributes
. They are specific to the object we have created. But we can also set class attributes
which are the same amongst all instances of a class, they are defined outside of the __init__
method.
class dsp_member:
role = "DSP member" # class attributes
campus = "Atlanta"
def __init__(self, first, last):
self.first = first
self.last = last
self.email = first.lower() + "." + last.lower() + "@dsp.com"
def full_name(self):
return f"{self.first} {self.last}"
All instances of our class share the class attribute:
dsp_1 = dsp_member('Davi', 'Moreira')
dsp_2 = dsp_member('Juliana', 'Carnaval')
print(f"{dsp_1.first} is at campus {dsp_1.campus}.")
print(f"{dsp_2.first} is at campus {dsp_2.campus}.")
We can even change the class attribute after our instances have been created. This will affect all of our created instances:
dsp_1 = dsp_member('Davi', 'Moreira')
dsp_2 = dsp_member('Gabriel', 'Moreira')
dsp_member.campus = 'Atlanta Main'
print(f"{dsp_1.first} is at campus {dsp_1.campus}.")
print(f"{dsp_2.first} is at campus {dsp_2.campus}.")
You can also change the class attribute for just a single instance. But this is typically not recommended because if you want differing attributes for instances, you should probably use instance attributes
.
class dsp_member:
role = "DSP member"
campus = "Atlanta"
def __init__(self, first, last):
self.first = first
self.last = last
self.email = first.lower() + "." + last.lower() + "@dsp.com"
def full_name(self):
return f"{self.first} {self.last}"
dsp_1 = dsp_member('Davi', 'Moreira')
dsp_2 = dsp_member('Gabriel', 'Moreira')
dsp_1.campus = 'Atlanta Main'
print(f"{dsp_1.first} is at campus {dsp_1.campus}.")
print(f"{dsp_2.first} is at campus {dsp_2.campus}.")
The methods
we've seen so far are sometimes called "regular" methods
, they act on an instance of the class (i.e., take self
as an argument). We also have class methods
that act on the actual class. class methods
are often used as "alternative constructors". As an example, let's say that somebody commonly wants to use our class with comma-separated names like the following:
name = 'Davi,Moreira'
Unfortunately, those users can't do this:
dsp_member(name)
To use our class, they would need to parse this string into first
and last
:
first, last = name.split(',')
print(first)
print(last)
Then they could make an instance of our class:
dsp_1 = dsp_member(first, last)
If this is a common use case for the users of our code, we don't want them to have to coerce the data every time before using our class. Instead, we can facilitate their use-case with a class method
. There are two things we need to do to use a class method
:
class method
using the decorator @classmethod
(more on decorators in a bit);cls
instead of self
as the first argument.class dsp_member:
role = "DSP member"
campus = "Atlanta"
def __init__(self, first, last):
self.first = first
self.last = last
self.email = first.lower() + "." + last.lower() + "@dsp.com"
def full_name(self):
return f"{self.first} {self.last}"
@classmethod
def from_csv(cls, csv_name):
first, last = csv_name.split(',')
return cls(first, last)
Now we can use our comma-separated values directly!
dsp_1 = dsp_member.from_csv('Davi,Moreira')
dsp_1.full_name()
There is a third kind of method called a static method
. static methods
do not operate on either the instance or the class, they are just simple functions. But we might want to include them in our class because they are somehow related to our class. They are defined using the @staticmethod
decorator:
class dsp_member:
role = "DSP member"
campus = "Atlanta"
def __init__(self, first, last):
self.first = first
self.last = last
self.email = first.lower() + "." + last.lower() + "@dsp.com"
def full_name(self):
return f"{self.first} {self.last}"
@classmethod
def from_csv(cls, csv_name):
first, last = csv_name.split(',')
return cls(first, last)
@staticmethod
def is_quizweek(week):
return True if week in [3, 5] else False
Note that the method is_quizweek()
does not accept or use the self
argument. But it is still DSP-related, so we might want to include it here.
dsp_1 = dsp_member.from_csv('Davi,Moreira')
print(f"Is week 1 a quiz week? {dsp_1.is_quizweek(1)}")
print(f"Is week 3 a quiz week? {dsp_1.is_quizweek(3)}")
Decorators can be quite a complex topic, you can read more about them here. Briefly, they are what they sounds like, they "decorate" functions/methods with additional functionality. You can think of a decorator as a function that takes another function and adds functionality.
Let's create a decorator as an example. Recall that functions are data types in Python, they can be passed to other functions. So a decorator simply takes a function as an argument, adds some more functionality to it, and returns a "decorated function" that can be executed.
# some function we wish to decorate
def original_func():
print("I'm the original function!")
# a decorator
def my_decorator(original_func): # takes our original function as input
def wrapper(): # wraps our original function with some extra functionality
print(f"A decoration before {original_func.__name__}.")
result = original_func()
print(f"A decoration after {original_func.__name__}.")
return result
return wrapper # returns the unexecuted wrapper function which we can can excute later
The my_decorator()
function will return to us a function which is the decorated version of our original function.
my_decorator(original_func)
As a function was returned to us, we can execute it by adding parentheses:
my_decorator(original_func)()
We can decorate any arbitrary function with our decorator:
def another_func():
print("I'm a different function!")
my_decorator(another_func)()
The syntax of calling our decorator is not that readable. Instead, we can use the @
symbol as "syntactic sugar" to improve readability and reuseability of decorators:
@my_decorator
def one_more_func():
print("One more function...")
one_more_func()
Okay, let's make something a little more useful. We will create a decorator that times the execution time of any arbitrary function:
import time # import the time module, we'll learn about imports later
def timer(my_function): # the decorator
def wrapper(): # the added functionality
t1 = time.time()
result = my_function() # the original function
t2 = time.time()
print(f"{my_function.__name__} ran in {t2 - t1:.3f} sec") # print the execution time
return result
return wrapper
@timer
def silly_function():
for i in range(10_000_000):
if (i % 1_000_000) == 0:
print(i)
else:
pass
silly_function()
Python's built-in decorators like classmethod
and staticmethod
are coded in C.
Just like it sounds, inheritance allows us to "inherit" methods and attributes from another class. So far, we've been working with an dsp_member
class. But let's get more specific and create a dsp_student
and dsp_instructor
class. Recall this was dsp_member
:
class dsp_member:
role = "DSP member"
campus = "Atlanta"
def __init__(self, first, last):
self.first = first
self.last = last
self.email = first.lower() + "." + last.lower() + "@dsp.com"
def full_name(self):
return f"{self.first} {self.last}"
@classmethod
def from_csv(cls, csv):
first, last = csv_name.split(',')
return cls(first, last)
@staticmethod
def is_quizweek(week):
return True if week in [3, 5] else False
We can create an dsp_student
class that inherits all of the attributes and methods from our dsp_member
class by by simply passing the dsp_member
class as an argument to an dsp_student
class definition:
class dsp_student(dsp_member):
pass
student_1 = dsp_student('Juliana', 'Carnaval')
student_2 = dsp_student('Gabriel', 'Moreira')
print(student_1.full_name())
print(student_2.full_name())
What happened here is that our dsp_student
instance first looked in the dsp_student
class for an __init__
method, which it didn't find. It then looked for the __init__
method in the inherited dsp_member
class and found something to use! This order is called the "method resolution order". We can inspect it directly using the help()
function:
help(dsp_student)
Okay, let's fine-tune our dsp_student
class. The first thing we might want to do is change the role of the student instances to "DSP Student". We can do that by simply adding a class attribute
to our dsp_student
class. Any attributes or methods not "over-ridden" in the dsp_student
class will just be inherited from the dsp_member
class.
class dsp_student(dsp_member):
role = "DSP student"
student_1 = dsp_student('Juliana', 'Carnaval')
print(student_1.role)
print(student_1.campus)
print(student_1.full_name())
Now let's add an instance attribute
to our class called grade
. You might be tempted to do something like this:
class dsp_student(dsp_member):
role = "DSP student"
def __init__(self, first, last, grade):
self.first = first
self.last = last
self.email = first.lower() + "." + last.lower() + "@dsp.com"
self.grade = grade
student_1 = dsp_student('Juliana', 'Carnaval', 'B+')
print(student_1.email)
print(student_1.grade)
But this is not DRY code, remember that we've already typed most of this in our dsp_member
class. So what we can do is let the dsp_member
class handle our first
and last
argument and we'll just worry about grade
. We can do this easily with the super()
function. Things can get pretty complicated with super()
, you can read more here, but all you really need to know is that super()
allows you to inherit attributes/methods from other classes.
class dsp_student(dsp_member):
role = "DSP student"
def __init__(self, first, last, grade):
super().__init__(first, last)
self.grade = grade
student_1 = dsp_student('Juliana', 'Carnaval', 'B+')
print(student_1.email)
print(student_1.grade)
Amazing! Hopefully you can start to see how powerful inheritance can be. Let's create another subclass called dsp_instructor
, which has two new methods add_course()
and remove_course()
.
class dsp_instructor(dsp_member):
role = "DSP instructor"
def __init__(self, first, last, courses=None):
super().__init__(first, last)
self.courses = ([] if courses is None else courses)
def add_course(self, course):
self.courses.append(course)
def remove_course(self, course):
self.courses.remove(course)
instructor_1 = dsp_instructor('Davi', 'Moreira', ['210', '350', '100'])
print(instructor_1.full_name())
print(instructor_1.courses)
instructor_1.add_course('485')
instructor_1.remove_course('100')
instructor_1.courses
There's one more import topic to talk about with Python classes and that is getters/setters/deleters. The necessity for these actions is best illustrated by example. Here's a stripped down version of the dsp_member
class from earlier:
class dsp_member:
def __init__(self, first, last):
self.first = first
self.last = last
self.email = first.lower() + "." + last.lower() + "@dsp.com"
def full_name(self):
return f"{self.first} {self.last}"
dsp_1 = dsp_member('David', 'Moreira')
print(dsp_1.first)
print(dsp_1.last)
print(dsp_1.email)
print(dsp_1.full_name())
Imagine that I mis-spelled the name of this class instance and wanted to correct it. Watch what happens...
dsp_1.first = 'Davi'
print(dsp_1.first)
print(dsp_1.last)
print(dsp_1.email)
print(dsp_1.full_name())
Uh oh... the email didn't update with the new first name! We didn't have this problem with the full_name()
method because it just calls the current first
and last
name. You might think that the best thing to do here is to create a method for email()
like we have for full_name()
. But this is bad coding for a variety of reasons, for example it means that users of your code will have to change every call to the email
attribute to a call to the email()
method. We'd call that a breaking change to our software and we want to avoid that where possible. What we can do instead, is define our email
like a method, but keep it as an attribute using the @property
decorator.
class dsp_member:
def __init__(self, first, last):
self.first = first
self.last = last
def full_name(self):
return f"{self.first} {self.last}"
@property
def email(self):
return self.first.lower() + "." + self.last.lower() + "@dsp.com"
dsp_1 = dsp_member('David', 'Moreira')
dsp_1.first = 'Davi'
print(dsp_1.first)
print(dsp_1.last)
print(dsp_1.email)
print(dsp_1.full_name())
We could do the same with the full_name()
method if we wanted too...
class dsp_member:
def __init__(self, first, last):
self.first = first
self.last = last
@property
def full_name(self):
return f"{self.first} {self.last}"
@property
def email(self):
return self.first.lower() + "." + self.last.lower() + "@dsp.com"
dsp_1 = dsp_member('David', 'Moreira')
dsp_1.full_name
But what happens if we instead want to make a change to the full name now?
dsp_1.full_name = 'Dave Moreira'
We get an error... Our class instance doesn't know what to do with the value it was passed. Ideally, we'd like our class instance to use this full name information to update self.first
and self.last
. To handle this action, we need a setter
, defined using the decorator @<attribute>.setter
:
class dsp_member:
def __init__(self, first, last):
self.first = first
self.last = last
@property
def full_name(self):
return f"{self.first} {self.last}"
@full_name.setter
def full_name(self, name):
first, last = name.split(' ')
self.first = first
self.last = last
@property
def email(self):
return self.first.lower() + "." + self.last.lower() + "@dsp.com"
dsp_1 = dsp_member('David', 'Moreira')
dsp_1.full_name = 'Dave Moreira'
print(dsp_1.first)
print(dsp_1.last)
print(dsp_1.email)
print(dsp_1.full_name)
Almost there! We've talked about getting information and setting information, but what about deletting information? This is typically used to do some clean up and is defined with the @<attribute>.deleter
decorator. I rarely use this method but I want you to see it:
class dsp_member:
def __init__(self, first, last):
self.first = first
self.last = last
@property
def full_name(self):
return f"{self.first} {self.last}"
@full_name.setter
def full_name(self, name):
first, last = name.split(' ')
self.first = first
self.last = last
@full_name.deleter
def full_name(self):
print('Name deleted!')
self.first = None
self.last = None
@property
def email(self):
return self.first.lower() + "." + self.last.lower() + "@dsp.com"
dsp_1 = dsp_member('David', 'Moreira')
delattr(dsp_1, "full_name")
print(dsp_1.first)
print(dsp_1.last)
!jupyter nbconvert _03-py-tests-classes.ipynb --to html --template classic --output 03-py-tests-classes.html