Introduction
This topic material is based on Professor Mike Gelbart Algorithms and Data Structures course. It was adapted for our purposes.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from collections import defaultdict, Counter
Exercise: tricky recursive code¶
Explain what the following code does, and how it works:
def f(letters, n):
"""
Does something mysterious.
Parameters
----------
letters : str
?????
n : int
?????
Returns
-------
???
?????
"""
if n == 0:
return [""]
return [letter + l for letter in letters for l in f(letters, n-1)]
f("QTM!", 1)
Answer:
Exercise: Set implementation with BSTs¶
In this exercise, you will implement a set data structure based on a binary search tree. You will write the tree as a Python class. We are providing some starter code for you below.
¶
Implement a recursive function insert
that takes a new element and inserts it into the tree. Your function should work by recursively calling insert
on the left or right subtree depending on whether the new value is less than or greater than the tree's value, respectively. If the element is already in the tree, then the call to insert
should do nothing.
Hint: When inserting something into the tree, you should be creating a new TreeSet
object with TreeSet()
, then inserting the value into this newly created TreeSet
, and then making sure this new TreeSet
is stored in your current TreeSet
as either self.left
or self.right
.
class TreeSet:
"""
A set implementation based on a binary tree.
"""
def __init__(self):
self.value = None
self.left = None
self.right = None
def contains(self, value):
"""
Check to see if the binary tree has a certain value
Parameters
----------
value : object
the value to search for within the tree
Returns
-------
bool
if contained in the tree returns True, otherwise False
Example
--------
>>> my_set = TreeSet()
>>> my_set.insert("Attempt")
>>> my_set.contains("Failure")
False
"""
if value == self.value:
return True
if value < self.value:
if self.left is None:
return False
else:
return self.left.contains(value)
else:
if self.right is None:
return False
else:
return self.right.contains(value)
def __str__(self, s=""):
"""
A crude way to print the tree. A better way would be to print the tree by depth.
Note: __str__ is a special method, like __init__, that returns a string representation of an object.
Parameters
----------
s : str
the starting string value. Default is empty string
Returns
-------
str
aggregated items in the set
Example
--------
>>> my_set = TreeSet()
>>> my_set.insert("Try")
>>> my_set.insert("your")
>>> my_set.insert("best")
>>> print(my_set)
Try, your, best,
"""
if self.value is None:
return "(An empty tree)"
if self.left is not None:
s += self.left.__str__()
s += str(self.value) + ", "
if self.right is not None:
s += self.right.__str__()
return s
my_set = TreeSet()
my_set.insert("today")
my_set.insert("hello")
my_set.insert("data science")
my_set.insert("jerry")
my_set.insert("apple")
my_set.insert("17")
my_set.insert("hello")
print(my_set)
assert my_set.contains("data science")
assert my_set.contains("apple")
assert not my_set.contains("18")
assert not my_set.contains("blah")
my_set = TreeSet()
my_set.insert(3)
my_set.insert(5)
my_set.insert(10)
print(my_set)
Answer:
¶
In this topic, we empirically timed the searching operation using four approaches:
- Linear search on an unsorted list
- Binary search on an sorted list
- Python's
in
operator on an unsorted list in
with Python's built-inset
Similar code to that from lecture, for just Python's set
, is reproduced below for your convenience:
list_sizes = [100, 1000, 10_000, 100_000, 1_000_000]
results = defaultdict(list)
results["size"] = list_sizes
key = -1
x_all = np.random.randint(1e8, size=max(list_sizes))
for list_size in list_sizes:
print('List size: ', list_size)
x = x_all[:list_size]
x_set = set(x)
time = %timeit -q -o -r 1 (key in x_set)
results["Python set in"].append(time.average)
df = pd.DataFrame(results)
df
Empirically measure the speed of contains
with your TreeSet
implementation, and then add them to the DataFrame for printing. Print out the DataFrame.
(Note: for reasons of speed, we only go up to $n=10^6$ here. Populating the TreeSet
objects with $10^7$ items would take a long time.
Answer:
¶
Discuss your results from the previous part. How do Python's set
and your TreeSet
compare? Specifically:
- Which method is faster?
- What is the theoretical time complexity of
in
with aset
, andcontains
with aTreeSet
? - Are the empirical results consistent with the theoretical time complexities?
- Are the results what you expected, overall?
Answer:
!jupyter nbconvert _04-py-recursive-data-structures-practice.ipynb --to html --template classic --output 04-py-recursive-data-structures-practice.html