Introduction


The material assumes no prior knowledge of Python. Experience with programming concepts or another programming language will help, but is not required to understand the material.


This topic material is based on the Python Programming for Data Science book and adapted for our purposes in the course.

Basic Python Data Types


Common built-in Python data types

English name Type name Type Category Description Example
integer int Numeric Type positive/negative whole numbers 42
floating point number float Numeric Type real number in decimal form 3.14159
boolean bool Boolean Values true or false True
string str Sequence Type text "I Can Has Cheezburger?"
list list Sequence Type a collection of objects - mutable & ordered ['Ali', 'Xinyi', 'Miriam']
tuple tuple Sequence Type a collection of objects - immutable & ordered ('Thursday', 6, 9, 2018)
dictionary dict Mapping Type mapping of key-value pairs {'name':'QTM', 'code':350, 'credits':4}
none NoneType Null Object represents no value None

Numeric data types

A value is a piece of data that a computer program works with such as a number or text. There are different types of values: 42 is an integer and "Hello!" is a string. A variable is a name that refers to a value. In mathematics and statistics, we usually use variable names like $x$ and $y$. In Python, we can use any word as a variable name as long as it starts with a letter or an underscore. However, it should not be a reserved word in Python such as for, while, class, lambda, etc. as these words encode special functionality in Python that we don't want to overwrite!

It can be helpful to think of a variable as a box that holds some information (a single number, a vector, a string, etc). We use the assignment operator = to assign a value to a variable.

Image modified from: medium.com

Tip: See the Python 3 documentation for a summary of the standard built-in Python datatypes.

There are three distinct numeric types: integers, floating point numbers, and complex numbers (not covered here). We can determine the type of an object in Python using type(). We can print the value of the object using print().

In [ ]:
x = 42
In [ ]:
type(x)
In [ ]:
print(x)

In Jupyter/IPython (an interactive version of Python), the last line of a cell will automatically be printed to screen so we don't actually need to explicitly call print().

In [ ]:
x  # Anything after the pound/hash symbol is a comment and will not be run
In [ ]:
pi = 3.14159
pi
In [ ]:
type(pi)

Arithmetic Operators

Below is a table of the syntax for common arithmetic operations in Python:

Operator Description
+ addition
- subtraction
* multiplication
/ division
** exponentiation
// integer division / floor division
% modulo

Let's have a go at applying these operators to numeric types and observe the results.

In [ ]:
1 + 2 + 3 + 4 + 5  # add
In [ ]:
2 * 3.14159  # multiply
In [ ]:
2 ** 10  # exponent

Division may produce a different dtype than expected, it will change int to float.

In [ ]:
int_2 = 2
type(int_2)
In [ ]:
int_2 / int_2  # divison
In [ ]:
type(int_2 / int_2)

But the syntax // allows us to do "integer division" (aka "floor division") and retain the int data type, it always rounds down.

In [ ]:
101 / 2
In [ ]:
101 // 2  # "floor division" - always rounds down

We refer to this as "integer division" or "floor division" because it's like calling int on the result of a division, which rounds down to the nearest integer, or "floors" the result.

In [ ]:
int(101 / 2)

The % "modulo" operator gives us the remainder after division.

In [ ]:
100 % 2  # "100 mod 2", or the remainder when 100 is divided by 2
In [ ]:
101 % 2  # "101 mod 2", or the remainder when 101 is divided by 2
In [ ]:
100.5 % 2

None

NoneType is its own type in Python. It only has one possible value, None - it represents an object with no value. We'll see it again in a later.

In [ ]:
x = None
In [ ]:
print(x)
In [ ]:
type(x)

Strings

Text is stored as a data type called a string. We can think of a string as a sequence of characters.

{tip}
Actually they are a sequence of Unicode code points. Here's a [great blog post](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/) on Unicode if you're interested.

We write strings as characters enclosed with either:

  • single quotes, e.g., 'Hello'
  • double quotes, e.g., "Goodbye"

There's no difference between the two methods, but there are cases where having both is useful (more on that below)! We also have triple double quotes, which are typically used for function documentation (more on that in a later), e.g., """This function adds two numbers""".

In [ ]:
my_name = "Davi Moreira"
In [ ]:
my_name
In [ ]:
type(my_name)
In [ ]:
course = 'QTM 350'
In [ ]:
course
In [ ]:
type(course)

If the string contains a quotation or apostrophe, we can use a combination of single and double quotes to define the string.

In [ ]:
sentence = "It's a rainy day."
In [ ]:
sentence
In [ ]:
type(sentence)
In [ ]:
quote = 'Donald Knuth: "Premature optimization is the root of all evil."'
In [ ]:
quote

Boolean

The Boolean (bool) type has two values: True and False.

In [ ]:
the_truth = True
In [ ]:
the_truth
In [ ]:
type(the_truth)
In [ ]:
lies = False
In [ ]:
lies
In [ ]:
type(lies)

Comparison Operators

We can compare objects using comparison operators, and we'll get back a Boolean result:

Operator Description
x == y is x equal to y?
x != y is x not equal to y?
x > y is x greater than y?
x >= y is x greater than or equal to y?
x < y is x less than y?
x <= y is x less than or equal to y?
x is y is x the same object as y?
In [ ]:
2 < 3
In [ ]:
"Deep learning" == "Solve all the world's problems"
In [ ]:
2 != "2"
In [ ]:
2 is 2
In [ ]:
2 == 2.0

Boolean Operators

We also have so-called "boolean operators" which also evaluates to either True or False:

Operator Description
x and y are x and y both True?
x or y is at least one of x and y True?
not x is x False?
In [ ]:
True and True
In [ ]:
True and False
In [ ]:
True or False
In [ ]:
False or False
In [ ]:
("Python 2" != "Python 3") and (2 <= 3)
In [ ]:
True
In [ ]:
not True
In [ ]:
not not True

Note: Python also has bitwise operators like & and |. Bitwise operators literally compare the bits of two integers.

In [ ]:
print(f"Bit representation of the number 5: {5:0b}")
print(f"Bit representation of the number 4: {4:0b}")
print(f"                                    ↓↓↓")
print(f"                                    {5 & 4:0b}")
print(f"                                     ↓ ")
print(f"                                     {5 & 4}")

Casting

Sometimes we need to explicitly cast a value from one type to another. We can do this using functions like str(), int(), and float(). Python tries to do the conversion, or throws an error if it can't.

In [ ]:
x = 5.0
type(x)
In [ ]:
x = int(5.0)
x
In [ ]:
type(x)
In [ ]:
x = str(5.0)
x
In [ ]:
type(x)
In [ ]:
str(5.0) == 5.0
In [ ]:
int(5.3)
In [ ]:
float("hello")

Lists and Tuples


Lists and tuples allow us to store multiple things ("elements") in a single object. The elements are ordered (we'll explore what that means a little later). We'll start with lists. Lists are defined with square brackets [].

In [ ]:
my_list = [1, 2, "THREE", 4, 0.5]
In [ ]:
my_list
In [ ]:
type(my_list)

Lists can hold any datatype - even other lists!

In [ ]:
another_list = [1, "two", [3, 4, "five"], True, None, {"key": "value"}]
another_list

You can get the length of the list with the function len():

In [ ]:
len(my_list)

Tuples look similar to lists but have a key difference (they are immutable - but more on that a bit later). They are defined with parentheses ().

In [ ]:
today = (1, 2, "THREE", 4, 0.5)
In [ ]:
today
In [ ]:
type(today)
In [ ]:
len(today)

Indexing and Slicing Sequences

We can access values inside a list, tuple, or string using square bracket syntax. Python uses zero-based indexing, which means the first element of the list is in position 0, not position 1.

In [ ]:
my_list
In [ ]:
my_list[0]
In [ ]:
my_list[2]
In [ ]:
len(my_list)
In [ ]:
my_list[5]

We can use negative indices to count backwards from the end of the list.

In [ ]:
my_list
In [ ]:
my_list[-1]
In [ ]:
my_list[-2]

We can use the colon : to access a sub-sequence. This is called "slicing".

In [ ]:
my_list[1:3]

Note from the above that the start of the slice is inclusive and the end is exclusive. So my_list[1:3] fetches elements 1 and 2, but not 3.

Strings behave the same as lists and tuples when it comes to indexing and slicing. Remember, we think of them as a sequence of characters.

In [ ]:
alphabet = "abcdefghijklmnopqrstuvwxyz"
In [ ]:
alphabet[0]
In [ ]:
alphabet[-1]
In [ ]:
alphabet[-3]
In [ ]:
alphabet[:5]
In [ ]:
alphabet[12:20]

List Methods

A list is an object and it has methods for interacting with its data. A method is like a function, it performs some operation with the data, but a method differs to a function in that it is defined on the object itself and accessed using a period .. For example, my_list.append(item) appends an item to the end of the list called my_list. You can see the documentation for more list methods.

In [ ]:
primes = [2, 3, 5, 7, 11]
primes
In [ ]:
len(primes)
In [ ]:
primes.append(13)
In [ ]:
primes

Sets

Another built-in Python data type is the set, which stores an un-ordered list of unique items. Being unordered, sets do not record element position or order of insertion and so do not support indexing.

In [ ]:
s = {2, 3, 5, 11}
s
In [ ]:
{1, 2, 3} == {3, 2, 1}
In [ ]:
[1, 2, 3] == [3, 2, 1]
In [ ]:
s.add(2)  # does nothing
s
In [ ]:
s[0]

Above: throws an error because elements are not ordered and can't be indexing.

Mutable vs. Immutable Types

Strings and tuples are immutable types which means they can't be modified. Lists are mutable and we can assign new values for its various entries. This is the main difference between lists and tuples.

In [ ]:
names_list = ["Indiana", "Fang", "Linsey"]
names_list
In [ ]:
names_list[0] = "Cool guy"
names_list
In [ ]:
names_tuple = ("Indiana", "Fang", "Linsey")
names_tuple
In [ ]:
names_tuple[0] = "Not cool guy"

Same goes for strings. Once defined we cannot modifiy the characters of the string.

In [ ]:
my_name = "Tom"
In [ ]:
my_name[-1] = "q"
In [ ]:
x = ([1, 2, 3], 5)
In [ ]:
x[1] = 7
In [ ]:
x
In [ ]:
x[0][1] = 4
In [ ]:
x

String Methods


There are various useful string methods in Python.

In [ ]:
all_caps = "HOW ARE YOU TODAY?"
all_caps
In [ ]:
new_str = all_caps.lower()
new_str

Note that the method lower doesn't change the original string but rather returns a new one.

In [ ]:
all_caps

There are many string methods. Check out the documentation.

In [ ]:
all_caps.split()
In [ ]:
all_caps.count("O")

One can explicitly cast a string to a list:

In [ ]:
caps_list = list(all_caps)
caps_list
In [ ]:
"".join(caps_list)
In [ ]:
"-".join(caps_list)

We can also chain multiple methods together (more on this when we get to NumPy and Pandas):

In [ ]:
"".join(caps_list).lower().split(" ")

String formatting

Python has ways of creating strings by "filling in the blanks" and formatting them nicely. This is helpful for when you want to print statements that include variables or statements. There are a few ways of doing this but I use and recommend f-strings which were introduced in Python 3.6. All you need to do is put the letter "f" out the front of your string and then you can include variables with curly-bracket notation {}.

In [ ]:
name = "Newborn Baby"
age = 4 / 12
day = 20
month = 3
year = 2020
template_new = f"Hello, my name is {name}. I am {age:.2f} years old. I was born {day}/{month:02}/{year}."
template_new

Note: Notes require no arguments, In the code above, the notation after the colon in my curly braces is for formatting. For example, :.2f means, print this variable with 2 decimal places. See format code options here.

Dictionaries


A dictionary is a mapping between key-values pairs and is defined with curly-brackets:

In [ ]:
house = {
    "bedrooms": 3,
    "bathrooms": 2,
    "city": "West Lafayette",
    "price": 2499999,
    "date_sold": (1, 3, 2015),
}

condo = {
    "bedrooms": 2,
    "bathrooms": 1,
    "city": "Atlanta",
    "price": 699999,
    "date_sold": (27, 8, 2011),
}

We can access a specific field of a dictionary with square brackets:

In [ ]:
house["price"]
In [ ]:
condo["city"]

We can also edit dictionaries (they are mutable):

In [ ]:
condo["price"] = 5  # price already in the dict
condo
In [ ]:
condo["flooring"] = "wood"
In [ ]:
condo

We can also delete fields entirely (though I rarely use this):

In [ ]:
del condo["city"]
In [ ]:
condo

And we can easily add fields:

In [ ]:
condo[5] = 443345
In [ ]:
condo

Keys may be any immutable data type, even a tuple!

In [ ]:
condo[(1, 2, 3)] = 777
condo

You'll get an error if you try to access a non-existent key:

In [ ]:
condo["not-here"]

Empties


Sometimes you'll want to create empty objects that will be filled later on.

In [ ]:
lst = list()  # empty list
lst
In [ ]:
lst = []  # empty list
lst

There's no real difference between the two methods above, [] is apparently marginally faster...

In [ ]:
tup = tuple()  # empty tuple
tup
In [ ]:
tup = ()  # empty tuple
tup
In [ ]:
dic = dict()  # empty dict
dic
In [ ]:
dic = {}  # empty dict
dic
In [ ]:
st = set()  # empty set
st

Conditionals


Conditional statements allow us to write programs where only certain blocks of code are executed depending on the state of the program. Let's look at some examples and take note of the keywords, syntax and indentation.

In [ ]:
name = "Davi"

if name.lower() == "davi":
    print("That's my name too!")
elif name.lower() == "gabriel":
    print("That's a funny name.")
else:
    print(f"Hello {name}! That's a cool name!")
print("Nice to meet you!")

The main points to notice:

  • Use keywords if, elif and else
  • The colon : ends each conditional expression
  • Indentation (by 4 empty space) defines code blocks
  • In an if statement, the first block whose conditional statement returns True is executed and the program exits the if block
  • if statements don't necessarily need elif or else
  • elif lets us check several conditions
  • else lets us evaluate a default block if all other conditions are False
  • the end of the entire if statement is where the indentation returns to the same level as the first if keyword

If statements can also be nested inside of one another:

In [ ]:
name = "Super Davi"

if name.lower() == "davi":
    print("That's my name too!")
elif name.lower() == "gabriel":
    print("That's a funny name.")
else:
    print(f"Hello {name}! That's a cool name.")
    if name.lower().startswith("super"):
        print("Do you really have superpowers?")

print("Nice to meet you!")

Inline if/else


We can write simple if statements "inline", i.e., in a single line, for simplicity.

In [ ]:
words = ["the", "list", "of", "words"]

x = "long list" if len(words) > 10 else "short list"
x
In [ ]:
if len(words) > 10:
    x = "long list"
else:
    x = "short list"
In [ ]:
x

Truth Value Testing

Any object can be tested for "truth" in Python, for use in if and while statements.

  • True values: all objects return True unless they are a bool object with value False or have len() == 0
  • False values: None, False, 0, empty sequences and collections: '', (), [], {}, set()

Tip: Read more in the docs here.

In [ ]:
x = 1

if x:
    print("I'm truthy!")
else:
    print("I'm falsey!")
In [ ]:
x = False

if x:
    print("I'm truthy!")
else:
    print("I'm falsey!")
In [ ]:
x = []

if x:
    print("I'm truthy!")
else:
    print("I'm falsey!")

Short-circuiting

Python supports a concept known as "short-circuting". This is the automatic stopping of the execution of boolean operation if the truth value of expression has already been determined.

In [ ]:
fake_variable  # not defined
In [ ]:
True or fake_variable
In [ ]:
True and fake_variable
In [ ]:
False and fake_variable
Expression Result Detail
A or B If A is True then A else B B only executed if A is False
A and B If A is False then A else B B only executed if A is True
In [ ]:
!jupyter nbconvert _01-py-basics.ipynb --to html --template classic --output 01-py-basics.html

Thank you!