Topic 1: Course Set Up
About
The document offers a guide about the course infrastructure.[1]
Introduction
Throughout the semester, we will use a combination of tools. This a summary of the main tools:
CommandLine: primarily to interact with git, install programs and run a few scripts
Python3: for programming tasks
Git/Github: for version control, reproducibility, for sharing materials, for your problem sets, and final project
Jupyter Notebooks: as a main IDE to work with Python
Quarto via RStudio: as a secondary IDE (you can make your primary if you prefer) to coding in Python/R/SQL
Google Colab: is a free cloud service hosted by Google which allows anyone to write and execute arbitrary Python code through the browser. It’s particularly useful for machine learning, and data analysis.
Let’s cover how to set up each of these tools in your local machines.
If you run into issues, please reach out to the Teaching Assistant for assistance
CommandLine
At times, we’ll use a unix-based commandline. The commandline will feature into our discussion on using git
and also running Python programs. If you use a Mac or a Linux operating system, then a functioning commandline comes with your operating system. For Apple machines, this is the Terminal.
For Windows (specifically Windows 10), you can enable Linux Bash shell. The following offers a tutorial on how to do this.
If you’re using a version of Windows that pre-dates version 10, then Git Bash offers a program will allow you to use git
commands from your windows machine.
We will cover some concepts of working with the commandline. You can get a full notebook with a intro to commandline in the materials for Topic 2
Python3
We’ll use Python3 throughout this course. Below are instructions for downloading Python3 using commandline packages manager (Homebrew
for mac, Chocolatey
for windows).
An alternative way to install Python3 is to download an Anaconda distribution. I will use pip
rather than conda
in the instruction for downloading Python modules. These are simply two ways of downloading and managing open-source software packages. Choose which ever works best for you
Most computers already have python3 installed. You can check if that is your case through your commandline:
python3 --version
On some versions of Windows, you may need to use py instead of python3:
py --version
In either case, the output of this command should be something like Python 3.8.5
Jupyter Notebooks
Once you have Python3 on your computer, you can install a Jupyter Notebook. If you downloaded Python3 using Anaconda, then Jupyter Notebook comes with the distribution and requires no further installation on your part. If you are not using Anaconda, you can install Jupyter notebook running the following code using your commandline.
# on your command line
pip install jupyter
You can then activate a Jupyter Notebook from the commandline by typing:
# on your command line
jupyter notebook
Workflow to work with Juyter Notebooks using the commandline.
Open the terminal
Navigate (using
cd
) to the folder you want to be the root of your jupyter notebookOpen the notebook (
jupyter notebook
)
It looks like this if I were to open a notebook in the folder I have for this course
# open terminal
cd qtm_350 jupyter notebook
Workflow with Anaconda.
If you installed Python using Anaconda distribution system (here: https://www.anaconda.com/products/individual). You can open Jupyter through a point-and-click system.
In the lecture notes, you can also find a Introduction to Jupyter notebook.
Rstudio + Reticulate|Quarto
In your classes that are focused on using R
, RStudio
will be your main IDE. However, RStudio
is not just for R
. It can handle a number of different languages. We can use Python in RStudio
using the reticulate
package.
Check the intro to quarto notebook on how to use Python in Rstudio. Let’s cover some of the installation steps here:
To install RStudio
, download from the following link. reticulate
is a R
package that allows one run a Python
REPL in the R console. In addition, it allows one to read in and use Python
code, and pass data between R
and Python. The following provides instructions on installing reticulate
.
With reticulate
, you can use Rstudio as a IDE for Python. Another option is to use Quarto
(the next-generation version of R Markdown) as an unified framework to generate notebooks with text + code. If you are an R Markdown
user, you will see how Quarto
is just an extension of the capabilities that were previously provided by R Markdown
. Now, instead of .rmd
files, we have .qmd
files. Quarto
is already installed with RStudio.
Git
Git/GitHub instructions to check before the next session:
1. This document was originally developed by Professor Tiago Ventura and adapted to our courses purposes.