Topic 1: Course Set Up

About

The document offers a guide about the course infrastructure.[1]

Introduction

Throughout the semester, we will use a combination of tools. This a summary of the main tools:

  • CommandLine: primarily to interact with git, install programs and run a few scripts

  • Python3: for programming tasks

  • Git/Github: for version control, reproducibility, for sharing materials, for your problem sets, and final project

  • Jupyter Notebooks: as a main IDE to work with Python

  • Quarto via RStudio: as a secondary IDE (you can make your primary if you prefer) to coding in Python/R/SQL

  • Google Colab: is a free cloud service hosted by Google which allows anyone to write and execute arbitrary Python code through the browser. It’s particularly useful for machine learning, and data analysis.

Let’s cover how to set up each of these tools in your local machines.

Warning

If you run into issues, please reach out to the Teaching Assistant for assistance

CommandLine

At times, we’ll use a unix-based commandline. The commandline will feature into our discussion on using git and also running Python programs. If you use a Mac or a Linux operating system, then a functioning commandline comes with your operating system. For Apple machines, this is the Terminal.

For Windows (specifically Windows 10), you can enable Linux Bash shell. The following offers a tutorial on how to do this.

If you’re using a version of Windows that pre-dates version 10, then Git Bash offers a program will allow you to use git commands from your windows machine.

We will cover some concepts of working with the commandline. You can get a full notebook with a intro to commandline in the materials for Topic 2

Python3

We’ll use Python3 throughout this course. Below are instructions for downloading Python3 using commandline packages manager (Homebrew for mac, Chocolatey for windows).

An alternative way to install Python3 is to download an Anaconda distribution. I will use pip rather than conda in the instruction for downloading Python modules. These are simply two ways of downloading and managing open-source software packages. Choose which ever works best for you

Most computers already have python3 installed. You can check if that is your case through your commandline:

python3 --version

On some versions of Windows, you may need to use py instead of python3:

py --version

In either case, the output of this command should be something like Python 3.8.5

Jupyter Notebooks

Once you have Python3 on your computer, you can install a Jupyter Notebook. If you downloaded Python3 using Anaconda, then Jupyter Notebook comes with the distribution and requires no further installation on your part. If you are not using Anaconda, you can install Jupyter notebook running the following code using your commandline.

# on your command line
pip install jupyter

You can then activate a Jupyter Notebook from the commandline by typing:

# on your command line
jupyter notebook

Workflow to work with Juyter Notebooks using the commandline.

  1. Open the terminal

  2. Navigate (using cd) to the folder you want to be the root of your jupyter notebook

  3. Open the notebook (jupyter notebook)

It looks like this if I were to open a notebook in the folder I have for this course

# open terminal
cd qtm_350
jupyter notebook

Workflow with Anaconda.

If you installed Python using Anaconda distribution system (here: https://www.anaconda.com/products/individual). You can open Jupyter through a point-and-click system.

In the lecture notes, you can also find a Introduction to Jupyter notebook.

Rstudio + Reticulate|Quarto

In your classes that are focused on using R, RStudio will be your main IDE. However, RStudio is not just for R. It can handle a number of different languages. We can use Python in RStudio using the reticulate package.

Check the intro to quarto notebook on how to use Python in Rstudio. Let’s cover some of the installation steps here:

To install RStudio, download from the following link. reticulate is a R package that allows one run a Python REPL in the R console. In addition, it allows one to read in and use Python code, and pass data between R and Python. The following provides instructions on installing reticulate.

With reticulate, you can use Rstudio as a IDE for Python. Another option is to use Quarto (the next-generation version of R Markdown) as an unified framework to generate notebooks with text + code. If you are an R Markdown user, you will see how Quarto is just an extension of the capabilities that were previously provided by R Markdown. Now, instead of .rmd files, we have .qmd files. Quarto is already installed with RStudio.

Git

Git/GitHub instructions to check before the next session:

1. This document was originally developed by Professor Tiago Ventura and adapted to our courses purposes.