QTM 350: Data Science Computing

Using Jupyter Notebooks, Magic Commands, & Extensions

Davi Moreira


About

The document offers a short guide on utilizing Jupyter Notebooks effectively for data science projects.[1]

What is a Notebook

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain

  • live code,
  • equations,
  • visualizations and
  • narrative text.

Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

.ipynb is really a JSON file

An Jupyter notebook is a JSON (JavaScript Object Notation) file.

Open a notebook with a text editor and you will see it!

Pros and Cons of Jupyter Noteboks

Pros:

  • Notebooks are ubiquitous,
  • Reproducible: transmitting and conveying results
  • We can build code interactively (like we do in R). This makes Jupyter notebooks particularly friendly when you are first learning Python
  • stable

Cons:

  • There is a process to spinning a Notebook up.
  • It doesn't allow you to run the code line-by-line
  • For those used to working with a text editor, writing code in cells on notebook can be frustating.
  • Non-linear: In a Jupyter Notebook, you write and execute code in cells, which can be run in any order. This flexibility is one of the Notebook's strengths, but it can also lead to confusion or errors.

Initializing a Notebook

There are two primary methods for initializing a notebook.

  1. Via the command line + Jupyter
    • Go into the working directory containing your .ipynb notebook.
      • e.g. cd /Users/me/Desktop/
    • type jupyter notebook
    • the web application will open up in your default browser.
    • from there, click on the notebook and "spin it up". The notebook will then be "running".
    • We can close the notebook by clicking on the Quit and Logout buttons on the page.
      • Quit == close the local server (i.e. the web application connection)
      • Logout == shut down the home page of the web application (but keep the server running)
    • We can also close the server connection in the console using the combo of Control-C in the console.
    • We can also relocate the the server (say if we accidentally close the Notebook) by using the local URL pathway provided when the notebook first activates.
  • Jupyter lab

Instead of using the simple jupyter notebooks, you can also follow the same steps above using JupyterLab. Jupyter notebook only offers a very simple interface. Jupyter lab offers a more interactive interface that includes notebooks, consoles, terminals, CSV editors, markdown editors, interactive maps, and more.

To use Jupyter lab:

  • Install JupyterLab
  • Go into the terminal
  • Type jupyter lab
  • Navigate to your folder, and code.
  1. Point and Click with Anaconda Distribution

If you prefer point-and-click, you can start jupyter notebook through your anaconda distribution.

  • Make sure Anaconda is installed
  • Go to Applications and click on the Anaconda-Navigator icon
  • Click on the Launch icon under Jupyter Notebook

Using Jupyter Notebooks with Google Colab is a great way to leverage cloud computing resources for data science and machine learning projects. Google Colab offers a free service where you can run Jupyter Notebooks in the cloud, with access to powerful hardware like GPUs and TPUs. Here are some guidelines to get started and use it effectively:

  1. Google Colab
  • Visit Google Colab.
  • You can start a new notebook by clicking on 'New Notebook', or you can upload an existing Jupyter notebook from your computer.
  • Google Colab's interface is similar to Jupyter Notebook. It consists of cells where you can write and execute Python code or add text in Markdown format.
  • The toolbar offers options to add new cells, run cells, change cell types, and access other useful features.
  • Write Python code in code cells and execute it by pressing Shift + Enter.
  • You can import libraries and modules as you would in a local Jupyter environment.
  • For documentation, use text cells. You can format your text using Markdown syntax. This is useful for explaining your code and analyses.
  • Working with Data: You can upload data files directly to your Colab environment using the file upload feature, or you can mount your Google Drive to access files stored there.
  • Installing Additional Libraries: If you need libraries that are not pre-installed, you can install them using !pip install or !apt-get install commands.

Kernels

A kernel is a computational engine that executes the code contained in a notebook document. A cell (or "Chunk") is a container for text to be displayed in the notebook or code to be executed by the notebook's kernel.

We can only have one type of kernel running for any given notebook (we can not change between kernels in the middle of a notebook). Here is a list of all the kernels that you can use with a jupyter notebook. For example, we can easily employ an R kernel in a jupyter notebook. This was always the notebooks original intent. Actually, "Jupyter" is a loose acronym meaning Julia, Python and R.


Usage

Code Chunks

Code chunks are what we use to execute Python (or whatever kernel we have running) code. In addition, we can write prose in a code chunk by altering the metadata regarding how the code should be run.

There are two states of a code chunk:

  • Edit Mode: Edit mode is indicated by a green cell border and a prompt showing in the editor area. When a cell is in edit mode, you can type into the cell, like a normal text editor. Enter edit mode by pressing Enter or using the mouse to click on a cell's editor area.

  • Command Mode: Command mode is indicated by a grey cell border with a blue left margin. When you are in command mode, you are able to edit the notebook as a whole, but not type into individual cells. Most importantly, in command mode, the keyboard is mapped to a set of shortcuts that let you perform notebook and cell actions efficiently. For example, if you are in command mode and you press c, you will copy the current cell - no modifier is needed. Don't try to type into a cell in command mode. Enter command mode by pressing Esc or using the mouse to click outside a cell's editor area.

We can switch between Markdown and Code chunks either

  • By using the drop down menu in the tool bar (in either mode)

  • By using the shortcut:

    • Press y when on the cell in Command Mode to switch to a code chunk.
    • Press m when on the cell in Command Mode to switch to a markdown chunk

Executing Code

A code chunk will always reflect the behavior of the kernel that you're using (e.g. a Python code chunk will follow Python coding Syntax).

Best Practices

  • Break code chunks up!
  • Every code chunk should render some output (the aim is to be able to read what we were doing without needing to fire the notebook back up)
  • Use spaces. Keep the chunk readable. Less is more.

Using Markdown

The Markdown chunks will use the Markdown and will allow for writing mathematical equations using LaTex.

Shortcuts

There are a number of useful shortcuts that you can employ to help perform useful tasks in Jupiter Notebooks.

We can access a full (searchable) list of keyboard shortcuts by pressing p when in Command Mode, or by clicking the keyboard icon in the tools.

Important ones while in Command Mode:

  • a: create a new code chunk above the current one.
  • b: create a new code chunk below the current one.
  • ii: interrupt the kernel (really useful when some code is running too long or you've accidentally initiated an infinite loop!
  • y: code mode
  • m: markdown mode
  • shift + m: merge cells (when more than one cell is highlighted)
  • dd: delete cell.

Important ones while in Edit Mode:

  • shit + ctrl + minus: split cell

Magic Commands

Magic commands, and are prefixed by the % character. These magic commands are designed to succinctly solve various common problems in standard data analysis.

Magic commands come in two flavors:

  • line magics, which are denoted by a single % prefix and operate on a single line of input,
  • cell magics, which are denoted by a double %% prefix and operate on multiple lines of input.

List off all the available magic commands.

In [ ]:
%lsmagic

Or consult the quick reference sheet of all available magic

In [ ]:
%quickref

Useful Magic

Here are some useful magic commands that come in handy as you're working with code.

Bookmarking

"Come back here later"

In [ ]:
%bookmark Home

See below

Changing working directories

In [ ]:
cd ~/Dropbox
In [ ]:
pwd

Using the bookmark to return to where we were...

In [ ]:
%cd -b Home
In [ ]:
%pwd

Writing code to files

Extremely useful when we develop some functionality that we would like to utilize later on.

In [ ]:
%%writefile my_fib_func.py
def fib(n):
    '''Fibonacci Sequence'''
    x = [0]*n
    for i in range(n):
        if i == 0:
            x[i] = 0
        elif i == 1:
            x[i] = 1
        else:
            x[i] = x[i-2] + x[i-1]
    return x
In [ ]:
%ls # list files ( see our function)

Reading in files

In [ ]:
# %load my_fib_func.py
def fib(n):
    '''Fibonacci Sequence'''
    x = [0]*n
    for i in range(n):
        if i == 0:
            x[i] = 0
        elif i == 1:
            x[i] = 1
        else:
            x[i] = x[i-2] + x[i-1]
    return x

Run an external file as a program

In [ ]:
%run my_fib_func.py

Timing Code

How fast does what we wrote run?

In [ ]:
%time fib(10)

How long does many runs take (statistical sample)?

In [ ]:
%timeit fib(10)

Look up object names in the name space

In [ ]:
main_dat = [1,2,3,4]
main_key = ["a","b"]
x = 5
y = 6
In [ ]:
%psearch main*

Whenever you encounter an error or exception, just open a new notebook cell, type %debug and run the cell. This will open a command line where you can test your code and inspect all variables right up to the line that threw the error. Type n and hit Enter to run the next line of code (The -> arrow shows you the current position). Use c to continue until the next breakpoint. q quits the debugger and code execution.

Asking for help

In [ ]:
%%timeit?

Notebook Extensions

We can expand the functionality of Jupyter notebooks through extensions. Extensions allow for use to create and use new features that better customize the notebook's user experience. For example, there are extensions for spell check, a table of contents to ease navigation, run code in parallel, and for viewing differences in notebooks when using Version control.

Download python module to install notebook extensions: https://github.com/ipython-contrib/jupyter_contrib_nbextensions

Using PyPi (module manager):

pip install jupyter_nbextensions_configurator jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
jupyter nbextensions_configurator enable --user

Using Conda (Anaconda module manager):

conda install -c conda-forge jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
jupyter nbextensions_configurator enable --user

Extensions can be activated most easily on the home screen when you first activate your Jupyter notebook.

Useful Extensions

  • Collapsible headings: allows you to collapse some parts of the notebooks.
  • Notify: sends a notification when the notebook becomes idle (for long running tasks)
  • Code folding: folds function, loops, and indented code chunks (makes things tidy)
  • nbdime: provides tools for git differencing and merging of Jupyter Notebooks.
    • Requires installation: pip install nbdime
In [1]:
!jupyter nbconvert _using-jupyter-notebooks.ipynb --to html --template classic
[NbConvertApp] Converting notebook _using-jupyter-notebooks.ipynb to html
[NbConvertApp] Writing 303458 bytes to _using-jupyter-notebooks.html

1. ^: This document was originally developed by Professor Tiago Ventura and adapted to our courses purposes.