QTM 350: Data Science Computing

Command Line

Davi Moreira


About

The document offers a short guide on utilizing Command Line effectively for data science projects.[1]

Why the Command Line?

When we are introduced to computers, we are often driven towards point-and-click usability. These are called Graphical user interfaces (GUIs). GUIs are great for many tasks, but they are not good for all tasks. GUIs work for what they were developed for. In that sense, it limits the way you interact with your computer condinational on the original development of the GUI at hand. If you need more, you need to speak directly to your operation system

The command line (CLI) is a program that allows you to interact directly with your operating system. So by using the command line, you have more control over all programs run by your machine. From the command line, you can run scripts, call programs like Python and R directly, and build all sorts of data pipelines.

The CLI will help us:

  1. Understand file paths on your computer
  2. Serves as a common hub from which to work
  3. Allow for us to generate reproducible coding sequences (via running scripts)
  4. Streamline work flow
    • set projects up
    • work between languages
    • batch process heavy loads
  5. Vital when speaking to a computing cluster, working on a virtual machine, or ssh-ing into a local computer - SSH, or Secure Shell, is a protocol used to securely access and manage a computer remotely.

Important: All this will come with time, and you will not finish the course as an expert in CLI. The reason we start the course covering this 101 tool is that the command line will feature into our discussion on using git and also running Python programs.

Accessing the Command Line

The "command line" line can differ, however, given what machine you're running.

  • If you're on a Mac a unix command line comes installed on your machine. This is your terminal, which is an application available on all macs.

  • If you're on a Windows machine, you'll need to activate your Ubuntu terminal by turning on the developer mode on your computer. Instructions on how to do that can be found here. (Note that there are also other alternatives, such as putty)

For a more in-depth overview of the potential of Unix command line, see The Linux Command Line by William Shotts

Common command line commands

The following outlines a few common commands that will be useful as you move forward. Disclaimer: some of these commands may differ given your operating system, but it's only quick Google/GPT search to find out how things are done on your machine.

  • pwd: check working directory
  • cd <path>: change working directory
    • cd ..: go back to the last directory
    • cd ~: go to home directory
    • cd -: go back to where you once where
  • ls: list all files in the working directory
  • mkdir <dir name>: make a directory
  • mv <old path> <new path>: move file from old path to new path
  • cp <old path> <new path>: copy file from old path to new path
  • ctr + c: stops current execution.
  • cat <file>: print the entire file
  • head: view the start of a file to some $N$ number of lines
    • head -n 3 file
  • tail: view the end of a file to some $N$ number of lines

    • tail -n 3 file
  • Making a file:

    • touch <file name>
    • echo 'text' > file
  • Renaming a File:

    • mv <old file name> <new file name>
  • Asking for help:

    • man <command name>
    • <command name> -h
In [5]:
!jupyter nbconvert _basics_of_cmd.ipynb --to html --template classic
[NbConvertApp] Converting notebook _basics_of_cmd.ipynb to html
[NbConvertApp] Writing 279127 bytes to _basics_of_cmd.html

1. ^: This document was originally developed by Professor Tiago Ventura and adapted to our courses purposes.