MGMT 17300: Data Mining Lab

Installing Packages and Importing Data in R

Professor: Davi Moreira

August 01, 2024

Overview

  • R Packages and Their Importance
    • Installing and loading R packages
    • Using popular packages like ggplot2, dplyr, and tidyr
  • Data Import in R
    • Importing data from text files, Excel sheets, CSV files, and SQL databases
  • Exporting Data from R
    • Exporting data to CSV files, Excel files, and text files
    • Saving R-specific objects using RDS files

R Packages

R Packages and Their Importance


R functionality is enhanced by packages—which are like “apps” in the smartphone ecosystem.

# Example of installing a package
install.packages("readxl")

# Loading a package to use its functions
library(readxl)

The Tidyverse Project



In space, no one can hear you scream.

– Alien (1979)



The Tidyverse Project

Tidyverse Project

The tidyverse is a collection of R packages designed for data science. All packages share an underlying philosophy and common APIs.

The Tidyverse Data Science Workflow

Tidyverse Project

Installing Packages


You can install packages in two ways:

  1. Using the R command line with install.packages()
  2. Using RStudio, by navigating to the “Packages” tab and searching for the desired package.
# Installing multiple packages at once
install.packages(c("readxl", "ggplot2"))

Loading Packages


After installation, you must load a package to use it:

# Loading the readxl package
library(readxl)


Note: Some commonly used packages for data analysis are ggplot2, dplyr, and tidyr.

Data Import in R

Data Import in R


R allows importing data from various file formats like text files, Excel sheets, CSV files, and SQL databases.

Importing Text Files


# Importing a text file
data <- read.table("data.txt", header = TRUE, sep = ",")

Importing Excel Files


Make sure you have the readxl package installed:

# Importing data from an Excel file
library(readxl)
data <- read_excel("data/SalesData.xlsx")


To import a specific sheet:

# Importing a specific sheet from an Excel file
data <- read_excel("c:/mydata/data.xlsx", sheet = "Sheet1")

Importing CSV Files


CSV files are commonly used and can be easily imported using read.csv():

# Importing data from a CSV file
data <- read.csv("data.csv")

Importing Data from SQL Databases


To import data from a SQL database, you can use the RODBC package:

# Importing from SQL database
library(RODBC)
conn <- odbcConnect("database_name")
data <- sqlQuery(conn, "SELECT * FROM table_name")
odbcClose(conn)


Note: For this course, we will primarily focus on importing data from Excel spreadsheets.

Exporting Data from R

Exporting Data from R


Once your data analysis is complete, you’ll often need to export the data for further use or reporting. R provides several ways to export datasets to various formats, including CSV, Excel, and text files.

Exporting Data to CSV Files


One of the most common way to export data from R is to save it as a CSV file using the write.csv() function.

# Exporting a dataset to a CSV file
write.csv(data, "output_data.csv", row.names = FALSE)
  • The first argument is the data you want to export.
  • The second argument is the file name (path) for the exported file.
  • row.names = FALSE avoids adding an extra column for row numbers.

Exporting Data to Excel Files


You can export data to Excel using the writexl package. First, make sure it’s installed.

# Install the writexl package
install.packages("writexl")

# Exporting a dataset to an Excel file
library(writexl)
write_xlsx(data, "data/output_data.xlsx")


Note: The write_xlsx() function saves the data into an Excel file, and you can specify the file path.

Exporting Data to Text Files


For exporting data to a text file, you can use the write.table() function. This is particularly useful when you want to use a delimiter other than commas, such as tabs.

# Exporting a dataset to a tab-delimited text file
write.table(data, "output_data.txt", sep = "\t", row.names = FALSE)
  • The sep argument specifies the delimiter used in the file (in this case, tabs).

Exporting Data to RDS Files


RDS is a format specific to R that allows you to save R objects and reload them later.

# Exporting data to an RDS file
saveRDS(data, "data.rds")

# Loading the RDS file back into R
data <- readRDS("data.rds")


Note: RDS files are useful when you want to save R objects for later use within R itself.

Summary

Summary

  • Packages are essential in extending R’s functionality.

  • You can install and load packages easily with install.packages() and library() functions.

  • R supports importing data from multiple sources, including text files, Excel sheets, CSV files, and SQL databases.

  • You can export datasets to various formats in R, including CSV, Excel, text files, and RDS.

    • write.csv() and write_xlsx() are common functions for CSV and Excel exports.
    • write.table() allows for more customizable exports, such as tab-delimited files.
    • Use saveRDS() and readRDS() for saving and reloading R-specific objects.

Thank you!