Python - Basic Wrangling With Pandas - Practice

QTM 350: Data Science Computing

Davi Moreira

Introduction

No description has been provided for this image

This topic material is based on the Python Programming for Data Science book and adapted for our purposes in the course.

Exercise¶

In this set of practice exercises we'll look at the dataset of consumption and carbon footprints of different foods that we looked at in the last set of practice exercises, which was compiled by Kasia Kulma and contributed to R's Tidy Tuesday project.

Let's start by importing pandas with the alias pd.

# Your answer here.

Exercise¶

As a reminder, the dataset has the following columns:

column	description
country	Country Name
food_category	Food Category
consumption	Consumption (kg/person/year)
co2_emmission	Co2 Emission (Kg CO2/person/year)

Import the dataset as a dataframe named df from this url: https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv

# Your answer here.

Exercise¶

What country consumes the most food per person per year (across all food categories)?

# Your answer here.

Exercise¶

Which food category is the biggest contributor to the above country's consumption total?

# Your answer here.

Exercise¶

What country produces the most kg C02 per person per year?

# Your answer here.

Exercise¶

Which food category is the biggest contributor to the above country's C02 emissions?

# Your answer here.

Exercise¶

What food category produces the most C02 per person per year across all countries?

# Your answer here.

Exercise¶

What food category is consumed the most across all countries per person per year? What food category is consumed the least across all countries?

# Your answer here.

Exercise¶

Make the dataset wide by pivoting on the food_category column. You'll end up with a "multi-index" dataframe, with multiple levels of columns.

# Your answer here.

Exercise¶

Now that the dataset is wide, I want you to answer the same question from Question 5 above: "What country produces the most kg C02 per person per year?". Specifically, I want you to notice that the way we answer the same data analysis question changes depending on the format of the data (wide vs long). You can form your own opinion on which option you prefer - remember that many visualization libraries work best with long data. Hint: you can index the outer layer of a multi-index column using the same syntax we've seen previously: df['co2_emmission'], if you wanted to access an inner index, you'd have to use a tuple: df[("consumption", "Beef")].

# Your answer here.

!jupyter nbconvert _08-wrangling-basics-practice.ipynb --to html --template classic --output 08-wrangling-basics-practice.html

[NbConvertApp] Converting notebook _08-wrangling-basics-practice.ipynb to html
[NbConvertApp] Writing 285591 bytes to 08-wrangling-basics-practice.html

Python - Basic Wrangling With Pandas - Practice

QTM 350: Data Science Computing

Davi Moreira

Introduction

Exercise¶

Exercise¶

Exercise¶

Exercise¶

Exercise¶

Exercise¶

Exercise¶

Exercise¶

Exercise¶

Exercise¶

Have fun!