Python - Basic Wrangling With Pandas - Practice

QTM 350: Data Science Computing

Davi Moreira

Introduction


No description has been provided for this image

This topic material is based on the Python Programming for Data Science book and adapted for our purposes in the course.

Exercise

In this set of practice exercises we'll look at the dataset of consumption and carbon footprints of different foods that we looked at in the last set of practice exercises, which was compiled by Kasia Kulma and contributed to R's Tidy Tuesday project.

Let's start by importing pandas with the alias pd.

In [1]:
# Your answer here.

Exercise

As a reminder, the dataset has the following columns:

column description
country Country Name
food_category Food Category
consumption Consumption (kg/person/year)
co2_emmission Co2 Emission (Kg CO2/person/year)

Import the dataset as a dataframe named df from this url: https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv

In [2]:
# Your answer here.

Exercise

What country consumes the most food per person per year (across all food categories)?

In [3]:
# Your answer here.

Exercise

Which food category is the biggest contributor to the above country's consumption total?

In [4]:
# Your answer here.

Exercise

What country produces the most kg C02 per person per year?

In [5]:
# Your answer here.

Exercise

Which food category is the biggest contributor to the above country's C02 emissions?

In [6]:
# Your answer here.

Exercise

What food category produces the most C02 per person per year across all countries?

In [7]:
# Your answer here.

Exercise

What food category is consumed the most across all countries per person per year? What food category is consumed the least across all countries?

In [8]:
# Your answer here.

Exercise

Make the dataset wide by pivoting on the food_category column. You'll end up with a "multi-index" dataframe, with multiple levels of columns.

In [9]:
# Your answer here.

Exercise

Now that the dataset is wide, I want you to answer the same question from Question 5 above: "What country produces the most kg C02 per person per year?". Specifically, I want you to notice that the way we answer the same data analysis question changes depending on the format of the data (wide vs long). You can form your own opinion on which option you prefer - remember that many visualization libraries work best with long data. Hint: you can index the outer layer of a multi-index column using the same syntax we've seen previously: df['co2_emmission'], if you wanted to access an inner index, you'd have to use a tuple: df[("consumption", "Beef")].

In [10]:
# Your answer here.
In [1]:
!jupyter nbconvert _08-wrangling-basics-practice.ipynb --to html --template classic --output 08-wrangling-basics-practice.html
[NbConvertApp] Converting notebook _08-wrangling-basics-practice.ipynb to html
[NbConvertApp] Writing 285591 bytes to 08-wrangling-basics-practice.html

Have fun!