Working with Data

Learning Objectives

Following this assignment students should be able to:

install and load an R package

understand the data manipulation functions of dplyr

execute a simple import and analyze data scenario

Reading

Topics
- dplyr
Readings
- dplyr vignette
- Optional Resources:
  - Analyzing data with dplyr
  - R4DS - Data transformation

Lecture Notes

dplyr

1) Week 4 “Quiz” on Canvas

NOTE: Big Q this week is 6!

Complete the 5 exercises for the week.

This week you will need to create a project, zip up the directory and upload the compressed file. Remember to completely comment your work and and create your script such that I can reproduce your work on my computer. tip: Exercise 5, Q3 asks for a text file. This should be in a subdirectory in your project.

2) Summary Sheets Due Thursday

This is a sheet of paper where you use a writing instrument to create a mindmap / cheat sheet.

Exercises

-- dplyr --

Install and familiarize yourself with the dplyr package. The library() step(s) should always be located at the very top of a script.
```
install.packages("dplyr")

library(dplyr)

help(package = dplyr)
```
This vignette is a great reference for data manipulation verbs to keep in mind.
-- Shrub Volume Data Basics --

This is a follow-up to Shrub Volume Data Frame.

Dr. Granger is interested in studying the factors controlling the size and carbon storage of shrubs. This research is part of a larger area of research trying to understand carbon storage by plants. She has conducted a small preliminary experiment looking at the effect of three different treatments on shrub volume at four different locations. She has placed the data file on the web for you to download:
- shrub dimensions data
Download this into your data folder and get familiar with the data by importing the shrub dimensions data using read.csv() and then:
1. Check the column names in the data using the function names().
2. Use str() to show the structure of the data frame and its individual columns.
3. Print out the first few rows of the data using the function head().
  
  Use dplyr to complete the remaining tasks.
4. Select the data from the length column and print it out.
5. Select the data from the site and experiment columns and print it out.
6. Filter the data for all of the plants with heights greater than 5 and print out the result.
7. Create a new data frame called shrub_data_w_vols that includes all of the original data and a new column containing the volumes, and display it.
[click here for output]
-- Shrub Volume Aggregation --

This is a follow-up to Shrub Volume Data Basics.

Dr. Granger wants some summary data of the plants at her sites and for her experiments. Make sure you have her shrub dimensions data.

This code calculates the average height of a plant at each site:
```
by_site <- group_by(shrub_dims, site)
avg_height <- summarize(by_site, avg_height = mean(height))
```
1. Modify the code to calculate and print the average height of a plant in each experiment.
2. Use max() to determine the maximum height of a plant at each site.
[click here for output]
-- Shrub Volume Join --

This is a follow-up to Shrub Volume Aggregation.

Dr. Granger has kept a separate table that describes the manipulation for each experiment. Add the experiments data to your data folder.

Import the experiments data and then use inner_join to combine it with the shrub dimensions data to add a manipulation column to the shrub data.
[click here for output]
-- Fix the Code --

This is a follow-up to Shrub Volume Aggregation. If you haven’t already downloaded the shrub volume data do so now and store it in your data directory.

The following code is supposed to import the shrub volume data and calculate the average shrub volume for each site and, separately, for each experiment
```
read.csv("data/shrub-volume-experiment.csv")
shrub_data %>%
  mutate(volume = length * width * height) %>%
  group_by(site) %>%
  summarize(mean_volume = max(volume))
shrub_data %>%
  mutate(volume = length * width * height)
  group_by(experiment) %>%
  summarize(mean_volume = mean(volume))
```
1. Fix the errors in the code so that it does what it’s supposed to
2. Add a comment to the top of the code explaining what it does
3. In a text file, discuss how you know that your fixed version of the code is right and how you would try to make sure it was right if the data file was thousands of lines long
[click here for output]

Introduction to Data Analysis for Aquatic Sciences

Assignment

Learning Objectives

Reading

Lecture Notes

1) Week 4 “Quiz” on Canvas

2) Summary Sheets Due Thursday

Exercises

-- dplyr --

-- Shrub Volume Data Basics --

-- Shrub Volume Aggregation --

-- Shrub Volume Join --

-- Fix the Code --