There were a relatively large number of extinctions of mammalian species roughly 10,000 years ago. To help understand why these extinctions happened scientists are interested in understanding whether there were differences in the body size of those species that went extinct and those that did not.
To address this question we can use the
largest dataset on mammalian body size in the world,
which has data on the mass of recently extinct mammals as well as extant mammals
(i.e., those that are still alive today). Take a look at the
metadata to
understand the structure of the data. One key thing to remember is that species
can occur on more than one continent, and if they do then they will occur more
than once in this dataset. Also let’s ignore species that went extinct in the
very recent past (designated by the word "historical" in the status
column).
Import the data into R. If you’ve looked at a lot of data you’ll realize that
this dataset is tab delimited. Use the argument sep = "\t" in read.csv() to
properly format the data. There is no header row, so use head = FALSE. The
unknown value used in the dataset is -999. R assumes your unknown value is
NA, but "NA" in the data is the code for North America. Use the additional
arguments stringsAsFactors = FALSE, na.strings = "-999" in read.csv() to get
R to keep "NA" as a string and transform -999 to NA.
It’s probably a good idea to add column names to help identify columns:
colnames(mammal_sizes) <- c("continent", "status", "order",
"family", "genus", "species", "log_mass", "combined_mass",
"reference")
dplyr would be one way to do
this). Export your results to a csv file where the first entry on each
line is the continent, the second entry is the average mass of the extant
species on that continent, and the third entry is the average mass of the
extinct species on that continent (spread() from tidyr is a handy way to
convert the standard dplyr output to this form). Call the file
continent_mass_differences.csv.log_mass rather
than the mass itself so that you can see the form of the distributions more
clearly. facet_grid or facet_wrap may be useful to laying out the
subplots. Label the plots to make it clear to someone viewing them what they
are looking at. Save the graph or graphs as .png file(s) (this should
happen automatically in the code).