Learning Objectives
Following this assignment students should be able to:
- understand the basic query structure of SQL
- execute SQL commands to select, sort, group, and aggregate data
- use joins to combine tables in SQL
Reading
- Databases Intro
- Basic Queries - Selecting, Filtering, Sorting, Nulls
- Aggregation - Video, Reading
- Joins - Video, Reading
Lecture Notes
Exercises
-- SELECT --
For this and many of the following problems you will create queries that retrieve the relevant information from the Portal small mammal survey database. Download the data. As you begin to familiarize yourself with the database you will need to know some details regarding what is in this database in order to answer the questions. For example, you may need to know what species is associated with the two character species ID or you may need to know the units for the individual’s weight. This type of information associated with data is called metadata and the metadata for this dataset is available online at Ecological Archives.
- Write a query that displays all of the records for all of the fields (
*
) in the main table. Save it as a view namedall_survey_data
. - We want to generate data for an analysis of body size differences (using
both weight and hind foot length) between males and females of each
species. We have decided that we can ignore the information related to when
and where the individuals were trapped. Create a query that returns all of
the necessary information, but nothing else. Save this as
size_differences_among_sexes_data
.
- Write a query that displays all of the records for all of the fields (
-- WHERE --
A population biologist (Dr. Undomiel) who studies the population dynamics of Dipodomys spectabilis would like to use some data from the Portal Project, but she doesn’t know how to work with large datasets. Being the kind and benevolent person that you are, write a query to extract the data that she needs. She wants only the data for her species of interest (
DS
in thespecies_id
column), when each individual was trapped, and what sex it was. She doesn’t care about the plot the individual was trapped on or the size of the individuals. She also doesn’t need the species codes because you’re only providing her with the data for one species, and since she isn’t looking at the database itself the two character abbreviation would probably be confusing. Save this query as a view with the namespectabilis_population_data
.Scrolling through the results of your query you notice that the data on sex is missing for some species. You send Dr. Undomiel a short e-mail* asking what she would like you to do regarding this complexity. Dr. Undomiel asks that you create two additional queries so that she can decided what to do about this issue later. Add a query that retrieves the same data as above, but only for cases where the sex is known to be male, and an additional query with the same data, but only where the sex is known to be female. Save these as views with the names
spectabilis_population_data_males
andspectabilis_population_data_females
.*Short for elven-mail
[click here for output] [click here for output] [click here for output]-- ORDER BY --
The graduate students that work at the Portal site are hanging out late one evening drinking… soda pop… and they decide it would be an epically awesome idea to put together a list of the 100 largest rodents ever sampled at the site. Since you’re the resident computer genius they text you, and since you’re up late working and this sounds like a lot more fun than the homework you’re working on (which isn’t really saying much, if you know what I’m saying) you decide you’ll make the list for them.
The rules that the Portal students have come up with (and they did spend a sort of disturbingly long time coming up with these rules; I guess you just had to be there) are:
- The data should include the
species_id
,year
, and theweight
. These columns should be presented in this order. - Individuals should be sorted in descending order with respect to mass.
- Since individuals often have the same mass, ties should be settled by
sorting next by
hindfoot_length
and finally by theyear
.
Since you need to limit this list to the top 100 largest rodents, you’ll need to add the SQL command
[click here for output]LIMIT 100
to the end of the query. Save the final query as100_largest_individuals
.- The data should include the
-- DISTINCT --
Write a query that returns a list of the dates that mammal surveys took place at Portal with no duplicates. Save it as
[click here for output]dates_sampled
.-- Missing Data --
Write a query that returns the
[click here for output]year
,month
,day
,species_id
, andweight
for every record were there is no missing data in any of these fields. Save it asno_missing_data
.-- GROUP BY --
Using GROUP BY, write a query that returns a list of dates on which individuals of the species Dipodomys spectabilis (indicated by the
[click here for output]DS
species code) were trapped (with no duplicates). Sort the list in chronological order (from oldest to newest). Save it asdates_with_dipodomys_spectabilis
.-- COUNT --
Write a query that returns the number of individuals trapped in each year for all known species combined (i.e., where the
[click here for output]species_id
is not null). Name the count columntotal_abundance
and sort it chronologically. Include the year in the output. Save it astotal_abundance_by_year
. There should only be one value for each year since this is a count of the individuals across all species in that year.-- SUM --
Write a query that returns the number of individuals of each species captured in each year (
[click here for output]total_abundance
) and thetotal_biomass
of those individuals (the sum of theweight
). The units for biomass should be in kilograms. Include theyear
andspecies_id
in the output. Sort the result chronologically by year and then alphabetically by species. Save asmass_abundance_data
.-- Basic Join --
Write a query that returns the
[click here for output]year
,month
, andday
for each individual captured as well as it’sgenus
andspecies
names. This can be accomplished by joining thespecies
table to thesurveys
table using thespecies_id
column in both tables. Save this query asspecies_captures_by_date
.-- Multi-table Join --
The
plots
table in the Portal database can be joined to thesurveys
table by joiningplot_id
toplot_id
and thespecies
table can be joined to thesurveys
table by joiningspecies_id
tospecies_id
.The Portal mammal data include data from a number of different experimental manipulations. You want to do a time-series analysis of the population dynamics of all of the species at the site, taking into account the different experimental manipulations. Write a query that returns the
[click here for output]year
,month
,day
,genus
andspecies
of every individual as well as theplot_id
andplot_type
of the plot they are captured on. Save this query asspecies_plot_data
.-- Filtered Join --
You are curious about what other kinds of animals get caught in the Sherman traps used to census the rodents. Write a query that returns a list of the
[click here for output]genus
,species
, andtaxa
(from thespecies
table) for non-rodent individuals that are caught on theControl
plots. Non-rodents are indicated in thetaxa
column of thespecies
table. You are only interested in which species are captured, so make this list unique (only one line for each species). Save this query asnon_rodents_on_controls
.-- Detailed Join --
We want to do an analysis comparing the size of individuals on the
[click here for output]Control
plots to theLong-term Krat Exclosures
. Write a query that returns theyear
,genus
,species
,weight
and theplot_type
for all cases where the plot type is eitherControl
orLong-term Krat Exclosure
. Be sure to choose only rodents and exclude individuals that have not been identified to species (i.e., exclude species withsp.
in the species column). Remove any records where theweight
is missing. Save this query assize_comparison_controls_vs_krat_exclosures
.-- Aggregated Join --
Write a query that displays the total number of rodent individuals sampled on each
[click here for output]plot_type
. Save this query asindividuals_per_plot_type
.