Remember to
- download
portal_mammals.sqlite.- connect
portal_mammals.sqliteto SQLite Manager.
Aggregation
- Aggregation lets us combine rows into groups based on their values and calculate combined values for each group.
SELECT AVG(weight)
FROM surveys;
- Other aggregation functions include MIN,MAX,SUM, andCOUNT.
SELECT MIN(weight), MAX(weight), AVG(weight)
FROM surveys;
- We can use GROUP BYto calculate these values for different groups. For example, to get the min, max, and average weight for each species.
SELECT species_id, MIN(weight), MAX(weight), AVG(weight)
FROM surveys
GROUP BY species_id;
- We can group by multiple columns as well.
SELECT species_id, plot_id, MIN(weight), MAX(weight), AVG(weight)
FROM surveys
GROUP BY species_id, plot_id;
- Aggregation functions remove null values from the calculations.
- To count the number of individuals identified to species
SELECT species_id, plot_id, COUNT(species_id)
FROM surveys;
- To count the number of individuals trapped use
SELECT species_id, plot_id, COUNT(*)
FROM surveys;
- Using *counts any row with at least one non-null value
- We can name aggregated columns using as
SELECT species_id, plot_id, COUNT(*) as count
FROM surveys;
Filtering after aggregation
- To filter by an outcome of an aggregation use HAVING
SELECT species_id, plot_id, MIN(weight), MAX(weight), AVG(weight)
FROM surveys
GROUP BY species_id, plot_id
HAVING MIN(weight) IS NOT NULL;
- This works after aggregation, whereas WHEREworks before aggregation
SELECT species_id, plot_id, MIN(weight), MAX(weight), AVG(weight)
FROM surveys
WHERE weight > 10
GROUP BY species_id, plot_id
HAVING MIN(weight) IS NOT NULL;
Only use grouped or aggregated fields in SELECT
- When using GROUP BYonly fields that are in theGROUP BYclause or aggregated fields should be used in theSELECT
SELECT species_id, plot_id, MIN(weight), MAX(weight), AVG(weight)
FROM surveys
WHERE weight IS NOT NULL
GROUP BY species_id, plot_id;
- Using fields that aren’t grouped or aggregated give counter intuitive results.
SELECT species_id, plot_id, MIN(weight), MAX(weight), AVG(weight)
FROM surveys
WHERE weight IS NOT NULL
GROUP BY species_id;
