Test
Oly-Genetics-NF-Testing
Laura H Spencer
January 17, 2018
Oly Genetics Analysis, playing around with data
Today I played with the 2016/2017 Oly microsatellite data in R, generating some summary statistics by following a few online tutorials. What I’ve learned: * The spreadhseet “Olympic Oyster NFH_NFW (1).xlsx” from Crystal houses data from NF population (Fidalgo Bay) from two populations: 2016 hatchery-produced (F1), and 2017 wild. Crystal already determined bins, rounded the microsatellite data to produce: 100 samples with diploid genotype data across 6 loci (2 alleles per loci).
* microsatellite data is often in a particular spreadsheet format, whith metadata in the first 2 columns, and summary data in the first 2 rows. I exported the rounded data from Crystal’s spreadsheet, modified to conform with the required format, and saved as 3 separate files: 1. NFH-2016 alone 2. NFW-2017 alone 3. NFH-2016 & NFW-2017 merged * NOTE: There are additional microsatellite data from 2015, available online from Rick/Andy: https://www.dropbox.com/sh/0tz6a4f8tz8rwap/AACXhjgrZc2UWYSAk4N-uoM6a?dl=0
My github repo: https://github.com/laurahspencer/O.lurida_genetics
# Import 2016/2017 microsatellite data
# Used the following reference: https://grunwaldlab.github.io/Population_Genetics_in_R/TOC.html
NFH.2016 <- read.genalex("Data/Oly2016NFH_Rounded.csv", ploidy=2) #read
NFW.2017 <- read.genalex("Data/Oly2017NFW_Rounded.csv", ploidy=2)
NF <- read.genalex("Data/Oly2016NFH+2017NFW_Merged.csv", ploidy=2)
summary(NFH.2016)
##
## // Number of individuals: 100
## // Group sizes: 100
## // Number of alleles per locus: 31 14 14 17 18 18
## // Number of alleles per group: 112
## // Percentage of missing data: 3 %
## // Observed heterozygosity: 0.93 0.71 0.95 0.94 0.94 0.93
## // Expected heterozygosity: 0.94 0.75 0.86 0.92 0.92 0.92
summary(NFW.2017) #summary of wild samples
##
## // Number of individuals: 99
## // Group sizes: 99
## // Number of alleles per locus: 30 13 15 19 20 19
## // Number of alleles per group: 116
## // Percentage of missing data: 1.85 %
## // Observed heterozygosity: 0.98 0.71 0.88 0.93 0.94 0.93
## // Expected heterozygosity: 0.94 0.74 0.87 0.93 0.93 0.93
summary(NF) #summary of hatchery and wild combined
##
## // Number of individuals: 199
## // Group sizes: 99 100
## // Number of alleles per locus: 34 15 17 19 20 19
## // Number of alleles per group: 116 112
## // Percentage of missing data: 2.43 %
## // Observed heterozygosity: 0.95 0.71 0.91 0.93 0.94 0.93
## // Expected heterozygosity: 0.95 0.75 0.87 0.93 0.93 0.93
info_table(NF, type="missing", plot=TRUE) #see how the missing data is distributed over the 2 populations
## Locus
## Population Olur10 Olur11 Olur12 Olur13 Olur15 Olur19 Mean
## NFW-2017 0.020 0.040 0.010 0.010 0.010 0.020 0.019
## NFH-2016 0.020 0.060 . 0.040 0.020 0.040 0.030
## Total 0.020 0.050 0.005 0.025 0.015 0.030 0.024
mlg.table(NF) #genotype eveness. Result is N=199; MLG=199
## Warning: package 'bindrcpp' was built under R version 3.3.2
## MLG.1 MLG.2 MLG.3 MLG.4 MLG.5 MLG.6 MLG.7 MLG.8 MLG.9 MLG.10
## NFW-2017 0 0 1 1 1 0 0 1 0 0
## NFH-2016 1 1 0 0 0 1 1 0 1 1
## MLG.11 MLG.12 MLG.13 MLG.14 MLG.15 MLG.16 MLG.17 MLG.18 MLG.19
## NFW-2017 0 0 0 1 1 0 1 1 0
## NFH-2016 1 1 1 0 0 1 0 0 1
## MLG.20 MLG.21 MLG.22 MLG.23 MLG.24 MLG.25 MLG.26 MLG.27 MLG.28
## NFW-2017 1 0 1 0 0 1 0 1 1
## NFH-2016 0 1 0 1 1 0 1 0 0
## MLG.29 MLG.30 MLG.31 MLG.32 MLG.33 MLG.34 MLG.35 MLG.36 MLG.37
## NFW-2017 1 1 0 1 1 0 1 0 1
## NFH-2016 0 0 1 0 0 1 0 1 0
## MLG.38 MLG.39 MLG.40 MLG.41 MLG.42 MLG.43 MLG.44 MLG.45 MLG.46
## NFW-2017 1 0 1 1 0 0 0 1 1
## NFH-2016 0 1 0 0 1 1 1 0 0
## MLG.47 MLG.48 MLG.49 MLG.50 MLG.51 MLG.52 MLG.53 MLG.54 MLG.55
## NFW-2017 0 1 0 1 0 1 1 0 0
## NFH-2016 1 0 1 0 1 0 0 1 1
## MLG.56 MLG.57 MLG.58 MLG.59 MLG.60 MLG.61 MLG.62 MLG.63 MLG.64
## NFW-2017 0 0 1 0 0 1 0 0 0
## NFH-2016 1 1 0 1 1 0 1 1 1
## MLG.65 MLG.66 MLG.67 MLG.68 MLG.69 MLG.70 MLG.71 MLG.72 MLG.73
## NFW-2017 0 1 1 0 0 0 0 1 1
## NFH-2016 1 0 0 1 1 1 1 0 0
## MLG.74 MLG.75 MLG.76 MLG.77 MLG.78 MLG.79 MLG.80 MLG.81 MLG.82
## NFW-2017 1 1 1 1 0 0 1 0 1
## NFH-2016 0 0 0 0 1 1 0 1 0
## MLG.83 MLG.84 MLG.85 MLG.86 MLG.87 MLG.88 MLG.89 MLG.90 MLG.91
## NFW-2017 1 0 1 1 0 0 1 0 0
## NFH-2016 0 1 0 0 1 1 0 1 1
## MLG.92 MLG.93 MLG.94 MLG.95 MLG.96 MLG.97 MLG.98 MLG.99 MLG.100
## NFW-2017 1 0 0 1 0 0 1 1 1
## NFH-2016 0 1 1 0 1 1 0 0 0
## MLG.101 MLG.102 MLG.103 MLG.104 MLG.105 MLG.106 MLG.107 MLG.108
## NFW-2017 0 1 1 1 1 0 0 1
## NFH-2016 1 0 0 0 0 1 1 0
## MLG.109 MLG.110 MLG.111 MLG.112 MLG.113 MLG.114 MLG.115 MLG.116
## NFW-2017 1 1 1 0 0 0 1 0
## NFH-2016 0 0 0 1 1 1 0 1
## MLG.117 MLG.118 MLG.119 MLG.120 MLG.121 MLG.122 MLG.123 MLG.124
## NFW-2017 0 1 0 1 0 1 1 1
## NFH-2016 1 0 1 0 1 0 0 0
## MLG.125 MLG.126 MLG.127 MLG.128 MLG.129 MLG.130 MLG.131 MLG.132
## NFW-2017 1 0 1 0 0 0 0 0
## NFH-2016 0 1 0 1 1 1 1 1
## MLG.133 MLG.134 MLG.135 MLG.136 MLG.137 MLG.138 MLG.139 MLG.140
## NFW-2017 1 0 0 0 0 1 0 0
## NFH-2016 0 1 1 1 1 0 1 1
## MLG.141 MLG.142 MLG.143 MLG.144 MLG.145 MLG.146 MLG.147 MLG.148
## NFW-2017 1 1 0 0 0 0 1 0
## NFH-2016 0 0 1 1 1 1 0 1
## MLG.149 MLG.150 MLG.151 MLG.152 MLG.153 MLG.154 MLG.155 MLG.156
## NFW-2017 1 0 1 1 1 0 1 1
## NFH-2016 0 1 0 0 0 1 0 0
## MLG.157 MLG.158 MLG.159 MLG.160 MLG.161 MLG.162 MLG.163 MLG.164
## NFW-2017 0 1 1 0 0 1 0 1
## NFH-2016 1 0 0 1 1 0 1 0
## MLG.165 MLG.166 MLG.167 MLG.168 MLG.169 MLG.170 MLG.171 MLG.172
## NFW-2017 1 1 0 1 1 1 1 0
## NFH-2016 0 0 1 0 0 0 0 1
## MLG.173 MLG.174 MLG.175 MLG.176 MLG.177 MLG.178 MLG.179 MLG.180
## NFW-2017 0 0 0 0 1 1 0 0
## NFH-2016 1 1 1 1 0 0 1 1
## MLG.181 MLG.182 MLG.183 MLG.184 MLG.185 MLG.186 MLG.187 MLG.188
## NFW-2017 1 1 1 1 1 0 0 1
## NFH-2016 0 0 0 0 0 1 1 0
## MLG.189 MLG.190 MLG.191 MLG.192 MLG.193 MLG.194 MLG.195 MLG.196
## NFW-2017 1 0 1 0 0 0 1 0
## NFH-2016 0 1 0 1 1 1 0 1
## MLG.197 MLG.198 MLG.199
## NFW-2017 0 1 1
## NFH-2016 1 0 0
Generate more summary statistics using poppr
function
Abbreviations used in t Statistic
* Pop: Population name.
* N: Number of individuals observed.
* MLG: Number of multilocus genotypes (MLG) observed.
* eMLG: The number of expected MLG at the smallest sample size ≥ 10 based on rarefaction
* SE: Standard error based on eMLG.
* H: Shannon-Wiener Index of MLG diversity (Shannon, 2001).
* G: Stoddart and Taylor’s Index of MLG diversity (Stoddart & Taylor, 1988).
* lambda: Simpson’s Index (Simpson, 1949). 0 = no genotypes are differet; 1 = all genotypes are different
* E.5: Evenness, E5E5 (Pielou, 1975; Ludwig & Reynolds, 1988; Grünwald et al., 2003).
* Hexp: Nei’s unbiased gene diversity (Nei, 1978).
* Ia: The index of association, IAIA (Brown, Feldman & Nevo, 1980; Smith et al., 1993).
* rbarD: The standardized index of association, r¯dr¯d [@].
NF.pop <- poppr(NF) #summary stats on each population
NF.pop
## Pop N MLG eMLG SE H G lambda E.5 Hexp Ia rbarD
## 1 NFW-2017 99 99 99 0.00e+00 4.60 99 0.990 1 0.894 0.611 0.124
## 2 NFH-2016 100 100 99 2.58e-08 4.61 100 0.990 1 0.891 0.590 0.120
## 3 Total 199 199 99 0.00e+00 5.29 199 0.995 1 0.893 0.605 0.122
## File
## 1 NF
## 2 NF
## 3 NF
(NF.pop$N / (NF.pop$N - 1)) * NF.pop$lambda #corrected simpson's index (N/(N-1)) #all different genotypes
## [1] 1 1 1
Are our populations in Hardy-Weinberg equilibrium?
Hardy‐Weinberg Assumptions include: • infinite population • discrete generations • random mating • no selection • no migration in or out of population • no mutation • equal initial genotype frequencies in the two sexes Equilibrium is reached after one generation of mating under the Hardy‐Weinberg assumptions…Genotype frequencies remain the same from generation to generation.
library("pegas")
## Warning: package 'pegas' was built under R version 3.3.2
## Loading required package: ape
## Warning: package 'ape' was built under R version 3.3.2
##
## Attaching package: 'pegas'
## The following object is masked from 'package:ape':
##
## mst
## The following object is masked from 'package:ade4':
##
## amova
NF.HW <- seppop(NF) %>% lapply(hw.test, B=1000) #all P-values >0.05; reject the null that these populations are under HWE
NF.HW
## $`NFW-2017`
## chi^2 df Pr(chi^2 >) Pr.exact
## Olur10 447.58545 435 0.3280911 0.285
## Olur11 48.88706 78 0.9959968 0.789
## Olur12 70.89750 105 0.9956481 0.918
## Olur13 166.41974 171 0.5846400 0.319
## Olur15 191.69997 190 0.4517938 0.315
## Olur19 165.39521 171 0.6065314 0.325
##
## $`NFH-2016`
## chi^2 df Pr(chi^2 >) Pr.exact
## Olur10 417.06735 465 0.9459973 0.615
## Olur11 58.00523 91 0.9972368 0.806
## Olur12 67.18583 91 0.9711158 0.822
## Olur13 137.46193 136 0.4487866 0.130
## Olur15 159.88890 153 0.3350415 0.085
## Olur19 159.56688 153 0.3415811 0.101
Hardy-Weinberg test results: reject the null that these populations are under HWE for all 6 loci. Here is a table with p-values for all loci, for each population:
NF.HW.P <- sapply(NF.HW, "[", i=TRUE, j=3) #pvalues of HW chi-squared test for all loci, both pops combined into a dataframe
NF.HW.P
## NFW-2017 NFH-2016
## Olur10 0.3280911 0.9459973
## Olur11 0.9959968 0.9972368
## Olur12 0.9956481 0.9711158
## Olur13 0.5846400 0.4487866
## Olur15 0.4517938 0.3350415
## Olur19 0.6065314 0.3415811
Are populations in linkage disequilibrium?
Test the null hypothesis that alleles observed at different loci are not linked. This is the case if populations are sexual while alleles recombine freely into new genotypes during the process of sexual reproduction: IA =VO/VE -1 … where V0 is the observed variance of K and VE is the expected variance of K, where K is the number of loci at which two individuals differ.
library("magrittr")
NF.ia.H <- ia(popsub(NF, "NFH-2016"), sample=999)