--- title: "metaGE-vignette" author: "Annaig De Walsche" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{metaGE-vignette} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", out.width = '70%' ) ``` This package provides functionalities to perform a meta-analysis of genetic-wide association studies results in plant quantitative genetics. The present document displays the code and results corresponding to the analysis of the package data set. # Package installation First one need to install the `metaGE` package. ```{r intall metaGE} if (!require('metaGE')){ install.packages('metaGE') } library('metaGE') ``` The analysis requires the following R packages: ```{r setup,warning=FALSE,message=FALSE} library(tidyverse) library(DT) ## For graphical displays library(data.table) library(corrplot) library(ggplot2) ``` # Build the dataset The starting point of the analysis is the list of files corresponding to the association studies results performed in the different environments. Here we consider only three association files for simplification. These files are included in the package. ```{r listing files of GWAS results } ## Get the folder containing the association file RepData <- system.file("extdata", package = "metaGE") ## Get the complete list of association files File.list <- list.files(RepData ,full.names = TRUE) %>% tibble(Names = .)%>% mutate(ShortNames = Names %>% str_remove(pattern = paste0(RepData,"/")) %>% str_remove(pattern = "_3DF.txt")) %>% select(ShortNames,Names) %>% deframe File.list ``` Here is what the data look like in a single file: ```{r looking at single file } ## Have a look at the first one fread(File.list[1]) %>% head() %>% datatable() ``` One now needs to collect the results of all association files into a single dataset using the `metaGE.collect` function. Notice that all files may not have the same SNPs (due to filtering per environment). This will result in NAs when files are merged. By default, any marker with NAs is discarded. Specify NA.rmv = FALSE if you want to keep the marker with NAs. ```{r metaGE collect} ###Build the dataset ## First provide the list of variable names Names.list <- list(MARKER="Marker_Name", CHR="Chromosome", POS="Marker_Position", FREQ="Maf", EFFECT="SNP_Weight", PVAL="Pvalue", ALLELE0="Allele1", ALLELE1="Allele2") MinFreq <- 0.07 ## Now collect MetaData <- metaGE.collect(FileNames = File.list, VariableNames = Names.list, MinFreq = MinFreq) head(MetaData$Data) %>% datatable() ``` The association files can be in several folders. In this case, one can define the FileNames argument as a list of lists. The argument VariableNames may also be a list of lists. ```{r metaGE collect folders} ## Get the list of the association files File.list1 <- File.list[1:2] ## Get the list of the other association files File.list2 <- File.list[3] File.listc <- list("List1" = File.list1, "List2" = File.list2) File.listc ###Build the dataset ## First provide the list of variable names Names.listc <- list(Names.list, Names.list) MinFreq <- 0.07 ## Now collect MetaData <- metaGE.collect(FileNames = File.listc, VariableNames = Names.listc, MinFreq = MinFreq) head(MetaData$Data) %>% datatable() ``` From now on, we consider the dataset `metaData` included in the package `metaGE`. This dataset contains the results of 10 different genetic association studies on maize lines testing the association between a set of 27 411 markers and the grain yield. ```{r import data} # Import the data data("metaData") ``` # Accounting for correlations between individual GWAS One can compute the correlations between the individual GWAS using the `metaGE.cor` function. ```{r build matcorr} Threshold <- 0.8 MatCorr <- metaGE.cor(metaData, Threshold = Threshold) ``` Here is what the correlation matrix looks like: ```{r show matcorr,fig.align = 'center',fig.height=8,fig.width=8} corrplot(MatCorr,order = "hclust") ``` # Global meta-analysis procedures One may fit different models with the `metaGE.fit` function depending on the argument `Method` : - if `Method = 'Fe'`, then the Fixed Effect (Fe) model is fitted, and a test to identify globally significant markers is performed. - if `Method ='Re'`, then Random Effect (Re) model is fitted, and a test that allows some heterogeneity on the effect of the marker is performed. Here are the two different models: ```{r metaGE fit} ## Fixed Effect FeDF <- metaGE.fit(metaData, MatCorr, Method = "Fe") head(FeDF %>% select(CHR, POS, MARKER, Mu, Tau, PVALUE)) %>% datatable() ## Random Effect ReDF <- metaGE.fit(metaData, MatCorr, Method = "Re") head(ReDF %>% select(CHR, POS, MARKER, Mu, Tau, PVALUE)) %>% datatable() ``` One can have a look at the pvalues one gets for the different sets of tests to check for any problem. This can be done using the `metaGE.pvalplot` function, which displays the p-value distribution and the QQplot of the -log10(pvalues). ```{r metaGE pvalplot,fig.height=3,fig.width=5,out.width='45%'} Pvalue.list <- list('PVALUE.Fe'= FeDF$PVALUE, 'PVALUE.Re'= ReDF$PVALUE) plott <- map(names(Pvalue.list),~metaGE.pvalplot(Pvalue.list[[.x]], Main= .x) ) ``` ## Check the candidates First apply some multiple testing procedure (here Benjamini-Hochberg with $H_0$ prior estimation). ```{r correcting pvalues} CorrectingPValues <- function(Pvalues){ ## Get a p0 estimate p0 = min(2*sum(Pvalues > 0.5)/length(Pvalues),1-1/length(Pvalues)) ## Get the corrected p-values CorrectedPval <- stats::p.adjust(Pvalues, method = 'BH')*p0 return(CorrectedPval) } ``` Here is the number of significant markers at a 0.05 threshold for the different sets of tests : ```{r FDR control} ## FDR control Alpha <- 0.05 Signif <- lapply(Pvalue.list, FUN = function(i){ return(CorrectingPValues(i) %>% `<`(Alpha) %>% which)}) lapply(X = Signif,FUN = length) ``` ## Manhattan plot and heatmap One can draw the corresponding manhattan plot using the `metaGE.manhattan` : ```{r manhattan plot,fig.align='center',fig.height=6,fig.width=10} PvalThresholdFe <-FeDF[Signif$PVALUE.Fe,]$PVALUE%>% max %>% max(.,0) manhattanFe <- metaGE.manhattan(Data = FeDF,VarName = 'PVALUE', Threshold = PvalThresholdFe,Main = '-log10(Pval) alongside the chromosome Fe method' ) print(manhattanFe) PvalThresholdRe <- ReDF[Signif$PVALUE.Re,]$PVALUE%>% max %>% max(.,0) manhattanRe <- metaGE.manhattan(Data = ReDF,VarName = 'PVALUE', Threshold = PvalThresholdRe,Main = '-log10(Pval) alongside the chromosome Re method') print(manhattanRe) ``` One can draw the corresponding heatmap using the `metaGE.heatmap` : ```{r heatmap,fig.align='center',fig.height=6,fig.width=10} heatmapFe <- metaGE.heatmap(Data = FeDF[Signif$PVALUE.Fe,],Prefix = "Z.",Main = "Z-scores of Fe significant markers across environments") heatmapRe <- metaGE.heatmap(Data = ReDF[Signif$PVALUE.Re,],Prefix = "Z.", Main = "Z-scores of Re significant markers across environments") ``` # Score local One can use the score local approach developed by Fariello MI, Boitard S, Mercier S, et al.(2017) using the `metaGE.lscore`. This approach aims to detect significant regions in a genome sequence by accumulating single marker p-values. The technical details of the computation can be found in Fariello MI, Boitard S, Mercier S, et al. Accounting for linkage disequilibrium in genome scans for selection without individual genotypes: The local score approach. Mol Ecol. 2017;00:1–15. https://doi.org/10.1111/mec.14141. ```{r local score FE} x <- 3 FeDF_ls <- metaGE.lscore(Data = FeDF, PvalName = "PVALUE", xi = x) ``` Here are the significant regions founded by the score local approach : ```{r sigzones Fe} FeDF_ls$SigZones %>% datatable() ``` One can draw the manhattan plot of the score local and highlight the significtives markers using the `metaGE.manhattan` : ```{r manhattan Fe lscore,fig.align='center',fig.height=6,fig.width=10} manhattanFe_lscore <- metaGE.manhattan(Data = FeDF_ls$Data, VarName = "SCORE", Score = T, SigZones = FeDF_ls$SigZones ) print(manhattanFe_lscore+ggtitle('Score local alongside the chromosome Fe method')) ``` # Tests for GxE interactions Different tests may be performed with the `metaGE.test` function depending on the argument : - if `Incidence` is provided to the function and `Contrast = NULL`, then a test of contrast to identify markers significant for at least one subclass of environments is performed. - if `Incidence` and `Contrast` are provided to the function, then the test of contrast specified is performed. - if `Covariate` is provided to the function, then Random Effect regression, a test to identify significant markers correlated to environments covariate, is performed. One can perform several tests by providing a list of the arguments `Contrast` and/or `Incidence` and/or `Covariate`. Some covariates describing the environments are available in the `envDesc` data set : ```{r import environment data} data("envDesc") envDesc %>% datatable() ``` ## Tests of contrast First, one must build the incidence matrix using the `metaGE.incidence` function. ```{r incidence matrix} ## Build the incidence matrix (Incidence.Temp <- metaGE.incidence(VarName = "Temp",Covariate = envDesc,EnvName = "ShortName", Data = metaData)) ``` One can test whether the markers are significant in at least one environment subclass by setting `Contrast` to `NULL`. One can also identify significant markers with a distinct effect for the different subclasses of environments by specifying the appropriate `Contrast`. One can use the `metaGE.test` function to perform tests of contrast. ```{r metaGE contrast test} ## Build the list of Incidence Incidence.list <- list(Temp = Incidence.Temp, Diff.Temp = Incidence.Temp) #Build the list of Contrast Contrast.list <- list(Temp = NULL, Diff.Temp = matrix(c(1,-1,0,0,1,-1),2,byrow = T)) ContrastDF <- metaGE.test(Data = metaData, MatCorr = MatCorr, Incidence = Incidence.list, Contrast = Contrast.list) ``` Here are the number of significant markers at a 0.05 threshold on control FDR for the different sets of tests : (Benjamini-Hochberg with $H_0$ prior estimation) ```{r FDR control test contrast} Alpha <- 0.05 SignifContrast <- apply(ContrastDF %>% select(contains("PVALUE.")),2, FUN = function(i){return(CorrectingPValues(i) %>% `<`(Alpha) %>% which)}) lapply(SignifContrast,length) ``` One can draw the corresponding heatmap using the `metaGE.heatmap` : ```{r contrast heatmap,fig.align='center',fig.height=6,fig.width=10} # Specify the groups of environment heatmap_Temp <- metaGE.heatmap(Data = ContrastDF[SignifContrast$PVALUE.Temp,],EnvGroups = envDesc[,1:2],Prefix = "Z.",Main = "Z-scores of markers with contrasted effect according the temperature") ``` ## Tests of meta-regression One may want to identify markers correlated to environments covariate. These can be done by performing a meta-regression test thanks to the function `metaGE.test` by providing a data frame containing one column corresponding to the environments (the first one) and column(s) corresponding to the covariate(s) in the argument `Covariate`.One can test whether the markers are significant in at least one environment subclass by setting `Contrast` to `NULL`. One can also identify significant markers with a distinct effect for the different subclasses of environments by specifying the appropriate `Contrast`. One can use the `metaGE.test` function to perform tests of contrast. ```{r regression test} RegressionDF <- metaGE.test(Data=metaData,MatCorr = MatCorr,Covariate = envDesc[,c(1,5,6)],EnvName = "ShortName") ``` Here are the number of significant markers at a 0.05 threshold on control FDR for the meta-regression tests : (Benjamini-Hochberg with $H_0$ prior estimation) ```{r FDR control test regression} SignifRegression <- apply(RegressionDF %>% select(contains("PVALUE.")),2, FUN = function(i){return(CorrectingPValues(i) %>% `<`(Alpha) %>% which)}) lapply(SignifRegression,length) ``` Here are the five most significant markers correlated to the Tnight.mean covariate. ```{r significant marker test regression} RegressionDF[SignifRegression$PVALUE.Tnight.mean,] %>% select(CHR, POS, MARKER, PVALUE.Tnight.mean) %>% arrange(PVALUE.Tnight.mean) %>% head() %>% datatable() ``` One can draw the graph of the marker effects according to the covariate using the `metaGE.regplot`. ```{r metaGE regplot,fig.align='center',fig.height=7,fig.width=10} metaGE.regplot(Data = metaData, Covariate = envDesc,VarName = "Tnight.mean",EnvName = "ShortName", MarkerName = "AX-90548528", aesCol = "Classification") ```