--- title: 'SWATH2stats example script' output: pdf_document: keep_tex: yes vignette: > %\VignetteIndexEntry{SWATH2stats example script} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- Example R code showing the usage of the SWATH2stats package. The data processed is the publicly available dataset of *S.pyogenes* (Röst et al. 2014) (http://www.peptideatlas.org/PASS/PASS00289). The results file 'rawOpenSwathResults\_1pcnt\_only.tsv' can be found on PeptideAtlas (ftp://PASS00289@ftp.peptideatlas.org/../Spyogenes/results/). This is a R Markdown file, showing the result of processing this data. The lines shaded in grey represent the R code executed during this analysis. The SWATH2stats package can be directly installed from Bioconductor using the commands below (http://bioconductor.org/packages/devel/bioc/html/SWATH2stats.html). ```{r, eval=FALSE} if (!require("BiocManager")) install.packages("BiocManager") BiocManager::install("SWATH2stats") ``` # Part 1: Loading and annotation Load the SWATH-MS example data from the package, this is a reduced file in order to limit the file size of the package. ```{r} library(SWATH2stats) library(data.table) data('Spyogenes', package = 'SWATH2stats') ``` Alternatively the original file downloaded from the Peptide Atlas can be loaded from the working directory. ```{r, eval=FALSE} data <- data.frame(fread('rawOpenSwathResults_1pcnt_only.tsv', sep='\t', header=TRUE)) ``` Extract the study design information from the file names. Alternatively, the study design table can be provided as an external table. ```{r, tidy=TRUE} Study_design <- data.frame(Filename = unique(data$align_origfilename)) Study_design$Filename <- gsub('.*strep_align/(.*)_all_peakgroups.*', '\\1', Study_design$Filename) Study_design$Condition <- gsub('(Strep.*)_Repl.*', '\\1', Study_design$Filename) Study_design$BioReplicate <- gsub('.*Repl([[:digit:]])_.*', '\\1', Study_design$Filename) Study_design$Run <- seq_len(nrow(Study_design)) head(Study_design) ``` The SWATH-MS data is annotated using the study design table. ```{r} data.annotated <- sample_annotation(data, Study_design, column_file = "align_origfilename") ``` Remove the decoy peptides for a subsequent inspection of the data. ```{r} data.annotated.nodecoy <- subset(data.annotated, decoy==FALSE) ``` \newpage # Part 2: Analyze correlation, variation and signal Count the different analytes for the different injections. ```{r} count_analytes(data.annotated.nodecoy) ``` Plot the correlation of the signal intensity. ```{r, fig.height=2.5, fig.width = 6} correlation <- plot_correlation_between_samples(data.annotated.nodecoy, column.values = 'Intensity') ``` Plot the correlation of the delta_rt, which is the deviation of the retention time from the expected retention time. ```{r, fig.height=2.5, fig.width = 6} correlation <- plot_correlation_between_samples(data.annotated.nodecoy, column.values = 'delta_rt') ``` \newpage Plot the variation of the signal across replicates. ```{r, fig.height=2.5, fig.width = 5.5} variation <- plot_variation(data.annotated.nodecoy) variation[[2]] ``` Plot the total variation versus variation within replicates. ```{r, fig.height=2.5, fig.width = 5.5} variation_total <- plot_variation_vs_total(data.annotated.nodecoy) variation_total[[2]] ``` Calculate the summed signal per peptide and protein across samples. ```{r} peptide_signal <- write_matrix_peptides(data.annotated.nodecoy) protein_signal <- write_matrix_proteins(data.annotated.nodecoy) head(protein_signal) ``` \newpage # Part 3: FDR estimation Estimate the overall FDR across runs using a target decoy strategy. ```{r, fig.height = 3.5} par(mfrow = c(1, 3)) fdr_target_decoy <- assess_fdr_overall(data.annotated, n_range = 10, FFT = 0.25, output = 'Rconsole') ``` According to this FDR estimation one would need to filter the data with a lower mscore threshold to reach an overall protein FDR of 5%. ```{r} mscore4protfdr(data, FFT = 0.25, fdr_target = 0.05) ``` # Part 4: Filtering Filter data for values that pass the 0.001 mscore criteria in at least two replicates of one condition. ```{r} data.filtered <- filter_mscore_condition(data.annotated, 0.001, n_replica = 2) ``` Select only the 10 peptides showing strongest signal per protein. ```{r} data.filtered2 <- filter_on_max_peptides(data.filtered, n_peptides = 10) ``` \newpage Filter for proteins that are supported by at least two peptides. ```{r} data.filtered3 <- filter_on_min_peptides(data.filtered2, n_peptides = 2) ``` # Part 5: Conversion Convert the data into a transition-level format (one row per transition measured). ```{r} data.transition <- disaggregate(data.filtered3) ``` Convert the data into the format required by MSstats. ```{r} MSstats.input <- convert4MSstats(data.transition) head(MSstats.input) ``` Convert the data into the format required by mapDIA. ```{r} mapDIA.input <- convert4mapDIA(data.transition) head(mapDIA.input) ``` Convert the data into the format required by aLFQ. ```{r} aLFQ.input <- convert4aLFQ(data.transition) head(aLFQ.input) ``` Session info on the R version and packages used. ```{r} sessionInfo() ```