INSPEcT 1.18.1
The life cycle of RNAs is composed of three main steps, i.e. transcription and processing of the premature RNA (\(P\)) and degradation of the mature (\(M\)). The kinetic rates governing these steps define the dynamics of each transcript (\(k_{1-3}\) for synthesis, processing and degradation, respectively), and their role in transcriptional regulation is often underestimated. A complete understanding of the effects of the rates of the RNA life-cycle on premature and mature RNA requires mathematical and/or computer skills to solve the corresponding system of differential equations:
\[\begin{equation}\label{eq:modelsystem} \left\{ \begin{array}{l l} \dot{P}=k_1 - k_2 \, \cdot \, P \\ \dot{M}=k_2 \, \cdot \, P - k_3 \, \cdot \, M \end{array} \right. \end{equation}\]This system of differential equations is used by INSPEcT to estimate the rates of the RNA-life cycle when transctiptomic data and (possibly) newly-synthesized RNA are available. INSPEcT aims at assessing the dynamics of each gene by modeling the temporal behavior of the RNA kinetic rates with either constant or variable functions.
In order to visualize and interact with output of the modelig procedure of INSPEcT, and to facilitate the understanding of the impact of RNA kinetic rates on the dynamics of premature and mature RNA, we developed a Graphical User Interface (GUI). Specifically, the GUI allows to:
Importantly, we developed two wrapper functions (inspectFromBAM and inspectFromPCR, see INSPEcT vignette for more details), which streamline the generation of novel INSPEcT datasets, to be uploaded in the GUI.
The GUI is distributed within the INSPEcT package, and starts with the following command line operations:
library(INSPEcT)
runINSPEcTGUI()The GUI is divided into 4 sections (Fig. 1):
Figure 1: Representation of the GUI divided into its 4 main sections
At startup, the software loads a predefined INSPEcT object, which contains 10 genes and can be used to explore the software functionalities. This object can be replaced by any INSPEcT dataset previously saved in the “rds” format (“Choose INSPEcT file”, Fig. 2). Genes that are part of the INSPEcT object are divided according to their regulation class. This is encoded by a string where letters representing the step(s) of the RNA life-cycle that are regulated (‘s’ for synthesis, ‘p’ for processing and ‘d’ for degradation) are concatenated. For example: “p” represents a gene only regulated in its processing rate, “sd” a gene regulated in its synthesis and degradation rates. When no rates are identified as regulated the corresponding class is named “no-reg”. Once a regulation class is selected via “Select class”, a specific gene can be chosen from the list that appears in “Select gene” (Fig. 2). Experimental profiles might be smoothed to reduce the noise associated with this kind of data (“Smooth experimental data” in “Select input”, Fig. 2). Nonetheless, raw experimental data are selected by default (“Raw experimental data” in “Select input”, Fig. 2). The “User defined” mode in “Select input” will be covered in section 4.
Figure 2: Interaction with an object of class INSPEcT (section 1)
For the selected gene, the experimental quantifications of the premature and mature RNA levels (estimated from RNA-seq data) are plotted together with their standard deviations (Fig. 3). If nascent RNA has been profiled, the rate of synthesis is also considered part of the experimental data (since it directly dereives from nascent RNA profiling), and it is plotted with its standard deviation. Otherwise, the rate of synthesis is inferred from total RNA-seq data and lacks the standard deviation. The results of the INSPEcT modeling are plotted with continuous lines within the synthesis, pre-RNA, processing, mature RNA and degradation panels (Fig. 3), and can be downloaded in PDF (image) or TSV (tabular) formats. Below the plot panel, the visualization options allow to:
Figure 3: Visualization of the RNA dynamics for a single gene (section 2)
The minimization status corresponding to the modeling is reported in section 3 of the GUI (Fig. 4). In particular, the p-value associated to the goodness-of-fit statistic and the Akaike information criterion indicate the ability of a model to explain the experimental observations. Both these metrics are penalized for the complexity of the model, meaning that they measure a trade-off between its performance and its simplicity, and they can be used to compare models with different complexity. The complexity of a model depends on the functional forms that describe the RNA life-cycle kinetic rates: a constant rate has a complexity of 1, a sigmoidal 4, and impulsive 6, i.e. the number of their parameters. In practice, when two models explain the data adequately well, the simpler one is selected (lower p-value of the goodness-of-fit statistic and lower AIC). Additionally, the goodness-of-fit p-value is used to assess whether the model under consideration adequately explains the experimental data (e.g. p<0.05). Finally, the minimization status is reported, i.e. whether the minimization converged to a local minimum or not. Supplementary iterations can be provided to identify a better minimum, using either Nelder Mead method (NM, used from INSPEcT) or the quasi-Newton BFGS method (button “Optimization - Run”).
Figure 4: Model minimization (section 3)
Parameters of the modeling can be directly modified “by hand”. In fact, for each kinetic rate of the RNA life-cycle, the parameters describing the selected functional form are provided in the right part of the GUI (Fig. 5). Constant rates are described by a single parameter, which correspond to the value of the rates throughout the time-course. Rather, variable rates can be described either by sigmoid or impulse functions. Sigmoids are S-shaped functions described by four parameters: starting levels, final levels, time of transition between starting and final levels, and slope of the response. Impulse functions allow more complex behaviors with two additional parameters that describe time and levels of a second transition, possibly encoding for bell-shaped responses. The range of the sliders for starting and final levels can be set for each rate (“set min” and “set max”), giving full flexibility in the setting of rates levels. At startup, these ranges are set to cover the range of all parameters of the example dataset. Each time a new dataset is loaded, ranges are updated accordingly.
Figure 5: Direct interaction with RNA life-cycle kinetic rates (section 4)
The user can test regulative scenarios that are alternative to the one selected by INSPEcT, by tuning each function parameter, or by changing the functional form assigned to one or more rates. In the latter case, the new parameter settings can be defined by hand, or searched via minimization of the error over the data (3.3). Noteworthy, within the derivative framework, variable functional forms are contrained to be identical among kinetic rates. Therefore, a combination of sigmoid (or impulse) and constants is allowed, while sigmoid and impulse cannot be combined. Each time the model is modified, its plot and minimization status are updated.
The button “Conf.Int.” below the plot panel (Fig. 2) determines 95% confidence intervals of the modeled rates. The running time of this procedure varies according to the analysis framework, the number of time points, and the complexity of the model. The analysis framework depends on the settings of INSPEcT at the time of generation of the loaded dataset. In general, the quantification of confidence intervals is faster in the Derivative framework, and slower in the Integrative (in the worse case, up to few minutes). Confidence intervals are then used to assess the variability of each rate, testing the null hypothesis that the rate profile can be fit by a constant model. Test p-values are reported into the y-axis labels of the corresponding rate. Every change in the model requires a fresh computation of confidence intervals, for this reason, especially in the integrative framework, it is suggested to compute confidence intervals when the model is considered definitive.
The procedure explained in the previous sections mostly refers to the interaction with experimental data. Nonetheless, the software is designed also to test hypothesis independently from experiments. This can be easily achieved by choosing “User defined (No input)” in “Select input” (Fig. 2). In this mode, neither the row or smooth experimental data, nor the minimization status are represented. The user is thus free to set the functional form (and the corresponding parameters) for each RNA kinetic rate, directly assessing the impact on premature and mature RNA dynamics. In the following subsections, we report some case studies to exemplify the role of the RNA kinetic rates in the definition of premature and mature RNA dynamics.
The combination of kinetic rates determines the abundance of mature and premature RNA species. Specifically, the ratio between the rates of synthesis and processing sets premature RNA levels, and the ratio between synthesis and degradation sets the mature ones. When no perturbations occur, this long standing condition is called steady state. (Set rates to: synthesis = constant{10}; processing = constant{20}; degradation = constant{2})
Figure 6: Case study 1: constant kinetic rates
As a consequence of this, steady state premature RNA levels are independent from degradation rates, and (more importantly and less intuitively) steady state mature RNA levels are independent from processing rates.
As mentioned above, steady state mature RNA levels are independent from processing rates. Nonetheless, a perturbation in the processing dynamics leads to a transient variation of mature RNA levels. Conversely, it produces a lasting effect on premature RNA levels. (Set rates to: synthesis = constant{10}; processing = sigmoidal{5,20,8,1}; degradation = constant{2})
Figure 7: Case study 2: modulation of processing rate
Noteworthy, a reduction in the rate of degradation reduces the amplitude of mature RNA perturbation. (Set degradation = constant{0.5})
A modulation in the synthesis rate determines a similar modulation for premature and mature RNAs (in terms of fold change compared to the untreated condition). (Set rates to: synthesis = sigmoidal{10,20,8,1}; processing = constant{20}; degradation = constant{2})
Figure 8: Case study 3: modulation of synthesis rate
Noteworthy, the time-lag between the modulation of synthesis and the response of premature RNA is inversely proportional to the processing rate (Set processing = constant{10}). Following the same logic, the time-lag between the reponse of premature and mature RNAs is inversely proportional to the degradation rate (Set degradation = constant{0.5}).
When synthesis and degradation rates are modulated in the same direction, they apply an opposite effect on mature RNA levels, leaving trace of the transcriptional regulation only on premature RNA levels (a similar response has been observed in yeast, during heat shock responses). (Set rates to: synthesis = sigmoidal{10,20,8,1}; processing = constant{20}; degradation = sigmoidal{0.25,0.5,8,1}). Rather, the opposite modulation of synthesis and degradation reinforces the modulation of mature RNA. (Set degradation = sigmoidal{0.5,0.25,8,1})
Figure 9: Case study 4: concomitant modulation of synthesis and degradation rates
sessionInfo()## R version 4.0.2 (2020-06-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.11-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.11-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2
##  [2] GenomicFeatures_1.40.1                 
##  [3] AnnotationDbi_1.50.3                   
##  [4] GenomicRanges_1.40.0                   
##  [5] GenomeInfoDb_1.24.2                    
##  [6] IRanges_2.22.2                         
##  [7] S4Vectors_0.26.1                       
##  [8] INSPEcT_1.18.1                         
##  [9] BiocParallel_1.22.0                    
## [10] Biobase_2.48.0                         
## [11] BiocGenerics_0.34.0                    
## [12] BiocStyle_2.16.1                       
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-6                matrixStats_0.57.0         
##  [3] bit64_4.0.5                 RColorBrewer_1.1-2         
##  [5] progress_1.2.2              httr_1.4.2                 
##  [7] tools_4.0.2                 R6_2.4.1                   
##  [9] KernSmooth_2.23-17          DBI_1.1.0                  
## [11] colorspace_1.4-1            tidyselect_1.1.0           
## [13] prettyunits_1.1.1           DESeq2_1.28.1              
## [15] bit_4.0.4                   curl_4.3                   
## [17] compiler_4.0.2              DelayedArray_0.14.1        
## [19] rtracklayer_1.48.0          bookdown_0.20              
## [21] scales_1.1.1                genefilter_1.70.0          
## [23] askpass_1.1                 rappdirs_0.3.1             
## [25] stringr_1.4.0               digest_0.6.25              
## [27] Rsamtools_2.4.0             rmarkdown_2.4              
## [29] XVector_0.28.0              pkgconfig_2.0.3            
## [31] htmltools_0.5.0             highr_0.8                  
## [33] fastmap_1.0.1               dbplyr_1.4.4               
## [35] rlang_0.4.7                 RSQLite_2.2.1              
## [37] shiny_1.5.0                 generics_0.0.2             
## [39] gtools_3.8.2                dplyr_1.0.2                
## [41] RCurl_1.98-1.2              magrittr_1.5               
## [43] GenomeInfoDbData_1.2.3      Matrix_1.2-18              
## [45] Rcpp_1.0.5                  munsell_0.5.0              
## [47] lifecycle_0.2.0             stringi_1.5.3              
## [49] pROC_1.16.2                 yaml_2.2.1                 
## [51] plgem_1.60.0                rootSolve_1.8.2.1          
## [53] MASS_7.3-53                 SummarizedExperiment_1.18.2
## [55] zlibbioc_1.34.0             plyr_1.8.6                 
## [57] BiocFileCache_1.12.1        grid_4.0.2                 
## [59] blob_1.2.1                  gdata_2.18.0               
## [61] promises_1.1.1              crayon_1.3.4               
## [63] lattice_0.20-41             Biostrings_2.56.0          
## [65] splines_4.0.2               annotate_1.66.0            
## [67] hms_0.5.3                   magick_2.4.0               
## [69] locfit_1.5-9.4              knitr_1.30                 
## [71] pillar_1.4.6                geneplotter_1.66.0         
## [73] biomaRt_2.44.1              XML_3.99-0.5               
## [75] glue_1.4.2                  evaluate_0.14              
## [77] BiocManager_1.30.10         deSolve_1.28               
## [79] vctrs_0.3.4                 httpuv_1.5.4               
## [81] gtable_0.3.0                openssl_1.4.3              
## [83] purrr_0.3.4                 assertthat_0.2.1           
## [85] ggplot2_3.3.2               xfun_0.18                  
## [87] mime_0.9                    xtable_1.8-4               
## [89] later_1.1.0.1               survival_3.2-7             
## [91] tibble_3.0.3                GenomicAlignments_1.24.0   
## [93] memoise_1.1.0               ellipsis_0.3.1