hpar 1.22.2
From the Human Protein Atlas (Uhlén et al. 2005; Uhlen et al. 2010) site:
The Swedish Human Protein Atlas project, funded by the Knut and Alice Wallenberg Foundation, has been set up to allow for a systematic exploration of the human proteome using Antibody-Based Proteomics. This is accomplished by combining high-throughput generation of affinity-purified antibodies with protein profiling in a multitude of tissues and cells assembled in tissue microarrays. Confocal microscopy analysis using human cell lines is performed for more detailed protein localisation. The program hosts the Human Protein Atlas portal with expression profiles of human proteins in tissues and cells.
The hpar package provides access to HPA data from the R interface. It also distributes the following data sets:
hpaNormalTissue Normal tissue data: Expression profiles for
proteins in human tissues based on immunohistochemisty using tissue
micro arrays. The comma-separated file includes Ensembl gene
identifier (“Gene”), tissue name (“Tissue”), annotated cell type
(“Cell type”), expression value (“Level”), the type of annotation
(annotated protein expression (APE), based on more than one
antibody, or staining, based on one antibody only) (“Expression
type”), and the reliability or validation of the expression value
(“Reliability”).}
hpaCancer Cancer tumor data: Staining profiles for proteins in
human tumor tissue based on immunohistochemisty using tissue micro
arrays. The comma-separated file includes Ensembl gene identifier
(“Gene”), tumor name (“Tumor”), staining value (“Level”), the number
of patients that stain for this staining value (“Count patients”),
the total amount of patients for this tumor type (“Total patients”)
and the type of annotation staining (“Expression type”). }
rnaGeneTissue RNA gene data: RNA levels in 45 cell lines and 32
tissues based on RNA-seq. The comma-separated file includes Ensembl
gene identifier (“Gene”), analysed sample (“Sample”), fragments per
kilobase of transcript per million fragments mapped (“Value” and
“Unit”), and abundance class (“Abundance”). }
rnaGeneCellLine RNA gene data: RNA levels in 45 cell lines and
32 tissues based on RNA-seq. The comma-separated file includes
Ensembl gene identifier (“Gene”), analysed sample (“Sample”),
fragments per kilobase of transcript per million fragments mapped
(“Value” and “Unit”), and abundance class (“Abundance”). }
hpaSubcellularLoc Subcellular location data: Subcellular
localization of proteins based on immunofluorescently stained
cells. The comma-separated file includes Ensembl gene identifier
(“Gene”), main subcellular location of the protein (“Main
location”), other locations (“Other location”), the type of
annotation (annotated protein expression (APE), based on more than
one antibody, or staining, based on one antibody only) (“Expression
type”), and the reliability or validation of the expression value
(“Reliability”). }
hpaSubcellularLoc14 and *16.1: Same as above, for versions 14
and 16.1.
The use of data and images from the HPA in publications and presentations is permitted provided that the following conditions are met:
hpar is available through the Bioconductor project. Details about the package and the installation procedure can be found on its landing page. To install using the dedicated Bioconductor infrastructure, run :
source("http://bioconductor.org/biocLite.R")
## or, if you have already used the above before
library("BiocInstaller") ## and to install the package
biocLite("hpar")After installation, hpar will have to be explicitly loaded with
library("hpar")## This is hpar version 1.22.2,
## based on the Human Protein Atlas
##   Version: 18
##   Release data: 2017.12.01
##   Ensembl build: 88.38
## See '?hpar' or 'vignette('hpar')' for details.so that all the package’s functionality and data is available to the user.
The data sets described above can be loaded with the data function,
as illustrated below for hpaNormalTissue below. Each data set is a
data.frame and can be easily manipulated using standard R
functionality. The code chunk below illustrates some of its
properties.
data(hpaNormalTissue)
dim(hpaNormalTissue)## [1] 1053330       6names(hpaNormalTissue)## [1] "Gene"        "Gene.name"   "Tissue"      "Cell.type"   "Level"      
## [6] "Reliability"## Number of genes
length(unique(hpaNormalTissue$Gene))## [1] 13206## Number of cell types
length(unique(hpaNormalTissue$Cell.type))## [1] 82head(levels(hpaNormalTissue$Cell.type))## [1] "adipocytes"                  "bile duct cells"            
## [3] "cells in anterior "          "cells in cortex/medulla"    
## [5] "cells in cuticle"            "cells in endometrial stroma"## Number of tissues
length(unique(hpaNormalTissue$Tissue))## [1] 58head(levels(hpaNormalTissue$Tissue))## [1] "adrenal gland" "appendix"      "bone marrow"   "breast"       
## [5] "bronchus"      "caudate"The package provides a interface to the HPA data. The getHpa allows
to query the data sets described above. It takes three arguments,
id, hpadata and type, that control the query, what data set to
interrogate and how to report results respectively. The HPA data uses
Ensembl gene identifiers and id must be a valid
identifier. hpadata must be one of available dataset. type can be
either "data" or "details". The former is the default and returns
a data.frame containing the information relevant to id. It is also
possible to obtained detailed information, (including cell images) as
web pages, directly from the HPA web page, using "details".
We will illustrate this functionality with using the TSPAN6 (tetraspanin 6) gene (ENSG00000000003) as example.
id <- "ENSG00000000003"
head(getHpa(id, hpadata = "hpaNormalTissue"))##              Gene Gene.name        Tissue           Cell.type        Level
## 1 ENSG00000000003    TSPAN6 adrenal gland     glandular cells Not detected
## 2 ENSG00000000003    TSPAN6      appendix     glandular cells       Medium
## 3 ENSG00000000003    TSPAN6      appendix     lymphoid tissue Not detected
## 4 ENSG00000000003    TSPAN6   bone marrow hematopoietic cells Not detected
## 5 ENSG00000000003    TSPAN6        breast          adipocytes Not detected
## 6 ENSG00000000003    TSPAN6        breast     glandular cells         High
##   Reliability
## 1    Approved
## 2    Approved
## 3    Approved
## 4    Approved
## 5    Approved
## 6    ApprovedgetHpa(id, hpadata = "hpaSubcellularLoc")##              Gene Gene.name Reliability Enhanced Supported Approved
## 1 ENSG00000000003    TSPAN6    Approved                     Cytosol
##   Uncertain Single.cell.variation.intensity Single.cell.variation.spatial
## 1                                                                        
##   Cell.cycle.dependency                GO.id
## 1                       Cytosol (GO:0005829)head(getHpa(id, hpadata = "rnaGeneCellLine"))##              Gene Gene.name    Sample Value Unit
## 1 ENSG00000000003    TSPAN6     A-431  27.8  TPM
## 2 ENSG00000000003    TSPAN6      A549  37.6  TPM
## 3 ENSG00000000003    TSPAN6      AF22 108.1  TPM
## 4 ENSG00000000003    TSPAN6    AN3-CA  51.8  TPM
## 5 ENSG00000000003    TSPAN6  ASC diff  32.3  TPM
## 6 ENSG00000000003    TSPAN6 ASC TERT1  17.7  TPMIf we ask for "detail", a browser page pointing to the
relevant page is open (see figure below)
getHpa(id, type = "details")The HPA web page for the tetraspanin 6 gene (ENSG00000000003).
If a user is interested specifically in one data set, it is possible
to set hpadata globally and omit it in getHpa. This is done by
setting the hpar options hpardata with the setHparOptions
function. The current default data set can be tested with
getHparOptions.
getHparOptions()## $hpar
## $hpar$hpadata
## [1] "hpaNormalTissue"setHparOptions(hpadata = "hpaSubcellularLoc")
getHparOptions()## $hpar
## $hpar$hpadata
## [1] "hpaSubcellularLoc"getHpa(id)##              Gene Gene.name Reliability Enhanced Supported Approved
## 1 ENSG00000000003    TSPAN6    Approved                     Cytosol
##   Uncertain Single.cell.variation.intensity Single.cell.variation.spatial
## 1                                                                        
##   Cell.cycle.dependency                GO.id
## 1                       Cytosol (GO:0005829)Information about the HPA release used to build the installed
hpar package can be accessed with getHpaVersion,
getHpaDate and getHpaEnsembl. Full release details can be found on
the HPA release history
page.
getHpaVersion()## version 
##    "18"getHpaDate()##         date 
## "2017.12.01"getHpaEnsembl()## ensembl 
## "88.38"Let’s compare the subcellular localisation annotation obtained from the HPA subcellular location data set and the information available in the Bioconductor annotation packages.
id <- "ENSG00000001460"
getHpa(id, "hpaSubcellularLoc")##              Gene Gene.name Reliability Enhanced Supported    Approved
## 8 ENSG00000001460     STPG1    Approved                    Nucleoplasm
##   Uncertain Single.cell.variation.intensity Single.cell.variation.spatial
## 8                                                                        
##   Cell.cycle.dependency                    GO.id
## 8                       Nucleoplasm (GO:0005654)Below, we first extract all cellular component GO terms available for
id from the org.Hs.eg.db human annotation and
then retrieve their term definitions using the GO.db
database.
library("org.Hs.eg.db")
library("GO.db")
ans <- select(org.Hs.eg.db, keys = id,
              columns = c("ENSEMBL", "GO", "ONTOLOGY"),
              keytype = "ENSEMBL")## 'select()' returned 1:many mapping between keys and columnsans <- ans[ans$ONTOLOGY == "CC", ]
ans##           ENSEMBL         GO EVIDENCE ONTOLOGY
## 2 ENSG00000001460 GO:0005622      IMP       CC
## 3 ENSG00000001460 GO:0005634      IEA       CC
## 4 ENSG00000001460 GO:0005739      IEA       CCsapply(as.list(GOTERM[ans$GO]), slot, "Term")##      GO:0005622      GO:0005634      GO:0005739 
## "intracellular"       "nucleus" "mitochondrion"## R version 3.5.0 (2018-04-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.4 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.7-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.7-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] hpar_1.22.2          GO.db_3.6.0          org.Hs.eg.db_3.6.0  
## [4] AnnotationDbi_1.42.1 IRanges_2.14.10      S4Vectors_0.18.2    
## [7] Biobase_2.40.0       BiocGenerics_0.26.0  BiocStyle_2.8.1     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.17    knitr_1.20      magrittr_1.5    bit_1.1-13     
##  [5] stringr_1.3.1   blob_1.1.1      tools_3.5.0     xfun_0.1       
##  [9] DBI_1.0.0       htmltools_0.3.6 yaml_2.1.19     bit64_0.9-7    
## [13] rprojroot_1.3-2 digest_0.6.15   bookdown_0.7    memoise_1.1.0  
## [17] evaluate_0.10.1 RSQLite_2.1.1   rmarkdown_1.9   stringi_1.2.2  
## [21] compiler_3.5.0  backports_1.1.2 pkgconfig_2.0.1Uhlen, Mathias, Per Oksvold, Linn Fagerberg, Emma Lundberg, Kalle Jonasson, Mattias Forsberg, Martin Zwahlen, et al. 2010. “Towards a knowledge-based Human Protein Atlas.” Nature Biotechnology 28 (12). Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.:1248–50. https://doi.org/10.1038/nbt1210-1248.
Uhlén, Mathias, Erik Björling, Charlotta Agaton, Cristina Al-Khalili A. Szigyarto, Bahram Amini, Elisabet Andersen, Ann-Catrin C. Andersson, et al. 2005. “A human protein atlas for normal and cancer tissues based on antibody proteomics.” Molecular & Cellular Proteomics : MCP 4 (12). Department of Biotechnology, AlbaNova University Center, Royal Institute of Technology (KTH), SE-106 91 Stockholm, Sweden. mathias.uhlen@biotech.kth.se:1920–32. https://doi.org/10.1074/mcp.M500279-MCP200.