First load the package. We also load several other packages to help quickly explore the data.
library(getTBinR)
library(ggplot2)
library(knitr)
library(magrittr)
library(dplyr)
Get TB burden data with a single function call. This will download the data if it has never been accessed and then save a local copy to R’s temporary directory (see tempdir()
). If a local copy exists from the current seesion then this will be loaded instead.
tb_burden <- get_tb_burden(download_data = TRUE, save = TRUE)
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=estimates
#> Saving data to: /tmp/RtmpdtvapX/TB_burden.rds
tb_burden
#> # A tibble: 3,651 x 71
#> count… iso2 iso3 iso_n… g_whor… year e_pop_… e_in… e_in… e_in… e_in…
#> <chr> <chr> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <int>
#> 1 Afgha… AF AFG 004 Easter… 2000 2.01e⁷ 190 123 271 38000
#> 2 Afgha… AF AFG 004 Easter… 2001 2.10e⁷ 189 123 271 40000
#> 3 Afgha… AF AFG 004 Easter… 2002 2.20e⁷ 189 122 270 42000
#> 4 Afgha… AF AFG 004 Easter… 2003 2.31e⁷ 189 122 270 44000
#> 5 Afgha… AF AFG 004 Easter… 2004 2.41e⁷ 189 122 270 46000
#> 6 Afgha… AF AFG 004 Easter… 2005 2.51e⁷ 189 122 270 47000
#> 7 Afgha… AF AFG 004 Easter… 2006 2.59e⁷ 189 122 270 49000
#> 8 Afgha… AF AFG 004 Easter… 2007 2.66e⁷ 189 122 270 50000
#> 9 Afgha… AF AFG 004 Easter… 2008 2.73e⁷ 189 122 270 52000
#> 10 Afgha… AF AFG 004 Easter… 2009 2.80e⁷ 189 123 270 53000
#> # ... with 3,641 more rows, and 60 more variables: e_inc_num_lo <int>,
#> # e_inc_num_hi <int>, e_inc_num_f014 <int>, e_inc_num_f014_lo <int>,
#> # e_inc_num_f014_hi <int>, e_inc_num_f15plus <int>,
#> # e_inc_num_f15plus_lo <int>, e_inc_num_f15plus_hi <int>,
#> # e_inc_num_f <int>, e_inc_num_f_lo <int>, e_inc_num_f_hi <int>,
#> # e_inc_num_m014 <int>, e_inc_num_m014_lo <int>,
#> # e_inc_num_m014_hi <int>, e_inc_num_m15plus <int>,
#> # e_inc_num_m15plus_lo <int>, e_inc_num_m15plus_hi <int>,
#> # e_inc_num_m <int>, e_inc_num_m_lo <int>, e_inc_num_m_hi <int>,
#> # e_inc_num_014 <int>, e_inc_num_014_lo <int>, e_inc_num_014_hi <int>,
#> # e_inc_num_15plus <int>, e_inc_num_15plus_lo <int>,
#> # e_inc_num_15plus_hi <int>, e_tbhiv_prct <dbl>, e_tbhiv_prct_lo <dbl>,
#> # e_tbhiv_prct_hi <dbl>, e_inc_tbhiv_100k <dbl>,
#> # e_inc_tbhiv_100k_lo <dbl>, e_inc_tbhiv_100k_hi <dbl>,
#> # e_inc_tbhiv_num <int>, e_inc_tbhiv_num_lo <int>,
#> # e_inc_tbhiv_num_hi <int>, e_mort_exc_tbhiv_100k <dbl>,
#> # e_mort_exc_tbhiv_100k_lo <dbl>, e_mort_exc_tbhiv_100k_hi <dbl>,
#> # e_mort_exc_tbhiv_num <int>, e_mort_exc_tbhiv_num_lo <int>,
#> # e_mort_exc_tbhiv_num_hi <int>, e_mort_tbhiv_100k <dbl>,
#> # e_mort_tbhiv_100k_lo <dbl>, e_mort_tbhiv_100k_hi <dbl>,
#> # e_mort_tbhiv_num <int>, e_mort_tbhiv_num_lo <int>,
#> # e_mort_tbhiv_num_hi <int>, e_mort_100k <dbl>, e_mort_100k_lo <dbl>,
#> # e_mort_100k_hi <dbl>, e_mort_num <int>, e_mort_num_lo <int>,
#> # e_mort_num_hi <int>, cfr <dbl>, cfr_lo <dbl>, cfr_hi <dbl>,
#> # c_newinc_100k <dbl>, c_cdr <dbl>, c_cdr_lo <dbl>, c_cdr_hi <dbl>
The WHO provides a large, detailed, data dictionary for use with the TB burden data. However, searching through this dataset can be tedious. To streamline this process getTBinR
provides a search function to find the definition of a single or multiple variables. Again if not previously used this function will download the data dictionary to the temporary directory, but in subsequent uses will load a local copy.
vars_of_interest <- search_data_dict(var = c("country",
"e_inc_100k",
"e_inc_100k_lo",
"e_inc_100k_hi"),
download_data = TRUE,
save = TRUE)
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=dictionary
#> Saving data to: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 4 results found for your variable search for country, e_inc_100k, e_inc_100k_lo, e_inc_100k_hi
knitr::kable(vars_of_interest)
variable_name | dataset | code_list | definition |
---|---|---|---|
country | Country identification | Country or territory name | |
e_inc_100k | Estimates | Estimated incidence (all forms) per 100 000 population | |
e_inc_100k_hi | Estimates | Estimated incidence (all forms) per 100 000 population, high bound | |
e_inc_100k_lo | Estimates | Estimated incidence (all forms) per 100 000 population, low bound |
We might also want to search the variable definitions for key phrases, for example mortality.
defs_of_interest <- search_data_dict(def = c("mortality"))
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 6 results found for your definition search for mortality
knitr::kable(defs_of_interest)
variable_name | dataset | code_list | definition |
---|---|---|---|
e_mort_exc_tbhiv_100k | Estimates | Estimated mortality of TB cases (all forms, excluding HIV) per 100 000 population | |
e_mort_exc_tbhiv_100k_hi | Estimates | Estimated mortality of TB cases (all forms, excluding HIV), per 100 000 population, high bound | |
e_mort_exc_tbhiv_100k_lo | Estimates | Estimated mortality of TB cases (all forms, excluding HIV), per 100 000 population, low bound | |
e_mort_tbhiv_100k | Estimates | Estimated mortality of TB cases who are HIV-positive, per 100 000 population | |
e_mort_tbhiv_100k_hi | Estimates | Estimated mortality of TB cases who are HIV-positive, per 100 000 population, high bound | |
e_mort_tbhiv_100k_lo | Estimates | Estimated mortality of TB cases who are HIV-positive, per 100 000 population, low bound |
Finally we could both search for a known variable and for key phrases in variable definitions.
vars_defs_of_interest <- search_data_dict(var = c("country"),
def = c("mortality"))
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 1 results found for your variable search for country
#> 6 results found for your definition search for mortality
knitr::kable(vars_defs_of_interest)
variable_name | dataset | code_list | definition |
---|---|---|---|
country | Country identification | Country or territory name | |
e_mort_exc_tbhiv_100k | Estimates | Estimated mortality of TB cases (all forms, excluding HIV) per 100 000 population | |
e_mort_exc_tbhiv_100k_hi | Estimates | Estimated mortality of TB cases (all forms, excluding HIV), per 100 000 population, high bound | |
e_mort_exc_tbhiv_100k_lo | Estimates | Estimated mortality of TB cases (all forms, excluding HIV), per 100 000 population, low bound | |
e_mort_tbhiv_100k | Estimates | Estimated mortality of TB cases who are HIV-positive, per 100 000 population | |
e_mort_tbhiv_100k_hi | Estimates | Estimated mortality of TB cases who are HIV-positive, per 100 000 population, high bound | |
e_mort_tbhiv_100k_lo | Estimates | Estimated mortality of TB cases who are HIV-positive, per 100 000 population, low bound |
To start exploring the WHO TB data we map global TB incidence rates in 2016. Mapping data can help identify spatial patterns.
getTBinR::map_tb_burden(metric = "e_inc_100k",
year = 2016)
#> Loading data from: /tmp/RtmpdtvapX/TB_burden.rds
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 1 results found for your variable search for e_inc_100k
To showcase how quickly we can go from no data to plotting informative graphs we quickly explore incidence rates for all countries in the WHO data.
getTBinR::plot_tb_burden_overview(metric = "e_inc_100k",
interactive = FALSE)
#> Loading data from: /tmp/RtmpdtvapX/TB_burden.rds
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 1 results found for your variable search for e_inc_100k
Now that we have the data lets plot a sample of 9 countries using the inbuilt plot_tb_burden
function. Again plotting incidence rates, but this time with 95% confidence intervals. As you can see this isnt a hugely informative graph. Lets improve it!
## Take a random sample of countries
sample_countries <- sample(unique(tb_burden$country), 9)
plot_tb_burden(tb_burden, metric = "e_inc_100k",
countries = sample_countries)
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 1 results found for your variable search for e_inc_100k
We have faceted by country so that we can more easily see what is going on. This allows us to easily explore between country variation - depending on the sample there is likely to be alot of this.
plot_tb_burden(tb_burden, metric = "e_inc_100k",
countries = sample_countries,
facet = "country")
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 1 results found for your variable search for e_inc_100k
To explore within country variation we need to change the scale of the y axis.
plot_tb_burden(tb_burden, metric = "e_inc_100k",
countries = sample_countries,
facet = "country",
scales = "free_y")
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 1 results found for your variable search for e_inc_100k
We might also be interested in mortality in both HIV negative and HIV positive cases in our sample countries. We can also look at this using plot_tb_burden
as follows. Note we can do this without specifying the TB burden data, the plotting function will automatically find it either locally or remotely.
plot_tb_burden(metric = "e_mort_exc_tbhiv_100k",
countries = sample_countries,
facet = "country",
scales = "free_y")
#> Loading data from: /tmp/RtmpdtvapX/TB_burden.rds
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 1 results found for your variable search for e_mort_exc_tbhiv_100k
plot_tb_burden(metric = "e_mort_tbhiv_100k",
countries = sample_countries,
facet = "country",
scales = "free_y")
#> Loading data from: /tmp/RtmpdtvapX/TB_burden.rds
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 1 results found for your variable search for e_mort_tbhiv_100k