Getting started with getTBinR

Sam Abbott

2018-01-04

Using the package

First load the package. We also load several other packages to help quickly explore the data.

library(getTBinR)
library(ggplot2)
library(knitr)
library(magrittr)
library(dplyr)

Getting TB burden data

Get TB burden data with a single function call. This will download the data if it has never been accessed and then save a local copy to R’s temporary directory (see tempdir()). If a local copy exists from the current seesion then this will be loaded instead.

tb_burden <- get_tb_burden(download_data = TRUE, save = TRUE)
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=estimates
#> Saving data to: /tmp/RtmpdtvapX/TB_burden.rds

tb_burden
#> # A tibble: 3,651 x 71
#>    count… iso2  iso3  iso_n… g_whor…  year e_pop_… e_in… e_in… e_in… e_in…
#>    <chr>  <chr> <chr> <chr>  <chr>   <int>   <int> <dbl> <dbl> <dbl> <int>
#>  1 Afgha… AF    AFG   004    Easter…  2000  2.01e⁷   190   123   271 38000
#>  2 Afgha… AF    AFG   004    Easter…  2001  2.10e⁷   189   123   271 40000
#>  3 Afgha… AF    AFG   004    Easter…  2002  2.20e⁷   189   122   270 42000
#>  4 Afgha… AF    AFG   004    Easter…  2003  2.31e⁷   189   122   270 44000
#>  5 Afgha… AF    AFG   004    Easter…  2004  2.41e⁷   189   122   270 46000
#>  6 Afgha… AF    AFG   004    Easter…  2005  2.51e⁷   189   122   270 47000
#>  7 Afgha… AF    AFG   004    Easter…  2006  2.59e⁷   189   122   270 49000
#>  8 Afgha… AF    AFG   004    Easter…  2007  2.66e⁷   189   122   270 50000
#>  9 Afgha… AF    AFG   004    Easter…  2008  2.73e⁷   189   122   270 52000
#> 10 Afgha… AF    AFG   004    Easter…  2009  2.80e⁷   189   123   270 53000
#> # ... with 3,641 more rows, and 60 more variables: e_inc_num_lo <int>,
#> #   e_inc_num_hi <int>, e_inc_num_f014 <int>, e_inc_num_f014_lo <int>,
#> #   e_inc_num_f014_hi <int>, e_inc_num_f15plus <int>,
#> #   e_inc_num_f15plus_lo <int>, e_inc_num_f15plus_hi <int>,
#> #   e_inc_num_f <int>, e_inc_num_f_lo <int>, e_inc_num_f_hi <int>,
#> #   e_inc_num_m014 <int>, e_inc_num_m014_lo <int>,
#> #   e_inc_num_m014_hi <int>, e_inc_num_m15plus <int>,
#> #   e_inc_num_m15plus_lo <int>, e_inc_num_m15plus_hi <int>,
#> #   e_inc_num_m <int>, e_inc_num_m_lo <int>, e_inc_num_m_hi <int>,
#> #   e_inc_num_014 <int>, e_inc_num_014_lo <int>, e_inc_num_014_hi <int>,
#> #   e_inc_num_15plus <int>, e_inc_num_15plus_lo <int>,
#> #   e_inc_num_15plus_hi <int>, e_tbhiv_prct <dbl>, e_tbhiv_prct_lo <dbl>,
#> #   e_tbhiv_prct_hi <dbl>, e_inc_tbhiv_100k <dbl>,
#> #   e_inc_tbhiv_100k_lo <dbl>, e_inc_tbhiv_100k_hi <dbl>,
#> #   e_inc_tbhiv_num <int>, e_inc_tbhiv_num_lo <int>,
#> #   e_inc_tbhiv_num_hi <int>, e_mort_exc_tbhiv_100k <dbl>,
#> #   e_mort_exc_tbhiv_100k_lo <dbl>, e_mort_exc_tbhiv_100k_hi <dbl>,
#> #   e_mort_exc_tbhiv_num <int>, e_mort_exc_tbhiv_num_lo <int>,
#> #   e_mort_exc_tbhiv_num_hi <int>, e_mort_tbhiv_100k <dbl>,
#> #   e_mort_tbhiv_100k_lo <dbl>, e_mort_tbhiv_100k_hi <dbl>,
#> #   e_mort_tbhiv_num <int>, e_mort_tbhiv_num_lo <int>,
#> #   e_mort_tbhiv_num_hi <int>, e_mort_100k <dbl>, e_mort_100k_lo <dbl>,
#> #   e_mort_100k_hi <dbl>, e_mort_num <int>, e_mort_num_lo <int>,
#> #   e_mort_num_hi <int>, cfr <dbl>, cfr_lo <dbl>, cfr_hi <dbl>,
#> #   c_newinc_100k <dbl>, c_cdr <dbl>, c_cdr_lo <dbl>, c_cdr_hi <dbl>

Searching for variable definitions

The WHO provides a large, detailed, data dictionary for use with the TB burden data. However, searching through this dataset can be tedious. To streamline this process getTBinR provides a search function to find the definition of a single or multiple variables. Again if not previously used this function will download the data dictionary to the temporary directory, but in subsequent uses will load a local copy.

vars_of_interest <- search_data_dict(var = c("country",
                                             "e_inc_100k",
                                             "e_inc_100k_lo",
                                             "e_inc_100k_hi"),
                                     download_data = TRUE, 
                                     save = TRUE)
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=dictionary
#> Saving data to: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 4 results found for your variable search for country, e_inc_100k, e_inc_100k_lo, e_inc_100k_hi

knitr::kable(vars_of_interest)
variable_name dataset code_list definition
country Country identification Country or territory name
e_inc_100k Estimates Estimated incidence (all forms) per 100 000 population
e_inc_100k_hi Estimates Estimated incidence (all forms) per 100 000 population, high bound
e_inc_100k_lo Estimates Estimated incidence (all forms) per 100 000 population, low bound

We might also want to search the variable definitions for key phrases, for example mortality.

defs_of_interest <- search_data_dict(def = c("mortality"))
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 6 results found for your definition search for mortality

knitr::kable(defs_of_interest)
variable_name dataset code_list definition
e_mort_exc_tbhiv_100k Estimates Estimated mortality of TB cases (all forms, excluding HIV) per 100 000 population
e_mort_exc_tbhiv_100k_hi Estimates Estimated mortality of TB cases (all forms, excluding HIV), per 100 000 population, high bound
e_mort_exc_tbhiv_100k_lo Estimates Estimated mortality of TB cases (all forms, excluding HIV), per 100 000 population, low bound
e_mort_tbhiv_100k Estimates Estimated mortality of TB cases who are HIV-positive, per 100 000 population
e_mort_tbhiv_100k_hi Estimates Estimated mortality of TB cases who are HIV-positive, per 100 000 population, high bound
e_mort_tbhiv_100k_lo Estimates Estimated mortality of TB cases who are HIV-positive, per 100 000 population, low bound

Finally we could both search for a known variable and for key phrases in variable definitions.

vars_defs_of_interest <- search_data_dict(var = c("country"),
                                     def = c("mortality"))
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 1 results found for your variable search for country
#> 6 results found for your definition search for mortality

knitr::kable(vars_defs_of_interest)
variable_name dataset code_list definition
country Country identification Country or territory name
e_mort_exc_tbhiv_100k Estimates Estimated mortality of TB cases (all forms, excluding HIV) per 100 000 population
e_mort_exc_tbhiv_100k_hi Estimates Estimated mortality of TB cases (all forms, excluding HIV), per 100 000 population, high bound
e_mort_exc_tbhiv_100k_lo Estimates Estimated mortality of TB cases (all forms, excluding HIV), per 100 000 population, low bound
e_mort_tbhiv_100k Estimates Estimated mortality of TB cases who are HIV-positive, per 100 000 population
e_mort_tbhiv_100k_hi Estimates Estimated mortality of TB cases who are HIV-positive, per 100 000 population, high bound
e_mort_tbhiv_100k_lo Estimates Estimated mortality of TB cases who are HIV-positive, per 100 000 population, low bound

Mapping Global Incidence Rates

To start exploring the WHO TB data we map global TB incidence rates in 2016. Mapping data can help identify spatial patterns.

getTBinR::map_tb_burden(metric = "e_inc_100k",
                        year = 2016)
#> Loading data from: /tmp/RtmpdtvapX/TB_burden.rds
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 1 results found for your variable search for e_inc_100k

Plotting Incidence Rates for All Countries

To showcase how quickly we can go from no data to plotting informative graphs we quickly explore incidence rates for all countries in the WHO data.

getTBinR::plot_tb_burden_overview(metric = "e_inc_100k",
                                  interactive = FALSE)
#> Loading data from: /tmp/RtmpdtvapX/TB_burden.rds
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 1 results found for your variable search for e_inc_100k

Plotting Incidence Rates over Time in 9 Randomly Sampled Countries

Now that we have the data lets plot a sample of 9 countries using the inbuilt plot_tb_burden function. Again plotting incidence rates, but this time with 95% confidence intervals. As you can see this isnt a hugely informative graph. Lets improve it!

## Take a random sample of countries
sample_countries <- sample(unique(tb_burden$country), 9)
plot_tb_burden(tb_burden, metric = "e_inc_100k",
               countries = sample_countries)
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 1 results found for your variable search for e_inc_100k

We have faceted by country so that we can more easily see what is going on. This allows us to easily explore between country variation - depending on the sample there is likely to be alot of this.

plot_tb_burden(tb_burden, metric = "e_inc_100k",
               countries = sample_countries,
               facet = "country")
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 1 results found for your variable search for e_inc_100k

To explore within country variation we need to change the scale of the y axis.

plot_tb_burden(tb_burden, metric = "e_inc_100k",
               countries = sample_countries,
               facet = "country",
               scales = "free_y")
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 1 results found for your variable search for e_inc_100k

We might also be interested in mortality in both HIV negative and HIV positive cases in our sample countries. We can also look at this using plot_tb_burden as follows. Note we can do this without specifying the TB burden data, the plotting function will automatically find it either locally or remotely.

plot_tb_burden(metric = "e_mort_exc_tbhiv_100k",
               countries = sample_countries,
               facet = "country",
               scales = "free_y")
#> Loading data from: /tmp/RtmpdtvapX/TB_burden.rds
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 1 results found for your variable search for e_mort_exc_tbhiv_100k

plot_tb_burden(metric = "e_mort_tbhiv_100k",
               countries = sample_countries,
               facet = "country",
               scales = "free_y")
#> Loading data from: /tmp/RtmpdtvapX/TB_burden.rds
#> Loading data from: /tmp/RtmpdtvapX/TB_data_dict.rds
#> 1 results found for your variable search for e_mort_tbhiv_100k