Introduction to tidyquant

Matt Dancho

Bringing quantitative financial analysis to the tidyverse

Overview

tidyquant integrates the best quantitative resources for collecting and analyzing quantitative data, xts, quantmod and TTR, with the tidy data infrastructure of the tidyverse allowing for seamless interaction between each and working within the tidyverse.

The three primary quantitative packages that are the backbone for quantitative financial analysis in R programming are:

The tidy data principles are a cornerstone of data management and data modeling workflow. The foundation for tidy data management is the tidyverse, a collection of R packages: ggplot2, dplyr, tidyr, purrr, readr, tibble, that work in harmony, are built for scalability, and are well documented in R for Data Science. Using this infrastructure and the core tidy concepts, we can integrate the tidy data principles with the best quantitative financial analysis packages using the package, tidyquant.

Prerequisites

Load the tidyquant package to get started.

# Loads tidyquant, tidyverse, lubridate, xts, quantmod, TTR 
library(tidyquant)  

Benefits

The tidyquant philosophy:

A Few Core Functions with A Lot of Power

Minimizing the number of functions reduces the learning curve. Functions are grouped into verbs for efficient collection and manipulation of quantitative data:

Get Quantitative Data

The tq_get() function is used to collect all data by changing the get argument. The options include stock lists for 18 stock indexes from marketvolume.com, stock prices, dividends and splits from Yahoo Finance, financial statements from Google Finance, metal prices and exchange rates from Oanda, and economic data from the FRED database. To see the full list, execute tq_get_options().

tq_get_options()
## [1] "stock.prices"   "stock.index"    "dividends"      "splits"        
## [5] "financials"     "economic.data"  "exchange.rates" "metal.prices"

Stock Index:

A wide range of stock index / exchange lists can be retrieved using get = "stock.index". To get a full list of the options, use tq_get_stock_index_options().

tq_get_stock_index_options()
##  [1] "DOWJONES"    "DJI"         "DJT"         "DJU"         "SP100"      
##  [6] "SP400"       "SP500"       "SP600"       "RUSSELL1000" "RUSSELL2000"
## [11] "RUSSELL3000" "AMEX"        "AMEXGOLD"    "AMEXOIL"     "NASDAQ"     
## [16] "NASDAQ100"   "NYSE"        "SOX"

Set x as one of the options in the list of options above, and get = "stock.index" to get the desired stock index / exchange.

tq_get("sp500", get = "stock.index")
## # A tibble: 501 × 2
##    symbol                   company
##     <chr>                     <chr>
## 1     MMM                        3M
## 2     ABT       ABBOTT LABORATORIES
## 3    ABBV                ABBVIE INC
## 4     ACN                 ACCENTURE
## 5    ATVI       ACTIVISION BLIZZARD
## 6     AYI             ACUITY BRANDS
## 7    ADBE             ADOBE SYSTEMS
## 8     AAP        ADVANCE AUTO PARTS
## 9     AET                     AETNA
## 10    AMG AFFILIATED MANAGERS GROUP
## # ... with 491 more rows

The data source is www.marketvolume.com.

Stock Prices, Dividends and Splits:

The stock prices can be retrieved succinctly using get = "stock.prices".

aapl_prices  <- tq_get("AAPL", get = "stock.prices", from = " 1990-01-01")
aapl_prices 
## # A tibble: 6,805 × 7
##          date   open   high   low  close   volume adjusted
##        <date>  <dbl>  <dbl> <dbl>  <dbl>    <dbl>    <dbl>
## 1  1990-01-02 35.250 37.500 35.00 37.250 45799600 1.132075
## 2  1990-01-03 38.000 38.000 37.50 37.500 51998800 1.139673
## 3  1990-01-04 38.250 38.750 37.25 37.625 55378400 1.143471
## 4  1990-01-05 37.750 38.250 37.00 37.750 30828000 1.147270
## 5  1990-01-08 37.500 38.000 37.00 38.000 25393200 1.154868
## 6  1990-01-09 38.000 38.000 37.00 37.625 21534800 1.143471
## 7  1990-01-10 37.625 37.625 35.75 36.000 49929600 1.094086
## 8  1990-01-11 36.250 36.250 34.50 34.500 52763200 1.048499
## 9  1990-01-12 34.250 34.750 33.75 34.500 42974400 1.048499
## 10 1990-01-15 34.500 35.750 34.25 34.250 40434800 1.040901
## # ... with 6,795 more rows

Dividends are obtained using get = "dividends".

aapl_divs <- tq_get("AAPL", get = "dividends", from = "1990-01-01")
aapl_divs
## # A tibble: 42 × 2
##          date dividends
##        <date>     <dbl>
## 1  1990-02-16   0.00393
## 2  1990-05-21   0.00393
## 3  1990-08-20   0.00393
## 4  1990-11-16   0.00429
## 5  1991-02-15   0.00429
## 6  1991-05-20   0.00429
## 7  1991-08-19   0.00429
## 8  1991-11-18   0.00429
## 9  1992-02-14   0.00429
## 10 1992-06-01   0.00429
## # ... with 32 more rows

Stock splits are obtained using get = "splits".

aapl_splits <- tq_get("AAPL", get = "splits", from = "1990-01-01")
aapl_splits
## # A tibble: 3 × 2
##         date    splits
##       <date>     <dbl>
## 1 2000-06-21 0.5000000
## 2 2005-02-28 0.5000000
## 3 2014-06-09 0.1428571

The data source is yahoo finance.

Financial Statements:

For any given stock, a total of six financials statements are retrieved as nested tibbles, one for each combination of statement type (Income Statement, Balance Sheet, and Cash Flow) and period (by annual and quarter).

fb_financials <- tq_get("FB", get = "financials")
fb_financials
## # A tibble: 3 × 3
##    type             annual            quarter
## * <chr>             <list>             <list>
## 1    BS <tibble [168 × 4]> <tibble [210 × 4]>
## 2    CF  <tibble [76 × 4]>  <tibble [76 × 4]>
## 3    IS <tibble [196 × 4]> <tibble [245 × 4]>

The statement information can be extracted by selecting (dplyr::select()) and filtering (dplyr::filter()) to the desired statement and unnesting (tidyr::unnest()) the results.

fb_financials %>%
    filter(type == "IS") %>%
    select(annual) %>%
    unnest()
## # A tibble: 196 × 4
##    group             category       date value
##    <int>                <chr>     <date> <dbl>
## 1      1              Revenue 2015-12-31 17928
## 2      1              Revenue 2014-12-31 12466
## 3      1              Revenue 2013-12-31  7872
## 4      1              Revenue 2012-12-31  5089
## 5      2 Other Revenue, Total 2015-12-31    NA
## 6      2 Other Revenue, Total 2014-12-31    NA
## 7      2 Other Revenue, Total 2013-12-31    NA
## 8      2 Other Revenue, Total 2012-12-31    NA
## 9      3        Total Revenue 2015-12-31 17928
## 10     3        Total Revenue 2014-12-31 12466
## # ... with 186 more rows

A slightly more powerful example is looking at all quarterly statements together. This is easy to do with unnest and spread from the tidyr package.

fb_financials %>%
    unnest(quarter) %>% 
    spread(key = date, value = value)
## # A tibble: 110 × 8
##     type group                         category `2015-09-30` `2015-12-31`
## *  <chr> <int>                            <chr>        <dbl>        <dbl>
## 1     BS     1               Cash & Equivalents         1621         2409
## 2     BS     2           Short Term Investments        11526        14322
## 3     BS     3  Cash and Short Term Investments        15834        18434
## 4     BS     4 Accounts Receivable - Trade, Net         2010         2559
## 5     BS     5              Receivables - Other           NA           NA
## 6     BS     6           Total Receivables, Net         2010         2559
## 7     BS     7                  Total Inventory           NA           NA
## 8     BS     8                 Prepaid Expenses         1295          659
## 9     BS     9      Other Current Assets, Total           NA           NA
## 10    BS    10             Total Current Assets        19139        21652
## # ... with 100 more rows, and 3 more variables: `2016-03-31` <dbl>,
## #   `2016-06-30` <dbl>, `2016-09-30` <dbl>

The data source is google finance.

Economic Data:

A wealth of economic data can be extracted from the Federal Reserve Economic Data (FRED) database. The WTI Crude Oil Prices are shown below.

wti_price_usd <- tq_get("DCOILWTICO", get = "economic.data")
wti_price_usd 
## # A tibble: 2,867 × 2
##          date price
##        <date> <dbl>
## 1  2006-01-02    NA
## 2  2006-01-03 63.11
## 3  2006-01-04 63.41
## 4  2006-01-05 62.81
## 5  2006-01-06 64.21
## 6  2006-01-09 63.56
## 7  2006-01-10 63.41
## 8  2006-01-11 63.91
## 9  2006-01-12 63.96
## 10 2006-01-13 63.86
## # ... with 2,857 more rows

The FRED contains literally over 10K data sets that are free to use. See the FRED categories to narrow down the data base and to get data codes.

Exchange Rates:

Exchange rates are entered as currency pairs using “/” notation (e.g "EUR/USD"), and by setting get = "exchange.rates".

eur_usd <- tq_get("EUR/USD", get = "exchange.rates", from = "2000-01-01")
eur_usd 
## # A tibble: 1,827 × 2
##          date exchange.rate
##        <date>         <dbl>
## 1  2011-12-31       1.29618
## 2  2012-01-01       1.29590
## 3  2012-01-02       1.29375
## 4  2012-01-03       1.30038
## 5  2012-01-04       1.30036
## 6  2012-01-05       1.28717
## 7  2012-01-06       1.27698
## 8  2012-01-07       1.27195
## 9  2012-01-08       1.27151
## 10 2012-01-09       1.27272
## # ... with 1,817 more rows

The data source is Oanda, and list of currencies to compare can be found on Oanda’s currency converter. It may make more sense to get this data from the FRED (See Economic Data) since the max period for Oanda is 5-years.

Metal Prices:

Metal prices are very similar to stock prices. Set get = "metal.prices" along with the appropriate commodity symbol (e.g. XAU (gold) , XAG (silver), XPD (palladium), or XPT (platinum)).

plat_price_eur <- tq_get("plat", get = "metal.prices", 
                         from = "2000-01-01", base.currency = "EUR")
plat_price_eur 
## # A tibble: 1,827 × 2
##          date   price
##        <date>   <dbl>
## 1  2011-12-31 1080.87
## 2  2012-01-01 1081.11
## 3  2012-01-02 1085.99
## 4  2012-01-03 1080.45
## 5  2012-01-04 1080.47
## 6  2012-01-05 1091.55
## 7  2012-01-06 1100.26
## 8  2012-01-07 1104.61
## 9  2012-01-08 1097.12
## 10 2012-01-09 1096.08
## # ... with 1,817 more rows

The data source is Oanda. It may make more sense to get this data from the FRED (See Economic Data) since the max period for Oanda is 5-years.

Transform and Mutate Quantitative Data

Transform and mutate functions enable the xts, quantmod and TTR functions to shine (see Leverage the Quantitative Power of xts, quantmod and TTR):

Transform Quantitative Data, tq_transform():

Transforms the results of tq_get(). The result is typically a different shape than the input (hence “transformed”), although this is not a requirement. An example is periodicity aggregation from daily to monthly.

fb_prices <- tq_get("FB") 
fb_prices %>%
    tq_transform(x_fun = OHLCV, transform_fun = to.monthly)
## # A tibble: 56 × 6
##        date  open  high   low close    volume
##       <chr> <dbl> <dbl> <dbl> <dbl>     <dbl>
## 1  May 2012 28.55 29.67 26.83 29.60 111639200
## 2  Jun 2012 31.92 31.99 30.76 31.10  19526900
## 3  Jul 2012 23.37 23.37 21.61 21.71  56179400
## 4  Aug 2012 18.68 18.70 18.03 18.06  58764200
## 5  Sep 2012 20.57 21.95 20.50 21.66  65486000
## 6  Oct 2012 20.82 21.50 20.73 21.11  99378200
## 7  Nov 2012 27.26 28.00 26.76 28.00 127049600
## 8  Dec 2012 26.20 26.99 26.11 26.62  60374500
## 9  Jan 2013 29.15 31.47 28.74 30.98 190744900
## 10 Feb 2013 26.84 27.30 26.34 27.25  83027800
## # ... with 46 more rows

Let’s go through what happened. x_fun is one of the various quantmod Open, High, Low, Close (OHLC) functions (see ?quantmod::OHLC). The function returns a column or set of columns from data that are passed to the transform_fun. In example above, OHLCV selects the full list of prices and volumes from data, and sends this to the transform function, to.monthly, which transforms the periodicity from daily to monthly. Additional arguments can be passed to the transform_fun by way of ....

Mutate Quantitative Data, tq_mutate():

Adds a column or set of columns to the tibble with the calculated attributes (hence the original tibble is returned, mutated with the additional columns). An example is getting the MACD from Cl (close price), which mutates the original input by adding MACD and Signal columns.

fb_prices %>%
    tq_mutate(x_fun = Cl, mutate_fun = MACD)
## # A tibble: 1,163 × 9
##          date  open  high   low close    volume adjusted  macd signal
##        <date> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl>
## 1  2012-05-18 42.05 45.00 38.00 38.23 573576400    38.23    NA     NA
## 2  2012-05-21 36.53 36.66 33.00 34.03 168192700    34.03    NA     NA
## 3  2012-05-22 32.61 33.59 30.94 31.00 101786600    31.00    NA     NA
## 4  2012-05-23 31.37 32.50 31.36 32.00  73600000    32.00    NA     NA
## 5  2012-05-24 32.95 33.21 31.77 33.03  50237200    33.03    NA     NA
## 6  2012-05-25 32.90 32.95 31.11 31.91  37149800    31.91    NA     NA
## 7  2012-05-29 31.48 31.69 28.65 28.84  78063400    28.84    NA     NA
## 8  2012-05-30 28.70 29.55 27.86 28.19  57267900    28.19    NA     NA
## 9  2012-05-31 28.55 29.67 26.83 29.60 111639200    29.60    NA     NA
## 10 2012-06-01 28.89 29.15 27.39 27.72  41855500    27.72    NA     NA
## # ... with 1,153 more rows

Note that a mutation can occur if, and only if, the mutation has the same structure of the original tibble. In other words, the calculation must have the same number of rows and row.names (or date fields), otherwise the mutation cannot be performed.

xy Variants, tq_transform_xy and tq_mutate_xy:

Enables working with:

  1. Transformation functions that require two primary inputs (e.g. EVWMA, VWAP, etc)
  2. Data that is not in OHLC format.

Transformation with two primary inputs:

EVWMA (exponential volume-weighted moving average) requires two inputs, price and volume, that are not in OHLC code format. To work with these columns, we can switch to the xy variants, tq_transform_xy() and tq_mutate_xy(). The only difference is instead of an x_fun argument, you use .x and .y arguments to pass the columns needed based on the transform_fun or mutate_fun documentation.

fb_prices %>%
    tq_mutate_xy(.x = close, .y = volume, mutate_fun = EVWMA)
## # A tibble: 1,163 × 8
##          date  open  high   low close    volume adjusted    V1
##        <date> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl> <dbl>
## 1  2012-05-18 42.05 45.00 38.00 38.23 573576400    38.23    NA
## 2  2012-05-21 36.53 36.66 33.00 34.03 168192700    34.03    NA
## 3  2012-05-22 32.61 33.59 30.94 31.00 101786600    31.00    NA
## 4  2012-05-23 31.37 32.50 31.36 32.00  73600000    32.00    NA
## 5  2012-05-24 32.95 33.21 31.77 33.03  50237200    33.03    NA
## 6  2012-05-25 32.90 32.95 31.11 31.91  37149800    31.91    NA
## 7  2012-05-29 31.48 31.69 28.65 28.84  78063400    28.84    NA
## 8  2012-05-30 28.70 29.55 27.86 28.19  57267900    28.19    NA
## 9  2012-05-31 28.55 29.67 26.83 29.60 111639200    29.60    NA
## 10 2012-06-01 28.89 29.15 27.39 27.72  41855500    27.72 27.72
## # ... with 1,153 more rows

Working with non-OHLC data:

Returns from FRED, Oanda, and other sources do not have open, high, low, close, and volume (OHLCV) format. The following example shows how to transform WTI Crude daily prices to monthly prices. Since we only have a single column to pass, set the .x = price and leave the .y = NULL. This sends the price column to the to.period transformation fuction.

wti_prices <- tq_get("DCOILWTICO", get = "economic.data") 
wti_prices %>%    
    tq_transform_xy(.x = price, transform_fun = to.period,
                    period = "months")
## # A tibble: 132 × 2
##          date price
##        <dttm> <dbl>
## 1  2006-01-31 67.86
## 2  2006-02-28 61.37
## 3  2006-03-31 66.25
## 4  2006-04-28 71.80
## 5  2006-05-31 71.42
## 6  2006-06-30 73.94
## 7  2006-07-31 74.56
## 8  2006-08-31 70.38
## 9  2006-09-29 62.90
## 10 2006-10-31 58.72
## # ... with 122 more rows

Coercing Time Series Objects To and From Tibble

Sometimes you want to work using a tibble and other times you want to work using a xts object. The as_tibble() and as_xts() functions are the key.

Coerce from time-series to tibble, as_tibble():

The tidyquant::as_tibble() function includes a preserve_row_names argument, which is useful when coercing one of the many time formats (e.g. xts, zoo, timeSeries, ts) or matrix objects that contain valuable information in the row names. This makes bridging the gap between the various quantitative analysis packages and the tidyverse much easier.

Let’s start with an xts object.

# Create xts object from a matrix
vals = matrix(c(500, 504, 503))
date = c("2016-01-01", "2016-01-02", "2016-01-03") 
rownames(vals) <- date
time_series_xts <- as_xts(vals)
time_series_xts
##            [,1]
## 2016-01-01  500
## 2016-01-02  504
## 2016-01-03  503

We can easily coerce to tibble by setting preserve_row_names = TRUE. Note the return column is row.names with class of character.

time_series_tbl <- as_tibble(time_series_xts, preserve_row_names = TRUE)
time_series_tbl
## # A tibble: 3 × 2
##    row.names    V1
##        <chr> <dbl>
## 1 2016-01-01   500
## 2 2016-01-02   504
## 3 2016-01-03   503

Converting to date is one extra step with lubridate.

time_series_tbl <- time_series_tbl %>%
    mutate(row.names = lubridate::ymd(row.names))
time_series_tbl
## # A tibble: 3 × 2
##    row.names    V1
##       <date> <dbl>
## 1 2016-01-01   500
## 2 2016-01-02   504
## 3 2016-01-03   503

Coerce from tibble to xts, as_xts():

We can convert back to xts with the tidyquant as_xts() function. Make sure to set the date column (date_col) argument to the column name containing the date (date_col = row.names). The date column must be in a date format (inherits either Date or POSIXct classes).

time_series_xts <- time_series_tbl %>%
    as_xts(date_col = row.names)
time_series_xts
##             V1
## 2016-01-01 500
## 2016-01-02 504
## 2016-01-03 503

Working in the tidyverse

You probably already know and love tidyverse packages like dplyr, tidyr, purrr, readr, and tibble along with lubridate for working with date and datetime. tidyquant works solely in tibbles, so all of the tidyverse functionality is intact.

A simple example inspired by Kan Nishida’s blog shows the dplyr and lubridate capability: Say we want the growth in the stock over the past year. We can do this with dplyr operations.

Getting the last year is simple with dplyr and lubridate. We first select the date and adjusted price (adjusted for stock splits). We then filter using lubridate date functions.

aapl_prices %>%
    select(date, adjusted) %>%
    filter(date >= today() - years(1))
## # A tibble: 253 × 2
##          date  adjusted
##        <date>     <dbl>
## 1  2015-12-31 102.96903
## 2  2016-01-04 103.05706
## 3  2016-01-05 100.47452
## 4  2016-01-06  98.50827
## 5  2016-01-07  94.35077
## 6  2016-01-08  94.84967
## 7  2016-01-11  96.38550
## 8  2016-01-12  97.78438
## 9  2016-01-13  95.27031
## 10 2016-01-14  97.35395
## # ... with 243 more rows

We can also get a baseline price using the first function. Adding to our workflow, this looks like:

aapl_prices %>%
    select(date, adjusted) %>%
    filter(date >= today() - years(1)) %>%
    mutate(baseline = first(adjusted))
## # A tibble: 253 × 3
##          date  adjusted baseline
##        <date>     <dbl>    <dbl>
## 1  2015-12-31 102.96903  102.969
## 2  2016-01-04 103.05706  102.969
## 3  2016-01-05 100.47452  102.969
## 4  2016-01-06  98.50827  102.969
## 5  2016-01-07  94.35077  102.969
## 6  2016-01-08  94.84967  102.969
## 7  2016-01-11  96.38550  102.969
## 8  2016-01-12  97.78438  102.969
## 9  2016-01-13  95.27031  102.969
## 10 2016-01-14  97.35395  102.969
## # ... with 243 more rows

Growth and growth percent versus baseline columns can be added now. We tack on a final select statement to remove unnecessary columns. The final workflow looks like this:

aapl_prices %>%
    select(date, adjusted) %>%
    filter(date >= today() - years(1)) %>%
    mutate(baseline = first(adjusted),
           growth = adjusted - baseline,
           growth_pct = growth / baseline * 100) %>%
    select(-(baseline:growth))
## # A tibble: 253 × 3
##          date  adjusted growth_pct
##        <date>     <dbl>      <dbl>
## 1  2015-12-31 102.96903  0.0000000
## 2  2016-01-04 103.05706  0.0854995
## 3  2016-01-05 100.47452 -2.4225751
## 4  2016-01-06  98.50827 -4.3321348
## 5  2016-01-07  94.35077 -8.3697559
## 6  2016-01-08  94.84967 -7.8852393
## 7  2016-01-11  96.38550 -6.3936946
## 8  2016-01-12  97.78438 -5.0351540
## 9  2016-01-13  95.27031 -7.4767271
## 10 2016-01-14  97.35395 -5.4531690
## # ... with 243 more rows

Leverage the Quantitative Power of xts, quantmod and TTR

You may already know and love xts, quantmod and TTR, which is why the core functionality is fully intact. Using tq_transform() and tq_mutate(), we can apply the xts, quantmod and TTR functions. Entering tq_transform_fun_options() returns a list the transform functions by each package. We’ll discuss these options by package briefly.

tq_transform_fun_options() %>% str()
## List of 3
##  $ xts     : chr [1:27] "apply.daily" "apply.monthly" "apply.quarterly" "apply.weekly" ...
##  $ quantmod: chr [1:25] "allReturns" "annualReturn" "ClCl" "dailyReturn" ...
##  $ TTR     : chr [1:61] "adjRatios" "ADX" "ALMA" "aroon" ...

xts Functionality

# Get xts functions that work with tq_transform and tq_mutate
tq_transform_fun_options()$xts
##  [1] "apply.daily"     "apply.monthly"   "apply.quarterly"
##  [4] "apply.weekly"    "apply.yearly"    "diff.xts"       
##  [7] "lag.xts"         "period.apply"    "period.max"     
## [10] "period.min"      "period.prod"     "period.sum"     
## [13] "periodicity"     "to.daily"        "to.hourly"      
## [16] "to.minutes"      "to.minutes10"    "to.minutes15"   
## [19] "to.minutes3"     "to.minutes30"    "to.minutes5"    
## [22] "to.monthly"      "to.period"       "to.quarterly"   
## [25] "to.weekly"       "to.yearly"       "to_period"

The xts functions that are compatible are listed above. Generally speaking, these are the:

  • Period Apply Functions:
    • Apply a function to a time segment (e.g. max, min, mean, etc).
    • Form: apply.daily(x, FUN, ...).
    • Options include apply.daily, weekly, monthly, quarterly, yearly.
  • To-Period Functions:
    • Convert a time series to time series of lower periodicity (e.g. convert daily to monthly periodicity).
    • Form: to.period(x, period = 'months', k = 1, indexAt, name = NULL, OHLC = TRUE, ...).
    • Options include to.minutes, hourly, daily, weekly, monthly, quarterly, yearly.
    • Note 1 (Important): The return structure is different for to.period and the to.monthly (to.weekly, to.quarterly, etc) forms. to.period returns a date, while to.months returns a character MON YYYY. Best to use to.period if you want to work with time-series via lubridate.

quantmod Functionality

# Get quantmod functions that work with tq_transform and tq_mutate
tq_transform_fun_options()$quantmod
##  [1] "allReturns"      "annualReturn"    "ClCl"           
##  [4] "dailyReturn"     "Delt"            "HiCl"           
##  [7] "Lag"             "LoCl"            "LoHi"           
## [10] "monthlyReturn"   "Next"            "OpCl"           
## [13] "OpHi"            "OpLo"            "OpOp"           
## [16] "periodReturn"    "quarterlyReturn" "seriesAccel"    
## [19] "seriesDecel"     "seriesDecr"      "seriesHi"       
## [22] "seriesIncr"      "seriesLo"        "weeklyReturn"   
## [25] "yearlyReturn"

The quantmod functions that are compatible are listed above. Generally speaking, these are the:

  • Percentage Change (Delt) and Lag Functions
    • Delt: Delt(x1, x2 = NULL, k = 0, type = c("arithmetic", "log"))
      • Variations of Delt: ClCl, HiCl, LoCl, LoHi, OpCl, OpHi, OpLo, OpOp
      • Form: OpCl(OHLC)
    • Lag: Lag(x, k = 1) / Next: Next(x, k = 1) (Can also use dplyr::lag and dplyr::lead)
  • Period Return Functions:
    • Get the arithmetic or logarithmic returns for various periodicities, which include daily, weekly, monthly, quarterly, and yearly.
    • Form: periodReturn(x, period = 'monthly', subset = NULL, type = 'arithmetic', leading = TRUE, ...)
  • Series Functions:
    • Return values that describe the series. Options include describing the increases/decreases, accelerations/decelerations, and hi/low.
    • Forms: seriesHi(x), seriesIncr(x, thresh = 0, diff. = 1L), seriesAccel(x)

TTR Functionality

# Get TTR functions that work with tq_transform and tq_mutate
tq_transform_fun_options()$TTR
##  [1] "adjRatios"          "ADX"                "ALMA"              
##  [4] "aroon"              "ATR"                "BBands"            
##  [7] "CCI"                "chaikinAD"          "chaikinVolatility" 
## [10] "CLV"                "CMF"                "CMO"               
## [13] "DEMA"               "DonchianChannel"    "DPO"               
## [16] "DVI"                "EMA"                "EMV"               
## [19] "EVWMA"              "GMMA"               "growth"            
## [22] "HMA"                "KST"                "lags"              
## [25] "MACD"               "MFI"                "momentum"          
## [28] "OBV"                "PBands"             "ROC"               
## [31] "rollSFM"            "RSI"                "runCor"            
## [34] "runCov"             "runMAD"             "runMax"            
## [37] "runMean"            "runMedian"          "runMin"            
## [40] "runPercentRank"     "runSD"              "runSum"            
## [43] "runVar"             "SAR"                "SMA"               
## [46] "SMI"                "stoch"              "TDI"               
## [49] "TRIX"               "ultimateOscillator" "VHF"               
## [52] "VMA"                "volatility"         "VWAP"              
## [55] "VWMA"               "wilderSum"          "williamsAD"        
## [58] "WMA"                "WPR"                "ZigZag"            
## [61] "ZLEMA"

Here’ a brief description of the most popular functions from TTR:

  • Welles Wilder’s Directional Movement Index:
    • ADX(HLC, n = 14, maType, ...)
  • Bollinger Bands:
    • BBands(HLC, n = 20, maType, sd = 2, ...): Bollinger Bands
  • Rate of Change / Momentum:
    • ROC(x, n = 1, type = c("continuous", "discrete"), na.pad = TRUE): Rate of Change
    • momentum(x, n = 1, na.pad = TRUE): Momentum
  • Moving Averages (maType):
    • SMA(x, n = 10, ...): Simple Moving Average
    • EMA(x, n = 10, wilder = FALSE, ratio = NULL, ...): Exponential Moving Average
    • DEMA(x, n = 10, v = 1, wilder = FALSE, ratio = NULL): Double Exponential Moving Average
    • WMA(x, n = 10, wts = 1:n, ...): Weighted Moving Average
    • EVWMA(price, volume, n = 10, ...): Elastic, Volume-Weighted Moving Average
    • ZLEMA(x, n = 10, ratio = NULL, ...): Zero Lag Exponential Moving Average
    • VWAP(price, volume, n = 10, ...): Volume-Weighted Moving Average Price
    • VMA(x, w, ratio = 1, ...): Variable-Length Moving Average
    • HMA(x, n = 20, ...): Hull Moving Average
    • ALMA(x, n = 9, offset = 0.85, sigma = 6, ...): Arnaud Legoux Moving Average
  • MACD Oscillator:
    • MACD(x, nFast = 12, nSlow = 26, nSig = 9, maType, percent = TRUE, ...)
  • Relative Strength Index:
    • RSI(price, n = 14, maType, ...)
  • runFun:
    • runSum(x, n = 10, cumulative = FALSE): returns sums over a n-period moving window.
    • runMin(x, n = 10, cumulative = FALSE): returns minimums over a n-period moving window.
    • runMax(x, n = 10, cumulative = FALSE): returns maximums over a n-period moving window.
    • runMean(x, n = 10, cumulative = FALSE): returns means over a n-period moving window.
    • runMedian(x, n = 10, non.unique = "mean", cumulative = FALSE): returns medians over a n-period moving window.
    • runCov(x, y, n = 10, use = "all.obs", sample = TRUE, cumulative = FALSE): returns covariances over a n-period moving window.
    • runCor(x, y, n = 10, use = "all.obs", sample = TRUE, cumulative = FALSE): returns correlations over a n-period moving window.
    • runVar(x, y = NULL, n = 10, sample = TRUE, cumulative = FALSE): returns variances over a n-period moving window.
    • runSD(x, n = 10, sample = TRUE, cumulative = FALSE): returns standard deviations over a n-period moving window.
    • runMAD(x, n = 10, center = NULL, stat = "median", constant = 1.4826, non.unique = "mean", cumulative = FALSE): returns median/mean absolute deviations over a n-period moving window.
    • wilderSum(x, n = 10): retuns a Welles Wilder style weighted sum over a n-period moving window.
  • Stochastic Oscillator / Stochastic Momentum Index:
    • stoch(HLC, nFastK = 14, nFastD = 3, nSlowD = 3, maType, bounded = TRUE, smooth = 1, ...): Stochastic Oscillator
    • SMI(HLC, n = 13, nFast = 2, nSlow = 25, nSig = 9, maType, bounded = TRUE, ...): Stochastic Momentum Index

Quantitative Power In Action

We’ll go through some examples, but first let’s get some data. The default for tq_get() is get = "stock.prices", so all we need is to give x a stock symbol.

AAPL <- tq_get("AAPL")

Example 1: Getting the max close price for each quarter.

The xts::apply.quarterly() function that is part of the period apply group can be used to apply functions by quarterly time segments. Because we are seeking a return structure that is on a different time scale than the input (quarterly versus daily), we need to use a transform function. We select tq_transform and pass the close price using OHLC format via x_fun = Cl, and we send this subset of the data to the apply.quarterly function via the transform_fun argument. Looking at the documentation for apply.quarterly, we see that we can pass a function to the argument, FUN. We want the maximum values, so we set FUN = max. The result is the quarters returned as a date and the maximum closing price during the quarter returned as a double.

AAPL %>%
    tq_transform(x_fun = Cl, transform_fun = apply.quarterly, FUN = max)
## # A tibble: 44 × 2
##          date  close
##        <dttm>  <dbl>
## 1  2006-03-31  85.59
## 2  2006-06-30  71.89
## 3  2006-09-29  77.61
## 4  2006-12-29  91.81
## 5  2007-03-30  97.10
## 6  2007-06-29 125.09
## 7  2007-09-28 154.50
## 8  2007-12-31 199.83
## 9  2008-03-31 194.93
## 10 2008-06-30 189.96
## # ... with 34 more rows

Note that as an alternative you could use the xy form, replacing x_fun = Cl with .x = close.

Example 2: Getting daily log returns

The quantmod::periodReturn() function generates returns by periodicity. We have a few options here. Normally I go with a transform function, tq_transform, because the periodReturn function accepts different periodicity options, and anything other than daily will blow up a mutation. But, in our situation the period returns periodicity is the same as the stock prices periodicity (both daily), so we can use either. We want to use the adjusted closing prices column (adjusted for stock splits, which can make it appear that a stock is performing poorly if a split is included), so we set x_fun = Ad. We researched the periodReturn function, and we found that it accepts type = "log" and period = "daily", which returns the daily log returns.

AAPL %>%
    tq_transform(x_fun = Ad, transform_fun = periodReturn, 
                 type = "log", period = "daily")
## # A tibble: 2,769 × 2
##          date daily.returns
##        <dttm>         <dbl>
## 1  2006-01-03   0.000000000
## 2  2006-01-04   0.002938752
## 3  2006-01-05  -0.007900889
## 4  2006-01-06   0.025485766
## 5  2006-01-09  -0.003281888
## 6  2006-01-10   0.061328315
## 7  2006-01-11   0.036906288
## 8  2006-01-12   0.004637594
## 9  2006-01-13   0.015305252
## 10 2006-01-17  -0.010334731
## # ... with 2,759 more rows

Example 3: Adding MACD and Bollinger Bands to a OHLC data set

In reviewing the available options in the TTR package, we see that MACD and BBands functions will get us where we need to be. In researching the documentation, the return is in the same periodicity as the input and the functions work with OHLC functions, so we can use tq_mutate(). MACD requires a price, so we select close using Cl, BBands requires high, low, and close, prices so we use HLC. We can chain the inputs together using the pipe (%>%) since mutate just adds columns. The result is a tibble containing the MACD and Bollinger Band results.

AAPL %>%
    tq_mutate(Cl, MACD) %>%
    tq_mutate(HLC, BBands)
## # A tibble: 2,769 × 13
##          date  open  high   low close    volume  adjusted  macd signal
##        <date> <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl> <dbl>  <dbl>
## 1  2006-01-03 72.38 74.75 72.25 74.75 201808600  9.726565    NA     NA
## 2  2006-01-04 75.13 75.98 74.50 74.97 154900900  9.755191    NA     NA
## 3  2006-01-05 74.83 74.90 73.75 74.38 112355600  9.678420    NA     NA
## 4  2006-01-06 75.25 76.70 74.55 76.30 176114400  9.928252    NA     NA
## 5  2006-01-09 76.73 77.20 75.74 76.05 168760200  9.895722    NA     NA
## 6  2006-01-10 76.25 81.89 75.83 80.86 569967300 10.521606    NA     NA
## 7  2006-01-11 83.84 84.80 82.59 83.90 373448600 10.917174    NA     NA
## 8  2006-01-12 84.97 86.40 83.62 84.29 320202400 10.967921    NA     NA
## 9  2006-01-13 84.99 86.01 84.60 85.59 194076400 11.137079    NA     NA
## 10 2006-01-17 85.70 86.38 83.87 84.71 208905900 11.022573    NA     NA
## # ... with 2,759 more rows, and 4 more variables: dn <dbl>, mavg <dbl>,
## #   up <dbl>, pctB <dbl>

Note that for the MACD, we could have used tq_mutate_xy(), setting .x = close. However, for the BBands, we are forced to use tq_mutate() because of the HLC input.

Example 4: Getting the Percentage Difference Between Open and Close from Zero to Five Periods

We can’t use the OpCl function for this task since it only returns the percentage difference for a period lag of zero. We keep digging and we find the base Delt function from quantmod. In researching the function, we see that Delt takes one or two inputs, k a series of lags, and the type of difference, either arithmetic or log. We will set .x = open and .y = close and k = 0:5 to get zero through five periods. The default type = "arithmetic" is acceptable, so there is no need to specify. The result is the percentage difference between the open and close prices for periods zero to five.

AAPL %>%
    tq_mutate_xy(.x = open, .y = close, mutate_fun = Delt, k = 0:5) %>%
    select(-c(high, low, volume, adjusted))
## # A tibble: 2,769 × 9
##          date  open close Delt.0.arithmetic Delt.1.arithmetic
##        <date> <dbl> <dbl>             <dbl>             <dbl>
## 1  2006-01-03 72.38 74.75      0.0327438653                NA
## 2  2006-01-04 75.13 74.97     -0.0021296021       0.035783351
## 3  2006-01-05 74.83 74.38     -0.0060135910      -0.009982657
## 4  2006-01-06 75.25 76.30      0.0139534485       0.019644528
## 5  2006-01-09 76.73 76.05     -0.0088622703       0.010631203
## 6  2006-01-10 76.25 80.86      0.0604590009       0.053825127
## 7  2006-01-11 83.84 83.90      0.0007155892       0.100327799
## 8  2006-01-12 84.97 84.29     -0.0080028479       0.005367330
## 9  2006-01-13 84.99 85.59      0.0070596538       0.007296705
## 10 2006-01-17 85.70 84.71     -0.0115518788      -0.003294505
## # ... with 2,759 more rows, and 4 more variables: Delt.2.arithmetic <dbl>,
## #   Delt.3.arithmetic <dbl>, Delt.4.arithmetic <dbl>,
## #   Delt.5.arithmetic <dbl>

For comparison we’ll inspect the output from the OpCl() function using tq_mutate(). We send OHLC prices to the OpCl function. As expected the OpCl.. column returned is the same as Delt.0.arithmetic from above.

AAPL %>%
    tq_mutate(OHLC, OpCl) %>%
    select(-c(high, low, volume, adjusted))
## # A tibble: 2,769 × 4
##          date  open close        OpCl..
##        <date> <dbl> <dbl>         <dbl>
## 1  2006-01-03 72.38 74.75  0.0327438653
## 2  2006-01-04 75.13 74.97 -0.0021296021
## 3  2006-01-05 74.83 74.38 -0.0060135910
## 4  2006-01-06 75.25 76.30  0.0139534485
## 5  2006-01-09 76.73 76.05 -0.0088622703
## 6  2006-01-10 76.25 80.86  0.0604590009
## 7  2006-01-11 83.84 83.90  0.0007155892
## 8  2006-01-12 84.97 84.29 -0.0080028479
## 9  2006-01-13 84.99 85.59  0.0070596538
## 10 2006-01-17 85.70 84.71 -0.0115518788
## # ... with 2,759 more rows

Designed to be Used and Scaled with the tidyverse

Each function has one primary input and one output. This allows chaining operations with the pipe (%>%), and mapping to extend to lists of many stocks, exchange rates, metals, economic data, financial statements, etc. The rationale behind this is simple: let the function handle the operation, let the tidyverse handle the iteration.

Rather than explain, let’s go through a simple workflow using the tidyverse. We setup a two step workflow:

  1. Analyze a single stock
  2. Scale to many stocks

Analyze a Single Stock

In our hypothetical situation, we want to compare the mean monthly log returns (MMLR). First, let’s come up with a function to help us collect log returns. The function below performs three operations internally. It first gets the stock prices using tq_get(). Then, it transforms the stock prices to period returns using tq_transform(). We add the type = "log" and period = "monthly" arguments to ensure we retrieve a tibble of monthly log returns. Last, we take the mean of the monthly returns to get MMLR.

my_stock_analysis_fun <- function(stock.symbol) {
    period.returns <- stock.symbol %>%
        tq_get(get = "stock.prices") %>%
        tq_transform(x_fun = Ad, transform_fun = periodReturn, 
                     type = "log", period = "monthly")
    mean(period.returns$monthly.returns)
}

And, let’s test it out. We now have the mean monthly log returns over the past ten years.

my_stock_analysis_fun("AAPL")
## [1] 0.01876649

Extrapolate to Many Stocks using tidyverse

Now that we have one stock down, we can scale to many stocks. For brevity, we’ll randomly sample ten stocks from the S&P500 with a call to dplyr::sample_n().

set.seed(100)
stocks <- tq_get("SP500", get = "stock.index") %>%
    sample_n(10)
stocks
## # A tibble: 10 × 2
##    symbol            company
##     <chr>              <chr>
## 1     EMC                EMC
## 2    XRAY      DENTSPLY INTL
## 3     MNK   MALLINCKRODT PLC
## 4     AIG      AMERICAN INTL
## 5    INTC              INTEL
## 6     IVZ            INVESCO
## 7      SE     SPECTRA ENERGY
## 8    FLIR       FLIR SYSTEMS
## 9       L              LOEWS
## 10    CNP CENTERPOINT ENERGY

We can now apply our analysis function to the stocks using dplyr::mutate and purrr::map_dbl. The mutate() function adds a column to our tibble, and the map_dbl() function maps our my_stock_analysis_fun to our tibble of stocks using the symbol column.

stocks <- stocks %>%
    mutate(mmlr = map_dbl(symbol, my_stock_analysis_fun)) %>%
    arrange(desc(mmlr))
stocks
## # A tibble: 10 × 3
##    symbol            company         mmlr
##     <chr>              <chr>        <dbl>
## 1    FLIR       FLIR SYSTEMS  0.009359061
## 2     CNP CENTERPOINT ENERGY  0.008680819
## 3     IVZ            INVESCO  0.007007588
## 4      SE     SPECTRA ENERGY  0.006579937
## 5    XRAY      DENTSPLY INTL  0.006211669
## 6     EMC                EMC  0.006210331
## 7    INTC              INTEL  0.005163667
## 8       L              LOEWS  0.003371969
## 9     MNK   MALLINCKRODT PLC  0.002366373
## 10    AIG      AMERICAN INTL -0.021078924

And, we’re done! We now have the MMLR for 10-years of stock data for 10 stocks. And, we can easily extend this to larger lists or stock indexes. For example, the entire S&P500 could be analyzed removing the sample_n() following the call to tq_get("SP500", get = "stock.index").

Function tq_get() Designed to Handle Errors Gracefully

Eventually you will run into a stock index, stock symbol, FRED data code, etc that cannot be retrieved. Possible reasons are:

  • The website changes
  • An index becomes out of date
  • A company goes private
  • A stock ticker symbol changes
  • Yahoo / FRED just doesn’t like your stock symbol / FRED code

This becomes painful when scaling if the functions return errors. So, the tq_get() function is designed to handle errors gracefully. What this means is a NA value is returned when an error is generated along with a gentle error warning. There are pros and cons to this approach that you may not agree with but I believe helps in the long run. Just be aware of what happens:

  • Pros: Long running scripts are not interrupted because of one error

  • Cons: Errors flow downstream if not looking at warnings and not reviewing results

With tq_get(), Bad Apples Fail Gracefully:

Let’s see an example when mapping to tq_get() to a long list of stocks with one BAD APPLE.

stock_list_with_one_bad_apple <- tibble( 
    symbol = c("AAPL", "GOOG", "AMZN", "FB", "BAD APPLE",
               "AVGO", "SWKS","NVDA", "V", "MA")
)
stock_list_with_one_bad_apple <- stock_list_with_one_bad_apple %>%
    mutate(stock.prices = map(.x = symbol, ~ tq_get(.x, get = "stock.prices")))
## Warning in value[[3L]](cond): Error at stock symbol BAD APPLE during call
## to quantmod::getSymbols.

We get warned that there was an issue in the operation. With that said, we still get the full list of stocks.

stock_list_with_one_bad_apple
## # A tibble: 10 × 2
##       symbol         stock.prices
##        <chr>               <list>
## 1       AAPL <tibble [2,769 × 7]>
## 2       GOOG <tibble [2,769 × 7]>
## 3       AMZN <tibble [2,769 × 7]>
## 4         FB <tibble [1,163 × 7]>
## 5  BAD APPLE            <lgl [1]>
## 6       AVGO <tibble [1,865 × 7]>
## 7       SWKS <tibble [2,769 × 7]>
## 8       NVDA <tibble [2,769 × 7]>
## 9          V <tibble [2,214 × 7]>
## 10        MA <tibble [2,670 × 7]>

Say hypothetically we didn’t recognize the error message. An error shows up during the next operation. As an example, we’ll attempt to get yearly period returns using tq_transform. The operation is wrapped in a tryCatch() statement to enable printing the error message.

tryCatch({
    stock_list_with_one_bad_apple %>%
    mutate(annual.returns = map(.x = stock.prices, 
                                ~ tq_transform(.x,
                                               x_fun = Ad, 
                                               transform_fun = periodReturn, 
                                               period = "yearly")
                                )
           )
}, error = function(e) {
    print(e)
})
## <Rcpp::eval_error in eval(substitute(expr), envir, enclos): `data` must be a tibble or data.frame.>

The operation grinds to a hault because the BAD APPLE tried to send its value for stock.prices of NA to the tq_transform() function. The error message tells us that data is not a tibble or data.frame.

The rationale behind the error handling approach is that long-running scripts should not fail during minor issues. For example, if you have a list of 3000 stocks and the 3000th is bad, the program could take 20+ minutes to fail. This is disheartening. We allow tq_get() to continue to fetch data even if an error is encountered. Failure occurs during tq_transform() and tq_mutate() to prevent the error from getting too far downstream.

Recognizing how tq_get() works (and gracefully fails), we can adjust our workflow. It’s a good idea to collect stock information in one independent step, review any warnings / errors, and remove “bad apples” if present before moving on to any transformations or mutations.

Here’s an example of a good workflow:

stock_list_with_one_bad_apple <- tibble( 
    symbol = c("AAPL", "GOOG", "AMZN", "FB", "BAD APPLE",
               "AVGO", "SWKS","NVDA", "V", "MA")
    ) %>%
    # Step 1: Get stock prices
    mutate(stock.prices = map(.x = symbol, ~ tq_get(.x, get = "stock.prices")),
           class = map_chr(stock.prices, ~ class(.x)[[1]])) %>%
    # Step 2: Filter out errors; errors have a class of "logical"
    filter(class != "logical") %>%
    select(-class) %>%
    # Step 3: Perform period returns
    mutate(annual.returns = map(.x = stock.prices, 
                                ~ tq_transform(.x,
                                               x_fun = Ad, 
                                               transform_fun = periodReturn, 
                                               period = "yearly")
                                )
           )
stock_list_with_one_bad_apple
## # A tibble: 9 × 3
##   symbol         stock.prices    annual.returns
##    <chr>               <list>            <list>
## 1   AAPL <tibble [2,769 × 7]> <tibble [11 × 2]>
## 2   GOOG <tibble [2,769 × 7]> <tibble [11 × 2]>
## 3   AMZN <tibble [2,769 × 7]> <tibble [11 × 2]>
## 4     FB <tibble [1,163 × 7]>  <tibble [5 × 2]>
## 5   AVGO <tibble [1,865 × 7]>  <tibble [8 × 2]>
## 6   SWKS <tibble [2,769 × 7]> <tibble [11 × 2]>
## 7   NVDA <tibble [2,769 × 7]> <tibble [11 × 2]>
## 8      V <tibble [2,214 × 7]>  <tibble [9 × 2]>
## 9     MA <tibble [2,670 × 7]> <tibble [11 × 2]>

Fall Back for Stock Indexes:

There’s a fallback for the stock indexes too. Since the source, www.marketvolume.com, could change over time, an option is provided to pull stored data within the tidyquant package. The downside is that the data is only as accurate as the last update to tidyquant. Here’s how to get the stock indexes locally if for some reason the website is down or has changed.

tq_get("SP500", get = "stock.index", use_fallback = TRUE)
## Using fallback dataset last downloaded 2016-12-23.
## # A tibble: 501 × 2
##    symbol                   company
##     <chr>                     <chr>
## 1     MMM                        3M
## 2     ABT       ABBOTT LABORATORIES
## 3    ABBV                ABBVIE INC
## 4     ACN                 ACCENTURE
## 5    ATVI       ACTIVISION BLIZZARD
## 6     AYI             ACUITY BRANDS
## 7    ADBE             ADOBE SYSTEMS
## 8     AAP        ADVANCE AUTO PARTS
## 9     AET                     AETNA
## 10    AMG AFFILIATED MANAGERS GROUP
## # ... with 491 more rows

Recap

Hopefully now you see how tidyquant helps to integrate the best quantitative financial analysis packages with the tidyverse. The benefits are:

With a few, easy-to-use core functions, you can efficiently leverage the quantitative power of xts, quantmod and TTR with the data management infrastructure and scale-ability of the tidyverse.