rucrdtw
provides R bindings for functions from the UCR Suite (Rakthanmanon et al. 2012), which enables ultrafast subsequence search under both Dynamic Time Warping and Euclidean Distance.
Install rucrdtw
from GitHub:
install.packages("devtools")
devtools::install_github("pboesu/rucrdtw")
Load rucrdtw
package:
library("rucrdtw")
create a random long time series
set.seed(123)
rwalk <- cumsum(runif(1e7, min = -0.5, max = 0.5))
Pick a random subsequence of 100 elements as a query
qstart <- sample(length(rwalk), 1)
qlength <- 100
query <- rwalk[qstart:(qstart+qlength)]
Since both query and data are R vectors, we use the vector-vector methods for the search.
system.time(dtw_search <- ucrdtw_vv(data = rwalk, query = query, qlength = qlength, dtwwindow = 0.05))
## user system elapsed
## 0.97 0.00 1.00
all.equal(qstart, dtw_search$location)
## [1] TRUE
system.time(ed_search <- ucred_vv(data = rwalk, query = query, qlength = qlength))
## user system elapsed
## 1.82 0.01 1.86
all.equal(qstart, ed_search$location)
## [1] TRUE
And in a matter of seconds we have searched 10 million data points and rediscovered our query!
Searching for an exact match, however, is somewhat artificial. The real power of the similarity search is finding structurally similar subsequences in complex sets of time series. To demonstarte this we load an example data set:
data("synthetic_control")
This data set contains 600 time series of length 60 from 6 classes (Alcock et al. 1999). The data set documentation contains further information about these data. It can be displayed using the command ?synthetic_control
. We can plot an example of each class
par(mfrow = c(3,2),
mar = c(1,1,1,1))
classes = c("Normal", "Cyclic", "Increasing", "Decreasing", "Upward shift", "Downward shift")
for (i in 1:6){
plot(synthetic_control[i*100-99,], type = "l", xaxt = "n", yaxt = "n", ylab="", xlab = "", bty="n", main=classes[i])
}
Since we are now comparing a query against a set of time series, we only need to do comparisons for non-overlapping data sequences. The vector-matrix methods ucrdtw_vm
and ucred_vm
provide this functionality.
We can demonstrate this by removing a query from the data set, and then searching for a closest match:
index <- 600
query <- synthetic_control[index,]
#microbenchmark::microbenchmark(
dtw_search = ucrdtw_vm(synthetic_control[-index,], query, length(query), 0.05, byrow = TRUE)
ed_search = ucred_vm(synthetic_control[-index,], query, length(query), byrow= TRUE)
#times=50)
And plot the results:
plot(synthetic_control[dtw_search$location,], type="l", ylim=c(0,55), ylab="")
lines(query, col="red")
lines(synthetic_control[ed_search$location,], col="blue", lty=3, lwd=3)
legend("topright", legend = c("query", "DTW match", "ED match"), col=c("red", "black", "blue"), lty=c(1,1,3), bty="n")
Alcock, R. J., Y. Manolopoulos, Data Engineering Laboratory, and Department Of Informatics. 1999. “Time-Series Similarity Queries Employing a Feature-Based Approach.” In In 7 Th Hellenic Conference on Informatics, Ioannina, 27–29.
Rakthanmanon, Thanawin, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn Keogh. 2012. “Searching and Mining Trillions of Time Series Subsequences Under Dynamic Time Warping.” In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 262–70. ACM. doi:10.1145/2339530.2339576.