--- title: "ympes" output: litedown::html_format: options: embed_resources: ["all"] vignette: > %\VignetteEngine{litedown::vignette} %\VignetteIndexEntry{ympes} %\VignetteEncoding{UTF-8} --- ympes provides a collection of lightweight helper functions (imps) both for interactive use and for inclusion within other packages. It's my attempt to save some functionality that would otherwise get lost in a script somewhere on my computer. To that end it's a bit of a hodgepodge of things that I've found useful at one time or another and, more importantly, remembered to include here! ```{r} library(ympes) ``` ## Visualising palettes I often want to quickly see what a palette looks like to ensure I can distinguish the different colours. The imaginatively named `plot_palette()` thus provides a quick overview ```{r} #| fig.alt = "A plot with 3 rectangular regions, coloured green, red and black." plot_palette(c("#5FE756", "red", "black")) ``` We can make the plot square(ish) by setting the argument `square = TRUE`. A nice side effect of this is the automatic adjusting of labels to account for the underlying colour ```{r} #| fig.alt = "A plot of the 8 colours that define the 'R4' palette. The plot is #| divided in to a 3 by 3 square (one square is left blank)." plot_palette(palette.colors(palette = "R4"), square = TRUE) ``` ## Finding strings Sometimes you just want to find rows of a data frame where a particular string occurs. `greprows()` searches for pattern matches within a data frames columns and returns the related rows or row indices. It is a thin wrapper around a subset, lapply and reduce `grep()` based approach. ```{r} dat <- data.frame( first = letters, second = factor(rev(LETTERS)), third = "Q" ) greprows(dat, "A|b") ``` grepvrows() is identical to greprows() except with the default value = TRUE. ```{r} grepvrows(dat, "A|b") greprows(dat, "A|b", value = TRUE) ``` greplrows() returns a logical vector (match or not for each row of dat). ```{r} greplrows(dat, "A|b", ignore.case = TRUE) ``` ## Capturing strings One of my favourite functions in \R is `strcapture()`. This function allows you to extract the captured elements of a regular expression in to a tabular data structure. Being able to parse input strings from a file to correctly split columns in a data frame in a single function call feels so elegant. To illustrate this, we generate some synthetic movement data which we pretend to have loaded in from a file. Each entry has the form "Name-Direction-Value" with the first two entries representing character strings and, the last entry, an integer value. ```{r} movements <- function(length) { x <- lapply( list(c("Bob", "Mary", "Rose"), c("Up", "Down", "Right", "Left"), 1:10), sample, size = length, replace = TRUE ) do.call(paste, c(x, sep = "-")) } # just a small sample to begin with (dat <- movements(3)) pattern <- "([[:alpha:]]+)-([[:alpha:]]+)-([[:digit:]]+)" proto <- data.frame(Name = "", Direction = "", Value = 1L) strcapture(pattern, dat, proto = proto, perl = TRUE) ``` For small (define as you wish) data sets this works fine. Unfortunately as the number of entries increases the performance decays (see https://bugs.r-project.org/show_bug.cgi?id=18728 for a more detailed analysis). `fstrapture()` attempts to improve upon this by utilising an approach I saw implemented by Toby Hocking in the [nc](https://cran.r-project.org/package=nc) and the function `nc::capture_first_vec()`. ```{r} # Now a larger number of strings dat <- movements(1e5) (t <- system.time(r <- strcapture(pattern, dat, proto = proto, perl = TRUE))) (t2 <- system.time(r2 <- fstrcapture(dat, pattern, proto = proto))) t[["elapsed"]] / t2[["elapsed"]] ``` As well as the improved performance you will notice two other differences between the two function signatures. Firstly, to make things more pipeable, the data parameter `x` appears before the `pattern` parameter. Secondly, `fstrcapture()` works only with Perl-compatible regular expressions. # Combining values for lazy people `cc()` is for those of us that get fed up typeing quotation marks. It accepts either comma-separated, unquoted names that you wish to quote or, a length one character vector that you wish to split by whitespace. Intended mainly for interactive use only, an example is likely more enlightening than my description ```{r} cc(dale, audrey, laura, hawk) cc("dale audrey laura hawk") ```