---
title: "ympes"
output:
    litedown::html_format:
        options:
            embed_resources: ["all"]
        
vignette: >
  %\VignetteEngine{litedown::vignette}
  %\VignetteIndexEntry{ympes}
  %\VignetteEncoding{UTF-8}
---

ympes provides a collection of lightweight helper functions (imps) both for
interactive use and for inclusion within other packages. It's my attempt to save
some functionality that would otherwise get lost in a script somewhere on my
computer. To that end it's a bit of a hodgepodge of things that I've
found useful at one time or another and, more importantly, remembered to include
here!

```{r}
library(ympes)
```

## Visualising palettes

I often want to quickly see what a palette looks like to ensure I can
distinguish the different colours. The imaginatively named `plot_palette()`
thus provides a quick overview

```{r}
#| fig.alt = "A plot with 3 rectangular regions, coloured green, red and black."
plot_palette(c("#5FE756", "red", "black"))
```

We can make the plot square(ish) by setting the argument `square = TRUE`. A nice
side effect of this is the automatic adjusting of labels to account for the
underlying colour

```{r}
#| fig.alt = "A plot of the 8 colours that define the 'R4' palette. The plot is
#|            divided in to a 3 by 3 square (one square is left blank)."
plot_palette(palette.colors(palette = "R4"), square = TRUE)
```

## Finding strings

Sometimes you just want to find rows of a data frame where a particular string
occurs. `greprows()` searches for pattern matches within a data frames columns
and returns the related rows or row indices. It is a thin wrapper around a 
subset, lapply and reduce `grep()` based approach.

```{r}
dat <- data.frame(
    first = letters,
    second = factor(rev(LETTERS)),
    third = "Q"
)
greprows(dat, "A|b")
```

grepvrows() is identical to greprows() except with the default value = TRUE.

```{r}
grepvrows(dat, "A|b")
greprows(dat,  "A|b", value = TRUE)
```

greplrows() returns a logical vector (match or not for each row of dat).

```{r}
greplrows(dat, "A|b", ignore.case = TRUE)
```

## Capturing strings

One of my favourite functions in \R is `strcapture()`. This function allows you
to extract the captured elements of a regular expression in to a tabular data
structure. Being able to parse input strings from a file to correctly split
columns in a data frame in a single function call feels so elegant.

To illustrate this, we generate some synthetic movement data which we pretend
to have loaded in from a file. Each entry has the form "Name-Direction-Value"
with the first two entries representing character strings and, the last entry,
an integer value.

```{r}
movements <- function(length) {
    x <- lapply(
        list(c("Bob", "Mary", "Rose"), c("Up", "Down", "Right", "Left"), 1:10),
        sample,
        size = length,
        replace = TRUE
    )
    do.call(paste, c(x, sep = "-"))
}

# just a small sample to begin with
(dat <- movements(3))
pattern <- "([[:alpha:]]+)-([[:alpha:]]+)-([[:digit:]]+)"
proto   <- data.frame(Name = "", Direction = "", Value = 1L)
strcapture(pattern, dat, proto = proto, perl = TRUE)
```

For small (define as you wish) data sets this works fine. Unfortunately as the
number of entries increases the performance decays (see
https://bugs.r-project.org/show_bug.cgi?id=18728 for a more detailed analysis).
`fstrapture()` attempts to improve upon this by utilising an approach I saw
implemented by Toby Hocking in the [nc](https://cran.r-project.org/package=nc)
and the function `nc::capture_first_vec()`.

```{r}
# Now a larger number of strings
dat <- movements(1e5)
(t  <- system.time(r <- strcapture(pattern, dat, proto = proto, perl = TRUE)))
(t2 <- system.time(r2 <- fstrcapture(dat, pattern, proto = proto)))
t[["elapsed"]] / t2[["elapsed"]]
```

As well as the improved performance you will notice two other differences
between the two function signatures. Firstly, to make things more pipeable, the
data parameter `x` appears before the `pattern` parameter. Secondly,
`fstrcapture()` works only with Perl-compatible regular expressions.

# Combining values for lazy people

`cc()` is for those of us that get fed up typeing quotation marks. It accepts
either comma-separated, unquoted names that you wish to quote or, a
length one character vector that you wish to split by whitespace. Intended
mainly for interactive use only, an example is likely more enlightening than
my description

```{r}
cc(dale, audrey, laura, hawk)
cc("dale audrey laura hawk")
```