When building a web service, it is desirable to save commonly requested products in a cache directory to avoid time wasted reproducing them unnecessarily. Because the cache has finite disk space allocated to it, the cache should be routinely purged of old or outdated files to make room new ones. The manageCache() utility function simplifies this process.

A Product Cache Example

Lets first make a cache directory and put some data products in it.

# Create a cache directory
CACHE_DIR <- file.path(tempdir(), 'cache')
if ( file.exists(CACHE_DIR) == FALSE ) {
  dir.create(CACHE_DIR)
}

# Add a few files to the cache
write.csv(matrix(1,400,500), file=file.path(CACHE_DIR,'m1.csv'))
Sys.sleep(1) # wait a bit between each to give them different mtimes
write.csv(matrix(2,400,500), file=file.path(CACHE_DIR,'m2.csv'))
Sys.sleep(1)
write.csv(matrix(3,400,500), file=file.path(CACHE_DIR,'m3.csv'))
Sys.sleep(1)
write.csv(matrix(4,400,500), file=file.path(CACHE_DIR,'m4.csv'))

We can look in our new cache directory and see the four files we just added. The directory contains about 1.5 MB of data.

cachedFiles <- list.files(CACHE_DIR, full.names = TRUE)
infoDF <- file.info(cachedFiles)
cacheSize = (sum(infoDF$size) / 1e6) # in MB
print(list.files(CACHE_DIR))
#> [1] "m1.csv" "m2.csv" "m3.csv" "m4.csv"
sprintf("Cache size = %s MB", cacheSize)
#> [1] "Cache size = 1.622748 MB"

In order to simulate file requests, lets read two of them to update their access time.

# Access two of the files, updating their atime
invisible( read.csv(file.path(CACHE_DIR, 'm1.csv')) )
invisible( read.csv(file.path(CACHE_DIR, 'm2.csv')) )

Now, lets use manageCache() to get our cache down to 1 MB.

# Use manageCache() to get cache to 1 MB
library(MazamaCoreUtils)
manageCache(CACHE_DIR, extensions = 'csv', maxCacheSize = 1)

When we check our cache again, we will see that the two files with the oldest access times are gone and the cache size is now under 1 MB.

# Check cache contents and total size again
cachedFiles <- list.files(CACHE_DIR, full.names = TRUE)
infoDF <- file.info(cachedFiles)
cacheSize = (sum(infoDF$size) / 1e6) # in MB
print(list.files(CACHE_DIR))
#> [1] "m1.csv" "m2.csv"
sprintf("Cache size = %s MB", cacheSize)
#> [1] "Cache size = 0.811374 MB"

Other Use Cases

In the case of a product cache, the most typical behavior will be to sort files based on last access time. The manageCache() function uses sortBy = "atime" as the default. It is also possible to sort based on modification time mtime or change time ctime.

The use case scenario for sortBy = "mtime" might involve files that are considered stale if the contents aren't updated.

A use case scenario for sortBy = "ctime" is not clear.