# Intro

Registers managed by the Swedish Cancer centers (quality registers and the cancer register) have date variables in different formats. This package helps to recognise and handle these dates.

library(rccdates)

# Ordinary dates

RCC dates are usually in the form %Y-%m-%d, such as “2016-06-17”. These are recognised by ordinary R-functions such as as.Date if there are no missing values or if missing values are coded as NA. It is however common with RCC data that missing dates are coded as empty strings. Then:

d <- c("", "2016-06-17")
as.Date(d)
## Error in charToDate(x): character string is not in a standard unambiguous format

The as.Date function (not the plural) might then be easier to use.

as.Dates(d)
## x coerced to Date
## [1] NA           "2016-06-17"

# Non standard dates

The oringinal motivation for the package was to handle old date variables from the cancer register. Days and even months are sometimes coded as “00” (unknown). If so happens, as.Dates (note the plural) might still recognise the date and will replace “00” by an approximate date:

as.Date("2000-01-00") # as.Date fails!
## Error in charToDate(x): character string is not in a standard unambiguous format
as.Dates("2000-01-00") # Missing day
## x coerced to Date
## [1] "2000-01-15"
as.Dates("2000-00-01") # Missing month
## x coerced to Date
## [1] "2000-07-01"
as.Dates("2000-00-00") # Missing month and day
## x coerced to Date
## [1] "2000-07-15"

Some old dates might also be in the format %Y%V (see ?strptime), such as “7403” for week 3 in 1974. This is tricky for four reasons:

• Exact date is unknown but has to be approximated
• Different countries numerate weeks of year differently. Sweden use ISO-8601 (“If the week [starting on Monday] containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise, it is the last week of the previous year, and the next week is week 1.”) but R does not.
• Procedures for week number differ also between different operating systems
as.Date("7403")
## Error in charToDate(x): character string is not in a standard unambiguous format
as.Dates("7403")
## x coerced to Date
## [1] "1974-01-17"

It is also possible to have a mixture of different dates within the same vector:

as.Dates(c("", NA, "2000-01-01", "20000101", "20000000", "7403"))
## x coerced to Date
## [1] NA           NA           "2000-01-01" "2000-01-01" "2000-07-15"
## [6] "1974-01-17"

# Convert all date variables to dates

Another common issue with RCC data is that the number of columns might be huge (several hundreds of variables). When data is imported to R from tab/csv-files date columns are recognised only as characters (and are therefore treated as factors by default). All date columns must than be converted to dates manually before further processing.

This process might sometimes be simplified assuming common name structures of date variables such that:

df1 <- df2 <- data.frame(
important_date = "1985-05-04",
another_date = "2001-09-11",
something_else = "halleluja!"
)
str(df1)
## 'data.frame':    1 obs. of  3 variables:
##  $important_date: Factor w/ 1 level "1985-05-04": 1 ##$ another_date  : Factor w/ 1 level "2001-09-11": 1
##  $something_else: Factor w/ 1 level "halleluja!": 1 dts <- grepl("dat", names(df1)) df1[dts] <- lapply(df1[dts], as.Date) str(df1) ## 'data.frame': 1 obs. of 3 variables: ##$ important_date: Date, format: "1985-05-04"
##  $another_date : Date, format: "2001-09-11" ##$ something_else: Factor w/ 1 level "halleluja!": 1

It is hopefully obvious that this soultion is not optimal (for several reasons)!

as.Dates however is in fact a generic function with a method for data frames that tries to automate this process:

df2 <- as.Dates(df2)
## The following variables were recognised as potential dates and therefore coerced to such:
## * important_date
## * another_date
str(df2)
## 'data.frame':    1 obs. of  3 variables:
##  $important_date: Date, format: "1985-05-04" ##$ another_date  : Date, format: "2001-09-11"
##  \$ something_else: Factor w/ 1 level "halleluja!": 1

This can simplify date handling quite a lot!

# Year variables

Another feature of the package is a new way to handle year data.

Cohort data are often presented by year. The rccdates introduce a new S3 class “year”. This might be prefered to converting year to characters:

# Let's make some random dates
x <- Sys.Date() - sample(365:(5 * 365), 5)

# The year is usually treated as a string in one of two ways:
(y1 <- substr(x, 1, 4))
## [1] "2014" "2011" "2012" "2014" "2013"
(y2 <- format(x, format = "%Y"))
## [1] "2014" "2011" "2012" "2014" "2013"

This is fine as long as we just want to treat the year as a “label” but then we can than no longer use the year for any type of arithmetics:

max(y1) - min(y1)
## Error in max(y1) - min(y1): non-numeric argument to binary operator
y1 + 10
## Error in y1 + 10: non-numeric argument to binary operator

We cound of course treat years as numerics instead but then we might do all sorts of crazy stuff that doesn’t make any sense at all:

y1 <- as.numeric(y1)
log(y1)
## [1] 7.607878 7.606387 7.606885 7.607878 7.607381
y1 ^ 3
## [1] 8169178744 8132727331 8144865728 8169178744 8157016197

We can instead use the year class to only allow operations that actually make sense:

table(y3 <- as.year(x))
##
## 2011 2012 2013 2014
##    1    1    1    2
max(y3) - min(y3)
## [1] 3
y3 + 10
## [1] "2024" "2021" "2022" "2024" "2023"
log(y3)
## Error in log(y3): non-numeric argument to mathematical function
y3 ^ 3
## Error in y3^3: non-numeric argument to binary operator