I am having problems with reading excel. A column with dates is being read some cells in serial and others as date as below.

date<-c("43942", "43945", "43952", "17/05/2020", "17/05/2020","18/05/2020", "19/05/2020", "18/05/2020", "22/05/2020")

How could I convert the vector below to dates? I put it as a string because this is how the column I want to convert looks after reading.

The command

as.Date(date, origin = "1899-12-30")

return error and

as.Date(as.numeric(date), origin = "1899-12-30")

convert in NA the non-serial dates.

2

Best Answer


You can handle this in two steps in any order.

Convert numbers to date

new_date <- as.Date(as.numeric(date), origin = "1899-12-30")

This will return warnings for dates that cannot be converted to numeric.

Convert remaining dates using "dmy" format

new_date[is.na(new_date)] <- as.Date(date[is.na(new_date)], "%d/%m/%Y")#Or using lubridate's dmy#new_date[is.na(new_date)] <- lubridate::dmy(date[is.na(new_date)])new_date#[1] "2020-04-21" "2020-04-24" "2020-05-01" "2020-05-17" "2020-05-17" "2020-05-18"#[7] "2020-05-19" "2020-05-18" "2020-05-22"

We can use grepl without any warnings

i1 <- grepl("^[0-9]+$", date)newdate <- as.Date(rep(NA, length(date)))newdate[i1] <- as.Date(as.numeric(date[i1]), origin = "1899-12-30")newdate[!i1] <- as.Date(date[!i1], "%d/%m/%Y")newdate#[1] "2020-04-21" "2020-04-24" "2020-05-01" "2020-05-17" "2020-05-17" #[6] "2020-05-18" "2020-05-19" "2020-05-18" "2020-05-22"