Speed isn’t everything, but we do care about it. Ways to make your code run faster:
tidyr::complete
and its inclusion immediately slows down the driver, it may be working in groups. If the groups are not actually necessary for the pipeline, try adding ungroup()
at the beginning of the pipeline / before the call to tidyr::complete
.fast_left_join
function that uses a different library to do the join.replace_na()
, which is much faster than using mutate
+ if_else
in combination.NA
) while leaving other values alone, use replace
instead of if_else
; it is also much faster.mutate()
calls, e.g. x %>% mutate(...) %>% mutate(...)
. Combining these into a single x %>% mutate(..., ...)
call can result in a 30-40% speedup. Note that the package checks scan for this and will raise an error.Some things that seem like they might help but haven’t been verified with experiments: * It may be faster to convert strings to factors prior to doing a join. However, the dplyr team has been working on fixing this, and it’s not clear whether the difference is as stark as it used to be. See discussion here: https://github.com/tidyverse/dplyr/issues/1386