Everybody develops its own coding habits and style. Some people take a lot of effort in making their source code readable, while others don’t bother at all. Working together with other people is easier when everyone uses the same standard.
The checklist package defines a set of standards and
provides tools to validate whether your project or R package adheres to
these standards. It integrates several existing tools like rcmdcheck, lintr, devtools, desc, hunspell, pkgdown.
checklist always uses these tools with hard-coded settings,
ensuring that everyone uses the same settings. Since version 0.5.0,
every organisation can define its own settings in a central repository.
More information on that in vignette("organisation").
checklist tries to make on-boarding as easy as possible.
We do this by providing the interactive functions
create_package() and create_project(). These
function don’t just provide a template which the user must fill in.
Instead they guide the user through a series of questions to set up the
project or package according to the best practices. Hence the user can
start from a solid basis rather than having to figure out all the best
practices by him/herself. We deliberately choose the highest quality
level for R packages by enforcing all relevant checks. In case of
project we allow the user to choose which checks to apply. More
information on that in vignette("getting_started") or
vignette("getting_started_project"). You can apply (or
update) the checklist setting to an existing project or package using
setup_package() or setup_project(). However,
we recommend that you first get used to checklist by
creating a new project or package from scratch. Since
checklist enforces several best practices, it is easier to
learn them from the start rather than trying to adapt an existing
project. You might need to refactor some parts of your existing code to
meet the quality standards.
Available checks
Currently, checklist handles two different types of
projects: R packages and non-package R projects. For both types of
projects, checklist provides a set of checks to validate
the quality of your code. You can either choose to run the entire set of
checks at once, or run individual checks.
We have a set of rules that are relevant for both packages and non-package projects.
-
check_spelling(): checks the spelling in R code and markdown files usinghunspell. More information invignette("spelling"). -
check_lintr(): runslintr::lint_dir()to check the code style. More information invignette("coding_style"). -
check_filename(): checks whether all file names meet the naming conventions. More information invignette("file_name"). -
check_folder(): checks whether the folder structure meets the conventions. More information invignette("file_name"). -
check_license(): checks whether a valid license is present. -
update_citation(): checks the citation metadata and update the citation files
On a package, you can run check_package() to run these
additional checks.
-
check_cran(): runsrcmdcheck::rcmdcheck()to check whether the package meets CRAN standards. -
check_description(): checks theDESCRIPTIONfile for common problems. -
check_documentation(): checks whether the documentation is up-to-date. -
check_codemeta(): checks whether thecodemeta.jsonfile is present and valid. - build the documentation website with
pkgdown - check the code coverage of the unit tests with
covr::package_coverage(). Adding more code should not (significantly) decrease the code coverage.
You can run these functions interactively on your machine. We
recommend to run the individual checks while developing your code to fix
problems as soon as they arise. Before pushing your code to GitHub,
always run check_package() or check_project().
You can also add these checks as GitHub actions, which
runs them automatically after every push to the repository
on GitHub. We recommend to set up these
checks as required checks on GitHub. This prevents merging code to the
main branch that does not meet the quality standards. In case of an R
package, merging a pull request on GitHub will update the pkgdown
website automatically too and creates a release of the new version.
Therefore you must increase the version of the package in the
DESCRIPTION file when starting a new branch.
Extensive report on checks
Both check_package() and check_project()
generate an extensive report on the checks that were performed. The
report contains the following sections:
- used organisation settings
- output of the individual checks
- extracted citation metadata
- list of files for spell checking, aggregated by language
- session info with loaded packages and their version
- any changes made to files during the checks
- a summary of errors, warnings and notes
Errors, warnings and notes
The report lists all errors, warnings and notes that were found
during the checks. You can use this report to fix the problems in your
code. After fixing the problems, you can run the checks again to see
whether all problems are solved. You must fix all errors in order to
pass check_package() or check_project(). You
should try to fix as many warnings and notes as possible too. Some
warnings and notes might be acceptable in certain situations. You can
document these exceptions via write_checklist() in the
checklist.yml file. The report can have three categories of
warnings and notes: “new”, “allowed” and “missing”.
-
New warnings and notes are those that were found
during the checks, but are not yet documented as allowed exceptions. You
should try to fix these problems. They result in failing
check_package()orcheck_project(). Allow them, when fixing is not possible at the moment. -
Allowed warnings and notes are those that were
found during the checks and are documented as allowed exceptions.
Allowing warnings and notes require a short motivation. This motivation
is included in the report and in the
checklist.ymlfile. They don’t result in failingcheck_package()orcheck_project(). -
Missing warnings and notes are those that were
documented as allowed exceptions, but are no longer found during the
checks. They result in failing
check_package()orcheck_project(). You should remove these exceptions from the documentation viawrite_checklist().
Changes
When you use version control like git,
checklist can detect which files were changed since the
last commit. This is useful because some checks rewrite an improved
version of a file. E.g. check_documentation() regenerates
the documentation files from the roxygen2 tags in the R
code. If you forgot to regenerate the documentation after changing the
code, check_documentation() will detect that the
documentation files are out-of-date. The checklist will
report this and show which changes took place.
Session info
Most problems are related to the R version or the versions of the packages that are used. To help you debug problems, the report contains the session info. This is especially important when checks pass locally but fail on GitHub Actions. When this occurs, you can compare the session info of your local machine with the session info on GitHub Actions to find out which package versions differ. Then use the same package versions locally to reproduce and fix the problem.
The GitHub Actions run on a Docker image based on rocker/verse:latest.
It should contain the latest R version and the latest versions of the
most common R packages. Any other packages that your code need are
installed on the fly.
Making code findable and citable
When you share your code with other people, you want them to be able
to find it easily. You also want them to be able to cite a specific
version of your code in a report or paper. checklist helps
you to achieve this by enforcing a strict usage of metadata either in
the DESCRIPTION (package) or README.md
(non-package). Because of the strict format of the person information,
we recommend to add most people when creating the project or package via
the interactive functions create_project() or
create_package(). The functions not only add the people to
the metadata, but also store their information for later reuse.
You can define a list of default organisations in your organisations’
checklist repository. This ensures that everyone in the
organisation uses the standard names of these organisations. Based on
matching e-mail domains, checklist enforces that persons
use their official organisation name as affiliation. More information in
vignette("organisation").
Integration with Zenodo and ORCID
When you push your code to GitHub, you can use the GitHub-Zenodo
integration to create a DOI for every release of your code.
checklist automatically adds the required metadata to a
.zenodo.json in your repository so that Zenodo can create a
proper citation for your code. When authors add their ORCID to their
person information, this ORCID is included in the Zenodo metadata too.
Then every release of your code can automatically flow into you ORCID
profile. Which makes it easier to maintain an up-to-date list of your
publications. And your organisation can import the publications of all
its researchers based on their ORCID. More information in
vignette("zenodo").
Bundling your code in a package
Most users think of an R package as a collection of generic functions that they can use to run their analysis. However, an R package is a useful way to bundle and document a stand-alone analysis too! Suppose you want to pass your code to a collaborator or your future self who is working on a different computer. If you have a project folder with a bunch of files, people will need to get to know your project structure, find out what scripts to run and which dependencies they need. Unless you documented everything well they (including your future self!) will have a hard time figuring out how things work.
Having the analysis as a package and running
check_package() to ensure a minimal quality standard, makes
things a lot easier for the user. Agreed, it will take a bit more time
to create the analysis, especially with the first few projects. In the
long run you save time due to a better quality of your code. Try to
start by packaging a recurrent analysis or standardised report when you
want to learn writing a package. Once you have some experience, it is
little overhead to do it for smaller analysis. Keep in mind that you
seldom run an analysis exactly once.
Benefits of using a package for your analysis
- The package itself is a way to cite (a specific version of) the analysis in a report or paper.
- You have to list all dependencies on other R packages. This makes
installing your code as simple as running
remotes::install_github("inbo/packagename"). - You must split your analysis in a set of functions. Say goodbye to unreadable scripts with thousands lines of code.
- Functions make it easy to re-use code. Need to run the same thing with a different parameter value? Add the parameter as an argument to the function and run the function once for every different parameter value. This avoids the need to copy-paste large chunks of scripts and replace a few values. The copy-paste work flow typically results in hard-to-read long scripts. Imagine you made a mistake in the code and copy-pasted that mistake several times before you found it. You have to check your entire project to fix the mistake several times. Having it as a function reduces the workload to fixing only the function.
- Packages require that every object is either defined within the package or imported from another package. Global variables are not allowed. The user only needs to load your package and run the function with the required arguments. The results will not depend on any other packages loaded nor by user-defined objects like vectors or dataframes (unless the user passes them explicitly as arguments to a function).
- A package gives the opportunity to add documentation to your code. Afterwards you can simply consult this documentation rather than having to dig into your code to find out what it is actually doing. Every function needs at least a title and an entry for every argument.
- Most likely you would still need a short script that combines a few
high level functions of your package to run the analysis. The
instfolder is an ideal place to bundle such scripts within the package. You can also use it to store small (!) datasets or rmarkdown reports.