For a practical recipe to setup
n2khab_data, go to Getting started.
vignette("v022_example") if you’d like to be guided by a hands-on example.
Apart from several textual datasets, provided directly with this package, other N2KHAB data sources 1 are binary or large data. Those are made available through cloud-based infrastructure, preserved for the future at least via Zenodo (see below).2 An overview of data distribution pathways is given here.
Zenodo is a scientific repository funded by the European Commission and hosted at CERN.
Data sources evolve, and hence, data source versions succeed one another. To ease reproducibility of analytical workflows, this package assumes locally stored data sources.
n2khab functions, aimed at reading these data and returning them in R in some kind of standardized way, always provide arguments to specify the file’s name and location – so you can in fact freely choose these. However, to ease collaboration in scripting, it is highly recommended to follow the below standard locations and filenames (see: Getting started). Moreover, the functions assume these conventions by default in order to make your life easier!
There is a major distinction between:
n2khab_data/20_processed. These data sources have been derived from the raw data sources, but are distributed on their own because of the time-consuming or intricate calculations needed to reproduce them.
You can reproduce the processed data sources from a shell script on Github, but it will take hours.
As you see, when storing these binary or large data, we avoid using a folder named as
n2khab_dataname is better fit when the folder does not sit inside one project or repository (see further) but instead delivers to several projects / repositories.
datafolder with locally generated or extra needed input data, part or all of which is to be version-controlled, and which may use its own substructure.
n2khab_datashould always be ignored by version control systems.
n2khabfunctions to automatically detect the right location when using a more special name.
Mind that, if you store the
n2khab_data folder inside a version controlled repository (e.g. using git), it must be ignored by version control!
Decide where you want to store the
n2khab_datafolder can be put inside the project / repository folder. This approach has the advantage that you can store versions of data sources different from those in another repository (where you also have an
For the functions to succeed in finding the
n2khab_data folder in each collaborator’s file system, make sure that the folder is present either in the working directory of your R scripts or in a path 1 up to 10 levels above this working directory. By default, the functions search the folder in that order and use the first encountered
n2khab_data folder. (Otherwise, you would need to actively set the path to the data folder with the
path argument in each function call.)
From your working directory, use
fileman_folders() to specify the desired location (using the function’s arguments). It will check the existence of the folders
n2khab_data/20_processed and create them if they don’t exist.
fileman_folders(root = "rproj") #> Created <clipped_path_prefix>/n2khab_data #> Created subfolder 10_raw #> Created subfolder 20_processed #>  "<clipped_path_prefix>/n2khab_data"
From the cloud storage (links: raw data | processed data), download the respective data files of a data source. You can also use the function
download_zenodo() to do that, using the DOI of each data source version. For each data source, put its file(s) in an appropriate subfolder either below
n2khab_data/20_processed (depending on the data source). Use the data source’s default name for the subfolder. You get a list of the data source names with XXX. These names are version-agnostic! The names of the
n2khab ‘read’ function and their documentation make clear which data sources you will need.
Below is an example of correctly organised N2KHAB data folders:
n2khab_data ├── 10_raw │ ├── habitatmap -> contains habitatmat.shp, habitatmap.dbf etc. │ ├── soilmap │ └── GRTSmaster_habitats └── 20_processed ├── habitatmap_stdized └── GRTSmh_diffres
N2KHAB data sources are a list of public, standard data sources, important to analytical workflows concerning Natura 2000 (n2k) habitats (hab) in Flanders. They are in a public repository in order to be easily findable and to be preserved in a durable way.↩︎