This document describes how we map the non-validated red list data for Flanders to Darwin Core. The source file for this document can be found here.

1 Setup

Load libraries:

# devtools::install_github(c("tazinho/snakecase"))

library(tidyverse)      # To do data science
library(magrittr)       # To use %<>% pipes
library(here)           # To find files
library(janitor)        # To clean input data
library(digest)         # To generate hashes
library(rgbif)          # To use GBIF services

2 Read source data

Create a data frame input_data from the source data:

input_data <- read_delim(here("data", "raw", "tblFlandersRedListsAll.tsv"), delim = "\t")

Filter on NonValidated taxa:

input_data %<>% filter(Validated == "NonValidated")

Number of records:

input_data %>% nrow()
## [1] 3161

3 Process source data

3.1 Tidy data

Clean data somewhat:

input_data %<>%
  remove_empty("rows") %>%    # Remove empty rows
  clean_names()               # Have sensible (lowercase) column names

3.2 Scientific names

The scientific names contain trailing spaces:

input_data %<>% mutate(
  speciesname_as_published = str_trim(speciesname_as_published),
  speciesname_unique = str_trim(speciesname_unique)
)

3.3 Taxon ranks

Use the GBIF nameparser to retrieve nomenclatural information for the scientific names in the checklist:

parsed_names <- input_data %>%
  distinct(speciesname_as_published) %>%
  pull() %>% # Create vector from dataframe
  parsenames() # An rgbif function

The nameparser function also provides information about the rank of the taxon (in rankmarker). Here we join this information with our checklist. Cleaning these ranks will done in the Taxon Core mapping:

input_data %<>% left_join(
  select(parsed_names, scientificname, rankmarker),
  by = c("speciesname_as_published" = "scientificname"))

3.4 References

Since the source data only includes codes for references, we load an additional file with more complete reference information:

references <- read_csv(here("data", "raw", "references.csv"))

Join source data with references:

input_data %<>% left_join(
  references,
  by = c("reference" = "reference", "taxonomic_group" = "taxonomic_group")
)

3.5 Preview data

Show the number of taxa per red list and taxonomic group:

Show the number of taxa per kingdom and rank:

Preview data:

input_data %>% head()

4 Taxon core

4.1 Pre-processing

taxon <- input_data

4.2 Term mapping

Map the data to Darwin Core Taxon.

4.2.1 language

taxon %<>% mutate(dwc_language = "en")

4.2.2 license

taxon %<>% mutate(dwc_license = "http://creativecommons.org/publicdomain/zero/1.0/")

4.2.3 rightsHolder

taxon %<>% mutate(dwc_rightsHolder = "INBO")

4.2.4 accessRights

taxon %<>% mutate(dwc_accessRights = "https://www.inbo.be/en/norms-data-use")

4.2.5 datasetID

taxon %<>% mutate(dwc_datasetID = "https://doi.org/10.15468/54nwog")

4.2.6 institutionCode

taxon %<>% mutate(dwc_institutionCode = "INBO")

4.2.7 datasetName

taxon %<>% mutate(dwc_datasetName = "Non-validated red lists of Flanders, Belgium")

4.2.8 taxonID

taxon %<>% mutate(dwc_taxonID = unique_id)

4.2.9 scientificName

taxon %<>% mutate(dwc_scientificName = speciesname_as_published)

4.2.10 kingdom

taxon %<>% mutate(dwc_kingdom = kingdom)

4.2.11 phylum

taxon %<>% mutate(dwc_phylum = phylum)

4.2.12 class

taxon %<>% mutate(dwc_class = class)

4.2.13 order

taxon %<>% mutate(dwc_order = order)

4.2.14 family

taxon %<>% mutate(dwc_family = family)

4.2.15 genus

taxon %<>% mutate(dwc_genus = genus)

4.2.16 taxonRank

Inspect values:

taxon %>%
  group_by(rankmarker) %>%
  count()

Map values by recoding to the GBIF rank vocabulary:

taxon %<>% mutate(dwc_taxonRank = recode(rankmarker,
  "sp." = "species",
  "infrasp." = "infraspecificname",
  "subsp." = "subspecies",
  "var." = "variety",
  .default = "",
  .missing = ""
))

Inspect mapped values:

taxon %>%
  group_by(rankmarker, dwc_taxonRank) %>%
  count()

4.2.17 vernacularName

taxon %<>% mutate(dwc_vernacularName = speciesname_dutch)

4.2.18 nomenclaturalCode

taxon %<>% mutate(dwc_nomenclaturalCode = case_when(
  kingdom == "Animalia" ~ "ICZN",
  kingdom == "Plantae" ~ "ICBN"
))

4.3 Post-processing

Only keep the Darwin Core columns:

taxon %<>% select(starts_with("dwc_"))

Drop the dwc_ prefix:

colnames(taxon) <- str_replace(colnames(taxon), "dwc_", "")

Sort on taxonID (to maintain some consistency between updates of the dataset):

taxon %<>% arrange(taxonID)

Preview data:

taxon %>% head()

Save to CSV:

write_csv(taxon, here("data", "processed", "nonvalidated", "taxon.csv"), na = "")

5 Distribution extension

5.1 Pre-processing

distribution <- input_data

5.2 Term mapping

Map the data to Species Distribution.

5.2.1 taxonID

distribution %<>% mutate(dwc_taxonID = unique_id)

5.2.2 locationID

distribution %<>% mutate(dwc_locationID = "ISO_3166:BE-VLG")

5.2.3 locality

distribution %<>% mutate(dwc_locality = "Flanders")

5.2.4 countryCode

distribution %<>% mutate(dwc_countryCode = "BE")

5.2.5 occurrenceStatus

Set to absent for regionally extent species, otherwise present:

distribution %<>% mutate(dwc_occurrenceStatus = recode(rlc, 
  "RE" = "absent",
  .default = "present",
  .missing = "present"
))

5.2.6 threatStatus

There are two red list category columns:

distribution %>%
  group_by(rlc, rlc_as_published) %>%
  count()

These will be mapped as follows:

  • rlcthreatStatus: IUCN equivalent of Flemish status and according to expected vocabulary.
  • rlc_as_publishedoccurrenceRemarks: Flemish status as originally published in red list. Not according to vocabulary, but important include.
distribution %<>% mutate(dwc_threatStatus = rlc)

5.2.7 establishmentMeans

distribution %<>% mutate(dwc_establishmentMeans = "native")

5.2.8 eventDate

distribution %<>% mutate(dwc_eventDate = year_published)

5.2.9 source

The source for the distribution information is the red list:

distribution %<>% mutate(dwc_source = source_red_list)

5.2.10 occurrenceRemarks

distribution %<>% mutate(dwc_occurrenceRemarks = rlc_as_published)

5.3 Post-processing

Only keep the Darwin Core columns:

distribution %<>% select(starts_with("dwc_"))

Drop the dwc_ prefix:

colnames(distribution) <- str_replace(colnames(distribution), "dwc_", "")

Sort on taxonID:

distribution %<>% arrange(taxonID)

Preview data:

distribution %>% head()

Save to CSV:

write_csv(distribution, here("data", "processed", "nonvalidated", "distribution.csv"), na = "" )