This document describes how we map the validated red list data for Flanders to Darwin Core. The source file for this document can be found here.

1 Setup

Load libraries:

# devtools::install_github(c("tazinho/snakecase"))

library(tidyverse)      # To do data science
library(magrittr)       # To use %<>% pipes
library(here)           # To find files
library(janitor)        # To clean input data
library(digest)         # To generate hashes
library(rgbif)          # To use GBIF services
library(snakecase)      # To convert case of descriptions

2 Read source data

Create a data frame input_data from the source data:

input_data <- read_delim(here("data", "raw", "tblFlandersRedListsAll.tsv"), delim = "\t")

Filter on Validated taxa:

input_data %<>% filter(Validated == "Validated")

Number of records:

input_data %>% nrow()

## [1] 3063

3 Process source data

3.1 Tidy data

Clean data somewhat:

input_data %<>%
  remove_empty("rows") %>%    # Remove empty rows
  clean_names()               # Have sensible (lowercase) column names

3.2 Scientific names

The scientific names contain trailing spaces:

input_data %<>% mutate(
  speciesname_as_published = str_trim(speciesname_as_published),
  speciesname_unique = str_trim(speciesname_unique)
)

3.3 Taxon ranks

Use the GBIF nameparser to retrieve nomenclatural information for the scientific names in the checklist:

parsed_names <- input_data %>%
  distinct(speciesname_as_published) %>%
  pull() %>% # Create vector from dataframe
  parsenames() # An rgbif function

The nameparser function also provides information about the rank of the taxon (in rankmarker). Here we join this information with our checklist. Cleaning these ranks will done in the Taxon Core mapping:

input_data %<>% left_join(
  select(parsed_names, scientificname, rankmarker),
  by = c("speciesname_as_published" = "scientificname")
)

3.4 References

Since the source data only includes codes for references, we load an additional file with more complete reference information:

references <- read_csv(here("data", "raw", "references.csv"))

Join source data with references:

input_data %<>% left_join(
  references,
  by = c("reference" = "reference", "taxonomic_group" = "taxonomic_group")
)

3.5 Preview data

Show the number of taxa per red list and taxonomic group:

Show the number of taxa per kingdom and rank:

Preview data:

input_data %>% head()

4 Taxon core

4.1 Pre-processing

taxon <- input_data

4.2 Term mapping

Map the data to Darwin Core Taxon.

4.2.1 language

taxon %<>% mutate(dwc_language = "en")

4.2.2 license

taxon %<>% mutate(dwc_license = "http://creativecommons.org/publicdomain/zero/1.0/")

4.2.3 rightsHolder

taxon %<>% mutate(dwc_rightsHolder = "INBO")

4.2.4 accessRights

taxon %<>% mutate(dwc_accessRights = "https://www.inbo.be/en/norms-data-use")

4.2.5 datasetID

taxon %<>% mutate(dwc_datasetID = "https://doi.org/10.15468/8tk3tk")

4.2.6 institutionCode

taxon %<>% mutate(dwc_institutionCode = "INBO")

4.2.7 datasetName

taxon %<>% mutate(dwc_datasetName = "Validated Red Lists of Flanders, Belgium")

4.2.8 taxonID

taxon %<>% mutate(dwc_taxonID = unique_id)

4.2.9 scientificName

Use the name as originally published on the checklist:

taxon %<>% mutate(dwc_scientificName = speciesname_as_published)

4.2.10 kingdom

taxon %<>% mutate(dwc_kingdom = kingdom)

4.2.11 phylum

taxon %<>% mutate(dwc_phylum = phylum)

4.2.12 class

taxon %<>% mutate(dwc_class = class)

4.2.13 order

taxon %<>% mutate(dwc_order = order)

4.2.14 family

taxon %<>% mutate(dwc_family = family)

4.2.15 genus

taxon %<>% mutate(dwc_genus = genus)

4.2.16 taxonRank

Inspect values:

taxon %>%
  group_by(rankmarker) %>%
  count()

Map values by recoding to the GBIF rank vocabulary:

taxon %<>% mutate(dwc_taxonRank = recode(rankmarker,
  "sp." = "species",
  "infrasp." = "infraspecificname",
  "subsp." = "subspecies",
  "var." = "variety",
  .default = "",
  .missing = ""
))

Inspect mapped values:

taxon %>%
  group_by(rankmarker, dwc_taxonRank) %>%
  count()

4.2.17 vernacularName

taxon %<>% mutate(dwc_vernacularName = speciesname_dutch)

4.2.18 nomenclaturalCode

taxon %<>% mutate(dwc_nomenclaturalCode = case_when(
  kingdom == "Animalia" ~ "ICZN",
  kingdom == "Plantae" ~ "ICBN"
))

4.3 Post-processing

Only keep the Darwin Core columns:

taxon %<>% select(starts_with("dwc_"))

Drop the dwc_ prefix:

colnames(taxon) <- str_replace(colnames(taxon), "dwc_", "")

Sort on taxonID (to maintain some consistency between updates of the dataset):

taxon %<>% arrange(taxonID)

Preview data:

taxon %>% head()

Save to CSV:

write_csv(taxon, here("data", "processed", "validated", "taxon.csv"), na = "")

5 Distribution extension

5.1 Pre-processing

distribution <- input_data

5.2 Term mapping

Map the data to Species Distribution.

5.2.1 taxonID

distribution %<>% mutate(dwc_taxonID = unique_id)

5.2.2 locationID

distribution %<>% mutate(dwc_locationID = "ISO_3166:BE-VLG")

5.2.3 locality

distribution %<>% mutate(dwc_locality = "Flanders")

5.2.4 countryCode

distribution %<>% mutate(dwc_countryCode = "BE")

5.2.5 occurrenceStatus

Set to absent for regionally extent species, otherwise present:

distribution %<>% mutate(dwc_occurrenceStatus = recode(rlc, 
  "RE" = "absent",
  .default = "present",
  .missing = "present"
))

5.2.6 threatStatus

There are two red list category columns:

distribution %>%
  group_by(rlc, rlc_as_published) %>%
  count()

This will be mapped as follows:

rlc → threatStatus: IUCN equivalent of Flemish status and according to expected vocabulary.
rlc_as_published → occurrenceRemarks: Flemish status as originally published in red list. Not according to vocabulary, but important include.

distribution %<>% mutate(dwc_threatStatus = rlc)

5.2.7 establishmentMeans

distribution %<>% mutate(dwc_establishmentMeans = case_when(
  rlc_as_published == "Niet-inheemse broedvogel" ~ "introduced",
  TRUE ~ "native"
))

5.2.8 eventDate

distribution %<>% mutate(dwc_eventDate = year_published)

5.2.9 source

The source for the distribution information is the red list:

distribution %<>% mutate(dwc_source = source_red_list)

5.2.10 occurrenceRemarks

distribution %<>% mutate(dwc_occurrenceRemarks = rlc_as_published)

5.3 Post-processing

Only keep the Darwin Core columns:

distribution %<>% select(starts_with("dwc_"))

Drop the dwc_ prefix:

colnames(distribution) <- str_replace(colnames(distribution), "dwc_", "")

Sort on taxonID:

distribution %<>% arrange(taxonID)

Preview data:

distribution %>% head()

Save to CSV:

write_csv(distribution, here("data", "processed", "validated", "distribution.csv"), na = "" )

6 Description extension

6.1 Pre-processing

description_ext <- input_data

Gather description columns to rows:

description_ext %<>% gather(
  key = type, value = description,
  biome, biotope1, biotope2, lifespan, cuddliness, mobility, spine, nutrient_level,
  na.rm = TRUE
)

Rename biotope1 and biotope2 to biotope:

description_ext %<>% mutate(type = recode(type,
  "biotope1" = "biotope",
  "biotope2" = "biotope"
))

Inspect values:

description_ext %>%
  select(type, description) %>%
  group_by(type, description) %>%
  count()

Convert descriptions from CamelCase to lower case:

description_ext %<>% mutate(
  clean_description = str_to_lower(to_sentence_case(description)))

Inspect mapped values:

description_ext %>%
  group_by(description, clean_description) %>%
  count()

6.2 Term mapping

Map the data to Taxon Description.

6.2.1 taxonID

description_ext %<>% mutate(dwc_taxonID = unique_id)

6.2.2 description

description_ext %<>% mutate(dwc_description = clean_description)

6.2.3 type

description_ext %<>% mutate(dwc_type = case_when(
  type == "nutrient_level" ~ "nutrient level",
  TRUE ~ type
))

6.2.4 source

The source for the life-history traits is not the red list, but a separate source:

description_ext %<>% mutate(dwc_source = source_for_traits)

6.2.5 language

description_ext %<>% mutate(dwc_language = "en")

6.3 Post-processing

Only keep the Darwin Core columns:

description_ext %<>% select(starts_with("dwc_"))

Drop the dwc_ prefix:

colnames(description_ext) <- str_replace(colnames(description_ext), "dwc_", "")

Sort on taxonID:

description_ext %<>% arrange(taxonID)

Preview data:

description_ext %>% head()

Save to CSV:

write_csv(description_ext, here("data", "processed", "validated", "description.csv"), na = "" )

Darwin Core mapping

For: Validated red lists of Flanders, Belgium

Dimitri Brosens

Peter Desmet

Lien Reyserhove

Dirk Maes

2020-02-07

1 Setup

2 Read source data

3 Process source data

3.1 Tidy data

3.2 Scientific names

3.3 Taxon ranks

3.4 References

3.5 Preview data

4 Taxon core

4.1 Pre-processing

4.2 Term mapping

4.2.1 language

4.2.2 license

4.2.3 rightsHolder

4.2.4 accessRights

4.2.5 datasetID

4.2.6 institutionCode

4.2.7 datasetName

4.2.8 taxonID

4.2.9 scientificName

4.2.10 kingdom

4.2.11 phylum

4.2.12 class

4.2.13 order

4.2.14 family

4.2.15 genus

4.2.16 taxonRank

4.2.17 vernacularName

4.2.18 nomenclaturalCode

4.3 Post-processing

5 Distribution extension

5.1 Pre-processing

5.2 Term mapping

5.2.1 taxonID

5.2.2 locationID

5.2.3 locality

5.2.4 countryCode

5.2.5 occurrenceStatus

5.2.6 threatStatus

5.2.7 establishmentMeans

5.2.8 eventDate

5.2.9 source

5.2.10 occurrenceRemarks

5.3 Post-processing

6 Description extension

6.1 Pre-processing

6.2 Term mapping

6.2.1 taxonID

6.2.2 description

6.2.3 type

6.2.4 source

6.2.5 language

6.3 Post-processing