vignettes/gbif_name_match.Rmd
gbif_name_match.Rmd
Working with different partners/institutes/researchers results in a diversity of taxonomic names to define species. This hardens comparison amongst datasets, as in many occasions, aggregation is aimed for or filtering on specific species. By translating all species names to a common taxonomic backbone (ensuring unique ID’s for each species name), this can be done. The gbif_species_name_match
function supports matching with the GBIF taxonomic backbone.
This function provides the functionality to add the species information from the GBIF backbone to any data table (data.frame
) by requesting this information via the GBIF API. For each match, the corresponding accepted name is looked for. Nevertheless there will always be errors and control is still required!
The gbif_species_name_match
function extends the matching function provided by rgbif to be compatible with a data.frame
data structure.
Loading the functionality can be done by loading the inborutils
package:
Consider the example data set species_example
:
knitr::kable(species_example)
speciesName | kingdom | euConcernStatus |
---|---|---|
Alopochen aegyptiaca | Animalia | under consideration |
Cotoneaster ganghobaensis | Plantae | NA |
Cotoneaster hylmoei | Plantae | NA |
To add the species information, using the scientificName
column, and the default fields:
my_data_update <- gbif_species_name_match(species_example,
name = "speciesName")
## [1] "All column names present"
## New names:
## • `kingdom` -> `kingdom...2`
## • `kingdom` -> `kingdom...10`
knitr::kable(my_data_update)
speciesName | kingdom…2 | euConcernStatus | usageKey | scientificName | rank | order | matchType | phylum | kingdom…10 | genus | class | confidence | synonym | status | family |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Alopochen aegyptiaca | Animalia | under consideration | 2498252 | Alopochen aegyptiaca (Linnaeus, 1766) | SPECIES | Anseriformes | EXACT | Chordata | Animalia | Alopochen | Aves | 99 | FALSE | ACCEPTED | Anatidae |
Cotoneaster ganghobaensis | Plantae | NA | 3025989 | Cotoneaster ganghobaensis J.Fryer & B.Hylmö | SPECIES | Rosales | EXACT | Tracheophyta | Plantae | Cotoneaster | Magnoliopsida | 99 | FALSE | ACCEPTED | Rosaceae |
Cotoneaster hylmoei | Plantae | NA | 3025758 | Cotoneaster hylmoei Flinck & J.Fryer | SPECIES | Rosales | EXACT | Tracheophyta | Plantae | Cotoneaster | Magnoliopsida | 99 | FALSE | ACCEPTED | Rosaceae |
When not satisfied by the default fields provided ´(‘usageKey’,‘scientificName’,‘rank’,‘order’,‘matchType’,‘phylum’, ‘kingdom’,‘genus’, ‘class’,‘confidence’, ‘synonym’, ‘status’,‘family’)´, you can alter these by the gbif_terms
argument, for example:
gbif_terms_to_use <- c('canonicalName', 'order')
my_data_update <- gbif_species_name_match(species_example,
name = "speciesName" ,
gbif_terms = gbif_terms_to_use)
## [1] "All column names present"
knitr::kable(my_data_update)
speciesName | kingdom | euConcernStatus | canonicalName | order |
---|---|---|---|---|
Alopochen aegyptiaca | Animalia | under consideration | Alopochen aegyptiaca | Anseriformes |
Cotoneaster ganghobaensis | Plantae | NA | Cotoneaster ganghobaensis | Rosales |
Cotoneaster hylmoei | Plantae | NA | Cotoneaster hylmoei | Rosales |
If the name of a GBIF field is already in use as column name in your data.frame, the suffix number 1
is added and a warning is returned. For example:
df <- species_example
names(df) <- c("scientificName", names(species_example)[2:3])
gbif_terms_to_use <- c('scientificName', 'order')
my_data_update <- gbif_species_name_match(df,
name = "scientificName" ,
gbif_terms = gbif_terms_to_use)
## [1] "All column names present"
## Warning in gbif_species_name_match(df, name = "scientificName", gbif_terms
## = gbif_terms_to_use): Column with names 'scientificName' is also one
## of the returned gbif_terms. GBIF column name is authomatically recalled
## 'scientificName1'.
knitr::kable(my_data_update)
scientificName…1 | kingdom | euConcernStatus | scientificName…4 | order |
---|---|---|---|---|
Alopochen aegyptiaca | Animalia | under consideration | Alopochen aegyptiaca (Linnaeus, 1766) | Anseriformes |
Cotoneaster ganghobaensis | Plantae | NA | Cotoneaster ganghobaensis J.Fryer & B.Hylmö | Rosales |
Cotoneaster hylmoei | Plantae | NA | Cotoneaster hylmoei Flinck & J.Fryer | Rosales |
Sometimes, a scientific name can occur in different kingdoms, the so-called hemihomonyms. To avoid a taxon being misidentified, it is then sometimes useful to specify kingdom it belongs to. You could also add other taxonomic related parameters such as rank, family or genus. It is also possible to pass other not taxonomic related parameters, e.g. strict
which allows more control on the match behavior. For more information about all parameters accepted by GBIF, see documentation on GBIF match.
my_data_update <- gbif_species_name_match(species_example,
name = "speciesName",
kingdom = "kingdom",
strict = TRUE)
## [1] "All column names present"
knitr::kable(my_data_update)
speciesName | kingdom…2 | euConcernStatus | usageKey | scientificName | rank | order | matchType | phylum | kingdom…10 | genus | class | confidence | synonym | status | family |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Alopochen aegyptiaca | Animalia | under consideration | 2498252 | Alopochen aegyptiaca (Linnaeus, 1766) | SPECIES | Anseriformes | EXACT | Chordata | Animalia | Alopochen | Aves | 100 | FALSE | ACCEPTED | Anatidae |
Cotoneaster ganghobaensis | Plantae | NA | 3025989 | Cotoneaster ganghobaensis J.Fryer & B.Hylmö | SPECIES | Rosales | EXACT | Tracheophyta | Plantae | Cotoneaster | Magnoliopsida | 100 | FALSE | ACCEPTED | Rosaceae |
Cotoneaster hylmoei | Plantae | NA | 3025758 | Cotoneaster hylmoei Flinck & J.Fryer | SPECIES | Rosales | EXACT | Tracheophyta | Plantae | Cotoneaster | Magnoliopsida | 100 | FALSE | ACCEPTED | Rosaceae |