vignettes/gbif_name_match.Rmd
gbif_name_match.Rmd
Working with different partners/institutes/researchers results in a
diversity of taxonomic names to define species. This hardens comparison
amongst datasets, as in many occasions, aggregation is aimed for or
filtering on specific species. By translating all species names to a
common taxonomic backbone (ensuring unique ID’s for each species name),
this can be done. The gbif_species_name_match
function
supports matching with the GBIF taxonomic backbone.
This function provides the functionality to add the species
information from the GBIF backbone to any data table
(data.frame
) by requesting this information via
the GBIF API. For each match, the corresponding accepted name is looked
for. Nevertheless there will always be errors and control is still
required!
The gbif_species_name_match
function extends the
matching function provided by rgbif to be compatible with
a data.frame
data structure.
Loading the functionality can be done by loading the
inborutils
package:
Consider the example data set species_example
:
knitr::kable(species_example)
speciesName | kingdom | euConcernStatus |
---|---|---|
Alopochen aegyptiaca | Animalia | under consideration |
Cotoneaster ganghobaensis | Plantae | NA |
Cotoneaster hylmoei | Plantae | NA |
To add the species information, using the scientificName
column, and the default fields:
my_data_update <- gbif_species_name_match(species_example,
name = "speciesName")
## [1] "All column names present"
## New names:
## • `kingdom` -> `kingdom...2`
## • `kingdom` -> `kingdom...10`
knitr::kable(my_data_update)
speciesName | kingdom…2 | euConcernStatus | usageKey | scientificName | rank | order | matchType | phylum | kingdom…10 | genus | class | confidence | synonym | status | family |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Alopochen aegyptiaca | Animalia | under consideration | 2498252 | Alopochen aegyptiaca (Linnaeus, 1766) | SPECIES | Anseriformes | EXACT | Chordata | Animalia | Alopochen | Aves | 99 | FALSE | ACCEPTED | Anatidae |
Cotoneaster ganghobaensis | Plantae | NA | 3025989 | Cotoneaster ganghobaensis J.Fryer & B.Hylmö | SPECIES | Rosales | EXACT | Tracheophyta | Plantae | Cotoneaster | Magnoliopsida | 99 | FALSE | ACCEPTED | Rosaceae |
Cotoneaster hylmoei | Plantae | NA | 3025758 | Cotoneaster hylmoei Flinck & J.Fryer | SPECIES | Rosales | EXACT | Tracheophyta | Plantae | Cotoneaster | Magnoliopsida | 98 | TRUE | SYNONYM | Rosaceae |
When not satisfied by the default fields provided
('usageKey','scientificName','rank','order','matchType','phylum', 'kingdom','genus', 'class','confidence', 'synonym', 'status','family')
,
you can alter these by the gbif_terms
argument, for
example:
gbif_terms_to_use <- c("canonicalName", "order")
my_data_update <- gbif_species_name_match(species_example,
name = "speciesName",
gbif_terms = gbif_terms_to_use)
## [1] "All column names present"
knitr::kable(my_data_update)
speciesName | kingdom | euConcernStatus | canonicalName | order |
---|---|---|---|---|
Alopochen aegyptiaca | Animalia | under consideration | Alopochen aegyptiaca | Anseriformes |
Cotoneaster ganghobaensis | Plantae | NA | Cotoneaster ganghobaensis | Rosales |
Cotoneaster hylmoei | Plantae | NA | Cotoneaster hylmoei | Rosales |
If the name of a GBIF field is already in use as column name in your
data.frame, the suffix number 1
is added and a warning is
returned. For example:
df <- species_example
names(df) <- c("scientificName", names(species_example)[2:3])
gbif_terms_to_use <- c("scientificName", "order")
my_data_update <- gbif_species_name_match(df,
name = "scientificName",
gbif_terms = gbif_terms_to_use)
## [1] "All column names present"
## Warning in gbif_species_name_match(df, name = "scientificName", gbif_terms =
## gbif_terms_to_use): Column with names 'scientificName' is also one of the
## returned gbif_terms. GBIF column name is authomatically recalled
## 'scientificName1'.
knitr::kable(my_data_update)
scientificName…1 | kingdom | euConcernStatus | scientificName…4 | order |
---|---|---|---|---|
Alopochen aegyptiaca | Animalia | under consideration | Alopochen aegyptiaca (Linnaeus, 1766) | Anseriformes |
Cotoneaster ganghobaensis | Plantae | NA | Cotoneaster ganghobaensis J.Fryer & B.Hylmö | Rosales |
Cotoneaster hylmoei | Plantae | NA | Cotoneaster hylmoei Flinck & J.Fryer | Rosales |
Sometimes, a scientific name can occur in different kingdoms, the
so-called hemihomonyms. To avoid a taxon being misidentified,
it is then sometimes useful to specify kingdom it belongs to. You could
also add other taxonomic related parameters such as rank, family or
genus. It is also possible to pass other not taxonomic related
parameters, e.g. strict
which allows more control on the
match behaviour. For more information about all parameters accepted by
GBIF, see documentation on GBIF
match.
my_data_update <- gbif_species_name_match(species_example,
name = "speciesName",
kingdom = "kingdom",
strict = TRUE)
## [1] "All column names present"
knitr::kable(my_data_update)
speciesName | kingdom…2 | euConcernStatus | usageKey | scientificName | rank | order | matchType | phylum | kingdom…10 | genus | class | confidence | synonym | status | family |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Alopochen aegyptiaca | Animalia | under consideration | 2498252 | Alopochen aegyptiaca (Linnaeus, 1766) | SPECIES | Anseriformes | EXACT | Chordata | Animalia | Alopochen | Aves | 100 | FALSE | ACCEPTED | Anatidae |
Cotoneaster ganghobaensis | Plantae | NA | 3025989 | Cotoneaster ganghobaensis J.Fryer & B.Hylmö | SPECIES | Rosales | EXACT | Tracheophyta | Plantae | Cotoneaster | Magnoliopsida | 100 | FALSE | ACCEPTED | Rosaceae |
Cotoneaster hylmoei | Plantae | NA | 3025758 | Cotoneaster hylmoei Flinck & J.Fryer | SPECIES | Rosales | EXACT | Tracheophyta | Plantae | Cotoneaster | Magnoliopsida | 100 | TRUE | SYNONYM | Rosaceae |