4 Map user data
4.1 Read temporary user data
We start from the user temporary data saved in TSV file users.tsv
in folder data\interim
:
users <- read_tsv(here::here("data", "interim", "users.tsv"),
col_types = cols(
.default = col_character(),
Nummer = col_double(),
Wachtwoord = col_logical()
)
)
Number of users:
## [1] 2039
4.2 Map user data
We map the original fields to SOVON fields denoted by prefix sovon_
.
4.2.1 E-mail
## mutate: new variable 'sovon_user_email' (character) with 1,211 unique values and 40% NA
4.2.2 First name
## mutate: new variable 'sovon_user_first_name' (character) with 897 unique values and 1% NA
4.2.3 Last name
## mutate: new variable 'sovon_user_last_name' (character) with 1,868 unique values and 0% NA
4.2.4 Address
## mutate: new variable 'sovon_user_address' (character) with 196 unique values and 90% NA
4.2.5 Place
## mutate: new variable 'sovon_user_place' (character) with 174 unique values and 90% NA
4.2.6 Postal code
## mutate: new variable 'sovon_user_postal_code' (character) with 175 unique values and 91% NA
4.2.7 Country
Countries present:
## distinct: removed 2,026 rows (99%), 13 rows remaining
## mutate: new variable 'sovon_user_country' (character) with 13 unique values and 66% NA
4.2.8 User ID
User identifiers are provided by SOVON. NA
is given.
## mutate: new variable 'sovon_user_id' (logical) with one unique value and 100% NA
4.2.9 User reference
We use the unique ID in Nummer
:
## mutate: new variable 'sovon_user_reference' (double) with 2,039 unique values and 0% NA
4.3 Save user data
Export the SOVON fields to crbirding_users
:
## select: dropped 13 variables (Nummer, Familienaam, Voornaam, Adres, Postcode, …)
Remove prefix sovon_
:
The desired order of columns in crbirding_users
:
cr_users_cols <- c(
"user_id", "user_reference", "user_email", "user_first_name",
"user_last_name", "user_address", "user_postal_code", "user_place",
"user_country", "user_language", "user_role"
)
Fields still not mapped:
## [1] "user_role"
The field user_role
cannot be filled at the moment: it will be mapped at the end of the next chapter.
Set column order:
## select: columns reordered (user_id, user_reference, user_email, user_first_name, user_last_name, …)
Preview data (e-mail, first and last names removed for privacy reasons):
## select: dropped 3 variables (user_email, user_first_name, user_last_name)
Save to csv file crbirding_users.csv
in ./data/processed/
as asked by SOVON: