4 Map user data

4.1 Read temporary user data

We start from the user temporary data saved in TSV file users.tsv in folder data\interim:

users <- read_tsv(here::here("data", "interim", "users.tsv"),
  col_types = cols(
    .default = col_character(),
    Nummer = col_double(),
    Wachtwoord = col_logical()
  )
)

Number of users:

nrow(users)
## [1] 2039

4.2 Map user data

We map the original fields to SOVON fields denoted by prefix sovon_.

4.2.1 E-mail

users <-
  users %>%
  mutate(sovon_user_email = Email)
## mutate: new variable 'sovon_user_email' (character) with 1,211 unique values and 40% NA

4.2.2 First name

users <-
  users %>%
  mutate(sovon_user_first_name = Voornaam)
## mutate: new variable 'sovon_user_first_name' (character) with 897 unique values and 1% NA

4.2.3 Last name

users <-
  users %>%
  mutate(sovon_user_last_name = Familienaam)
## mutate: new variable 'sovon_user_last_name' (character) with 1,868 unique values and 0% NA

4.2.4 Address

users <-
  users %>%
  mutate(sovon_user_address = Adres)
## mutate: new variable 'sovon_user_address' (character) with 196 unique values and 90% NA

4.2.5 Place

users <-
  users %>%
  mutate(sovon_user_place = Gemeente)
## mutate: new variable 'sovon_user_place' (character) with 174 unique values and 90% NA

4.2.6 Postal code

users <-
  users %>%
  mutate(sovon_user_postal_code = Postcode)
## mutate: new variable 'sovon_user_postal_code' (character) with 175 unique values and 91% NA

4.2.7 Country

Countries present:

users %>%
  distinct(LandCode)
## distinct: removed 2,026 rows (99%), 13 rows remaining
users <-
  users %>%
  mutate(sovon_user_country = LandCode)
## mutate: new variable 'sovon_user_country' (character) with 13 unique values and 66% NA

4.2.8 User ID

User identifiers are provided by SOVON. NA is given.

users <-
  users %>%
  mutate(sovon_user_id = NA)
## mutate: new variable 'sovon_user_id' (logical) with one unique value and 100% NA

4.2.9 User reference

We use the unique ID in Nummer:

users <-
  users %>%
  mutate(sovon_user_reference = Nummer)
## mutate: new variable 'sovon_user_reference' (double) with 2,039 unique values and 0% NA

4.2.10 User language

This field is not present in users. We leave it empty:

users <-
  users %>%
  mutate(sovon_user_language = NA_character_)
## mutate: new variable 'sovon_user_language' (character) with one unique value and 100% NA

4.3 Save user data

Export the SOVON fields to crbirding_users:

crbirding_users <-
  users %>%
  select(starts_with("sovon_"))
## select: dropped 13 variables (Nummer, Familienaam, Voornaam, Adres, Postcode, …)

Remove prefix sovon_:

names(crbirding_users) <- str_remove(
  names(crbirding_users),
  pattern = "sovon_"
)

The desired order of columns in crbirding_users:

cr_users_cols <- c(
  "user_id", "user_reference", "user_email", "user_first_name",
  "user_last_name", "user_address", "user_postal_code", "user_place",
  "user_country", "user_language", "user_role"
)

Fields still not mapped:

cr_users_cols[which(!cr_users_cols %in% names(crbirding_users))]
## [1] "user_role"

The field user_role cannot be filled at the moment: it will be mapped at the end of the next chapter.

Set column order:

crbirding_users <-
  crbirding_users %>%
  select(cr_users_cols[cr_users_cols != "user_role"])
## select: columns reordered (user_id, user_reference, user_email, user_first_name, user_last_name, …)

Preview data (e-mail, first and last names removed for privacy reasons):

crbirding_users %>%
  select(-c(user_email, user_first_name, user_last_name)) %>%
  head(n = 10)
## select: dropped 3 variables (user_email, user_first_name, user_last_name)

Save to csv file crbirding_users.csv in ./data/processed/ as asked by SOVON:

crbirding_users %>%
  write_csv(
    path = here::here("data", "processed", "crbirding_users.csv"),
    na = ""
  )