Getting started: connecting and selecting locations • watina

General note: the below vignette contains frozen output of 14 Jan 2021. This makes it possible to build the package with vignettes without access to the Watina database.

Connecting

You first need a connection object to connect to Watina; let’s simply name it watina:

watina <- connect_watina()

Selecting locations

Selecting locations is a typical starting point. Afterwards, you can use the returned object as input to other functions.

By default, a tbl_lazy object is given (lazy query).¹ The returned data include some essential metadata of the locations. The following example shows a few rows:

get_locs(watina) %>% 
  head(5) %>% 
  collect %>% 
  as.data.frame
#>   loc_wid loc_code area_code area_name      x      y loc_validitycode
#> 1      45  AABP001       AAB    Aabeek 177604 209714              VLD
#> 2    6291  AABP002       AAB    Aabeek 177903 209459              VLD
#> 3      28  AABP003       AAB    Aabeek 178353 209155              VLD
#> 4      29  AABP004       AAB    Aabeek 179544 210259              VLD
#> 5      30  AABP005       AAB    Aabeek 179610 210173              VLD
#>   loc_validity loc_typecode        loc_typename obswell_statecode
#> 1  Gevalideerd            P Peilbuis/Piëzometer             CONFN
#> 2  Gevalideerd            P Peilbuis/Piëzometer             CONFN
#> 3  Gevalideerd            P Peilbuis/Piëzometer             CONFN
#> 4  Gevalideerd            P Peilbuis/Piëzometer             CONFN
#> 5  Gevalideerd            P Peilbuis/Piëzometer             CONFN
#>       obswell_state soilsurf_ost measuringref_ost tubelength filterlength
#> 1 OK - niet bemeten       10.136           10.656          4            2
#> 2 OK - niet bemeten        9.590           10.120          4            2
#> 3 OK - niet bemeten        9.660           10.150          4            2
#> 4 OK - niet bemeten       10.310           10.730          4            2
#> 5 OK - niet bemeten       10.480           10.950          4            2
#>   filterdepth
#> 1        2.48
#> 2        2.47
#> 3        2.51
#> 4        2.58
#> 5        2.53

Selections can be done in a variety of ways – and these ways can be combined:

by using area codes:

get_locs(watina,
         area_codes = c("KAL", "KBR"))
#> # Source:   lazy query [?? x 17]
#> # Database: Microsoft SQL Server
#>    loc_wid loc_code area_code area_name      x      y loc_validitycode
#>      <int> <chr>    <chr>     <chr>      <dbl>  <dbl> <chr>           
#>  1    2824 KALP022  KAL       Kalmthou… 155356 230811 VLD             
#>  2    1546 KALP030  KAL       Kalmthou… 152990 238600 VLD             
#>  3    6205 KALP032  KAL       Kalmthou… 159110 239010 VLD             
#>  4   10069 KALP034  KAL       Kalmthou… 152962 231345 VLD             
#>  5    1549 KALP035  KAL       Kalmthou… 153000 236850 VLD             
#>  6    1552 KALP039  KAL       Kalmthou… 160940 235060 VLD             
#>  7    4327 KALP050  KAL       Kalmthou… 153881 233802 VLD             
#>  8    6210 KALP051  KAL       Kalmthou… 153822 233761 VLD             
#>  9    1562 KALP052  KAL       Kalmthou… 153822 233761 VLD             
#> 10    1563 KALP053  KAL       Kalmthou… 153781 233736 VLD             
#> # … with more rows, and 10 more variables: loc_validity <chr>,
#> #   loc_typecode <chr>, loc_typename <chr>, obswell_statecode <chr>,
#> #   obswell_state <chr>, soilsurf_ost <dbl>, measuringref_ost <dbl>,
#> #   tubelength <dbl>, filterlength <dbl>, filterdepth <dbl>

The number of involved locations is given by:

get_locs(watina,
         area_codes = c("KAL", "KBR")) %>% 
  count
#> # Source:   lazy query [?? x 1]
#> # Database: Microsoft SQL Server
#>       n
#>   <int>
#> 1   207

you can specify the type of location (device):

get_locs(watina,
         area_codes = c("KAL", "KBR"),
         loc_type = c("P", "S")) %>% 
  count
#> # Source:   lazy query [?? x 1]
#> # Database: Microsoft SQL Server
#>       n
#>   <int>
#> 1   247

for groundwater piezometers: by setting conditions on filterdepth. The filterdepth is the depth of the piezometer’s filter below soil surface (unit: m), more specifically referring to the position halfway the length of the filter. Conditions can be set by the arguments filterdepth_range and filterdepth_na (see documentation of get_locs()). Moreover, by setting filterdepth_guess = TRUE (default FALSE) you can replace missing filterdepths by a conservative value, before applying these conditions. By default, only filterdepths between 0 and 3 meters below soilsurface are returned and locations with missing filterdepth are dropped.

Indeed, more records are returned when allowing groundwater piezometers with missing filterdepth:

get_locs(watina,
         area_codes = c("KAL", "KBR"),
         loc_type = c("P", "S"),
         filterdepth_na = TRUE) %>% 
  count
#> # Source:   lazy query [?? x 1]
#> # Database: Microsoft SQL Server
#>       n
#>   <int>
#> 1   289

Of course you can set further conditions for other variables, by building upon the lazy result of get_locs(), e.g.:

get_locs(watina,
         area_codes = c("KAL", "KBR"),
         loc_type = c("P", "S"),
         filterdepth_na = TRUE) %>%
  filter(obswell_state == "OK - niet bemeten") %>% 
  count
#> # Source:   lazy query [?? x 1]
#> # Database: Microsoft SQL Server
#>       n
#>   <int>
#> 1   157

by using a spatial bounding box (rectangular area), using the Belgian Lambert 72 coordinate reference system (EPSG-code 31370):

get_locs(watina,
         bbox = c(xmin = 1.4e+5,
                  xmax = 1.7e+5,
                  ymin = 1.6e+5,
                  ymax = 1.9e+5))
#> # Source:   lazy query [?? x 17]
#> # Database: Microsoft SQL Server
#>    loc_wid loc_code area_code area_name      x      y loc_validitycode
#>      <int> <chr>    <chr>     <chr>      <dbl>  <dbl> <chr>           
#>  1   10261 ATGP002  ATG       Antitank… 169572 186043 VLD             
#>  2     777 BABP012  BAB       Barebeek  160190 180988 VLD             
#>  3    5629 DORP016  DOR       Dorent    158521 186074 VLD             
#>  4    5630 DORP017  DOR       Dorent    158442 185925 VLD             
#>  5    5632 DORP019  DOR       Dorent    158304 184915 VLD             
#>  6    5634 DORP020  DOR       Dorent    158352 184259 VLD             
#>  7    5635 DORP021  DOR       Dorent    158103 183853 VLD             
#>  8    5586 DORP022  DOR       Dorent    157715 184013 VLD             
#>  9    5587 DORP023  DOR       Dorent    154284 182533 VLD             
#> 10    6420 DORP024  DOR       Dorent    154823 182431 VLD             
#> # … with more rows, and 10 more variables: loc_validity <chr>,
#> #   loc_typecode <chr>, loc_typename <chr>, obswell_statecode <chr>,
#> #   obswell_state <chr>, soilsurf_ost <dbl>, measuringref_ost <dbl>,
#> #   tubelength <dbl>, filterlength <dbl>, filterdepth <dbl>

if you already have specific location codes in mind, you can provide these directly as a vector:

get_locs(watina,
         loc_vec = c("KBRP081", "KBRP090", "KBRP095"))
#> # Source:   lazy query [?? x 17]
#> # Database: Microsoft SQL Server
#>   loc_wid loc_code area_code area_name      x      y loc_validitycode
#>     <int> <chr>    <chr>     <chr>      <dbl>  <dbl> <chr>           
#> 1     664 KBRP081  KBR       Polders … 145846 204871 VLD             
#> 2     698 KBRP090  KBR       Polders … 145988 204702 VLD             
#> 3     709 KBRP095  KBR       Polders … 145949 204712 VLD             
#> # … with 10 more variables: loc_validity <chr>, loc_typecode <chr>,
#> #   loc_typename <chr>, obswell_statecode <chr>, obswell_state <chr>,
#> #   soilsurf_ost <dbl>, measuringref_ost <dbl>, tubelength <dbl>,
#> #   filterlength <dbl>, filterdepth <dbl>

you can use a geospatial filter of class sf to make selections, by using the mask argument:

get_locs(watina,
         mask = westfl,  # sf polygon of the 'West-Vlaanderen' province
         buffer = 0)
#> As a mask always invokes a collect(), the argument 'collect = FALSE' will be ignored.
#> Warning in get_locs(watina, mask = westfl, buffer = 0): Dropped 44 locations from which x or y coordinates were missing.
#> Warning in warn_xy_duplicates(locs$x, locs$y): 28 different coordinate pairs occur more than once.
#> # A tibble: 905 x 16
#>    loc_code area_code area_name     x      y loc_validitycode loc_validity
#>  * <chr>    <chr>     <chr>     <dbl>  <dbl> <chr>            <chr>       
#>  1 ASSP001  ASS       Assebroe… 74324 208838 VLD              Gevalideerd 
#>  2 ASSP002  ASS       Assebroe… 74526 208787 VLD              Gevalideerd 
#>  3 ASSP003  ASS       Assebroe… 74169 208790 VLD              Gevalideerd 
#>  4 ASSP004  ASS       Assebroe… 74108 208605 VLD              Gevalideerd 
#>  5 ASSP005  ASS       Assebroe… 73910 208646 VLD              Gevalideerd 
#>  6 ASSP006  ASS       Assebroe… 74033 208916 VLD              Gevalideerd 
#>  7 ASSP007  ASS       Assebroe… 73609 208954 VLD              Gevalideerd 
#>  8 ASSP008  ASS       Assebroe… 73948 209104 VLD              Gevalideerd 
#>  9 ASSP009  ASS       Assebroe… 73613 208584 VLD              Gevalideerd 
#> 10 ASSP010  ASS       Assebroe… 73583 208390 VLD              Gevalideerd 
#> # … with 895 more rows, and 9 more variables: loc_typecode <chr>,
#> #   loc_typename <chr>, obswell_statecode <chr>, obswell_state <chr>,
#> #   soilsurf_ost <dbl>, measuringref_ost <dbl>, tubelength <dbl>,
#> #   filterlength <dbl>, filterdepth <dbl>

Note that the default value for buffer is 10 meters; it is used to enlarge mask (or shrink it, if buffer < 0). Also, note there are a few warnings to inform about dropped locations and duplicated coordinates.

Instead of leaving the output as a tbl_lazy object, you can also retrieve it as a tibble (i.e. an easier-to-use type of dataframe) by setting collect = TRUE.

get_locs(watina,
         area_codes = c("KAL", "KBR"),
         loc_type = c("P", "S"),
         collect = TRUE)
#> Warning in warn_xy_duplicates(locs$x, locs$y): 15 different coordinate pairs occur more than once.
#> # A tibble: 247 x 16
#>    loc_code area_code area_name      x      y loc_validitycode loc_validity
#>    <chr>    <chr>     <chr>      <dbl>  <dbl> <chr>            <chr>       
#>  1 KALP022  KAL       Kalmthou… 155356 230811 VLD              Gevalideerd 
#>  2 KALP030  KAL       Kalmthou… 152990 238600 VLD              Gevalideerd 
#>  3 KALP032  KAL       Kalmthou… 159110 239010 VLD              Gevalideerd 
#>  4 KALP034  KAL       Kalmthou… 152962 231345 VLD              Gevalideerd 
#>  5 KALP035  KAL       Kalmthou… 153000 236850 VLD              Gevalideerd 
#>  6 KALP039  KAL       Kalmthou… 160940 235060 VLD              Gevalideerd 
#>  7 KALP050  KAL       Kalmthou… 153881 233802 VLD              Gevalideerd 
#>  8 KALP051  KAL       Kalmthou… 153822 233761 VLD              Gevalideerd 
#>  9 KALP052  KAL       Kalmthou… 153822 233761 VLD              Gevalideerd 
#> 10 KALP053  KAL       Kalmthou… 153781 233736 VLD              Gevalideerd 
#> # … with 237 more rows, and 9 more variables: loc_typecode <chr>,
#> #   loc_typename <chr>, obswell_statecode <chr>, obswell_state <chr>,
#> #   soilsurf_ost <dbl>, measuringref_ost <dbl>, tubelength <dbl>,
#> #   filterlength <dbl>, filterdepth <dbl>

Note that the lazy object contains a loc_wid column, which is an internal database variable to identify locations and make relations between tables. It should not be regarded as a stable ID; hence don’t store it for later use (note that lazy objects don’t store data). Consequently, the loc_wid column is omitted when using collect = TRUE.

Optionally, you can retrieve the individual observation wells that are linked to each location (where each observation well is from a different timeframe). In this case, more attributes are returned (specific observation well attributes). The below example also demonstrates the use of tibble::glimpse() to preview all columns:

get_locs(watina,
         obswells = TRUE,
         area_codes = c("KAL", "KBR"),
         loc_type = c("P", "S")) %>% 
  glimpse()
#> Rows: ??
#> Columns: 21
#> $ loc_wid             <int> 1546, 1549, 1552, 1562, 1563, 1564, 1565, 1566, 1…
#> $ loc_code            <chr> "KALP030", "KALP035", "KALP039", "KALP052", "KALP…
#> $ area_code           <chr> "KAL", "KAL", "KAL", "KAL", "KAL", "KAL", "KAL", …
#> $ area_name           <chr> "Kalmthoutse heide", "Kalmthoutse heide", "Kalmth…
#> $ x                   <dbl> 152990, 153000, 160940, 153822, 153781, 154754, 1…
#> $ y                   <dbl> 238600, 236850, 235060, 233761, 233736, 232459, 2…
#> $ loc_validitycode    <chr> "VLD", "VLD", "VLD", "VLD", "VLD", "VLD", "VLD", …
#> $ loc_validity        <chr> "Gevalideerd", "Gevalideerd", "Gevalideerd", "Gev…
#> $ loc_typecode        <chr> "P", "P", "P", "P", "P", "P", "P", "P", "P", "P",…
#> $ loc_typename        <chr> "Peilbuis/Piëzometer", "Peilbuis/Piëzometer", "Pe…
#> $ obswell_code        <chr> "KALP030X", "KALP035X", "KALP039X", "KALP052X", "…
#> $ obswell_rank        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1…
#> $ obswell_statecode   <chr> "CONFN", "CONFN", "CONFN", "CONFN", "CONFN", "CON…
#> $ obswell_state       <chr> "OK - niet bemeten", "OK - niet bemeten", "OK - n…
#> $ obswell_installdate <date> 1982-03-06, 1982-06-28, 1982-03-06, 1999-11-03, …
#> $ obswell_stopdate    <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, 2020-05-10, …
#> $ soilsurf_ost        <dbl> 13.420, 15.200, 17.270, 21.060, 21.050, 20.540, 2…
#> $ measuringref_ost    <dbl> 13.580, 15.000, 17.150, 21.580, 21.490, 20.860, 2…
#> $ tubelength          <dbl> 3.12, 0.50, 2.26, 2.02, 2.04, 1.82, 3.25, 2.82, 2…
#> $ filterlength        <dbl> 1.00, 0.50, 1.00, 0.20, 0.30, 0.20, 0.20, 0.20, 0…
#> $ filterdepth         <dbl> 2.460, 0.450, 1.880, 1.400, 1.450, 1.400, 2.700, …

With get_locs(obswells = FALSE), which is the default, the observation well variables are aggregated for each location using one of four methods. The method is set with the argument obswell_aggr. The default, obswell_aggr = "latest", returns the attributes of the most recent observation well that fulfills the filterdepth_range and filterdepth_na criteria.

Disconnecting

At the end of your work (a script, a bookdown document etc.), close the connection:

dbDisconnect(watina)

See the dbplyr package to get you started with this type of object.↩︎