Getting started: connecting and selecting locations
Source:vignettes/v010_getstarted.Rmd
v010_getstarted.RmdGeneral note: the below vignette contains frozen output of 14 Jan 2021. This makes it possible to build the package with vignettes without access to the Watina data warehouse.
Connecting
You first need a connection object to connect to Watina; let’s simply
name it watina:
watina <- connect_watina()Selecting locations
Selecting locations is a typical starting point. Afterwards, you can use the returned object as input to other functions.
By default, a tbl_lazy object is given (lazy query).1 The
returned data include some essential metadata of the locations. The
following example shows a few rows:
get_locs(watina) %>%
head(5) %>%
collect() %>%
as.data.frame()
#> loc_wid loc_code area_code area_name x y loc_validitycode
#> 1 45 AABP001 AAB Aabeek 177604 209714 VLD
#> 2 6291 AABP002 AAB Aabeek 177903 209459 VLD
#> 3 28 AABP003 AAB Aabeek 178353 209155 VLD
#> 4 29 AABP004 AAB Aabeek 179544 210259 VLD
#> 5 30 AABP005 AAB Aabeek 179610 210173 VLD
#> loc_validity loc_typecode loc_typename obswell_statecode
#> 1 Gevalideerd P Peilbuis/Piëzometer CONFN
#> 2 Gevalideerd P Peilbuis/Piëzometer CONFN
#> 3 Gevalideerd P Peilbuis/Piëzometer CONFN
#> 4 Gevalideerd P Peilbuis/Piëzometer CONFN
#> 5 Gevalideerd P Peilbuis/Piëzometer CONFN
#> obswell_state soilsurf_ost measuringref_ost tubelength filterlength
#> 1 OK - niet bemeten 10.136 10.656 4 2
#> 2 OK - niet bemeten 9.590 10.120 4 2
#> 3 OK - niet bemeten 9.660 10.150 4 2
#> 4 OK - niet bemeten 10.310 10.730 4 2
#> 5 OK - niet bemeten 10.480 10.950 4 2
#> filterdepth
#> 1 2.48
#> 2 2.47
#> 3 2.51
#> 4 2.58
#> 5 2.53Selections can be done in a variety of ways – and these ways can be combined:
- by using area codes:
get_locs(
watina,
area_codes = c("KAL", "KBR")
)
#> # Source: lazy query [?? x 17]
#> # Database: Microsoft SQL Server
#> loc_wid loc_code area_code area_name x y loc_validitycode
#> <int> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 2824 KALP022 KAL Kalmthou… 155356 230811 VLD
#> 2 1546 KALP030 KAL Kalmthou… 152990 238600 VLD
#> 3 6205 KALP032 KAL Kalmthou… 159110 239010 VLD
#> 4 10069 KALP034 KAL Kalmthou… 152962 231345 VLD
#> 5 1549 KALP035 KAL Kalmthou… 153000 236850 VLD
#> 6 1552 KALP039 KAL Kalmthou… 160940 235060 VLD
#> 7 4327 KALP050 KAL Kalmthou… 153881 233802 VLD
#> 8 6210 KALP051 KAL Kalmthou… 153822 233761 VLD
#> 9 1562 KALP052 KAL Kalmthou… 153822 233761 VLD
#> 10 1563 KALP053 KAL Kalmthou… 153781 233736 VLD
#> # … with more rows, and 10 more variables: loc_validity <chr>,
#> # loc_typecode <chr>, loc_typename <chr>, obswell_statecode <chr>,
#> # obswell_state <chr>, soilsurf_ost <dbl>, measuringref_ost <dbl>,
#> # tubelength <dbl>, filterlength <dbl>, filterdepth <dbl>The number of involved locations is given by:
get_locs(
watina,
area_codes = c("KAL", "KBR")
) %>%
count()
#> # Source: lazy query [?? x 1]
#> # Database: Microsoft SQL Server
#> n
#> <int>
#> 1 207- you can specify the type of location (device):
get_locs(
watina,
area_codes = c("KAL", "KBR"),
loc_type = c("P", "S")
) %>%
count()
#> # Source: lazy query [?? x 1]
#> # Database: Microsoft SQL Server
#> n
#> <int>
#> 1 247- for groundwater piezometers: by setting conditions on filterdepth.
The filterdepth is the depth of the piezometer’s filter below soil
surface (unit: m), more specifically referring to the position
halfway the length of the filter. Conditions can be set by the
arguments
filterdepth_rangeandfilterdepth_na(see documentation ofget_locs()). Moreover, by settingfilterdepth_guess = TRUE(defaultFALSE) you can replace missing filterdepths by a conservative value, before applying these conditions. By default, only filterdepths between 0 and 3 meters below soilsurface are returned and locations with missing filterdepth are dropped.
Indeed, more records are returned when allowing groundwater piezometers with missing filterdepth:
get_locs(
watina,
area_codes = c("KAL", "KBR"),
loc_type = c("P", "S"),
filterdepth_na = TRUE
) %>%
count()
#> # Source: lazy query [?? x 1]
#> # Database: Microsoft SQL Server
#> n
#> <int>
#> 1 289Of course you can set further conditions for other variables, by
building upon the lazy result of get_locs(), e.g.:
get_locs(
watina,
area_codes = c("KAL", "KBR"),
loc_type = c("P", "S"),
filterdepth_na = TRUE
) %>%
filter(obswell_state == "OK - niet bemeten") %>%
count()
#> # Source: lazy query [?? x 1]
#> # Database: Microsoft SQL Server
#> n
#> <int>
#> 1 157- by using a spatial bounding box (rectangular area), using the Belgian Lambert 72 coordinate reference system (EPSG-code 31370):
get_locs(
watina,
bbox = c(
xmin = 1.4e+5,
xmax = 1.7e+5,
ymin = 1.6e+5,
ymax = 1.9e+5
)
)
#> # Source: lazy query [?? x 17]
#> # Database: Microsoft SQL Server
#> loc_wid loc_code area_code area_name x y loc_validitycode
#> <int> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 10261 ATGP002 ATG Antitank… 169572 186043 VLD
#> 2 777 BABP012 BAB Barebeek 160190 180988 VLD
#> 3 5629 DORP016 DOR Dorent 158521 186074 VLD
#> 4 5630 DORP017 DOR Dorent 158442 185925 VLD
#> 5 5632 DORP019 DOR Dorent 158304 184915 VLD
#> 6 5634 DORP020 DOR Dorent 158352 184259 VLD
#> 7 5635 DORP021 DOR Dorent 158103 183853 VLD
#> 8 5586 DORP022 DOR Dorent 157715 184013 VLD
#> 9 5587 DORP023 DOR Dorent 154284 182533 VLD
#> 10 6420 DORP024 DOR Dorent 154823 182431 VLD
#> # … with more rows, and 10 more variables: loc_validity <chr>,
#> # loc_typecode <chr>, loc_typename <chr>, obswell_statecode <chr>,
#> # obswell_state <chr>, soilsurf_ost <dbl>, measuringref_ost <dbl>,
#> # tubelength <dbl>, filterlength <dbl>, filterdepth <dbl>- if you already have specific location codes in mind, you can provide these directly as a vector:
get_locs(
watina,
loc_vec = c("KBRP081", "KBRP090", "KBRP095")
)
#> # Source: lazy query [?? x 17]
#> # Database: Microsoft SQL Server
#> loc_wid loc_code area_code area_name x y loc_validitycode
#> <int> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 664 KBRP081 KBR Polders … 145846 204871 VLD
#> 2 698 KBRP090 KBR Polders … 145988 204702 VLD
#> 3 709 KBRP095 KBR Polders … 145949 204712 VLD
#> # … with 10 more variables: loc_validity <chr>, loc_typecode <chr>,
#> # loc_typename <chr>, obswell_statecode <chr>, obswell_state <chr>,
#> # soilsurf_ost <dbl>, measuringref_ost <dbl>, tubelength <dbl>,
#> # filterlength <dbl>, filterdepth <dbl>- you can use a geospatial filter of class sf to make selections, by
using the
maskargument:
get_locs(
watina,
mask = westfl,
# sf polygon of the 'West-Vlaanderen' province
buffer = 0
)
#> As a mask always invokes a collect(), the argument 'collect = FALSE' will be ignored.
#> Warning in get_locs(watina, mask = westfl, buffer = 0): Dropped 44 locations from which x or y coordinates were missing.
#> Warning in warn_xy_duplicates(locs$x, locs$y): 28 different coordinate pairs occur more than once.
#> # A tibble: 905 x 16
#> loc_code area_code area_name x y loc_validitycode loc_validity
#> * <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
#> 1 ASSP001 ASS Assebroe… 74324 208838 VLD Gevalideerd
#> 2 ASSP002 ASS Assebroe… 74526 208787 VLD Gevalideerd
#> 3 ASSP003 ASS Assebroe… 74169 208790 VLD Gevalideerd
#> 4 ASSP004 ASS Assebroe… 74108 208605 VLD Gevalideerd
#> 5 ASSP005 ASS Assebroe… 73910 208646 VLD Gevalideerd
#> 6 ASSP006 ASS Assebroe… 74033 208916 VLD Gevalideerd
#> 7 ASSP007 ASS Assebroe… 73609 208954 VLD Gevalideerd
#> 8 ASSP008 ASS Assebroe… 73948 209104 VLD Gevalideerd
#> 9 ASSP009 ASS Assebroe… 73613 208584 VLD Gevalideerd
#> 10 ASSP010 ASS Assebroe… 73583 208390 VLD Gevalideerd
#> # … with 895 more rows, and 9 more variables: loc_typecode <chr>,
#> # loc_typename <chr>, obswell_statecode <chr>, obswell_state <chr>,
#> # soilsurf_ost <dbl>, measuringref_ost <dbl>, tubelength <dbl>,
#> # filterlength <dbl>, filterdepth <dbl>Note that the default value for buffer is 10 meters; it
is used to enlarge mask (or shrink it, if buffer < 0).
Also, note there are a few warnings to inform about dropped locations
and duplicated coordinates.
Instead of leaving the output as a tbl_lazy object, you
can also retrieve it as a tibble (i.e. an easier-to-use type of data
frame) by setting collect = TRUE.
get_locs(
watina,
area_codes = c("KAL", "KBR"),
loc_type = c("P", "S"),
collect = TRUE
)
#> Warning in warn_xy_duplicates(locs$x, locs$y): 15 different coordinate pairs occur more than once.
#> # A tibble: 247 x 16
#> loc_code area_code area_name x y loc_validitycode loc_validity
#> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
#> 1 KALP022 KAL Kalmthou… 155356 230811 VLD Gevalideerd
#> 2 KALP030 KAL Kalmthou… 152990 238600 VLD Gevalideerd
#> 3 KALP032 KAL Kalmthou… 159110 239010 VLD Gevalideerd
#> 4 KALP034 KAL Kalmthou… 152962 231345 VLD Gevalideerd
#> 5 KALP035 KAL Kalmthou… 153000 236850 VLD Gevalideerd
#> 6 KALP039 KAL Kalmthou… 160940 235060 VLD Gevalideerd
#> 7 KALP050 KAL Kalmthou… 153881 233802 VLD Gevalideerd
#> 8 KALP051 KAL Kalmthou… 153822 233761 VLD Gevalideerd
#> 9 KALP052 KAL Kalmthou… 153822 233761 VLD Gevalideerd
#> 10 KALP053 KAL Kalmthou… 153781 233736 VLD Gevalideerd
#> # … with 237 more rows, and 9 more variables: loc_typecode <chr>,
#> # loc_typename <chr>, obswell_statecode <chr>, obswell_state <chr>,
#> # soilsurf_ost <dbl>, measuringref_ost <dbl>, tubelength <dbl>,
#> # filterlength <dbl>, filterdepth <dbl>Note that the lazy object contains a loc_wid column,
which is an internal data warehouse variable to identify locations and
make relations between tables. It should not be regarded as a stable ID;
hence don’t store it for later use (note that lazy objects don’t store
data). Consequently, the loc_wid column is omitted when
using collect = TRUE.
Optionally, you can retrieve the individual observation
wells that are linked to each location (where each observation
well is from a different timeframe). In this case, more attributes are
returned (specific observation well attributes). The below example also
demonstrates the use of tibble::glimpse() to preview all
columns:
get_locs(
watina,
obswells = TRUE,
area_codes = c("KAL", "KBR"),
loc_type = c("P", "S")
) %>%
glimpse()
#> Rows: ??
#> Columns: 21
#> $ loc_wid <int> 1546, 1549, 1552, 1562, 1563, 1564, 1565, 1566, 1…
#> $ loc_code <chr> "KALP030", "KALP035", "KALP039", "KALP052", "KALP…
#> $ area_code <chr> "KAL", "KAL", "KAL", "KAL", "KAL", "KAL", "KAL", …
#> $ area_name <chr> "Kalmthoutse heide", "Kalmthoutse heide", "Kalmth…
#> $ x <dbl> 152990, 153000, 160940, 153822, 153781, 154754, 1…
#> $ y <dbl> 238600, 236850, 235060, 233761, 233736, 232459, 2…
#> $ loc_validitycode <chr> "VLD", "VLD", "VLD", "VLD", "VLD", "VLD", "VLD", …
#> $ loc_validity <chr> "Gevalideerd", "Gevalideerd", "Gevalideerd", "Gev…
#> $ loc_typecode <chr> "P", "P", "P", "P", "P", "P", "P", "P", "P", "P",…
#> $ loc_typename <chr> "Peilbuis/Piëzometer", "Peilbuis/Piëzometer", "Pe…
#> $ obswell_code <chr> "KALP030X", "KALP035X", "KALP039X", "KALP052X", "…
#> $ obswell_rank <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1…
#> $ obswell_statecode <chr> "CONFN", "CONFN", "CONFN", "CONFN", "CONFN", "CON…
#> $ obswell_state <chr> "OK - niet bemeten", "OK - niet bemeten", "OK - n…
#> $ obswell_installdate <date> 1982-03-06, 1982-06-28, 1982-03-06, 1999-11-03, …
#> $ obswell_stopdate <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, 2020-05-10, …
#> $ soilsurf_ost <dbl> 13.420, 15.200, 17.270, 21.060, 21.050, 20.540, 2…
#> $ measuringref_ost <dbl> 13.580, 15.000, 17.150, 21.580, 21.490, 20.860, 2…
#> $ tubelength <dbl> 3.12, 0.50, 2.26, 2.02, 2.04, 1.82, 3.25, 2.82, 2…
#> $ filterlength <dbl> 1.00, 0.50, 1.00, 0.20, 0.30, 0.20, 0.20, 0.20, 0…
#> $ filterdepth <dbl> 2.460, 0.450, 1.880, 1.400, 1.450, 1.400, 2.700, …With get_locs(obswells = FALSE), which is the default,
the observation well variables are aggregated for each location using
one of four methods. The method is set with the argument
obswell_aggr. The default,
obswell_aggr = "latest", returns the attributes of the most
recent observation well that fulfills the filterdepth_range
and filterdepth_na criteria.
Disconnecting
At the end of your work (a script, a bookdown document etc.), close the connection:
dbDisconnect(watina)