For a dataset as returned by get_chem,
return summary statistics (data availability and/or
numeric properties) of
the specified hydrochemical variables, for each location.
Arguments
- data
An object returned by
get_chem.- chem_var
A character vector to select chemical variables for which statistics will be computed. To specify chemical variables, use the codes from the column
chem_variableindata.- type
A string defining the requested type of summary statistics. See section 'Value'. Either:
"avail": availability statistics (the default);"num": numeric summary statistics;"both": both types will be returned.
- uniformity_test
Should the availability statistic
pval_uniform_totalspanbe added (see section 'Value')? Defaults toFALSEas this takes much more time to calculate than everything else.
Value
A tibble with variables loc_code (see get_locs),
chem_variable (character; see Details)
and summary statistics (at the level
of loc_code x chem_variable) depending on the state of
type
(type = "both" combines both cases):
With type = "avail" (the default):
nryears: number of calendar years for which the chemical variable is available (on one date at least)nrdates: number of dates at which the chemical variable is availablefirstdate: earliest date at which the chemical variable is availablelastdate: latest date at which the chemical variable is availabletimespan_years: duration as years from the first calendar year (year offirstdate) to the last calendar year (year oflastdate) with data availabletimespan_totalspan_ratio: the ratio oftimespan_yearsto the 'total span', which is the duration as years from the first to the last calendar year for the whole ofdata.nryears_totalspan_ratio: the ratio ofnryearsto the 'total span'pval_uniform_totalspan: Only returned withuniformity_test = TRUE. p-value of an exact, one-sample two-sided Kolmogorov-Smirnov test for the discrete uniform distribution of the calendar years with data of the chemical variable available within the series of consecutive calendar years defined by the 'total span' (see above). The smaller the p-value, the less uniform the years are spread within the total span. A perfectly uniform spread results in a p-value of 1.
With type = "num": summary statistics based on the chemical values,
i.e. excluding the 'combined' level:
val_min: minimum valueval_pct10,val_pct25,val_pct50,val_pct75,val_pct90: percentilesval_max: maximum valueval_range: rangeval_mean: meanval_geometric_mean: geometric mean. Only calculated when all values are strictly positive.unitprop_below_loq: the proportion of measurements below the limit of quantification (as derived frombelow_loq == TRUE), i.e. relative to the total number of measurements for whichbelow_loqis notNA. This statistic neglects measurements with valueNAforbelow_loq! If all measurements have valueNAforbelow_loq, this statistic is set toNA.
Details
For the availability statistics, an extra level
"combined" is added in the column chem_variable whenever the
arguments data and
chem_var imply more than one
chemical variable to be investigated.
This 'combined' level defines data availability for a water sample as the
availability of data for all corresponding chemical variables.
See also
Other functions to evaluate locations:
eval_xg3_avail(),
eval_xg3_series()
Examples
if (FALSE) { # \dontrun{
watina <- connect_watina()
library(dplyr)
mylocs <- get_locs(watina, area_codes = "ZWA")
mydata <-
mylocs %>%
get_chem(watina, "1/1/2010")
mydata %>% arrange(loc_code, date, chem_variable)
mydata %>%
pull(date) %>%
lubridate::year(.) %>%
(function(x) c(firstyear = min(x), lastyear = max(x)))
mydata %>%
eval_chem(chem_var = c("P-PO4", "N-NO3", "N-NO2", "N-NH4")) %>%
arrange(desc(loc_code))
mydata %>%
eval_chem(
chem_var = c("P-PO4", "N-NO3", "N-NO2", "N-NH4"),
type = "both"
) %>%
arrange(desc(loc_code)) %>%
as.data.frame() %>%
head(10)
mydata %>%
eval_chem(
chem_var = c("P-PO4", "N-NO3", "N-NO2", "N-NH4"),
uniformity_test = TRUE
) %>%
arrange(desc(loc_code)) %>%
select(loc_code, chem_variable, pval_uniform_totalspan)
# Disconnect:
dbDisconnect(watina)
} # }