This function calculates statistics for the given data
(e.g. from the git-repository forresdat
) on the specified level
(e.g. forest_reserve, period and species) and for the specified variables
(e.g. basal_area and volume).
Calculated statistics include number of observations, mean, variance
and confidence interval with lower and upper limit (lci and uci).
These summary statistics are calculated on the given data, not taking into account absence of observations unless explicitly added as a record with value zero. E.g. if a certain species only occurs in 3 plots out of 10 and no records are added for the 7 remaining plots, the summary statistics (e.g. mean coverage) are calculated on 3 plots. Records with value zero for certain variables (e.g. coverage of a certain species or number of trees for a certain diameter class) can automatically be added using the function add_zeros().
In case of intervals, the variance and confidence interval are calculated
based on the minimum and maximum values of the intervals of the individual
records (which is considered a CI, so lci and uci can serve as min and max).
For this, dataset
must contain columns with minimum and maximum values,
variables
must contain a name for the output of this variable, and
interval_information
must contain the variable names for minimum, maximum
and output that should be used.
In interval_information
it can be specified if a logarithmic transformation
is needed to compensate of unequal interval widths.
In this case, mean and the confidence interval are transformed back,
but variance is not, as this result would be confusing rather than useful.
For typical forresdat
variables,
the default value of interval_information
can be used and in this case, the variable mentioned in variables
should
be named after the values in forresdat
, omitting min_
, _min
, max_
or
_max
(see example on interval data).
create_statistics(
dataset,
level = c("period", "forest_reserve"),
variables,
include_year_range = FALSE,
na_rm = FALSE,
interval_information = suppressMessages(read_csv2(system.file("extdata/class_data.csv",
package = "forrescalc")))
)
dataset with data to be summarised with at least columns year
and period, e.g. table from git repository forresdat
grouping variables that determine on which level the values should be calculated (e.g. forest_reserve, year and species), given as a string or a vector of strings. Defaults to forest_reserve & period.
variable(s) of which summary statistics should be calculated (given as a string or a vector of strings)
Should min_year and max_year be calculated based on a given column year in dataset? Defaults to FALSE.
Should NA values in the dataset be ignored? Defaults to FALSE. If TRUE, levels without any non NA data are kept (resulting in NA values).
overview of names for interval data,
including columns var_name
(= name for output), var_min
and var_max
(= names for minimum and maximum value in input dataset), and
preferred_transformation
(= "log" if log-transformation is desired).
Defaults to a table containing all interval variables in forresdat
,
where log transformation is applied in variables where class widths differ.
(In cover data in the Longo scale, log transformation is only applied in
variables where most observations have a low coverage, e.g. moss cover,
in congruence with the fact that class widths only differ in the lower part
of the Longo scale.)
dataframe with the columns chosen for level, a column variable with
the chosen variables, and the columns n_obs
, mean
, variance
,
lci
(lower limit of confidence interval) and
uci
(upper limit of confidence interval)
library(forrescalc)
dendro_by_plot <- read_forresdat_table(tablename = "dendro_by_plot")
#> Rows: 17 Columns: 16
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): plottype
#> dbl (15): plot_id, year, period, number_of_tree_species, number_of_trees_ha,...
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 17 Columns: 13
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): forest_reserve, plottype
#> dbl (4): plot_id, period, survey_number, year_dendro
#> lgl (7): survey_trees, survey_deadw, survey_veg, survey_reg, game_impact_veg...
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: The dataset only contains presence data and lacks zero observations (except for 1 observation per plot_id and period to indicate that observations are done). Please use function add_zeros() to add zero observations when needed.
create_statistics(
dataset = dendro_by_plot,
level = c("forest_reserve", "period"),
variables = "vol_alive_m3_ha"
)
#> # A tibble: 9 × 9
#> forest_reserve period variable n_obs mean variance lci uci logaritmic
#> <chr> <dbl> <chr> <int> <dbl> <dbl> <dbl> <dbl> <lgl>
#> 1 Everzwijnbad 1 vol_alive… 1 541. NA NA NA FALSE
#> 2 Everzwijnbad 2 vol_alive… 1 592. NA NA NA FALSE
#> 3 Kersselaerspleyn 1 vol_alive… 1 107. NA NA NA FALSE
#> 4 Kersselaerspleyn 2 vol_alive… 1 126. NA NA NA FALSE
#> 5 Kersselaerspleyn 3 vol_alive… 1 133. NA NA NA FALSE
#> 6 Liedekerke 1 vol_alive… 1 16.1 NA NA NA FALSE
#> 7 Liedekerke 2 vol_alive… 1 29.7 NA NA NA FALSE
#> 8 Withoefse heide 1 vol_alive… 1 12.9 NA NA NA FALSE
#> 9 Withoefse heide 2 vol_alive… 1 17.2 NA NA NA FALSE
dendro_by_diam_plot_species <-
read_forresdat_table(tablename = "dendro_by_diam_plot_species")
#> Rows: 158 Columns: 15
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): plottype, dbh_class_5cm
#> dbl (13): plot_id, year, period, species, stem_number_alive_ha, stem_number_...
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 17 Columns: 13
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): forest_reserve, plottype
#> dbl (4): plot_id, period, survey_number, year_dendro
#> lgl (7): survey_trees, survey_deadw, survey_veg, survey_reg, game_impact_veg...
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: The dataset only contains presence data and lacks zero observations (except for 1 observation per plot_id and period to indicate that observations are done). Please use function add_zeros() to add zero observations when needed.
create_statistics(
dataset = dendro_by_diam_plot_species,
level = c("forest_reserve", "year", "species", "dbh_class_5cm"),
variables = c("basal_area_alive_m2_ha", "basal_area_dead_m2_ha")
)
#> Warning: Are you sure you don't want to include period in level? Your dataset has measurements in different periods.
#> # A tibble: 124 × 11
#> forest_reserve year species dbh_class_5cm variable n_obs mean variance
#> <chr> <dbl> <dbl> <chr> <chr> <int> <dbl> <dbl>
#> 1 Everzwijnbad 2002 16 10 - 15 cm basal_area_a… 1 1.35 NA
#> 2 Everzwijnbad 2002 16 10 - 15 cm basal_area_d… 1 0 NA
#> 3 Everzwijnbad 2002 16 5 - 10 cm basal_area_a… 1 0.198 NA
#> 4 Everzwijnbad 2002 16 5 - 10 cm basal_area_d… 1 0 NA
#> 5 Everzwijnbad 2002 28 5 - 10 cm basal_area_a… 1 0.944 NA
#> 6 Everzwijnbad 2002 28 5 - 10 cm basal_area_d… 1 0 NA
#> 7 Everzwijnbad 2002 87 10 - 15 cm basal_area_a… 1 0 NA
#> 8 Everzwijnbad 2002 87 10 - 15 cm basal_area_d… 1 0 NA
#> 9 Everzwijnbad 2002 87 35 - 40 cm basal_area_a… 1 3.78 NA
#> 10 Everzwijnbad 2002 87 35 - 40 cm basal_area_d… 1 0 NA
#> # ℹ 114 more rows
#> # ℹ 3 more variables: lci <dbl>, uci <dbl>, logaritmic <lgl>
#example on interval data (shrub_cover and tree_cover)
veg_by_plot <- read_forresdat_table(tablename = "veg_by_plot")
#> Rows: 43 Columns: 29
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): plottype
#> dbl (27): plot_id, subplot_id, period, year_main_survey, number_of_species,...
#> dttm (1): date_vegetation
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 17 Columns: 13
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): forest_reserve, plottype
#> dbl (4): plot_id, period, survey_number, year_dendro
#> lgl (7): survey_trees, survey_deadw, survey_veg, survey_reg, game_impact_veg...
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: The dataset only contains presence data and lacks zero observations (except for 1 observation per plot_id and period to indicate that observations are done). Please use function add_zeros() to add zero observations when needed.
create_statistics(dataset = veg_by_plot,
level = c("forest_reserve", "period", "plottype"),
variables = c("number_of_species", "shrub_cover", "tree_cover")
)
#> # A tibble: 24 × 10
#> forest_reserve period plottype variable n_obs mean variance lci uci
#> <chr> <dbl> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 Everzwijnbad 1 CP number_of_… 1 2 NA NA NA
#> 2 Everzwijnbad 1 CP shrub_cover 1 39.7 0.00411 35.0 45
#> 3 Everzwijnbad 1 CP tree_cover 1 90 6.51 85 95
#> 4 Everzwijnbad 2 CP number_of_… 1 2 NA NA NA
#> 5 Everzwijnbad 2 CP shrub_cover 1 29.6 0.00737 25.0 35
#> 6 Everzwijnbad 2 CP tree_cover 1 90 6.51 85 95
#> 7 Kersselaerspleyn 1 CP number_of_… 1 1 NA NA NA
#> 8 Kersselaerspleyn 1 CP shrub_cover 1 NA NA NA NA
#> 9 Kersselaerspleyn 1 CP tree_cover 1 NA NA NA NA
#> 10 Kersselaerspleyn 2 CP number_of_… 1 1 NA NA NA
#> # ℹ 14 more rows
#> # ℹ 1 more variable: logaritmic <lgl>
# example on data with confidence interval (number_established_ha and
# number_seedlings_ha)
reg_by_plot <-
read_forresdat_table(tablename = "reg_by_plot")
#> Rows: 30 Columns: 25
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): plottype
#> dbl (24): plot_id, subplot_id, period, year, number_of_tree_species, nr_of_t...
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 17 Columns: 13
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): forest_reserve, plottype
#> dbl (4): plot_id, period, survey_number, year_dendro
#> lgl (7): survey_trees, survey_deadw, survey_veg, survey_reg, game_impact_veg...
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: The dataset only contains presence data and lacks zero observations (except for 1 observation per plot_id and period to indicate that observations are done). Please use function add_zeros() to add zero observations when needed.
create_statistics(dataset = reg_by_plot,
level = c("forest_reserve", "period", "plot_id"),
variables = c("number_established_ha", "number_seedlings_ha")
)
#> # A tibble: 18 × 10
#> forest_reserve period plot_id variable n_obs mean variance lci uci
#> <chr> <dbl> <dbl> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 Everzwijnbad 1 101 number_… 1 7.86e2 1.08e-22 7.86e2 786.
#> 2 Everzwijnbad 1 101 number_… 1 4.84e4 4.55e-23 4.84e4 48415.
#> 3 Everzwijnbad 2 101 number_… 1 9.10e3 4.75e- 3 7.95e3 10422.
#> 4 Everzwijnbad 2 101 number_… 1 8.09e4 2.82e- 3 7.29e4 89790.
#> 5 Kersselaerspl… 1 2006 number_… 1 0 Inf 0 NaN
#> 6 Kersselaerspl… 1 2006 number_… 1 2.52e3 1.80e-22 2.52e3 2515.
#> 7 Kersselaerspl… 2 2006 number_… 1 4.69e3 2.26e-25 4.69e3 4686.
#> 8 Kersselaerspl… 2 2006 number_… 1 1.73e4 6.30e-25 1.73e4 17330.
#> 9 Kersselaerspl… 3 2006 number_… 1 6.19e3 1.30e-25 6.19e3 6189.
#> 10 Kersselaerspl… 3 2006 number_… 1 1.14e4 2.04e- 2 8.59e3 15043.
#> 11 Liedekerke 1 1005 number_… 1 0 Inf 0 NaN
#> 12 Liedekerke 1 1005 number_… 1 0 Inf 0 NaN
#> 13 Liedekerke 2 1005 number_… 1 0 Inf 0 NaN
#> 14 Liedekerke 2 1005 number_… 1 0 Inf 0 NaN
#> 15 Withoefse hei… 1 204 number_… 1 8.75e1 8.16e-23 8.75e1 87.5
#> 16 Withoefse hei… 1 204 number_… 1 3.79e2 7.23e- 3 3.21e2 448.
#> 17 Withoefse hei… 2 204 number_… 1 1.75e2 5.47e-23 1.75e2 175.
#> 18 Withoefse hei… 2 204 number_… 1 3.47e2 2.41e- 3 3.15e2 382.
#> # ℹ 1 more variable: logaritmic <lgl>