R/add_zeros.R
add_zeros.Rd
Datasets for which this package has been developed, typically contain
measurements of observations.
Absence is often not reported explicitly (e.g. there exists no record of
a species that is not observed in a plot),
while it can be important to include these zero values in an analysis
(e.g. mean coverage per species in a certain forest reserve; mean stem number
per diameter class in a forest reserve).
This function automatically adds missing combinations with value zero to
the dataset for each combination of values of the variables given
in comb_vars
(within each value of grouping_vars
).
All variables that are not mentioned in comb_vars
or grouping_vars
,
are considered to be numerical variables and will get value 0 (zero).
Note that if a certain value is not present in the dataset
(or in one of the subsets defined by grouping_vars
), it will not be
added automatically;
at least one record should be added manually for this value
(e.g. a plot or diameterclass
that doesn't exist in the given dataset,
but has to be included in the output).
The data in forresdat
already contain one record with zeros per plot
(with NA value for species
and/or diameterclass
), resulting in records to
be added automatically if 'plot_id' is added to comb_vars
.
add_zeros(
dataset,
comb_vars,
grouping_vars,
add_zero_no_na = NA,
remove_na_records_in_comb_vars = NA,
defaults_to_na = NA
)
data.frame in which records should be added
variables (given as a vector of strings) of which all combinations of their values should have a record in the dataset.
one or more variables for which the combination of
values of the variables given in comb_vars
should be made for each value,
e.g. if grouping_vars = "forest_reserve"
and
comb_vars = c("plot", "species")
,
all combinations of the values in "plot" and "species" are made
within each value of "forest_reserve".
variable indicating which records of the
grouping_vars
should get a zero value (variable should be TRUE) or a NA
value (variable should be FALSE).
E.g. a variable indicating whether or not observations are done.
If no variable name is given (default NA), all added records get zero values.
In which of the given comb_vars
should records with NA values be removed after adding the records with zero
values for all combinations?
In some cases, e.g. if no species are observed in a plot, the dataset in
forresdat
has records with species NA and zeros for measured variables
to make sure zero values for all species are added for each plot when using
this function.
But after adding zero records for all missing species, the records with
species NA have become superfluous.
They can be removed by adding argument
remove_na_records_in_comb_vars = "species"
.
This argument defaults to NA (= no NA records are removed).
Columns in which the function should add NA instead of zero in the records that are added to complete the dataset.
dataframe based on dataset
to which records are added with
value 0 (zero) for each measurement.
library(forrescalc)
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
dendro_by_plot_species <-
read_forresdat_table(tablename = "dendro_by_plot_species") %>%
select(
-year, -plottype, -starts_with("survey_"), -data_processed,
-starts_with("game_")
)
#> Rows: 39 Columns: 16
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): plottype
#> dbl (15): plot_id, year, period, species, number_of_trees_ha, stem_number_ha...
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 17 Columns: 13
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): forest_reserve, plottype
#> dbl (4): plot_id, period, survey_number, year_dendro
#> lgl (7): survey_trees, survey_deadw, survey_veg, survey_reg, game_impact_veg...
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: The dataset only contains presence data and lacks zero observations (except for 1 observation per plot_id and period to indicate that observations are done). Please use function add_zeros() to add zero observations when needed.
add_zeros(
dataset = dendro_by_plot_species,
comb_vars = c("plot_id", "period", "species"),
grouping_vars = c("forest_reserve")
)
#> # A tibble: 16 × 16
#> plot_id period species number_of_trees_ha stem_number_ha
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 204 1 26 23.9 23.9
#> 2 2006 1 7 10.6 10.6
#> 3 2006 1 87 0 0
#> 4 101 1 16 78.6 157.
#> 5 101 1 28 157. 236.
#> 6 101 1 87 128. 128.
#> 7 1005 1 87 42.4 42.4
#> 8 2006 2 7 10.6 10.6
#> 9 2006 2 87 0 0
#> 10 204 2 26 23.9 23.9
#> 11 101 2 16 78.6 157.
#> 12 101 2 28 0 0
#> 13 101 2 87 128. 128.
#> 14 1005 2 87 42.4 42.4
#> 15 2006 3 7 10.6 10.6
#> 16 2006 3 87 0 0
#> # ℹ 11 more variables: basal_area_alive_m2_ha <dbl>,
#> # basal_area_dead_m2_ha <dbl>, vol_alive_m3_ha <dbl>,
#> # vol_dead_standing_m3_ha <dbl>, vol_bole_alive_m3_ha <dbl>,
#> # vol_bole_dead_m3_ha <dbl>, vol_log_m3_ha <dbl>, vol_deadw_m3_ha <dbl>,
#> # stems_per_tree <dbl>, forest_reserve <chr>, year_dendro <dbl>
add_zeros(
dataset = dendro_by_plot_species,
comb_vars = c("plot_id", "period", "species"),
grouping_vars = c("forest_reserve"),
remove_na_records_in_comb_vars = "species"
)
#> # A tibble: 16 × 16
#> plot_id period species number_of_trees_ha stem_number_ha
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 204 1 26 23.9 23.9
#> 2 2006 1 7 10.6 10.6
#> 3 2006 1 87 0 0
#> 4 101 1 16 78.6 157.
#> 5 101 1 28 157. 236.
#> 6 101 1 87 128. 128.
#> 7 1005 1 87 42.4 42.4
#> 8 2006 2 7 10.6 10.6
#> 9 2006 2 87 0 0
#> 10 204 2 26 23.9 23.9
#> 11 101 2 16 78.6 157.
#> 12 101 2 28 0 0
#> 13 101 2 87 128. 128.
#> 14 1005 2 87 42.4 42.4
#> 15 2006 3 7 10.6 10.6
#> 16 2006 3 87 0 0
#> # ℹ 11 more variables: basal_area_alive_m2_ha <dbl>,
#> # basal_area_dead_m2_ha <dbl>, vol_alive_m3_ha <dbl>,
#> # vol_dead_standing_m3_ha <dbl>, vol_bole_alive_m3_ha <dbl>,
#> # vol_bole_dead_m3_ha <dbl>, vol_log_m3_ha <dbl>, vol_deadw_m3_ha <dbl>,
#> # stems_per_tree <dbl>, forest_reserve <chr>, year_dendro <dbl>
add_zeros(
dataset = dendro_by_plot_species,
comb_vars = c("plot_id", "period", "species"),
grouping_vars = c("forest_reserve"),
defaults_to_na = "stems_per_tree"
)
#> # A tibble: 16 × 16
#> plot_id period species number_of_trees_ha stem_number_ha
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 204 1 26 23.9 23.9
#> 2 2006 1 7 10.6 10.6
#> 3 2006 1 87 0 0
#> 4 101 1 16 78.6 157.
#> 5 101 1 28 157. 236.
#> 6 101 1 87 128. 128.
#> 7 1005 1 87 42.4 42.4
#> 8 2006 2 7 10.6 10.6
#> 9 2006 2 87 0 0
#> 10 204 2 26 23.9 23.9
#> 11 101 2 16 78.6 157.
#> 12 101 2 28 0 0
#> 13 101 2 87 128. 128.
#> 14 1005 2 87 42.4 42.4
#> 15 2006 3 7 10.6 10.6
#> 16 2006 3 87 0 0
#> # ℹ 11 more variables: basal_area_alive_m2_ha <dbl>,
#> # basal_area_dead_m2_ha <dbl>, vol_alive_m3_ha <dbl>,
#> # vol_dead_standing_m3_ha <dbl>, vol_bole_alive_m3_ha <dbl>,
#> # vol_bole_dead_m3_ha <dbl>, vol_log_m3_ha <dbl>, vol_deadw_m3_ha <dbl>,
#> # stems_per_tree <dbl>, forest_reserve <chr>, year_dendro <dbl>