Datasets for which this package has been developed, typically contain measurements of observations. Absence is often not reported explicitly (e.g. there exists no record of a species that is not observed in a plot), while it can be important to include these zero values in an analysis (e.g. mean coverage per species in a certain forest reserve; mean stem number per diameter class in a forest reserve). This function automatically adds missing combinations with value zero to the dataset for each combination of values of the variables given in comb_vars (within each value of grouping_vars). All variables that are not mentioned in comb_vars or grouping_vars, are considered to be numerical variables and will get value 0 (zero). Note that if a certain value is not present in the dataset (or in one of the subsets defined by grouping_vars), it will not be added automatically; at least one record should be added manually for this value (e.g. a plot or diameterclass that doesn't exist in the given dataset, but has to be included in the output). The data in forresdat already contain one record with zeros per plot (with NA value for species and/or diameterclass), resulting in records to be added automatically if 'plot_id' is added to comb_vars.

add_zeros(
  dataset,
  comb_vars,
  grouping_vars,
  add_zero_no_na = NA,
  remove_na_records_in_comb_vars = NA,
  defaults_to_na = NA
)

Arguments

dataset

data.frame in which records should be added

comb_vars

variables (given as a vector of strings) of which all combinations of their values should have a record in the dataset.

grouping_vars

one or more variables for which the combination of values of the variables given in comb_vars should be made for each value, e.g. if grouping_vars = "forest_reserve" and comb_vars = c("plot", "species"), all combinations of the values in "plot" and "species" are made within each value of "forest_reserve".

add_zero_no_na

variable indicating which records of the grouping_vars should get a zero value (variable should be TRUE) or a NA value (variable should be FALSE). E.g. a variable indicating whether or not observations are done. If no variable name is given (default NA), all added records get zero values.

remove_na_records_in_comb_vars

In which of the given comb_vars should records with NA values be removed after adding the records with zero values for all combinations? In some cases, e.g. if no species are observed in a plot, the dataset in forresdat has records with species NA and zeros for measured variables to make sure zero values for all species are added for each plot when using this function. But after adding zero records for all missing species, the records with species NA have become superfluous. They can be removed by adding argument remove_na_records_in_comb_vars = "species". This argument defaults to NA (= no NA records are removed).

defaults_to_na

Columns in which the function should add NA instead of zero in the records that are added to complete the dataset.

Value

dataframe based on dataset to which records are added with value 0 (zero) for each measurement.

Examples

library(forrescalc)
library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
dendro_by_plot_species <-
  read_forresdat_table(tablename = "dendro_by_plot_species") %>%
  select(
    -year, -plottype, -starts_with("survey_"), -data_processed,
    -starts_with("game_")
  )
#> Rows: 39 Columns: 16
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (1): plottype
#> dbl (15): plot_id, year, period, species, number_of_trees_ha, stem_number_ha...
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 17 Columns: 13
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): forest_reserve, plottype
#> dbl (4): plot_id, period, survey_number, year_dendro
#> lgl (7): survey_trees, survey_deadw, survey_veg, survey_reg, game_impact_veg...
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: The dataset only contains presence data and lacks zero observations (except for 1 observation per plot_id and period to indicate that observations are done).  Please use function add_zeros() to add zero observations when needed.
add_zeros(
  dataset = dendro_by_plot_species,
  comb_vars = c("plot_id", "period", "species"),
  grouping_vars = c("forest_reserve")
)
#> # A tibble: 16 × 16
#>    plot_id period species number_of_trees_ha stem_number_ha
#>      <dbl>  <dbl>   <dbl>              <dbl>          <dbl>
#>  1     204      1      26               23.9           23.9
#>  2    2006      1       7               10.6           10.6
#>  3    2006      1      87                0              0  
#>  4     101      1      16               78.6          157. 
#>  5     101      1      28              157.           236. 
#>  6     101      1      87              128.           128. 
#>  7    1005      1      87               42.4           42.4
#>  8    2006      2       7               10.6           10.6
#>  9    2006      2      87                0              0  
#> 10     204      2      26               23.9           23.9
#> 11     101      2      16               78.6          157. 
#> 12     101      2      28                0              0  
#> 13     101      2      87              128.           128. 
#> 14    1005      2      87               42.4           42.4
#> 15    2006      3       7               10.6           10.6
#> 16    2006      3      87                0              0  
#> # ℹ 11 more variables: basal_area_alive_m2_ha <dbl>,
#> #   basal_area_dead_m2_ha <dbl>, vol_alive_m3_ha <dbl>,
#> #   vol_dead_standing_m3_ha <dbl>, vol_bole_alive_m3_ha <dbl>,
#> #   vol_bole_dead_m3_ha <dbl>, vol_log_m3_ha <dbl>, vol_deadw_m3_ha <dbl>,
#> #   stems_per_tree <dbl>, forest_reserve <chr>, year_dendro <dbl>
add_zeros(
  dataset = dendro_by_plot_species,
  comb_vars = c("plot_id", "period", "species"),
  grouping_vars = c("forest_reserve"),
  remove_na_records_in_comb_vars = "species"
)
#> # A tibble: 16 × 16
#>    plot_id period species number_of_trees_ha stem_number_ha
#>      <dbl>  <dbl>   <dbl>              <dbl>          <dbl>
#>  1     204      1      26               23.9           23.9
#>  2    2006      1       7               10.6           10.6
#>  3    2006      1      87                0              0  
#>  4     101      1      16               78.6          157. 
#>  5     101      1      28              157.           236. 
#>  6     101      1      87              128.           128. 
#>  7    1005      1      87               42.4           42.4
#>  8    2006      2       7               10.6           10.6
#>  9    2006      2      87                0              0  
#> 10     204      2      26               23.9           23.9
#> 11     101      2      16               78.6          157. 
#> 12     101      2      28                0              0  
#> 13     101      2      87              128.           128. 
#> 14    1005      2      87               42.4           42.4
#> 15    2006      3       7               10.6           10.6
#> 16    2006      3      87                0              0  
#> # ℹ 11 more variables: basal_area_alive_m2_ha <dbl>,
#> #   basal_area_dead_m2_ha <dbl>, vol_alive_m3_ha <dbl>,
#> #   vol_dead_standing_m3_ha <dbl>, vol_bole_alive_m3_ha <dbl>,
#> #   vol_bole_dead_m3_ha <dbl>, vol_log_m3_ha <dbl>, vol_deadw_m3_ha <dbl>,
#> #   stems_per_tree <dbl>, forest_reserve <chr>, year_dendro <dbl>
add_zeros(
  dataset = dendro_by_plot_species,
  comb_vars = c("plot_id", "period", "species"),
  grouping_vars = c("forest_reserve"),
  defaults_to_na = "stems_per_tree"
)
#> # A tibble: 16 × 16
#>    plot_id period species number_of_trees_ha stem_number_ha
#>      <dbl>  <dbl>   <dbl>              <dbl>          <dbl>
#>  1     204      1      26               23.9           23.9
#>  2    2006      1       7               10.6           10.6
#>  3    2006      1      87                0              0  
#>  4     101      1      16               78.6          157. 
#>  5     101      1      28              157.           236. 
#>  6     101      1      87              128.           128. 
#>  7    1005      1      87               42.4           42.4
#>  8    2006      2       7               10.6           10.6
#>  9    2006      2      87                0              0  
#> 10     204      2      26               23.9           23.9
#> 11     101      2      16               78.6          157. 
#> 12     101      2      28                0              0  
#> 13     101      2      87              128.           128. 
#> 14    1005      2      87               42.4           42.4
#> 15    2006      3       7               10.6           10.6
#> 16    2006      3      87                0              0  
#> # ℹ 11 more variables: basal_area_alive_m2_ha <dbl>,
#> #   basal_area_dead_m2_ha <dbl>, vol_alive_m3_ha <dbl>,
#> #   vol_dead_standing_m3_ha <dbl>, vol_bole_alive_m3_ha <dbl>,
#> #   vol_bole_dead_m3_ha <dbl>, vol_log_m3_ha <dbl>, vol_deadw_m3_ha <dbl>,
#> #   stems_per_tree <dbl>, forest_reserve <chr>, year_dendro <dbl>