Filter predicates
Damiano Oldoni
2024-12-09
Source:vignettes/filter-predicates.Rmd
filter-predicates.Rmd
This vignette shows what filter predicates are and how to use them in
get_*()
functions or in map_dep()
.
Setup
Load packages:
library(camtraptor)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
By loading package camtraptor
, a camera trap data
package called mica
is made available. This data package
contains camera trap data of musk rats and coypus. We will use this
variable from now on.
Filter predicates
All filter predicates are functions starting with pred
prefix. They can be distinguished in four categories based on the type
of inputs they accept:
- one argument, one value
- one argument, no value
- one argument, multiple values (vector)
- multiple predicates
They are called filter predicates because they build (dplyr) filter
statement. Filter predicates return objects of class
filter_predicate
, which are a particular kind of list with
the following slots:
-
arg
, the argument -
value
, the value -
type
, the type of filter predicate -
expr
, the filter dplyr expression
One argument - one value predicates
This filter predicates accept one argument and one value.
pred()
, pred_not()
The pred()
is the most basic predicates and refers to
equality statements. Example, if you want to select rows where column
a
is equal to 5:
pred(arg = "a", value = 5)
#> $arg
#> [1] "a"
#>
#> $value
#> [1] 5
#>
#> $type
#> [1] "equals"
#>
#> $expr
#> (a == 5)
#>
#> attr(,"class")
#> [1] "filter_predicate"
The opposite operator of pred()
(equals) is
pred_not
(notEquals):
pred_not(arg = "a", value = 5)
#> $arg
#> [1] "a"
#>
#> $value
#> [1] 5
#>
#> $type
#> [1] "notEquals"
#>
#> $expr
#> (a != 5)
#>
#> attr(,"class")
#> [1] "filter_predicate"
pred_gt()
, pred_gte()
,
pred_lt()
, pred_lte()
These predicates express >
(greaterThan),
>=
(greaterThanOrEqual),<
(lessThan) and
<=
(lessThanOrEqual) respectively. Example: if you want
to select rows where column a
is greater than 5:
pred_gt(arg = "a", value = 5)
#> $arg
#> [1] "a"
#>
#> $value
#> [1] 5
#>
#> $type
#> [1] "greaterThan"
#>
#> $expr
#> (a > 5)
#>
#> attr(,"class")
#> [1] "filter_predicate"
One argument - no value predicates
The predicate pred_na()
compares the argument against
NA. To select rows where column a
is NA:
pred_na(arg = "a")
#> $arg
#> [1] "a"
#>
#> $value
#> [1] NA
#>
#> $type
#> [1] "na"
#>
#> $expr
#> (is.na(a))
#>
#> attr(,"class")
#> [1] "filter_predicate"
To select all the rows where a
is not NA, you can use
the opposite predicate pred_notna()
:
pred_notna(arg = "a")
#> $arg
#> [1] "a"
#>
#> $value
#> [1] NA
#>
#> $type
#> [1] "notNa"
#>
#> $expr
#> (!is.na(a))
#>
#> attr(,"class")
#> [1] "filter_predicate"
One argument - multiple value predicates
These predicates accept a vector with multiple values as argument. To
get rows where column a
is one of the values
c(1,3,5)
:
pred_in(arg = "a", value = c(1,3,5))
#> $arg
#> [1] "a"
#>
#> $value
#> [1] 1 3 5
#>
#> $type
#> [1] "in"
#>
#> $expr
#> (a %in% c(1,3,5))
#>
#> attr(,"class")
#> [1] "filter_predicate"
The opposite of pred_in()
is
pred_notin()
:
pred_notin(arg = "a", value = c(1,3,5))
#> $arg
#> [1] "a"
#>
#> $value
#> [1] 1 3 5
#>
#> $type
#> [1] "notIn"
#>
#> $expr
#> (!(a %in% c(1,3,5)))
#>
#> attr(,"class")
#> [1] "filter_predicate"
multiple predicates: pred_and()
and
pred_or()
You can combine the predicates described above to make more complex
filter statements by using pred_and()
(AND operator) and
pred_or()
(OR operator).
Some examples. Select rows where column a
is equal to 5
and column b
is not NA:
pred_and(pred("a", 5), pred_notna("b"))
#> $arg
#> $arg[[1]]
#> [1] "a"
#>
#> $arg[[2]]
#> [1] "b"
#>
#>
#> $value
#> $value[[1]]
#> [1] 5
#>
#> $value[[2]]
#> [1] NA
#>
#>
#> $type
#> $type[[1]]
#> [1] "equals"
#>
#> $type[[2]]
#> [1] "notNa"
#>
#>
#> $expr
#> ((a == 5) & (!is.na(b)))
#>
#> attr(,"class")
#> [1] "filter_predicate"
Select rows where column a
is equal to 5 or column
b
is not NA:
pred_or(pred("a", 5), pred_notna("b"))
#> $arg
#> $arg[[1]]
#> [1] "a"
#>
#> $arg[[2]]
#> [1] "b"
#>
#>
#> $value
#> $value[[1]]
#> [1] 5
#>
#> $value[[2]]
#> [1] NA
#>
#>
#> $type
#> $type[[1]]
#> [1] "equals"
#>
#> $type[[2]]
#> [1] "notNa"
#>
#>
#> $expr
#> ((a == 5) | (!is.na(b)))
#>
#> attr(,"class")
#> [1] "filter_predicate"
Notice how these two predicates return filter_predicate
objects with the same structure as any other predicate, but with slots
arg
, value
and type
as long as
the number of input predicates they combine.
How to use filter predicates
The filter predicates are useful to select a subset of
deployments for the get_*()
functions and
the visualization function map_dep()
.
One predicate
Apply get_* functions only to the deployments with location name “B_DL_val 5_beek kleine vijver” or “B_DL_val 3_dikke boom”:
get_n_obs(mica,
pred_in("locationName",
c("B_DL_val 5_beek kleine vijver", "B_DL_val 3_dikke boom")))
#> df %>% dplyr::filter((locationName %in% c("B_DL_val 5_beek kleine vijver","B_DL_val 3_dikke boom")))
#> # A tibble: 18 × 3
#> deploymentID scientificName n
#> <chr> <chr> <int>
#> 1 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 Anas strepera 3
#> 2 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 Anas platyrhynchos 4
#> 3 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 Castor fiber 0
#> 4 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 Mustela putorius 0
#> 5 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 Vulpes vulpes 0
#> 6 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 Martes foina 0
#> 7 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 Ardea cinerea 0
#> 8 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 Ardea 0
#> 9 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 Homo sapiens 0
#> 10 577b543a-2cf1-4b23-b6d2-cda7e2eac372 Anas strepera 0
#> 11 577b543a-2cf1-4b23-b6d2-cda7e2eac372 Anas platyrhynchos 0
#> 12 577b543a-2cf1-4b23-b6d2-cda7e2eac372 Castor fiber 1
#> 13 577b543a-2cf1-4b23-b6d2-cda7e2eac372 Mustela putorius 3
#> 14 577b543a-2cf1-4b23-b6d2-cda7e2eac372 Vulpes vulpes 1
#> 15 577b543a-2cf1-4b23-b6d2-cda7e2eac372 Martes foina 1
#> 16 577b543a-2cf1-4b23-b6d2-cda7e2eac372 Ardea cinerea 0
#> 17 577b543a-2cf1-4b23-b6d2-cda7e2eac372 Ardea 0
#> 18 577b543a-2cf1-4b23-b6d2-cda7e2eac372 Homo sapiens 0
get_effort(mica,
pred_in("locationName",
c("B_DL_val 5_beek kleine vijver", "B_DL_val 3_dikke boom")))
#> df %>% dplyr::filter((locationName %in% c("B_DL_val 5_beek kleine vijver","B_DL_val 3_dikke boom")))
#> # A tibble: 2 × 4
#> deploymentID effort unit effort_duration
#> <chr> <dbl> <chr> <Duration>
#> 1 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 239. hour 859859s (~1.42 weeks)
#> 2 577b543a-2cf1-4b23-b6d2-cda7e2eac372 219. hour 786802s (~1.3 weeks)
get_n_species(mica,
pred_in("locationName",
c("B_DL_val 5_beek kleine vijver", "B_DL_val 3_dikke boom")))
#> df %>% dplyr::filter((locationName %in% c("B_DL_val 5_beek kleine vijver","B_DL_val 3_dikke boom")))
#> # A tibble: 2 × 2
#> deploymentID n
#> <chr> <int>
#> 1 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 2
#> 2 577b543a-2cf1-4b23-b6d2-cda7e2eac372 4
Multiple predicates
As shown above, you can combine several predicates for more complex filtering. E.g. calculate the number of species detected by the deployments with one of the location names B_ML_val 06_Oostpolderkreek and B_ML_val 07_Sint-Anna, or deployments further south than 50.7 degrees:
get_n_species(mica,
pred_or(
pred_in("locationName",
c("B_DL_val 5_beek kleine vijver", "B_DL_val 3_dikke boom")),
pred_lt("latitude", 50.7)))
#> df %>% dplyr::filter(((locationName %in% c("B_DL_val 5_beek kleine vijver","B_DL_val 3_dikke boom")) | (latitude < 50.7)))
#> # A tibble: 3 × 2
#> deploymentID n
#> <chr> <int>
#> 1 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 2
#> 2 577b543a-2cf1-4b23-b6d2-cda7e2eac372 4
#> 3 62c200a9-0e03-4495-bcd8-032944f6f5a1 2
Same syntax is valid for visualizing such information via
map_dep()
function:
map_dep(mica,
feature = "n_species",
pred_or(
pred_in("locationName",
c("B_DL_val 5_beek kleine vijver", "B_DL_val 3_dikke boom")),
pred_lt("latitude", 50.7)))
#> df %>% dplyr::filter(((locationName %in% c("B_DL_val 5_beek kleine vijver","B_DL_val 3_dikke boom")) | (latitude < 50.7)))
Notice also that you can pass as much predicates as you want to
get_*()
functions or map_dep()
by separating
them with comma: they will be combined internally using the AND
operator. E.g. to get effort of deployments southern than 51 degrees AND
eastern than 4 degrees, you can simplify this:
get_n_species(mica,
pred_and(pred_lt("latitude", 51),
pred_gt("longitude", 4)))
#> df %>% dplyr::filter(((latitude < 51) & (longitude > 4)))
#> # A tibble: 1 × 2
#> deploymentID n
#> <chr> <int>
#> 1 62c200a9-0e03-4495-bcd8-032944f6f5a1 2
by omitting the pred_and()
:
get_n_species(mica,
pred_lt("latitude", 51),
pred_gt("longitude", 4))
#> df %>% dplyr::filter((latitude < 51) & (longitude > 4))
#> # A tibble: 1 × 2
#> deploymentID n
#> <chr> <int>
#> 1 62c200a9-0e03-4495-bcd8-032944f6f5a1 2
This is similar to the behaviour of dplyr’s filter()
function, where this:
mica$data$deployments %>%
dplyr::filter(latitude < 51 & longitude > 4)
#> # A tibble: 1 × 24
#> deploymentID locationID locationName longitude latitude coordinateUncertainty
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 62c200a9-0e0… ce943ced-… B_DM_val 4_… 4.01 50.7 NA
#> # ℹ 18 more variables: start <dttm>, end <dttm>, setupBy <chr>, cameraID <chr>,
#> # cameraModel <chr>, cameraInterval <dbl>, cameraHeight <dbl>,
#> # cameraTilt <dbl>, cameraHeading <dbl>, timestampIssues <lgl>,
#> # baitUse <fct>, session <chr>, array <chr>, featureType <fct>,
#> # habitat <chr>, tags <chr>, comments <chr>, `_id` <chr>
is exactly the same as this:
mica$data$deployments %>%
dplyr::filter(latitude < 51, longitude > 4)
#> # A tibble: 1 × 24
#> deploymentID locationID locationName longitude latitude coordinateUncertainty
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 62c200a9-0e0… ce943ced-… B_DM_val 4_… 4.01 50.7 NA
#> # ℹ 18 more variables: start <dttm>, end <dttm>, setupBy <chr>, cameraID <chr>,
#> # cameraModel <chr>, cameraInterval <dbl>, cameraHeight <dbl>,
#> # cameraTilt <dbl>, cameraHeading <dbl>, timestampIssues <lgl>,
#> # baitUse <fct>, session <chr>, array <chr>, featureType <fct>,
#> # habitat <chr>, tags <chr>, comments <chr>, `_id` <chr>
Happy filtering!