class: center, middle ![:scale 30%](/coding-club/assets/images/coding_club_logo_1.png) # 25 MAY 2021 ## INBO coding club Exclusively on INBOflix --- class: center, middle ![:scale 90%](/coding-club/assets/images/20210525/20210525_badge_data_joining.png) --- class: center, middle ![:scale 100%](/coding-club/assets/images/20210525/20210525_cheatsheet_dplyr.png) [Download cheatsheet here](https://github.com/inbo/coding-club/blob/master/cheat_sheets/20181129_cheat_sheet_data_transformation.pdf) --- class: center, middle ### How to get started? Check the [Each session setup](https://inbo.github.io/coding-club/gettingstarted.html#each-session-setup) to get started. ### First time coding club? Check the [First time setup](https://inbo.github.io/coding-club/gettingstarted.html#first-time-setup) section to setup. --- class: left, middle ## Coding club tables **Like** a card club, we will split us in different tables (aka _breakout rooms_) and we will help each other! We return all together after a fixed amount of time to discuss each challenge apart. **Unlike** a card club, in our club there is no winner, neither within a table nor among tables! ![:scale 65%](/coding-club/assets/images/breakout_rooms.png) --- class: center, middle ### Share your code during the coding session! Go to https://hackmd.io/qjtMSYw4Tm2-V60C_hPx5Q?both and start by adding your name in section "Participants".
--- class: left, middle # Download data and code Today no R code script. We put all dataframes in a unique `.Rda` file You can download the material: - automatically via `inborutils::setup_codingclub_session()`* - manually** from GitHub folder [coding-club/data/20210525](https://github.com/inbo/coding-club/blob/master/data/20210525/)
__\* Note__: you can use the date in "YYYYMMDD" format to download the coding club material of a specific day, e.g. run `setup_codingclub_session("20201027")` to download the coding club material of October, 27 2020. If date is omitted, the date of today is used. For all options, check the [tutorial online](https://inbo.github.io/tutorials/tutorials/r_setup_codingclub_session/).
__\*\* Note__: check the getting started instructions on [how to download a single file](https://inbo.github.io/coding-club/gettingstarted.html#each-session-setup)
--- class: left, middle # Load libraries and data The R package [`tidylog`](https://github.com/elbersb/tidylog#tidylog) can be very helpful as it provides feedback about dplyr and tidyr operations. ```r install.packages("tidylog") library(tidyverse) library(tidylog) ``` ```r load("./data/20210525/20210525_data.Rda") ``` --- class: left, middle # Data description Today we work on both dummy data and more serious biological data. ## Dummy data Two dataframes: `coding_club_core_team` and `bmk_team` ## Biological data GBIF butterfly checklist data, inspired by Dirk's [Eurobutt checklist](https://www.gbif.org/dataset/f9af6ffd-febc-4626-b2e8-809b1c60fa01#description). - `taxon`: dataframe with taxonomic information of butterflies found in West-Europe. Each butterfly has a unique identifier (column `id`) - `vernacularname`: vernacular names in English. Link to `taxon` in column `taxon_id` - `distribution_BE`, `distribution_NL`, `distribution_LU` : data.frames about distribution and threat status of butterflies in Belgium (BE), the Netherlands (NL) and Luxembourg (LU). Link to `taxon` in column `taxon_id` --- background-image: url(/coding-club/assets/images/background_challenge_1.png) class: left, middle # Challenge 1 1. How to add orcid information to `coding_club_core_team`? 2. How are the columns ordered? Are the columns of `coding_club_core_team` on the left? Try to get the same result but put them on the right 3. Join `coding_club_core_team` with `bmk_team` and retain only people who are part of both 4. How to invert the order of the columns? 5. Now, join both again but retain ALL people --- class: left, middle # Intermezzo: combine variables vs cases In number 5 of the 1st challenge, we not only add columns (_variables_), but also rows (_cases_). Why can't we do this using functions which combine rows? ![:scale 90%](/coding-club/assets/images/20210525/20210525_cobmine_variables_cases.png) --- background-image: url(/coding-club/assets/images/background_challenge_2.png) class: left, middle # Challenge 2 1. Distributions from Belgium, Netherlands and Luxembourg should not overlap. How can you check it? There are several ways to do it, but try to use a data joining technique, which is likely the shortest and more readable way 2. Merge the three `distribution_*` dataframes into a data.frame called `distribution` 3. Check that all taxa in `vernacularname` point to taxa in `taxon` 4. Which taxa in `taxon` do not have vernacular names? --- background-image: url(/coding-club/assets/images/background_challenge_3.png) class: left, middle # Challenge 3 1. Add `vernacularname` information to `taxon`. Save the result as a new df called `extended_taxon`. What happens to columns with the same name? Distinguish them by adding suffix "_taxon" to the column(s) from `taxon`, and "_distribution" to the column(s) from `distribution`. 2. Add `distribution` to `extended_taxon` and retain only the taxa found in Belgium, Netherlands or Luxembourg, i.e. only taxa in `distribution` 3. Dirk points to East, he wants to include butterfly data from the Balcans. His dear Albanian colleagues send him an Albanian national list based on the same format as the Belgian/Dutch/Luxembourg ones. He got three files: `taxon_AL`, `distribution_AL` and `vernacularname_AL`. Which species are found only in Albania? And which one in West Europe (`taxon`) but not in Albania? 4. Which species are found both in Albania (`taxon_AL`) and in West-Europe? 5. Help him to merge `taxon_AL`, `distribution_AL` and `vernacularname_AL` to `taxon`, `distribution` and `vernacularname` respecitvely --- class: center, middle # Feedback We would like to know your opinion about the coding club structure (breakout rooms, participation in discussing solutions, ...) ![:scale 90%](/coding-club/assets/images/time_for_review.jpg) --- class: left, middle ## Resources - R script with [challenge solutions](https://github.com/inbo/coding-club/blob/master/src/2021050525/20210525_challenges_solutions.R) - [video recording](https://vimeo.com/558065790) of this coding club is available on the [INBO vimeo channel](https://vimeo.com/inbo) - [dplyr package documentation](https://dplyr.tidyverse.org/) - [dplyr cheat sheet](https://github.com/inbo/coding-club/blob/master/cheat_sheets/20181129_cheat_sheet_data_transformation.pdf) - [tidylog package](https://github.com/elbersb/tidylog#tidylog) on GitHub - [tutorial on joining with dplyr](https://dplyr.tidyverse.org/articles/two-table.html) --- class: center, middle ![:scale 30%](/coding-club/assets/images/coding_club_logo_1.png) Room: 01.05 - Isala Van Diest(?)
Date: __24/06/2021__, van 10:00 tot 12:00
Subject: **functions in R**
(registration announced via DG_useR@inbo.be)