Exploring biodiversity data is tidier than ever with {galah}

Dax Kellie

ALA4R

ALA4R still had problems


  • No function naming convention
  • Confusing syntax
  • Inconsistent behaviour



  • abbreviations: aus()
  • snake case: ala_fields()
  • single words: occurrences(), images()
  • contractions: fieldguide()

ALA4R still had problems


  • No function naming convention
  • Confusing syntax
  • Inconsistent behaviour



  • unclear function names: ala_list(), ala_lists(), specieslist()
  • abbreviated argument names: wkt, fq, qa
  • Required solr queries: "taxon_name:\"Alaba vibex\""

ALA4R still had problems


  • No function naming convention
  • Confusing syntax
  • Inconsistent behaviour



  • Functions return either:
    • a data.frame
    • a list
    • a PDF

tidyverse

The tidyverse brought a set of recognised standards and syntax

galah

galah

  • Query the ALA (and other national GBIF nodes)
  • Use tidy, pipe-able syntax

galah

Lookup Narrow a query Run a query
show_all() galah_identify() atlas_counts()
search_all() galah_filter() atlas_occurrences()
galah_select() atlas_species()
galah_group_by() atlas_media()
galah_geolocate()

Build a query

library(galah)

galah_call() |>
  galah_identify("Eolophus roseicapilla") |> # galahs
  galah_filter(year >= 2019,
               stateProvince == "New South Wales") |>
  galah_group_by(year, dataResourceName) |>
  atlas_counts()

Build a query

library(galah)

galah_call() |>
  galah_identify("Eolophus roseicapilla") |> # galahs
  galah_filter(year >= 2019,
               stateProvince == "New South Wales") |>
  galah_group_by(year, dataResourceName) |>
  atlas_counts()
# A tibble: 15 × 3
   dataResourceName            year  count
   <chr>                       <chr> <int>
 1 eBird Australia             2021  17823
 2 eBird Australia             2020  17054
 3 eBird Australia             2019  13535
 4 NSW BioNet Atlas            2020   2218
 5 NSW BioNet Atlas            2019   2026
 6 NSW BioNet Atlas            2021    634
 7 NSW BioNet Atlas            2022    308
 8 iNaturalist Australia       2021    509
 9 iNaturalist Australia       2020    448
10 iNaturalist Australia       2022    437
11 iNaturalist Australia       2019    232
12 Earth Guardians Weekly Feed 2019    135
13 Earth Guardians Weekly Feed 2020    100
14 Earth Guardians Weekly Feed 2021     52
15 Earth Guardians Weekly Feed 2022     38

Build a query

library(galah)

galah_call() |>
  galah_identify("Eolophus roseicapilla") |> # galahs
  galah_filter(year >= 2022,
               stateProvince == "New South Wales") |>
  galah_select(scientificName, decimalLongitude, decimalLatitude) |>
  atlas_occurrences()
# A tibble: 809 × 3
   scientificName        decimalLongitude decimalLatitude
   <chr>                            <dbl>           <dbl>
 1 Eolophus roseicapilla             151.           -34.5
 2 Eolophus roseicapilla             149.           -36.4
 3 Eolophus roseicapilla             152.           -30.5
 4 Eolophus roseicapilla             151.           -32.9
 5 Eolophus roseicapilla             149.           -35.4
 6 Eolophus roseicapilla             148.           -34.6
 7 Eolophus roseicapilla             149.           -32.3
 8 Eolophus roseicapilla             149.           -36.4
 9 Eolophus roseicapilla             151.           -33.9
10 Eolophus roseicapilla             151.           -32.9
# … with 799 more rows
# ℹ Use `print(n = ...)` to see more rows

galah 1.5.0

Expanding to other Living Atlases

library(gt)
show_all_atlases() |> gt()
atlas institution acronym url
Australia Atlas of Living Australia ALA https://www.ala.org.au
Austria Biodiversitäts-Atlas Österreich BAO https://biodiversityatlas.at
Brazil Sistemas de Informações sobre a Biodiversidade Brasileira SiBBr https://sibbr.gov.br
Estonia eElurikkus NA https://elurikkus.ee
France Inventaire National du Patrimoine Naturel INPN https://inpn.mnhn.fr
Guatemala Sistema Nacional de Información sobre Diversidad Biológica de Guatemala SNIBgt https://snib.conap.gob.gt
Portugal GBIF Portugal GBIF.pt https://www.gbif.pt
Spain GBIF Spain GBIF.es https://www.gbif.es
Sweden Swedish Biodiversity Data Infrastructure SBDI https://biodiversitydata.se
United Kingdom National Biodiversity Network NBN https://nbn.org.uk

Expanding to other Living Atlases

galah_config(atlas = "Spain")

galah_call() |>
  galah_identify("reptilia") |>
  atlas_counts()
# A tibble: 1 × 1
   count
   <int>
1 213909

Expanding to other Living Atlases

galah_config(atlas = "Brazil")

galah_call() |>
  galah_identify("reptilia") |>
  atlas_counts()
# A tibble: 1 × 1
   count
   <int>
1 304368

Expanding to other Living Atlases

galah_config(atlas = "Sweden")

galah_call() |>
  galah_identify("reptilia") |>
  atlas_counts()
# A tibble: 1 × 1
  count
  <int>
1 44864

Expanding to other Living Atlases

library(purrr)
library(tibble)
library(dplyr)

atlases <- show_all_atlases()

counts <- map(atlases$atlas, ~ {
  galah_config(atlas = .x)
  atlas_counts()
  }) |> bind_rows()

tibble(
  atlas = atlases$atlas, 
  n = counts$count) |> 
  arrange(desc(n)) |>
  gt() |> 
  fmt_number(columns = n,
             decimals = 0)
atlas n
United Kingdom 208,656,365
Australia 112,510,239
Sweden 103,417,222
France 87,443,384
Spain 38,295,465
Brazil 23,828,179
Portugal 16,043,865
Austria 8,032,930
Estonia 7,002,639
Guatemala 3,586,634

Downloading biodiversity data is tidier than ever

  • galah makes downloading data like wrangling data with dplyr

  • Package architecture is flexible for other biodiversity databases

  • Lots of documentation, ALA Labs




Thank you


Dax Kellie
Data Analyst | Science & Decision Support | ALA
e: dax.kellie@csiro.au
t: @daxkellie
gh: @daxkellie

galah development team
Martin Westgate
Matilda Stevenson
Shandiya Balasubramaniam
Peggy Newman


These slides were made using Quarto & RStudio