Add function to retrieve taxonomy version and tests to verify version #132

jrdnbradford · 2024-07-11T23:47:24Z

Closes #131.

Description

This adds a GitHub Action Workflow to check whether the eBird taxonomy has updated and opens an issue if it has.

.github/workflows/taxonomy-check.yaml: a workflow file with two jobs. The first runs an R script that checks whether the tax dataset is the same as that returned from ebirdtaxonomy(). If they aren't identical the second job runs. This job checks whether there is already an issue with a particular title (env.ISSUE_TITLE) and opens one if it doesn't exist. This keeps the workflow from opening duplicate issues if no one can get to the first in time.
data-raw/tax_check.R: a script run by the taxonomy check that checks if the current dataset is identical with that retrieved from eBird
R/ebirdtaxonomy.R: I updated this to sort comNameCodes and sciNameCodes, which are both characters that are comma-separated and multi-valued. In testing I found the API often returns these codes in different orders within the column (how fun!). By sorting, we ensure easier comparability between the package dataset and the most recent from eBird
R/zzz.R: added a sort_comma_separated function to sort the above codes
data-raw/tax.R: updated the comments. Looks like use_data() used to be called with internal = TRUE and the old location was left over
data/tax.rda: the same ole taxonomy we know and love, but with comNameCodes and sciNameCodes sorted for easier comparison.

Let me know what I can do to make this better. 🐦

Add GHA to check taxonomy

slager · 2024-07-12T02:32:52Z

Thanks! Most of my current comments are centered around simplifying this a bit.

I don't think we want to re-sort the taxonomy, either in ebirdtaxonomy() or in the rda file. It should currently be returned in taxonomic order, which is the desired order. If there are issues, I'd recommend reordering only at the point of comparison.

All functions/code for sorting/comparison would probably best just go in the R file in data-raw/, I don't see a need to create a new package function in zzz.R since it is unlikely to be used anywhere else.

In other words, I think this can/should be done without changing anything in R/, and without changing the current tax.rda.

I'm not clear on how the badge would get produced for this, and feel like it is probably not necessary.

The taxonomy update cycle is quite predictable and seasonal. This probably only needs to check between about the beginning of August and the end of November or so.

For naming the issue, I'd recommend appending a datestamp so that it doesn't get confused next year when there's already an issue by the same name.

@sebpardo might have other comments. I'd be happy to give this a test run next but I think it would be nice to boil it down a bit as suggested above.

sebpardo · 2024-07-12T16:35:58Z

I agree that we want to avoid sorting inside ebirdtaxonomy() and especially individual columns and reassigning them into the original data frame as this is ripe for causing issues down the line. If at all, we should sort the whole data frame at once.

A potentially simpler way of checking for version updates is to create a function that uses eBird's Taxonomic Version API. This would require for our tax data frame to have a version attribute set when it's created that could be checked against the TRUE/FALSE value in the "latest" field of that API request. Checking this way, we avoid having to do any taxonomic list comparisons and we'd also have a function for another API request for which there's none at the moment. Getting two birds stoned at once!

Thoughts?

jrdnbradford · 2024-07-12T16:56:01Z

Awesome, thanks for this feedback @slager and @sebpardo! It looks like the eBird's Taxonomic Version API is definitely the way to go here. Should definitely simplify things and make it easier to maintain. I'll make some changes!

Tax update action

jrdnbradford · 2024-07-13T13:30:13Z

Round 2. This is much better.

R/ebirdtaxonomy.R - now adds a version attribute to the data frame
data/tax.rda - now includes the version attribute
R/ebirdtaxonomyversion.R - new function that retrieves taxonomic version data
tests/ - expanded to test the new function, that the tax dataset version attribute can be set, and whether tax version is the latest
.github/workflows/taxonomy-version-check.yaml - now runs a single test file that verifies the version is the latest. If not it opens an issue. To prevent multiple, essentially identical issues from being opened, it checks if there is already an open issue with the right title and opens one if not.

slager · 2024-07-15T15:14:18Z

I need a little more time to actually review this but on first glance it's looking great @jrdnbradford

jrdnbradford · 2024-07-18T13:41:30Z

No rush here! I found this too late to use it myself, but it looks like you can test GitHub actions locally: https://github.com/nektos/act. Seems much better than committing and pushing tests of the action to the repo, if you'd like to test out the action.

jrdnbradford · 2024-07-20T00:52:57Z

It occurred to me that the GitHub Action here is actually redundant. I already added a test that checks whether the tax version is up to date. Instead of having this GitHub Action to maintain when it's only relevant for a small portion of the year, maybe it's better to add a schedule trigger to the R-CMD-check:

on:
  push:
    branches: [main, master]
  pull_request:
    branches: [main, master]
  schedule:
    - cron: '0 0 * 8-11 1' # Midnight on Mondays from August through November

When the version updates, the version check test will fail and the repo owners will get a workflow failure email to check out. Of course it won't be as explicit as the new GitHub Action I created, you'll need to open up the workflow logs to see it's actually an update issue. But maybe this explicitness is worth giving up so that you don't have the maintain this action.

Thoughts?

slager

This is looking good, had a few review comments.

.github/workflows/taxonomy-version-check.yaml

R/ebirdtaxonomyversion.R

tests/testthat/test-data.R

Tax update action

slager · 2024-08-06T15:44:38Z

I will defer to @sebpardo here.

I don't have strong preferences on this. As an eBird reviewer I do receive multiple emails every year letting me know the status of each taxonomic update as they happen.

sebpardo · 2024-08-06T23:43:42Z

Looks great. Regarding your suggested options, I think either of these two would be good:

throw in the unique bits of the tax version check into the R-CMD-check and run those steps only if the test step fails

remove the tax version check and add a schedule to R-CMD-check, with no issue created

The first would remove redundancy and would make it so that the tax-version-check bits would only run when tests fail, which is all we need. I personally like having an issue created when the taxonomy is out of date, but I would also be fine with the last option given its simplicity and the tests failing is enough of an indicator to fix things.

@slager, what are your thoughts on having R-CMD-check run on a regular interval, perhaps every 2 weeks?

@jrdnbradford, I really have no preference so I'd be happy with whichever option you choose to pursue.

slager · 2024-08-09T16:10:39Z

It seems fine to schedule the modified R CMD check action to do the taxonomy check! The existing R CMD check GHA workflow was very boilerplate from usethis, so in the future if something breaks with this, it should be pretty easy to get the R CMD check action back to the basic functionality we have now.

As an aside, speaking of automated GitHub Actions, another thing we had talked about doing at one point is scheduling one to run at pretty long intervals that would do the R CMD check tests, but with @sebpardo 's API key and without using VCR. If we tightened up the tests a bit and implemented that, this could catch changes to the eBird API, changes to which don't seem very well documented these days.

jrdnbradford · 2024-08-12T13:30:38Z

@slager and @sebpardo awesome, thanks for your continued help and discussion on this. Based on the discussion here I will opt for adding a schedule to R-CMD-check, with no issue created. The main arguments here being:

simplicity
there are already taxonomy update notifications sent to you all as eBird reviewers
running R-CMD-check on a regular basis will catch other potential issues with {rebird} and the eBird API earlier
running on a schedule seems to fit better with future testing aspirations for the package

I'll set this up and comment when ready.

Final updates

jrdnbradford · 2024-08-12T14:16:47Z

This should be set now. If you have other preferences for R-CMD-check's cron schedule please edit.

jrdnbradford added 5 commits July 11, 2024 16:37

Initial action

1a2027f

Add tax with sorted NameCodes

c6f005e

Add badge to README

876675f

Finalize GHA on

e02f688

Merge pull request #10 from jrdnbradford/tax-update-action

84dd1a6

Add GHA to check taxonomy

jrdnbradford added 16 commits July 12, 2024 21:29

Update from feedback

cdb6bb4

Test version check

45ebea3

Remove output

d6c3a4a

Fix if

b63e9ee

Add continue-on-error

0c80666

Load package from source

2e9b74d

Fix needs

8c35415

Test failure

eb2987e

Fix check-and-create-issue-job if

d8c61b2

Debugging if

3b49dc5

Re-add

8e2afdc

Test passed test

38df744

Update docs

1ab9b5c

Update test vars

be13030

Update cron comment

5e0b663

Merge pull request #12 from jrdnbradford/tax-update-action

b940ba8

Tax update action

Fix documentation typo

8b12cfe

jrdnbradford changed the title ~~Add GitHub Action to check taxonomy~~ Add function to retrieve taxonomy version and GitHub Action to verify version Jul 13, 2024

jrdnbradford added 2 commits July 13, 2024 14:06

Update README

c5d9609

Update README

e482fa8

slager self-requested a review July 27, 2024 16:34

slager reviewed Jul 27, 2024

View reviewed changes

.github/workflows/taxonomy-version-check.yaml Outdated Show resolved Hide resolved

R/ebirdtaxonomyversion.R Outdated Show resolved Hide resolved

tests/testthat/test-data.R Outdated Show resolved Hide resolved

jrdnbradford added 6 commits July 28, 2024 15:10

Update tax

ddb62ec

Test workflow_run

029645b

Merge branch 'master' into tax-update-action

21e6b31

Merge pull request #14 from jrdnbradford/tax-update-action

56236ef

Tax update action

Add failed test

14cfeb3

Test reusable workflow

ff8a43c

jrdnbradford added 3 commits August 12, 2024 09:47

Use scheduled R-CMD-check

a6cf1ef

Bump version and update changelog

dfc9212

Merge pull request #15 from jrdnbradford/tax-update-action

4ca3ca5

Final updates

sebpardo merged commit 2e74bdf into ropensci:master Aug 12, 2024
6 checks passed

jrdnbradford changed the title ~~Add function to retrieve taxonomy version and GitHub Action to verify version~~ Add function to retrieve taxonomy version and tests to verify version Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add function to retrieve taxonomy version and tests to verify version #132

Add function to retrieve taxonomy version and tests to verify version #132

jrdnbradford commented Jul 11, 2024

slager commented Jul 12, 2024

sebpardo commented Jul 12, 2024 •

edited

Loading

jrdnbradford commented Jul 12, 2024

jrdnbradford commented Jul 13, 2024

slager commented Jul 15, 2024

jrdnbradford commented Jul 18, 2024 •

edited

Loading

jrdnbradford commented Jul 20, 2024

slager left a comment

slager commented Aug 6, 2024

sebpardo commented Aug 6, 2024 •

edited

Loading

slager commented Aug 9, 2024

jrdnbradford commented Aug 12, 2024 •

edited

Loading

jrdnbradford commented Aug 12, 2024

Add function to retrieve taxonomy version and tests to verify version #132

Add function to retrieve taxonomy version and tests to verify version #132

Conversation

jrdnbradford commented Jul 11, 2024

Description

slager commented Jul 12, 2024

sebpardo commented Jul 12, 2024 • edited Loading

jrdnbradford commented Jul 12, 2024

jrdnbradford commented Jul 13, 2024

slager commented Jul 15, 2024

jrdnbradford commented Jul 18, 2024 • edited Loading

jrdnbradford commented Jul 20, 2024

slager left a comment

Choose a reason for hiding this comment

slager commented Aug 6, 2024

sebpardo commented Aug 6, 2024 • edited Loading

slager commented Aug 9, 2024

jrdnbradford commented Aug 12, 2024 • edited Loading

jrdnbradford commented Aug 12, 2024

sebpardo commented Jul 12, 2024 •

edited

Loading

jrdnbradford commented Jul 18, 2024 •

edited

Loading

sebpardo commented Aug 6, 2024 •

edited

Loading

jrdnbradford commented Aug 12, 2024 •

edited

Loading