-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add function to retrieve taxonomy version and tests to verify version #132
Conversation
Thanks! Most of my current comments are centered around simplifying this a bit. I don't think we want to re-sort the taxonomy, either in All functions/code for sorting/comparison would probably best just go in the R file in data-raw/, I don't see a need to create a new package function in zzz.R since it is unlikely to be used anywhere else. In other words, I think this can/should be done without changing anything in I'm not clear on how the badge would get produced for this, and feel like it is probably not necessary. The taxonomy update cycle is quite predictable and seasonal. This probably only needs to check between about the beginning of August and the end of November or so. For naming the issue, I'd recommend appending a datestamp so that it doesn't get confused next year when there's already an issue by the same name. @sebpardo might have other comments. I'd be happy to give this a test run next but I think it would be nice to boil it down a bit as suggested above. |
I agree that we want to avoid sorting inside A potentially simpler way of checking for version updates is to create a function that uses eBird's Taxonomic Version API. This would require for our Thoughts? |
Awesome, thanks for this feedback @slager and @sebpardo! It looks like the eBird's Taxonomic Version API is definitely the way to go here. Should definitely simplify things and make it easier to maintain. I'll make some changes! |
Tax update action
Round 2. This is much better.
|
I need a little more time to actually review this but on first glance it's looking great @jrdnbradford |
No rush here! I found this too late to use it myself, but it looks like you can test GitHub actions locally: https://github.com/nektos/act. Seems much better than committing and pushing tests of the action to the repo, if you'd like to test out the action. |
It occurred to me that the GitHub Action here is actually redundant. I already added a test that checks whether the tax version is up to date. Instead of having this GitHub Action to maintain when it's only relevant for a small portion of the year, maybe it's better to add a on:
push:
branches: [main, master]
pull_request:
branches: [main, master]
schedule:
- cron: '0 0 * 8-11 1' # Midnight on Mondays from August through November When the version updates, the version check test will fail and the repo owners will get a workflow failure email to check out. Of course it won't be as explicit as the new GitHub Action I created, you'll need to open up the workflow logs to see it's actually an update issue. But maybe this explicitness is worth giving up so that you don't have the maintain this action. Thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good, had a few review comments.
I will defer to @sebpardo here. I don't have strong preferences on this. As an eBird reviewer I do receive multiple emails every year letting me know the status of each taxonomic update as they happen. |
Looks great. Regarding your suggested options, I think either of these two would be good:
The first would remove redundancy and would make it so that the tax-version-check bits would only run when tests fail, which is all we need. I personally like having an issue created when the taxonomy is out of date, but I would also be fine with the last option given its simplicity and the tests failing is enough of an indicator to fix things. @slager, what are your thoughts on having R-CMD-check run on a regular interval, perhaps every 2 weeks? @jrdnbradford, I really have no preference so I'd be happy with whichever option you choose to pursue. |
It seems fine to schedule the modified R CMD check action to do the taxonomy check! The existing R CMD check GHA workflow was very boilerplate from usethis, so in the future if something breaks with this, it should be pretty easy to get the R CMD check action back to the basic functionality we have now. As an aside, speaking of automated GitHub Actions, another thing we had talked about doing at one point is scheduling one to run at pretty long intervals that would do the R CMD check tests, but with @sebpardo 's API key and without using VCR. If we tightened up the tests a bit and implemented that, this could catch changes to the eBird API, changes to which don't seem very well documented these days. |
@slager and @sebpardo awesome, thanks for your continued help and discussion on this. Based on the discussion here I will opt for adding a schedule to R-CMD-check, with no issue created. The main arguments here being:
I'll set this up and comment when ready. |
This should be set now. If you have other preferences for R-CMD-check's cron schedule please edit. |
Closes #131.
Description
This adds a GitHub Action Workflow to check whether the eBird taxonomy has updated and opens an issue if it has.
.github/workflows/taxonomy-check.yaml
: a workflow file with two jobs. The first runs an R script that checks whether thetax
dataset is the same as that returned fromebirdtaxonomy()
. If they aren't identical the second job runs. This job checks whether there is already an issue with a particular title (env.ISSUE_TITLE
) and opens one if it doesn't exist. This keeps the workflow from opening duplicate issues if no one can get to the first in time.data-raw/tax_check.R
: a script run by the taxonomy check that checks if the current dataset is identical with that retrieved from eBirdR/ebirdtaxonomy.R
: I updated this to sortcomNameCodes
andsciNameCodes
, which are both characters that are comma-separated and multi-valued. In testing I found the API often returns these codes in different orders within the column (how fun!). By sorting, we ensure easier comparability between the package dataset and the most recent from eBirdR/zzz.R
: added asort_comma_separated
function to sort the above codesdata-raw/tax.R
: updated the comments. Looks likeuse_data()
used to be called withinternal = TRUE
and the old location was left overdata/tax.rda
: the same ole taxonomy we know and love, but withcomNameCodes
andsciNameCodes
sorted for easier comparison.Let me know what I can do to make this better. 🐦