Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in check_extension_eventids #72

Open
diodon opened this issue Aug 17, 2019 · 4 comments
Open

error in check_extension_eventids #72

diodon opened this issue Aug 17, 2019 · 4 comments

Comments

@diodon
Copy link

diodon commented Aug 17, 2019

the argument field is not used when extracting the matching rows (line 2):
rows <- which(!extension[[field]] %in% event$eventID)

The correct form that actually uses the field argument, should be:
rows <- which(!extension[[field]] %in% event[[field]])

Also, a more informative message could be produced using the function arguments:

Here the full edited function:

check_extension_eventids <- function(event, extension, field = "eventID") {
  rows <- which(!extension[[field]] %in% event[[field]])
  if (length(rows) > 0) {
    return(data.frame(
      field = field,
      level = "error",
      row = rows,
      message = paste0(field, " ", extension[[field]][rows], " has no corresponding ", field, " in ", deparse(substitute(event))),
      stringsAsFactors = FALSE
    ))
  } else {
    return(data_frame())
  }
}
@pieterprovoost
Copy link
Member

@diodon Do you have an example where the check fails? The field in the extension which points to the core can have any name (often just id) while the the event core will always have an eventID field. That's why extension[[field]] is checked against event$eventID.

There is a test here which ensures that the field parameter is taken into account: https://github.com/iobis/obistools/blob/master/tests/testthat/test_check_eventids.R#L52

It may theoretically be possible to point from the extension to another field in the core instead of eventID, but I don't think I have ever seen this and in that case there would need to be two parameters (coreField and extensionField).

@diodon
Copy link
Author

diodon commented Aug 18, 2019

The function could be also useful to check if that occurrenceID in the MOF files matches with the same field in the occurrence file. I tried to do that kind of check and of course it failed as it checks against the hardcoded eventID in the "event" file.

I agree that the function is intended to check the eventID but it could be used also for my purpose with the suggested changes.

@sformel-usgs
Copy link

I encountered a use case for the most recent comment in this issue. occurrenceID will always link the occurrence core and DNA extension, not eventID. I suggest either adding an analogous function for occurrenceID, or a more generalized function, as discussed above.

@sformel-usgs
Copy link

Actually, I guess it would be relatively simple to make the analogous function. @pieterprovoost , if you think this would be useful, I'm happy to submit a PR for this function. I can also add some example content to the README.

function (occurrence, extension, field = "occurrenceID") 
{
  rows <- which(!extension[[field]] %in% occurrence$occurrenceID)
  if (length(rows) > 0) {
    return(data.frame(field = field, level = "error", row = rows, 
      message = paste0(field, " ", extension[[field]][rows], 
        " has no corresponding occurrenceID in the core"), 
      stringsAsFactors = FALSE))
  }
  else {
    return(data_frame())
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants