-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes with single value in multivalued field triggering validation errors #354
Comments
@kevinschaper This is definitely a bug in KGX. It should be aware that The fix should be straightforward. I'll have a PR for this soon. I think the way the multivalued properties are parsed is a little brittle (my bad!). This can be a good opportunity to tidy a few things up :) |
@kevinschaper - is this impacting the pydantic validation you're doing downstream? |
Whoops, I should have replied a year ago. I was just looking at whether it would be practical to put kgx validate back into my pipeline and I noticed this problem again. I'm not sure if I made a different issue for it at the time, but the biggest challenge is definitely the number of repeats for each type of error. It would definitely be more useful if it just reported the first N occurrences (5, 10, 30?) of each error, rather than an exhaustive list of each record that failed. |
Hmm... didn't we fix the KGX reporting some time ago, to be less verbose? |
Hi folks. I've been playing with kgx a bit and faced the same problem. I introduced this in my validate test to make the output a little more tractable. Seems to work for me in a validate/fix/validate dev cycle.
|
Resolved by #415 (will be in master once the PR is processed.. not sure about when the patch will show up in an official release...) |
@RichardBruskiewich the problem with reporting very large numbers of errors isn't limited to the case here. Suppose that is a separate issue more related to the interface with the validation system. |
Hi @goodb, That said, I'll take another look at it over the next few days to confirm the status of things. If the validation reporting is still problematic, we might wish to open up a separate ticket. I'm also going to compare our latest Translator validation code (in the pypi.org reasoner-validator package) which generally validates Translator web service data (including knowledge graphs) to check if there is suitable overlap in validation processes to apply a DRY upgrade to the KGX validation. Cheers |
It looks like #415 never made it in. I went looking for this issue because I just got the error with kgx 1.7.2, I wasn't able to run 2.0 in the cli, but I'll try running it as a module. |
Hi @kevinschaper, I've just pinged Sierra to inquire about this since I'm not that active on the KGX front right now. |
I noticed that I was getting a lot of these validation errors:
When I looked at the file, I found that I had single values in multi-value fields (these were publication nodes, so it was xref and author). For example:
It looks like
FB:FBrf0187243
andP.G. Wilson
aren't recognized as lists simply because they only have a single value - am I right that the format for a multivalue field doesn't include any sort of brackets to denote a list, just the pipe separator?I'm not sure what the right approach to fix it is, but my guess is that the tsv reader should be biolink-aware enough to put the single values into a list if it's multivalued field?
The text was updated successfully, but these errors were encountered: