Nodes with single value in multivalued field triggering validation errors #354

kevinschaper · 2021-11-02T02:37:34Z

I noticed that I was getting a lot of these validation errors:

[ERROR][INVALID_NODE_PROPERTY_VALUE_TYPE] FB:FBrf0013765 - Multi-valued node property 'xref' expected to be of type '<class 'list'>'
[ERROR][INVALID_NODE_PROPERTY_VALUE_TYPE] FB:FBrf0013765 - Multi-valued node property 'authors' expected to be of type '<class 'list'>'
[ERROR][INVALID_NODE_PROPERTY_VALUE_TYPE] FB:FBrf0000035 - Multi-valued node property 'xref' expected to be of type '<class 'list'>'

When I looked at the file, I found that I had single values in multi-value fields (these were publication nodes, so it was xref and author). For example:

id	category	name	xref	provided_by	authors	creation_date	keywords	mesh_terms	summary	type
biolink:Publication|biolink:NamedThing  Centrosome inheritance in the male germ line of Drosophila requires hu-li tai-shao function.    FB:FBrf0187243          P.G. Wilson     2005                    ....absract text...      IAO:0000013

It looks like FB:FBrf0187243 and P.G. Wilson aren't recognized as lists simply because they only have a single value - am I right that the format for a multivalue field doesn't include any sort of brackets to denote a list, just the pipe separator?

I'm not sure what the right approach to fix it is, but my guess is that the tsv reader should be biolink-aware enough to put the single values into a list if it's multivalued field?

The text was updated successfully, but these errors were encountered:

deepakunni3 · 2021-11-02T08:06:59Z

@kevinschaper This is definitely a bug in KGX. It should be aware that xref, authors (and other fields) are multi-valued and parse them as such.

The fix should be straightforward. I'll have a PR for this soon. I think the way the multivalued properties are parsed is a little brittle (my bad!). This can be a good opportunity to tidy a few things up :)

sierra-moxon · 2022-08-16T21:35:45Z

@kevinschaper - is this impacting the pydantic validation you're doing downstream?

kevinschaper · 2022-10-14T00:50:22Z

Whoops, I should have replied a year ago. I was just looking at whether it would be practical to put kgx validate back into my pipeline and I noticed this problem again.

I'm not sure if I made a different issue for it at the time, but the biggest challenge is definitely the number of repeats for each type of error. It would definitely be more useful if it just reported the first N occurrences (5, 10, 30?) of each error, rather than an exhaustive list of each record that failed.

RichardBruskiewich · 2022-10-14T03:12:07Z

I'm not sure if I made a different issue for it at the time, but the biggest challenge is definitely the number of repeats for each type of error. It would definitely be more useful if it just reported the first N occurrences (5, 10, 30?) of each error, rather than an exhaustive list of each record that failed.

Hmm... didn't we fix the KGX reporting some time ago, to be less verbose?

goodb · 2022-11-02T15:55:12Z

Hi folks. I've been playing with kgx a bit and faced the same problem. I introduced this in my validate test to make the output a little more tractable. Seems to work for me in a validate/fix/validate dev cycle.

   validator.validate(the_graph)
    if len(validator.errors) > 0:
        for e in validator.errors:
            error_dict = validator.errors[e]
            for k in error_dict.keys():
                print("root error type: ", k, " specific errors:", error_dict[k].keys())
                for ek in error_dict[k].keys():
                    print(ek, " element count ", len(error_dict[k][ek]))
                    n = 0
                    for element in error_dict[k][ek]:
                        n += 1
                        print(element)
                        if n > n_error_examples:
                            break
            print("error")```

RichardBruskiewich · 2022-11-02T19:32:58Z

Resolved by #415 (will be in master once the PR is processed.. not sure about when the patch will show up in an official release...)

goodb · 2022-11-02T19:52:33Z

@RichardBruskiewich the problem with reporting very large numbers of errors isn't limited to the case here. Suppose that is a separate issue more related to the interface with the validation system.

RichardBruskiewich · 2022-11-03T15:32:44Z

Hi @goodb,
As I noted earlier (above) in this issue, I seem to recall that we rehabilitated the error reporting to be somewhat more concise and possibly hierarchical. That said, I've not looked at that aspect of the code base for well over a year or so.

That said, I'll take another look at it over the next few days to confirm the status of things. If the validation reporting is still problematic, we might wish to open up a separate ticket.

I'm also going to compare our latest Translator validation code (in the pypi.org reasoner-validator package) which generally validates Translator web service data (including knowledge graphs) to check if there is suitable overlap in validation processes to apply a DRY upgrade to the KGX validation.

Cheers
Richard

kevinschaper · 2023-03-16T15:43:17Z

It looks like #415 never made it in. I went looking for this issue because I just got the error with kgx 1.7.2, I wasn't able to run 2.0 in the cli, but I'll try running it as a module.

RichardBruskiewich · 2023-03-17T00:01:32Z

Hi @kevinschaper, I've just pinged Sierra to inquire about this since I'm not that active on the KGX front right now.

sierra-moxon · 2023-03-21T00:38:06Z

I went ahead and fixed up the changes in #415 to satisfy the good tests (see #439); I don't think they solve the entirety of this issue, so keeping this open pending more changes.

kevinschaper added the bug Something isn't working label Nov 2, 2021

kevinschaper assigned RichardBruskiewich Nov 2, 2021

RichardBruskiewich mentioned this issue Nov 2, 2022

Remove spurious validation test against scalars as not being lists... #415

Closed

RichardBruskiewich closed this as completed Nov 2, 2022

kevinschaper reopened this Mar 16, 2023

kevinschaper mentioned this issue May 29, 2024

Add validation of final KG monarch-initiative/monarch-app#723

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nodes with single value in multivalued field triggering validation errors #354

Nodes with single value in multivalued field triggering validation errors #354

kevinschaper commented Nov 2, 2021

deepakunni3 commented Nov 2, 2021

sierra-moxon commented Aug 16, 2022

kevinschaper commented Oct 14, 2022

RichardBruskiewich commented Oct 14, 2022

goodb commented Nov 2, 2022

RichardBruskiewich commented Nov 2, 2022

goodb commented Nov 2, 2022

RichardBruskiewich commented Nov 3, 2022

kevinschaper commented Mar 16, 2023

RichardBruskiewich commented Mar 17, 2023

sierra-moxon commented Mar 21, 2023 •

edited

Loading

Nodes with single value in multivalued field triggering validation errors #354

Nodes with single value in multivalued field triggering validation errors #354

Comments

kevinschaper commented Nov 2, 2021

deepakunni3 commented Nov 2, 2021

sierra-moxon commented Aug 16, 2022

kevinschaper commented Oct 14, 2022

RichardBruskiewich commented Oct 14, 2022

goodb commented Nov 2, 2022

RichardBruskiewich commented Nov 2, 2022

goodb commented Nov 2, 2022

RichardBruskiewich commented Nov 3, 2022

kevinschaper commented Mar 16, 2023

RichardBruskiewich commented Mar 17, 2023

sierra-moxon commented Mar 21, 2023 • edited Loading

sierra-moxon commented Mar 21, 2023 •

edited

Loading