New command line dh-validator.py tool for validationg csv,tsv,xls,xlsx data files against a schema.yaml file #450
ddooley
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
A new command-line dh-validate.py script simplifies the validation of DataHarmonizer-generated csv,tsv,xls,xlsx files. We look forward to feedback on using this below.
Basically, the linkml-validate command is good for the .json or .yaml data format, but the tabular csv,tsv,xls,xlsx input formats often don't validate well for two main reasons which are resolved by dh-validator.py generating a temporary .yaml file version of the tabular input with necessary adjustments made according to the given schema. dh-validator.py then sends this to linkml-validate for processing. The following adjustments are made:
We will be evolving this script to give a report of any miss-matched columns/fields, to facilitate having older tabular data validated in a newer LinkML schema version for example.
Beta Was this translation helpful? Give feedback.
All reactions