Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Frame validation megaissue #57

Open
multimeric opened this issue Mar 5, 2021 · 3 comments
Open

Data Frame validation megaissue #57

multimeric opened this issue Mar 5, 2021 · 3 comments
Assignees

Comments

@multimeric
Copy link
Owner

multimeric commented Mar 5, 2021

There is a general need for validations with a scope wider than just a single series. This includes DataFrame level validations, as well as multiple dependent Series validations, such as "ensure each row is distinct, using two columns".

I am aware of this need, and am (very slowly) working on this feature in this branch: https://github.com/TMiguelT/PandasSchema/tree/bitwise. However this has been slow progress as I don't have a lot of time to devote to this project.

I have made this issue so that I can close the duplicate issues with slightly different requests that ultimately come down to this.

@praveentiru
Copy link

I had a need for composite key validation where I had to validate that all rows are unique when two columns are combined. I created a custom validation to address this. The constructor for validation is as below:
CompositeDistinctValidation(sibling=source['Sales Order Line Number'])

Here, I am providing the other column series as input. If this signature is ok, I can provide the same code. Else, let me know if you have any other thoughts. I can work on a PR for same.

@vovavili
Copy link

Well, absence of this feature makes me sad. For the moment this feature is unavailible, does anyone have an idea as to how to validate a value in a pandas dataframe based on value in another field for that specific row? Is biting the bullet and using painfully slow df.iterrows the only way to do this?

Maybe we can set up some sort of collective bounty system to get this megaissue going? I'd be willing to shell out 10 or 15 euro personally.

@vovavili
Copy link

vovavili commented Dec 19, 2021

I had a need for composite key validation where I had to validate that all rows are unique when two columns are combined. I created a custom validation to address this. The constructor for validation is as below: CompositeDistinctValidation(sibling=source['Sales Order Line Number'])

Here, I am providing the other column series as input. If this signature is ok, I can provide the same code. Else, let me know if you have any other thoughts. I can work on a PR for same.

I am a bit confused, sorry. How does your source dataframe look like? What is the output of this custom validator? How would you use this code to, say, resolve issue outlined in example from #55?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants