Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checking numbers in range and text at the same time #28

Open
cnarvaa opened this issue Jan 31, 2020 · 8 comments
Open

checking numbers in range and text at the same time #28

cnarvaa opened this issue Jan 31, 2020 · 8 comments

Comments

@cnarvaa
Copy link

cnarvaa commented Jan 31, 2020

Hey, I'm very thankful for this library, it makes lots of things easier for us 😄 .
Well, at this moment we have to validate that a cell contains "a, -, 4.5". And, if the cell has a number, that number must be between 0 and 10.

The problem is that, since we have some chars there, we can not use the range InRangeValidation. The error says Unable to parse string "-" at position 4.
I was able to fix this using

def validate_score_number(number):
    valid_number = False
    try:
        valid_number = 0<=float(number)<=MAX_SCORE_VALUE
    except Exception as e:
        valid_number = False
    return valid_number
CustomElementValidation(validate_score_number, "")

It seems like a very straigth forward answer, but it works.
Some advice? should I implement this and publish a PR?

@multimeric
Copy link
Owner

multimeric commented Feb 1, 2020

Hi @cnarvaa. This is mostly a by-product of the fact that I use vectorized operations wherever possible in PandasSchema, meaning that I can't look at each element individually to check if it's a number or not in the same operation. It also means I can't use your exact code because it's not vectorized.

The ideal way I'd want this to work is using an OrValidation, specifically:

InListValidation(['a', '-']) | InRangeValidation(0, MAX_SCORE)

However, it is a problem that the InRangeValidation throws an error here, because that prevents you using an OrValidation. What I (or you, if you have time) need to do is get the InRangeValidation to not fail if it encounters a non-numeric. I think it should use pd.to_numeric(errors='coerce') so that non-numerics are treated as nan when checking if it's in the range, which will always return false.

@cnarvaa
Copy link
Author

cnarvaa commented Feb 3, 2020

I would also like to receive the exact reason why a value doesn't accomplish a rule when I use the boolean operators.
The error I'm getting is this:
(is not in the list of legal options (a, -)) <built-in function or_> (Score value should be between 0 and 10000))
I would like to get, in this case, that the value is not in the legal options.

I was checking the code but I don't get were to change that.

@multimeric
Copy link
Owner

Under the current version, you can probably do this:

v = InListValidation(['a', '-']) | InRangeValidation(0, MAX_SCORE)
v._custom_message = 'Your custom message'

However that's a bit ugly, and there should be a way of setting a custom message as part of the constructor. What I might aim to do is improve the _CombinedValidation constructor (which is the class that's implicitly created when you use the |) so that it takes a message, and you can do this:

_CombinedValidation(InListValidation(['a', '-']),  InRangeValidation(0, MAX_SCORE), operator.or_, message='Your custom message')

Then I'll have to actually add _CombinedValidation to the documentation, though it's currently hidden. I'll look into this once I've finished my large rewrite, but you if desperately need this fix soon then I would accept a PR.

@cnarvaa
Copy link
Author

cnarvaa commented Feb 5, 2020

Well, the main idea is to know the exact reason (condition) why Validation is false. Using a generic message is not correct for us in this moment.

Maybe something in between the moment we get the first false?
Let me check for that constructor.

@multimeric
Copy link
Owner

multimeric commented Feb 5, 2020

Yes, I understand you want a message that explains why the validation failed. But you can specify anything in the custom message:

v._custom_message = 'was not in the range (0, 100) and also was not one of the allowed characters "a" or "-"'

Maybe something in between the moment we get the first false?

Can you explain what you mean by this?

@cnarvaa
Copy link
Author

cnarvaa commented Feb 7, 2020

Hey Miguel,
So, imagine we are using 10 rules for a single column but even if only one rule is broke, all the rules are displayed.

We are getting this message as validation error (is not in the list of legal options (a, -)) <built-in function or_> (Score value should be between 0 and 10000))
Suppose we have a score = 11000. We want the message to say Score value should be between 0 and 10000, Without including all the possible validations.

Using a custom message, like the example you show me, is not pretty clear for the users. It's more clear if they know exactly what they did wrong. We are doing some tricks on our side but I think this could be a good feature.

I will check the _CombinedValidation function and then I can share an Idea.

Edit

I just checked the logic in _CombinedValidation.validate, it will be pretty hard to get the exact values that result in an error.
The operator result is a Set, we can get the error indexes with something like ~operator_results and then get the v_a and v_b results for those indexes. Finally, we can save the necessary indexes and keep going on with new and complicated logic 😢.

Perhaps, even if we can set something like self.detailed_message to show what had failed, It will be very difficult in a case like (A | B) & (C & D) because for each comparison we will need to store some states or messages.

I don't know if you have any idea about this. By now, I'm letting it go!

@multimeric
Copy link
Owner

Ah okay, I see your point. You want the message to only mention validations that actually failed, not the others that passed. I agree with that, for what it's worth. Since it's a bit tricky to do, it's something I'm endeavouring to fix in the next major version, which is currently in the form of this PR: https://github.com/TMiguelT/PandasSchema/pull/29/files.

@multimeric
Copy link
Owner

Update: this is half fixed by #30, but it still returns both validation messages if only one validation fails, so this is still an ongoing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants