-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate bare items on construction #102
base: master
Are you sure you want to change the base?
Conversation
Do you see us keeping both Validated and unvalidated enum types, or do you intend to remove the unvalidated ones?
I think changing the enum in any way might count as a breaking change. In any case, as mentioned earlier, we need to make a breaking change to add the Date type anyway, so we're not constrained by that. |
Initially it was to keep the amount of touched LOC low, then I thought it might be nicer to transition over. Forgot at that moment that you either way wanted to introduce a breaking change. One could argue that the "struct wrappers" are not needed for all the enum variants, like Idea: let's rename the current enum members to something like |
Would they still be in the same enum as the validated types? Either way, I'd love to look at some code. The IETF list also has some suggestions about adding https://lists.w3.org/Archives/Public/ietf-http-wg/2023JanMar/0034.html |
Just as a quick update from some work on the weekend: I decided it's not really worth it to keep both the "unsafe" variants and the validated ones at the same time, as it becomes a bit weird to deal with. The parsing would have yielded "unsafe" variants, even though the parser validated that they're correct. So the code change will be a bit bigger, as it changes most of the test invocations from
Edit: The problem here was to expect that we pass parseable inputs. This was a wrong assumption on my end. We need serializable values and therefore use the Once that's done I'll open up this PR for review, with one commit per P.S.: Didn't know about the |
f1475a1
to
51fa5b3
Compare
@valenting I updated the PR description to explain the work that has happened. Let me know if there's anything controversial to it. Specifically I'm interested what you think about the whole I'm happy to be pointed to whatever you think is worth changing, I'm sure there's something I missed 🙂 |
// TODO: must_fail has to always fail on creation | ||
// As not all cases are moved yet, we take a two-step approach here: | ||
// Either creation fails or serialization fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can currently not be tackled, because Dictionary keys have their own validations that only happen on serialization, so for the time being we would have to live with the workaround or try to accommodate for valid keys on construction time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're just testing serialization, I think we can still create invalid bareItems if we simply don't use try_into, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I try to use e.g. the Integer
value struct in build_bare_item
, I get
cannot initialize a tuple struct which contains private fields
Which is the correct way for external usage. Not sure whether we can make the tests "part of the crate module tree" to allow it access to the pub(crate)
value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validating keys can be done similar to this PR with a new Key
value struct that is used with IndexMap
.
I have drafted the change here, but would probably take that in its own PR: zcei/sfv@validate-bare-items...zcei:sfv:validate-keys
One ergonomic advantage, but not sure whether it justifies the implementation overhead, would be to allow .get
with any unvalidated &str
and return None
early if it can't validate. Only for valid keys we would check back the IndexMap whether it contains the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're on the right track here.
My only concern is that all these try_into
seem overly verbose, Also it seems inellegant that some implement into while others are try_into. So I'm wondering if we should have some helper functions on BareItem that abstract away all the differences and easily allow you to create them.
fn new_boolean(value: bool) -> Result<BareItem, &str>
fn new_decimal(value: f64) -> Result<BareItem, &str>
fn new_decimal(value: &str) -> Result<BareItem, &str>
etc.
We also need to resolve the performance issue with the unnecessary serialization when constructing the bare item.
src/bare_item.rs
Outdated
type Error = &'static str; | ||
fn try_from(value: rust_decimal::Decimal) -> Result<Self, Self::Error> { | ||
let mut output = String::new(); | ||
Serializer::serialize_decimal(value, &mut output)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, doing an actual serialization in the try_from implementation seems less than optimal.
That means for every bare item that we construct, we allocate a string we then throw away. And then we do the serialization again when we actually serialize it into a buffer.
I think what we should do here is split the serialization from the validity checks, so try_from for Decimal would only check if the decimal is in the allowed range, and not actually serialize it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm not happy with validating by serializing either, and as mentioned in the PR description we could (and should 😅) move validations to a place where both can access it.
To get my idea validated and not change another hundreds of lines, I decided to keep it like this for the moment - but I'm totally supportive of the fact that this needs to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a new ValidateValue
trait that the value structs implement. It will either return a Result
where the Ok
value is either the input (passing back ownership) or a new "sanitized" value (important for Decimal
as we can round to three decimal places already)
// TODO: must_fail has to always fail on creation | ||
// As not all cases are moved yet, we take a two-step approach here: | ||
// Either creation fails or serialization fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're just testing serialization, I think we can still create invalid bareItems if we simply don't use try_into, right?
/// use rust_decimal::Decimal; | ||
/// # fn main() -> Result<(), &'static str> { | ||
/// let decimal_number = Decimal::from_f64(48.01).unwrap(); | ||
/// let bare_item: BareItem = decimal_number.try_into()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm starting to wonder if try_into is the most ergonomic solution.
Whouldn't it be nicer to have let decimal_number = BareItem::decimal_from_f64(48.01)?;
- and similar for the other subtypes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kinda like 3.i. from the issue (just with a better method naming than mine):
We can have static methods on the primitives that allow early validation, like BareItem::try_token("foo") which returns a Result of either the token or the validation error.
The actual token would be "unverified" in the sense that one could still create BareItem::Token("12345").Edit: this wouldn't apply now, as we would keep the value structs that can't be created externally without going through validation.
Happy to explore that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I do think both ways can be supported. TryFrom<f64>
could simply call BareItem::decimal_from_f64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@valenting what's your stance on having the external API only accepting f64
? Do you want to keep the usage of rust_decimal::Decimal
? The spec seems to only allow 12 digits on the integer component and 3 digits on the fractional component. f64 should cover that completely? 🤔
We can consider that for a follow-up change. For now let's keep using
Decimal.
…On Wed, 15 Feb 2023 at 10:51, Stephan Schneider ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/bare_item.rs
<#102 (comment)>:
> + /// ```
+ fn try_from(item: i64) -> Result<Self, Self::Error> {
+ Ok(BareItem::Integer(item.try_into()?))
+ }
+}
+
+impl TryFrom<rust_decimal::Decimal> for BareItem {
+ type Error = &'static str;
+ /// Converts `rust_decimal::Decimal` into `BareItem::Decimal`.
+ /// ```
+ /// # use sfv::{BareItem, FromPrimitive};
+ /// # use std::convert::TryInto;
+ /// use rust_decimal::Decimal;
+ /// # fn main() -> Result<(), &'static str> {
+ /// let decimal_number = Decimal::from_f64(48.01).unwrap();
+ /// let bare_item: BareItem = decimal_number.try_into()?;
@valenting <https://github.com/valenting> what's your stance on having
the external API only accepting f64? Do you want to keep the usage of
rust_decimal::Decimal? The spec seems to only allow 12 digits on the
integer component and 3 digits on the fractional component. f64 should
cover that completely? 🤔
—
Reply to this email directly, view it on GitHub
<#102 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALDEOLEM4LJZORBS5RRIT3WXSRKRANCNFSM6AAAAAAUCRR3FY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Aligns doc comments for each BareItem* value struct
Allowed ways to construct a
|
Closes #98
Hej folks,
as promised in the related issue, I gave a stab at validating the bare items on construction.
For the moment I have introducedBareItem::Validated*
variants adjacent to their unvalidated enum member counterparts. UsingDeref
I was able to quite easily pass them to their respectiveRefBareItem
version and thus could avoid a breaking change if wanted. I think eventually the shorthand version of aBareItem
member should be the safe variant, as shorter names tend to be used more frequently.To demonstrate that constructing thebare_item::*
structs without using(Try)From
is not possible, I've added an unused function in the bench setup. When using from within the same module tree, one can access the private struct members, so I had to do it outside of it.BareItem variant validations
I looked at all the currently existing enum variants and mapped out whether they have validations that run in the parse or serialize steps.
(Right now all have some form of parse validations, to verify integrity of the input strings. I decided later to not focus on the parsing step for the moment, so this is just mentioned for completeness.)
On the serialization side, all but
Boolean
andByteSeq
check for specific conditions on the backing data structure.For example
i64
supports a bigger number range than the spec describes for anInteger
.Token
s have to start with specific characters, etc.Approach
For each variant that needs validation while being serialized, their value struct implements a
TryFrom
which uses theSerializer::serialize_*
method to check for correctness on construction. Is is somewhat wasteful and in a follow up PR validation could be moved out to their respective structs and called upon construction and then be omitted during serialization.The value structs inner value is public within the crate. This means after parsing an input we can create the value structs without going through validation again.
For the two variants that don't need validation and can always be serialized (
Boolean
andByteSeq
) theFrom
trait is implemented. Backing value structs are created to a) keep consistency in the implementation and b) to allow for validations/parsing to be offloaded into the value structs without a breaking change if that is ever desired.Non-goals
As parsing usually happens from a complete structured field value and not their parts, I omitted this for the moment. This means right now you can only construct those value structs from their "native representation" and not from their serialized string value (e.g.
42_i64
can be converted into abare_item::Integer
, but"42"
can not).If ever needed, parsing can be offloaded into the value structs and
bare_item::Integer::parse("42")
or so could become reality.