Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace regex with custom parsers #113

Closed
wants to merge 1 commit into from
Closed

replace regex with custom parsers #113

wants to merge 1 commit into from

Conversation

oll3
Copy link
Contributor

@oll3 oll3 commented May 20, 2024

This PR removes both the regex and lazy_static as direct dependencies. This mainly to decrease the binary size.

In release mode, with stripped binary (x86_64), a simple RRuleSet parse example is decreased with ~1.1 MiB compared to when build before this PR.
I haven't measured how it compares performance wise.

@fmeringdal
Copy link
Owner

Hey! Can you share the exact commands you ran to get that information? 🙏 I am a bit surprised the binary size would reduce as regex would still be a transitive dependency of rrule even if it was not directly specified in Cargo.toml.

@oll3
Copy link
Contributor Author

oll3 commented Jun 2, 2024

Hey! Can you share the exact commands you ran to get that information? 🙏 I am a bit surprised the binary size would reduce as regex would still be a transitive dependency of rrule even if it was not directly specified in Cargo.toml.

Hi and thanks for responding!

Yes, I think you are partly right... Though, if I read the dependency tree correct, regex is only an (indirect) build-time dependency of chrono-tz so it's never included in built binary.

My sample program looks like below (Cargo.toml+main.rs):

[package]
name = "rrules-test"
edition = "2021"
[profile.release]
strip = true
[dependencies]
rrule = "0.12"
fn main() {
    let exp = std::env::args().nth(1).unwrap();
    let rrule: rrule::RRuleSet = exp.parse().unwrap();
    println!("{:#?}", rrule);
}

When I build this program (in release mode) prior to this PR the size of the binary is 3979720 bytes. After the PR it's 2812328 bytes.

@oll3
Copy link
Contributor Author

oll3 commented Jun 3, 2024

Fixed a lint:

51 |         let (time, zulu_timezone_set) = if len >= 15 && len <= 16 && &val[8..9] == "T" {
   |                                            ^^^^^^^^^^^^^^^^^^^^^^ help: use: `(15..=16).contains(&len)`

@fmeringdal
Copy link
Owner

regex is only an (indirect) build-time dependency of chrono-tz so it's never included in built binary.

I missed that, makes sense.

I have to admit that I am a bit skeptical of merging this as the regex parsing has worked well and IIRC was copied from some other much more used rrule implementation in another language. The size reduction is quite impressive though!

Do you have a use-case where the size reduction would be beneficial? I can imagine that this crate is already too big to be used in any resource constrained devices

@oll3
Copy link
Contributor Author

oll3 commented Jun 3, 2024

Do you have a use-case where the size reduction would be beneficial? I can imagine that this crate is already too big to be used in any resource constrained devices

It's for use on a resource constrained device yes. :) Not that 1 MiB is impossible to fit but it's quite a waste of both precious IoT bandwidth and disk space.
But I do understand that tested and true is valuable so sure understand your side. Would adding more tests make any difference?

Some(part) => part.as_str() == "Z",
None => false,
// Parse date (YYYYMMDD).
let year = val[0..4]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will panic if the character at byte 3 is a multi-byte character.

Such an input would be absolute garbage, but the parser should not crash in such a case and return an error instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that valid RRULEs are strictly ASCII, you can probably treat this as a raw byte array instead of a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will panic if the character at byte 3 is a multi-byte character.

That's very true, thanks for catching! And I agree that it should be handled gracefully.

And yes, either ensuring that string is ascii or using .get(..) instead of [..] would probably solve this. I will add some tests and a fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that valid RRULEs are strictly ASCII, you can probably treat this as a raw byte array instead of a string.

Btw, raw byte arrays (&[u8]) doesn't have the convenient parsing methods for integers so that's why I didn't convert it but just checked that it's all ascii.

@oll3
Copy link
Contributor Author

oll3 commented Jun 10, 2024

Rebased on main and added test and fix for the error @WhyNotHugo pointed out.

Replace use of regex with custom parsers to decrease dependencies and binary size.
@oll3 oll3 closed this Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants