replace regex with custom parsers #113

oll3 · 2024-05-20T13:14:36Z

This PR removes both the regex and lazy_static as direct dependencies. This mainly to decrease the binary size.

In release mode, with stripped binary (x86_64), a simple RRuleSet parse example is decreased with ~1.1 MiB compared to when build before this PR.
I haven't measured how it compares performance wise.

fmeringdal · 2024-06-02T14:58:35Z

Hey! Can you share the exact commands you ran to get that information? 🙏 I am a bit surprised the binary size would reduce as regex would still be a transitive dependency of rrule even if it was not directly specified in Cargo.toml.

oll3 · 2024-06-02T21:43:59Z

Hey! Can you share the exact commands you ran to get that information? 🙏 I am a bit surprised the binary size would reduce as regex would still be a transitive dependency of rrule even if it was not directly specified in Cargo.toml.

Hi and thanks for responding!

Yes, I think you are partly right... Though, if I read the dependency tree correct, regex is only an (indirect) build-time dependency of chrono-tz so it's never included in built binary.

My sample program looks like below (Cargo.toml+main.rs):

[package]
name = "rrules-test"
edition = "2021"
[profile.release]
strip = true
[dependencies]
rrule = "0.12"

fn main() {
    let exp = std::env::args().nth(1).unwrap();
    let rrule: rrule::RRuleSet = exp.parse().unwrap();
    println!("{:#?}", rrule);
}

When I build this program (in release mode) prior to this PR the size of the binary is 3979720 bytes. After the PR it's 2812328 bytes.

oll3 · 2024-06-03T06:50:35Z

Fixed a lint:

51 |         let (time, zulu_timezone_set) = if len >= 15 && len <= 16 && &val[8..9] == "T" {
   |                                            ^^^^^^^^^^^^^^^^^^^^^^ help: use: `(15..=16).contains(&len)`

fmeringdal · 2024-06-03T22:32:58Z

regex is only an (indirect) build-time dependency of chrono-tz so it's never included in built binary.

I missed that, makes sense.

I have to admit that I am a bit skeptical of merging this as the regex parsing has worked well and IIRC was copied from some other much more used rrule implementation in another language. The size reduction is quite impressive though!

Do you have a use-case where the size reduction would be beneficial? I can imagine that this crate is already too big to be used in any resource constrained devices

oll3 · 2024-06-03T22:46:15Z

Do you have a use-case where the size reduction would be beneficial? I can imagine that this crate is already too big to be used in any resource constrained devices

It's for use on a resource constrained device yes. :) Not that 1 MiB is impossible to fit but it's quite a waste of both precious IoT bandwidth and disk space.
But I do understand that tested and true is valuable so sure understand your side. Would adding more tests make any difference?

WhyNotHugo · 2024-06-10T08:00:49Z

rrule/src/parser/parsers.rs

-            Some(part) => part.as_str() == "Z",
-            None => false,
+        // Parse date (YYYYMMDD).
+        let year = val[0..4]


This will panic if the character at byte 3 is a multi-byte character.

Such an input would be absolute garbage, but the parser should not crash in such a case and return an error instead.

Given that valid RRULEs are strictly ASCII, you can probably treat this as a raw byte array instead of a string.

This will panic if the character at byte 3 is a multi-byte character.

That's very true, thanks for catching! And I agree that it should be handled gracefully.

And yes, either ensuring that string is ascii or using .get(..) instead of [..] would probably solve this. I will add some tests and a fix.

Given that valid RRULEs are strictly ASCII, you can probably treat this as a raw byte array instead of a string.

Btw, raw byte arrays (&[u8]) doesn't have the convenient parsing methods for integers so that's why I didn't convert it but just checked that it's all ascii.

oll3 · 2024-06-10T08:39:29Z

Rebased on main and added test and fix for the error @WhyNotHugo pointed out.

Replace use of regex with custom parsers to decrease dependencies and binary size.

WhyNotHugo reviewed Jun 10, 2024

View reviewed changes

oll3 requested a review from WhyNotHugo June 10, 2024 08:39

oll3 mentioned this pull request Jun 13, 2024

remove crate defined timezone constants #117

Closed

replace regex with custom parsers

2f881ce

Replace use of regex with custom parsers to decrease dependencies and binary size.

oll3 closed this Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace regex with custom parsers #113

replace regex with custom parsers #113

oll3 commented May 20, 2024

fmeringdal commented Jun 2, 2024

oll3 commented Jun 2, 2024

oll3 commented Jun 3, 2024

fmeringdal commented Jun 3, 2024

oll3 commented Jun 3, 2024

WhyNotHugo Jun 10, 2024

WhyNotHugo Jun 10, 2024

oll3 Jun 10, 2024

oll3 Jun 10, 2024

oll3 commented Jun 10, 2024

replace regex with custom parsers #113

replace regex with custom parsers #113

Conversation

oll3 commented May 20, 2024

fmeringdal commented Jun 2, 2024

oll3 commented Jun 2, 2024

oll3 commented Jun 3, 2024

fmeringdal commented Jun 3, 2024

oll3 commented Jun 3, 2024

WhyNotHugo Jun 10, 2024

Choose a reason for hiding this comment

WhyNotHugo Jun 10, 2024

Choose a reason for hiding this comment

oll3 Jun 10, 2024

Choose a reason for hiding this comment

oll3 Jun 10, 2024

Choose a reason for hiding this comment

oll3 commented Jun 10, 2024