Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement geo-traits for writing to WKT & perf improvement #124

Merged
merged 22 commits into from
Nov 26, 2024

Conversation

kylebarron
Copy link
Member

@kylebarron kylebarron commented Oct 26, 2024

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.
  • I ran cargo fmt

Changes:

This slightly changes the implementation for writing to WKT, partially based on @b4l's PR here geoarrow/geoarrow-rs#788.

While porting the code to use traits, I realized that there was an awful lot of string concatenation being performed. So for example, when writing a Polygon to WKT, every single coordinate would be allocated to a separate string, and then the coords of each ring would be concatenated and then all rings would be concatenated.

wkt/src/types/polygon.rs

Lines 42 to 52 in e1d838d

let strings = self
.0
.iter()
.map(|l| {
l.0.iter()
.map(|c| format!("{} {}", c.x, c.y))
.collect::<Vec<_>>()
.join(",")
})
.collect::<Vec<_>>()
.join("),(");

This instead writes coordinates directly to the output writer. So perhaps the performance differences in the discussion here #118 were not primarily due to ryu.

This PR also implements writing for geo-traits objects, and delegates the std::fmt::Display implementation to the underlying geo-traits impl.

Benchmark against main:

> cargo bench --bench write
Gnuplot not found, using plotters backend
to_string small wkt     time:   [719.98 ns 723.19 ns 726.51 ns]
                        change: [-33.175% -32.566% -32.039%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  5 (5.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe

to_string big wkt       time:   [2.3337 ms 2.3425 ms 2.3524 ms]
                        change: [-44.006% -43.647% -43.297%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

geo: serialize small wkt string
                        time:   [818.73 ns 826.57 ns 836.32 ns]
                        change: [-44.024% -37.443% -31.192%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

geo: serialize big wkt string
                        time:   [2.4689 ms 2.5142 ms 2.5899 ms]
                        change: [-45.157% -42.370% -39.852%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

geo: write small wkt    time:   [815.24 ns 821.34 ns 831.63 ns]
                        change: [-30.751% -28.806% -27.455%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

geo: write big wkt      time:   [2.4814 ms 2.4877 ms 2.4941 ms]
                        change: [-49.589% -45.273% -42.171%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)
  14 (14.00%) low severe
  4 (4.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

I also added another bench for writing the geo::Geometry object directly instead of first converting it to a wkt::Wkt. It's not 100% apples to oranges because I can't use std::io::sink() for a std::fmt::Write trait input. But it's still faster 🤷‍♂️

geo: write small wkt    time:   [809.53 ns 812.46 ns 815.71 ns]
                        change: [-1.9040% -1.1064% -0.5136%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

geo: write big wkt      time:   [2.5784 ms 2.6684 ms 2.8063 ms]
                        change: [+3.6982% +7.2652% +13.562%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) high mild
  11 (11.00%) high severe

geo: write small wkt using trait
                        time:   [689.63 ns 692.96 ns 696.94 ns]
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

geo: write big wkt using trait
                        time:   [2.3611 ms 2.3872 ms 2.4283 ms]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

@kylebarron
Copy link
Member Author

All existing tests pass; just a question of whether geo-traits should be excluded from default features because it depends on geo-types, which is excluded.

Copy link
Member

@michaelkirk michaelkirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, can you address the todo! before marking a PR as ready for review? I don't want to accidentally miss one and then merge code that willfully explodes on the happy path. It seems like a good way to get users to not trust you.

@kylebarron
Copy link
Member Author

Well I consider the todo (at least for this pr) "I don't know the right way to handle this, I'm open to advice"

Is there a better way to signify this?

@michaelkirk
Copy link
Member

Draft PRs are a great way to get line by line feedback and signify "please don't merge this" as is.

I'll also often leave comments on my own PR with specific questions and relevant context at the spot I'm seeking help.

@urschrei
Copy link
Member

Another alternative / complementary approach is to deny todo! in Clippy so a merge becomes gated on no todos

@kylebarron
Copy link
Member Author

Draft PRs are a great way to get line by line feedback and signify "please don't merge this" as is.

Ok! I've found different expectations in different communities on github of whether draft means "don't look at this yet" or "I'm soliciting feedback on things, but it's not yet ready for merge"

@kylebarron kylebarron marked this pull request as draft October 29, 2024 18:51
@kylebarron
Copy link
Member Author

Here the remaining questions are:

  • Should geo-traits not be a required dependency, because it depends on geo-types?
  • How should we handle input geometries with unknown dimension? Should we default 3D to XYZ and document it? Should we have a user parameter for which dimension to interpret unknown 3D geometries as?

I should be able to fix the rect and triangle issues separately

@kylebarron
Copy link
Member Author

Other questions:

  • Should we change the existing geo_types_to_wkt behavior? That clones geo-types objects into wkt objects, and then writes those. Could we deprecate that entire module?
  • This is a PR into the other branch. Any opinions on whether to review and merge that PR first?

let max_coord = rect.max();

// Note: Even if the rect has more than 2 dimensions, we omit the other dimensions when
// converting to a Polygon.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do this for Rect's?

Instead can it look more like the triangle impl which avoids this loss of dims and heap allocation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Triangle and Line store all coordinates, just with the simplifying case that they only have a specific number of points. Rect doesn't do that, or else Rect would store 4 coordinates, not 2. We need to do some sort of conversion from the min and max coordinates to a continuous ring.

It's obvious how to do this for 2D input, but less clear how to do this for 3D or 4D. I'd suggest that we should error for 3D/4D, as the user can choose how to transform a 3D or 4D Rect.

Or perhaps we could even change the RectTrait so that it doesn't have the same dimensionality as the others. Because Rect means "axis aligned bounding box" and when you have 3 or 4 axes, it's less clear how that maps to simple-features.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh geeze, yeah that's a mess.

Given that we're only handling 2d, can we at least avoid the vec allocation in that case?

As for erroring on 3-d/4-d, that seems reasonable, assuming you mean a runtime error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can avoid the vec allocation, just makes the code slightly more complicated.

Also, what should we do with the error? Currently it returns std::fmt::Error and there's no error type in this crate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to propose something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an Error enum and now error on non-2d Rect input.

src/to_wkt/geo_trait_impl.rs Outdated Show resolved Hide resolved
src/types/coord.rs Show resolved Hide resolved
src/to_wkt/mod.rs Outdated Show resolved Hide resolved
@kylebarron
Copy link
Member Author

Let's get #123 merged and then we can come back to this to decide how we want to write Rects

Base automatically changed from kyle/geo-traits to main November 26, 2024 05:47
@kylebarron kylebarron marked this pull request as ready for review November 26, 2024 19:08
@kylebarron
Copy link
Member Author

I updated the Rect writing, changed the todo!() to Error and this should be good for review now.

Copy link
Member

@michaelkirk michaelkirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

I left a bunch of stylistic nitpicks, but I don't feel that strongly about any of them. Apply whatever makes sense to you and merge away.

fn from(value: Error) -> Self {
match value {
Error::FmtError(err) => err,
_ => std::fmt::Error,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: std::fmt here, but core::fmt at the top — settle on one.

Copy link
Member Author

@kylebarron kylebarron Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, oops. I don't actually know the difference between std and core. I changed to std everywhere. I hope that's ok

Copy link
Member

@michaelkirk michaelkirk Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core is a subset of std.

std re-exports all of core and then has some other fancier functionality. So referencing things from core or std is equivalent, it just makes me twitch a little to see both used interchangeably in such close proximity. 🤪

The distinction is typically only relevant if you are trying to target a no-std build for very lightweight platforms (like an embedded system) which might not have support for the std lib.


pub fn write_point<T: WktNum + fmt::Display, G: PointTrait<T = T>, W: Write>(
g: &G,
f: &mut W,
Copy link
Member

@michaelkirk michaelkirk Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f: &mut W,
f: &mut impl Write

It's a little more readable to use impl Trait for these kinds of params.

style nit #2: I think conventionally the impl Write is the first param for free functions, e.g. :

So all together...

pub fn write_point<T: WktNum + fmt::Display, G: PointTrait<T = T>>(f: &mut impl Write, g: &G)

// e.g.
write_point(&mut some_writable, &some_point)

(similar for other methods in this file)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little more readable to use impl Trait for these kinds of params.

Happy to switch. Can you clarify what you mean by "these kinds"? Do you just mean "simple bounds without any parameters"? Like you switched the impl Write but not the impl PointTrait

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think anything that can be written as impl Trait probably should be. Maybe there's an exception, but I can't think of one.

src/to_wkt/geo_trait_impl.rs Outdated Show resolved Hide resolved
src/to_wkt/geo_trait_impl.rs Outdated Show resolved Hide resolved
src/to_wkt/geo_trait_impl.rs Outdated Show resolved Hide resolved
@kylebarron kylebarron merged commit 458a73f into main Nov 26, 2024
1 check passed
@kylebarron kylebarron deleted the kyle/geo-traits-writer branch November 26, 2024 20:25
f.write_str("(")?;
write_coord_sequence(f, coords.iter(), PhysicalCoordinateDimension::Two)?;
Ok(f.write_char(')')?)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be as well be direct like:

write!(
        f,
        "POLYGON ({0} {1},{2} {1},{2} {3},{0} {3},{0} {1})",
        min.x(),
        min.y(),
        max.x(),
        max.y(),
 )?;

It was not materialized at the time it was copied; sorry for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants