-
-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to add 'standard' base64 bytes support #446
Conversation
Byron#442 flags an issue where some APIs respond with non-valid base64 bytes values for the "URL safe" flavor of configuration. This adds support for a "standard" wrapper adjacent to the URL safe one with the intention of finding a way to flag which structures should use which configuration.
I ran the regenerate process and pushed everything to https://github.com/emarcotte/google-apis-rs/tree/regenerate, then pointed a contrived program to it which was able to successfully read a secret that would not decode as url-safe. |
Fantastic! I think the solution could be (instead of relying to have per-API settings for this which need to be created by hand in absence of a way to infer this) to have a wrapper that tries both encodings and fails only once both attempts fail. That way we only have to choose the most common encoding to be tried first in order to avoid too many wasted cycles (even though it would probably detect early when the encoding doesn't pan out so failure is cheap anyway). What do you think? |
I'll give it a shot sometime this week. I'm not an expert in base64 so my only question would be if there's any possible scenario where we do an incorrect, but non-erroring parse. BTW; any hints on that clippy lint? |
Thanks for your help!
Me neither, but: the question here is if decoding could succeed by one but produce garbage. And I'd think the answer for this is strictly: no. Of course, this might just be wishful thinking, but I am inclined to believe it's true and not worry about it.
Each time the Rust version increments The none-lint failure is more of a problem though as it seems python related (Oh I have so learned my lesson when it comes to python: do not use for anything that has to last). Thanks again. |
On the "fetching" side I think a fallback approach can work but I am not so confident that on the sending side. If say PubSub msg payloads need to be URL safe then we'd be posting invalid data which is a bit less ideal than burning cycles in decoding. Would you expect that we only do the fallback on reading side? BTW I did finally realize google standard APIs CLAIM to be URL safe variant but clearly that isn't followed very precisely (or the docs are bad). |
Right, I agree! The fallback would only work for the reading side. The question really is if the user has a way to know. Is there something in the API description that indicates what is what? I am pretty sure that the generators that Google does provide for other programming languages have figured this out, so maybe one could learn from there? |
Haven't forgotten this, just been swamped. Been trying to research how go and python clients work and mostly just getting more confused how this fixed anything and why the original code didn't work... |
I think this is related to some APIs acting differently, but I also never validated this myself. So me it seemed the fix broke more than it fixed and maybe this tells use we should revert first and then try to fix the original issue another way. If you agree, please feel free to open a revert-PR for that. |
Sorry, just dropping in. I assume the issue is random apis use random different base64 standards, and it's not part of the metadata? It could be like 50/50 urlsafe vs standard? What about one of the following:
|
Uhh, so AFAICT the golang go library uses the standard go |
Maybe it can just be switched back to that then. From my experience, after it changed to what it is now the number of issues multiplied - so what was a fix for few turned out to be a problem for many. Maybe you can find the commit and revert it, which should then be standard base64 as well. Alternatively, figure out a way to let the user handle encoding and decoding to a greater extend. By the way, I am very sorry you all have to suffer through this and my hands-off approach to the project. Too many people use it and it's in a barely-community-maintained state - right now it's not good enough. I simply can't understand why Google doesn't provide their own bindings for Rust. |
Looking at python, there's two bytes handlers registered here: https://github.com/googleapis/proto-plus-python/blob/161f89f5b7a6f493602ed8890ee9a11ebbc77836/proto/marshal/marshal.py (I think this is the library python uses) The first one comes from googles protobuf library, and I think the "bytes" type has id 18. The 2nd one is defined locally, with id 12. I think maybe the python methods might be automatically generated based on protobuf specs? The first one uses standard for encode, and urlsafe for decode ❓ https://github.com/protocolbuffers/protobuf/blob/a7b0421c78412baa48704d727601a17ac0f451d1/python/google/protobuf/json_format.py AFAICT ... that's using id type 12 though. Maybe I read the 18 thing wrong. The 2nd one uses urlsafe. |
One wonders why google does a lot of things |
And this is way better still than writing it from scratch, especially if you don't mind reviewing and merging! |
I'm going to try some read/write api with a couple libraries and see what goes over the wire. |
So doing a file upload with checksum=crc32c Golang sends
receives:
Python sends
receives
but no body because apparently in python it's impossible to log http response bodies. There is a lot of explicit encoding/decoding though in both, stuff that looks like it wasn't automatically generated (from protobuf or otherwise). The checksum is like that, so it might not be a good example. |
As someone who seems deeply familiar with implementing GCP api libraries, if you don't mind a totally out of left-field ping @codyoss - would you be able to lend advice on what base64 encoding is canonical here? (or maybe confirm that it's a case-by-case basis thing) Otherwise I think:
Would be the safest... |
Was this the commit? #379 - If I'm reading that, before it was just In that case I'm not sure overrides make sense. I kind of do feel reverting is safest and easiest here, and having |
Thanks a lot for your help, I will look at the PR in a bit!
Such a mechanism exists in principle, as each API can have overrides to any of its properties, but the implementation would have to be adjusted to use that to drive the decoding or encoding of values. With #463 that can probably be postponed indefinitely though. |
FWIW I made #463 . Having base64/datetime automatically parsed would be awesome but I don't have any great ideas. Maybe if someone else wants to tackle it it could be un-reverted. |
@andrewbaxter can you provide me with a little more context what are you encoding/decoding? It depends on which part of a req/resp you are dealing with which type to use. Generally speaking message bodies: standard RFC 4648. (all off the top of my head) |
Oh yeah! Thanks for taking the time to respond! I think it's just the JSON request/response bodies from the APIs so it wouldn't include query parameters or auth. The generation comes from, for example: https://developers.google.com/discovery/v1/type-format says
So the assumption was that this should be web safe base64 everywhere, but people encountered various APIs which are returning standard base64 strings. AFAICT the golang implementation seems to contradict that and is nearly all standard base64, but there are a few things that appear to use urlsafe, and python appears to use urlsafe if I read the code correctly... I was wondering if you could give a hint on which one is correct, or if you had hints on the correct way to deal with base64 in json in an API implementation. So you think standard encoding would be the safest to use here? Do you know if there are a lot of apis that deviate from that (i.e. few enough to be fixable by hand)? |
You are correct for our Go libraries. I can't speak for the python ones -- I assume some libraries have either fall back behaviour or just handle bytes in different ways at a language level. A good test case for handling bytes is a body is a pubsub message receive RPC.
Some fields laid out in discovery are query params and not a part of a body
Yes |
Perfect, thanks @codyoss ! In that case @Byron would you mind if we do this?
I just generated the urlsafe code from this branch locally and I'll do some testing. I'm not sure if there are tests that need to be fixed, it looks like the jobs expired - are you able to rerun the tests? Or I could make a new MR (unless @emarcotte wanted to continue this). |
Actually it doesn't let me re-run the tests even, so I think it will be easiest if you would build up on this PR and make it work as much as possible. Thanks so much for pushing this fix to completion. |
Hey guys, sorry i've been absent for months, I've been completely slammed by Real Life (TM). I had intended to just go back to byte value but never found time :( Edit: bytes, not string, brain is fried :( |
No worries! Cool, I'll take it over then if that's alright. |
I made #464 with a new build, no new changes |
Great! After all, the PR could be merged and apparently we all lost our track here in this PR somehow. Thanks @andrewbaxter for guiding us to completion. Before re-releasing all crates I think it's worth to take another look and make small last-minute changes in the next couple of days. Then when deemed ready, I can cut the release. Thanks everyone. |
Awesome, thank you too for help with the merge! Sounds good to me. |
#442, #445, and #444 all flag an issue where some APIs respond with non-valid base64 bytes values for the "URL safe" flavor of configuration.
This adds support for a "standard" wrapper adjacent to the URL safe one with the intention of finding a way to flag which structures should use which configuration.
I'm still trying to figure out how to properly test this and I have no clue how we could mark some as URL safe and some as standard. Open to any tips!