Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization #255

Open
wants to merge 105 commits into
base: master
Choose a base branch
from
Open

Serialization #255

wants to merge 105 commits into from

Conversation

GreyCat
Copy link
Member

@GreyCat GreyCat commented Aug 1, 2023

No description provided.

GreyCat and others added 30 commits March 24, 2017 05:37
…iting that gets expression to write properly
…r to use new KaitaiStruct.* abstract classes
…ry size checks, terminator inclusion checks, etc
generalmimon and others added 15 commits January 2, 2023 14:12
These languages use UniversalFooter, which implements
fetchInstancesFooter, but fetchInstancesHeader does nothing there - so
there was a misplaced footer that did not correspond to any active
block, and all generated code was typically broken by that.
See b5c2e92

{,write}alignToByte() methods are now called inside the runtime library as needed:
kaitai-io/kaitai_struct_python_runtime@1cb84b8
+ let --read-write imply --zero-copy-substream=false, see
kaitai-io/kaitai_struct#1060 (comment)
Comment on lines 372 to 379
var wasUnaligned = false
seq.foreach { (attr) =>
val nowUnaligned = isUnalignedBits(attr.dataType)
if (wasUnaligned && !nowUnaligned)
lang.alignToByte(lang.normalIO)
lang.attrWrite(attr, attr.id, defEndian)
wasUnaligned = nowUnaligned
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to avoid using var and imperative. foldLeft seems to do the trick here, cleanly conveying the semantics and allowing e.g. advanced parallelization of such loops:

Suggested change
var wasUnaligned = false
seq.foreach { (attr) =>
val nowUnaligned = isUnalignedBits(attr.dataType)
if (wasUnaligned && !nowUnaligned)
lang.alignToByte(lang.normalIO)
lang.attrWrite(attr, attr.id, defEndian)
wasUnaligned = nowUnaligned
}
seq.foldLeft(false) { case (wasUnaligned, attr) =>
val nowUnaligned = isUnalignedBits(attr.dataType)
if (wasUnaligned && !nowUnaligned)
lang.alignToByte(lang.normalIO)
lang.attrWrite(attr, attr.id, defEndian)
nowUnaligned
}

Copy link
Member

@generalmimon generalmimon Aug 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, one of the major changes in Java and Python runtime libraries related to serialization was that all alignToByte insertion logic is handled in the runtime libraries themselves and the alignToByte() call generation is disabled in the compiler for both Java and Python:

// NOTE: the compiler does not need to output alignToByte() calls for Java anymore, since the byte
// alignment is handled by the runtime library since commit
// https://github.com/kaitai-io/kaitai_struct_java_runtime/commit/1bc75aa91199588a1cb12a5a1c672b80b66619ac
override def alignToByte(io: String): Unit = {}

// NOTE: the compiler does not need to output alignToByte() calls for Python anymore, since the byte
// alignment is handled by the runtime library since commit
// https://github.com/kaitai-io/kaitai_struct_python_runtime/commit/1cb84b84d358e1cdffe35845d1e6688bff923952
override def alignToByte(io: String): Unit = {}

(and now I see a small mistake in the comment in PythonCompiler - technically it should be "the compiler does not need to output alignToByte() align_to_byte()" per Python naming)


So I guess the piece of code you're reviewing can be simplified to just lang.attrWrite(attr, attr.id, defEndian).

out.puts
out.puts("def _fetch_instances(self):")
out.inc
out.puts("pass")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a limitation to make the insertion of pass conditional? Like in readHeader with isEmpty. For this particular function it doesn't look so straightforward given that attrs seems to always be non-empty.

Although innocuous, having a "dangling" pass stands out on first read and makes one wonder if there was some kind of bug in the code generation or in the source ksy file. Probably a nit though!

def _fetch_instances(self):
    pass
    self.attr_01._fetch_instances()
    ...

Same occurrences in:

Copy link
Member

@generalmimon generalmimon Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a limitation to make the insertion of pass conditional? Like in readHeader with isEmpty. For this particular function it doesn't look so straightforward

Yep, it's not very straightforward with how the current compiler code is internally structured. The problem is that when the compiler generates a method header (typically in the *Header methods), it doesn't know whether that function will end up having some body or not. In case of readHeader, we're just very lucky that the body will be empty if and only if there are seq fields. Things get a bit more complicated in case of writeHeader, but it's doable - generated _write methods might not be empty even if there are no seq fields, provided there are parse instances that need their "write itself upon access" flags initialized:

lang.writeHeader(defEndian, !instances.values.exists(i => i.isInstanceOf[ParseInstanceSpec]) && seq.isEmpty)

For _check or _fetchInstances (let alone something reused for many different purposes like condIfHeader!), trying to predict ahead of time with an ultimate boolean expression whether the function will or won't be empty borders on insanity (again, with the current code structure). It's of course theoretically possible, just absolutely unmaintainable and definitely not worth spending time on.

I'll rather take all the superfluous passes - the worst thing that can happen is that someone inspects the generated code and the passes "stand out" to them, as you put it. Far worse would be if we tried to eliminate them, but in that effort we forget to insert a pass in some cases when it should be there, which means the entire generated Python code will be rejected by Python interpreter and people will actually have to patch these manually because KS compiler produced invalid code.

So until there's time and willingness to refactor the compiler code so that we can reliably know whether the function body is empty or not at the time of generating its header, I don't think there's any point in doing anything about this. And this is very low priority for me, given that it only affects code purity. Almost every KS issue should have a higher priority than this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the long and insightful answer! This confirms my suspicion that it was tricky. I agree this is very low priority, sorry for the noise[1].
Thanks also for the tremendous effort done in serialization. I'm super excited about this feature :D

[1] At the risk of adding more noise, just in case I wanted to point out that an empty docstring is an alternative to using pass for an empty body function. But probably without content it's still not worth it.

Copy link
Member

@generalmimon generalmimon Oct 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness: I've realized it wouldn't be hard to only insert passes when they're needed if we added pass in the *Footer method instead of the *Header method. At the time of when *Footer is called, we already know whether there was some code inserted since the corresponding *Header.

I may explore this idea one day.

Relevant tests were added in these commits:

- Java: kaitai-io/kaitai_struct_tests@e92fb33
- Python: kaitai-io/kaitai_struct_tests@e7869f0

This commit is needed for the `testCheckBadValidOldIo` /
`test_check_bad_valid_old_io` test methods to pass.

The _check() method is intended to verify pure data consistency and is
supposed to be called at the time when the actual `_io` is not available
yet (or is not in the correct state) and should not be used even if it's
not `null`. Before this commit, if we wanted to initialize a KS object
by reading an existing stream and then edit the data and write them,
_check() would read the position from the old `_io` used for reading and
report it in the validation error, which is wrong.
@generalmimon
Copy link
Member

@GreyCat Could you please continue with the review?

@dakhnod
Copy link

dakhnod commented Oct 27, 2024

@GreyCat @generalmimon Hey, how is this getting along? Would be great to have serialization support in main!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants