Serialization #255

GreyCat · 2023-08-01T20:19:28Z

No description provided.

…tage

…andling

…iting that gets expression to write properly

…r to use new KaitaiStruct.* abstract classes

…f members in a collection

…or implementations

…ry size checks, terminator inclusion checks, etc

… is true

…tai_struct_formats

These languages use UniversalFooter, which implements fetchInstancesFooter, but fetchInstancesHeader does nothing there - so there was a misplaced footer that did not correspond to any active block, and all generated code was typically broken by that.

See b5c2e92 {,write}alignToByte() methods are now called inside the runtime library as needed: kaitai-io/kaitai_struct_python_runtime@1cb84b8

as in Java, see https://github.com/kaitai-io/kaitai_struct_compiler/blob/829a14f1e33e8e48eeae726c8a287a5967bcb668/shared/src/main/scala/io/kaitai/struct/languages/JavaCompiler.scala#L153

+ let --read-write imply --zero-copy-substream=false, see kaitai-io/kaitai_struct#1060 (comment)

(fix regression from d1f16dd)

GreyCat · 2023-08-05T18:25:02Z

shared/src/main/scala/io/kaitai/struct/ClassCompiler.scala

+    var wasUnaligned = false
+    seq.foreach { (attr) =>
+      val nowUnaligned = isUnalignedBits(attr.dataType)
+      if (wasUnaligned && !nowUnaligned)
+        lang.alignToByte(lang.normalIO)
+      lang.attrWrite(attr, attr.id, defEndian)
+      wasUnaligned = nowUnaligned
+    }


Let's try to avoid using var and imperative. foldLeft seems to do the trick here, cleanly conveying the semantics and allowing e.g. advanced parallelization of such loops:

Suggested change

var wasUnaligned = false

seq.foreach { (attr) =>

val nowUnaligned = isUnalignedBits(attr.dataType)

if (wasUnaligned && !nowUnaligned)

lang.alignToByte(lang.normalIO)

lang.attrWrite(attr, attr.id, defEndian)

wasUnaligned = nowUnaligned

}

seq.foldLeft(false) { case (wasUnaligned, attr) =>

val nowUnaligned = isUnalignedBits(attr.dataType)

if (wasUnaligned && !nowUnaligned)

lang.alignToByte(lang.normalIO)

lang.attrWrite(attr, attr.id, defEndian)

nowUnaligned

}

Actually, one of the major changes in Java and Python runtime libraries related to serialization was that all alignToByte insertion logic is handled in the runtime libraries themselves and the alignToByte() call generation is disabled in the compiler for both Java and Python:

kaitai_struct_compiler/shared/src/main/scala/io/kaitai/struct/languages/JavaCompiler.scala

Lines 497 to 500 in 2eca3de

// NOTE: the compiler does not need to output alignToByte() calls for Java anymore, since the byte

// alignment is handled by the runtime library since commit

// https://github.com/kaitai-io/kaitai_struct_java_runtime/commit/1bc75aa91199588a1cb12a5a1c672b80b66619ac

override def alignToByte(io: String): Unit = {}

kaitai_struct_compiler/shared/src/main/scala/io/kaitai/struct/languages/PythonCompiler.scala

Lines 486 to 489 in 2eca3de

// NOTE: the compiler does not need to output alignToByte() calls for Python anymore, since the byte

// alignment is handled by the runtime library since commit

// https://github.com/kaitai-io/kaitai_struct_python_runtime/commit/1cb84b84d358e1cdffe35845d1e6688bff923952

override def alignToByte(io: String): Unit = {}

(and now I see a small mistake in the comment in PythonCompiler - technically it should be "the compiler does not need to output alignToByte() align_to_byte()" per Python naming)

So I guess the piece of code you're reviewing can be simplified to just lang.attrWrite(attr, attr.id, defEndian).

See #255 (comment)

shared/src/main/scala/io/kaitai/struct/format/Identifier.scala

shared/src/main/scala/io/kaitai/struct/datatype/Endianness.scala

See #255 (comment)

Brachi · 2023-09-03T16:59:22Z

shared/src/main/scala/io/kaitai/struct/languages/PythonCompiler.scala

+    out.puts
+    out.puts("def _fetch_instances(self):")
+    out.inc
+    out.puts("pass")


Is there a limitation to make the insertion of pass conditional? Like in readHeader with isEmpty. For this particular function it doesn't look so straightforward given that attrs seems to always be non-empty.

Although innocuous, having a "dangling" pass stands out on first read and makes one wonder if there was some kind of bug in the code generation or in the source ksy file. Probably a nit though!

def _fetch_instances(self): pass self.attr_01._fetch_instances() ...

Same occurrences in:

checkHeader

checkInstanceHeader

condIfHeader

condRepeatCommonHeader

switchIfCaseFirstStart

switchIfCaseStart

switchIfElseStart

Is there a limitation to make the insertion of pass conditional? Like in readHeader with isEmpty. For this particular function it doesn't look so straightforward

Yep, it's not very straightforward with how the current compiler code is internally structured. The problem is that when the compiler generates a method header (typically in the *Header methods), it doesn't know whether that function will end up having some body or not. In case of readHeader, we're just very lucky that the body will be empty if and only if there are seq fields. Things get a bit more complicated in case of writeHeader, but it's doable - generated _write methods might not be empty even if there are no seq fields, provided there are parse instances that need their "write itself upon access" flags initialized:

kaitai_struct_compiler/shared/src/main/scala/io/kaitai/struct/ClassCompiler.scala

Line 333 in cb0c1eb

lang.writeHeader(defEndian, !instances.values.exists(i => i.isInstanceOf[ParseInstanceSpec]) && seq.isEmpty)

For _check or _fetchInstances (let alone something reused for many different purposes like condIfHeader!), trying to predict ahead of time with an ultimate boolean expression whether the function will or won't be empty borders on insanity (again, with the current code structure). It's of course theoretically possible, just absolutely unmaintainable and definitely not worth spending time on.

I'll rather take all the superfluous passes - the worst thing that can happen is that someone inspects the generated code and the passes "stand out" to them, as you put it. Far worse would be if we tried to eliminate them, but in that effort we forget to insert a pass in some cases when it should be there, which means the entire generated Python code will be rejected by Python interpreter and people will actually have to patch these manually because KS compiler produced invalid code.

So until there's time and willingness to refactor the compiler code so that we can reliably know whether the function body is empty or not at the time of generating its header, I don't think there's any point in doing anything about this. And this is very low priority for me, given that it only affects code purity. Almost every KS issue should have a higher priority than this.

Thanks a lot for the long and insightful answer! This confirms my suspicion that it was tricky. I agree this is very low priority, sorry for the noise[1].
Thanks also for the tremendous effort done in serialization. I'm super excited about this feature :D

[1] At the risk of adding more noise, just in case I wanted to point out that an empty docstring is an alternative to using pass for an empty body function. But probably without content it's still not worth it.

For completeness: I've realized it wouldn't be hard to only insert passes when they're needed if we added pass in the *Footer method instead of the *Header method. At the time of when *Footer is called, we already know whether there was some code inserted since the corresponding *Header.

I may explore this idea one day.

Relevant tests were added in these commits: - Java: kaitai-io/kaitai_struct_tests@e92fb33 - Python: kaitai-io/kaitai_struct_tests@e7869f0 This commit is needed for the `testCheckBadValidOldIo` / `test_check_bad_valid_old_io` test methods to pass. The _check() method is intended to verify pure data consistency and is supposed to be called at the time when the actual `_io` is not available yet (or is not in the correct state) and should not be used even if it's not `null`. Before this commit, if we wanted to initialize a KS object by reading an existing stream and then edit the data and write them, _check() would read the position from the old `_io` used for reading and report it in the validation error, which is wrong.

generalmimon · 2023-10-22T10:39:03Z

@GreyCat Could you please continue with the review?

... as in Java, see https://github.com/kaitai-io/kaitai_struct_compiler/blob/829a14f1e33e8e48eeae726c8a287a5967bcb668/shared/src/main/scala/io/kaitai/struct/languages/JavaCompiler.scala#L153 This commit was extracted from the `serialization` branch (originally e776c98), see #255

dakhnod · 2024-10-27T00:39:20Z

@GreyCat @generalmimon Hey, how is this getting along? Would be great to have serialization support in main!

GreyCat and others added 30 commits March 24, 2017 05:37

Added --read-write switch

de0938e

JavaMain: ensure that exceptions are really thrown, even on compile s…

d24a36d

…tage

Started PoC seq writer support in JavaCompiler

0b757b9

JavaCompiler: very basic bytes writing support

93b37a1

Allow simple serialization of FixedBytesType

75ec0c2

Translators: added strToBytes, implemented in Java translator

eed5d07

Added _i iteration number identifier

5cf4edd

JavaTranslator: added translation of _i

4a3535f

ClassTypeProvider: use constants + added ITERATOR_I handling

568964d

Writing: added string handling, some repeat-expr handling, BytesEos h…

9e7bffa

…andling

Serialization: basic support for user types and BytesLimitType

8fd1002

Merge branch 'master' into serialization

0d73de2

Serialization: implemented unprocess and slightly better user type wr…

ea06fec

…iting that gets expression to write properly

AllocateIOLocalVar: Added allocation of fixed and growing IOs

9c1793e

Serialization: implemented fixed-size preallocated IO buffers

9f0e331

JavaCompiler: generate new Readable/Writable interface implements

969401c

Serialization: more intricate process-on-top-of-user-type support

885469f

JavaCompiler: implemented writing of repeat-eos

0f3a2aa

Serialization: implemented switch types support, reworked JavaCompile…

e4a56f1

…r to use new KaitaiStruct.* abstract classes

Added basic _check implementation, added checks for a proper number o…

44185e1

…f members in a collection

Added String#to_b(encoding) and ByteArray#size methods + JavaTranslat…

769dabc

…or implementations

Added limited byte / string sizing checks

040b689

Translators: added BytesType#first and #last, implemented for Java

28e8d06

GenericChecks: implemented tons of checks for BytesLimitType - bounda…

1b66897

…ry size checks, terminator inclusion checks, etc

Merge branch 'master' into serialization

7d2295f

Merge branch 'master' into serialization

dd0880e

JavaCompiler: only use ReadOnly when _read is public, i.e. when debug…

d73ec6b

… is true

Merge branch 'master' into serialization

2081fb2

Merge branch 'master' into serialization

7dee284

WIP

75729f7

generalmimon and others added 15 commits January 2, 2023 14:12

Add checks for _root and _parent built-in parameters

2accf38

Add valid checks as in _read() to _check()/_write()

5afa962

Add necessary fixes after testing on https://github.com/kaitai-io/kai…

793f58b

…tai_struct_formats

Going forward, starting 0.11-SNAPSHOT

533aaa8

Delete unused method attrWriteStreamToStream

203613b

Port serialization support to Python

d1f16dd

Fix Python 2 compatibility when creating a fixed stream

d986731

Python: disable calls to alignToByte()

fca43d9

See b5c2e92 {,write}alignToByte() methods are now called inside the runtime library as needed: kaitai-io/kaitai_struct_python_runtime@1cb84b8

Python: use self as default _root only in top-level types

e776c98

as in Java, see https://github.com/kaitai-io/kaitai_struct_compiler/blob/829a14f1e33e8e48eeae726c8a287a5967bcb668/shared/src/main/scala/io/kaitai/struct/languages/JavaCompiler.scala#L153

Add "implies --no-auto-read" to --read-write CLI option description

00c831f

Add "Java and Python only" to --read-write CLI option description

f1dc857

Merge branch 'master' into serialization

952c89c

+ let --read-write imply --zero-copy-substream=false, see kaitai-io/kaitai_struct#1060 (comment)

Improve setting zeroCopySubstream to false in readWrite mode

9142a1b

Bring back init of instance flags in non-readWrite mode (C++, C#)

2eca3de

(fix regression from d1f16dd)

GreyCat commented Aug 5, 2023

View reviewed changes

generalmimon added 2 commits August 5, 2023 20:53

PythonCompiler: fix "{alignToByte => align_to_byte}()" in comment

a477800

Remove alignToByte() insertion logic when writing

8b258e2

See #255 (comment)

GreyCat commented Aug 6, 2023

View reviewed changes

shared/src/main/scala/io/kaitai/struct/format/Identifier.scala Show resolved Hide resolved

GreyCat commented Aug 6, 2023

View reviewed changes

shared/src/main/scala/io/kaitai/struct/datatype/Endianness.scala Outdated Show resolved Hide resolved

Endianness.fromString(): revert incorrect case {None => _} change

cb0c1eb

See #255 (comment)

Brachi reviewed Sep 3, 2023

View reviewed changes

generalmimon added 4 commits September 25, 2023 10:17

Merge branch 'master' into serialization

31f1359

Fix Java 7 compat: add final in allocateIOFixed()

7d830fa

Merge branch 'master' into serialization

dea74e1

Merge branch 'master' into serialization

4066d1e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialization #255

Serialization #255

GreyCat commented Aug 1, 2023

GreyCat Aug 5, 2023

generalmimon Aug 5, 2023 •

edited

Loading

Brachi Sep 3, 2023

generalmimon Sep 11, 2023 •

edited

Loading

Brachi Sep 12, 2023

generalmimon Oct 22, 2023 •

edited

Loading

generalmimon commented Oct 22, 2023

dakhnod commented Oct 27, 2024

	// NOTE: the compiler does not need to output alignToByte() calls for Java anymore, since the byte
	// alignment is handled by the runtime library since commit
	// https://github.com/kaitai-io/kaitai_struct_java_runtime/commit/1bc75aa91199588a1cb12a5a1c672b80b66619ac
	override def alignToByte(io: String): Unit = {}

Serialization #255

Are you sure you want to change the base?

Serialization #255

Conversation

GreyCat commented Aug 1, 2023

GreyCat Aug 5, 2023

Choose a reason for hiding this comment

generalmimon Aug 5, 2023 • edited Loading

Choose a reason for hiding this comment

Brachi Sep 3, 2023

Choose a reason for hiding this comment

generalmimon Sep 11, 2023 • edited Loading

Choose a reason for hiding this comment

Brachi Sep 12, 2023

Choose a reason for hiding this comment

generalmimon Oct 22, 2023 • edited Loading

Choose a reason for hiding this comment

generalmimon commented Oct 22, 2023

dakhnod commented Oct 27, 2024

generalmimon Aug 5, 2023 •

edited

Loading

generalmimon Sep 11, 2023 •

edited

Loading

generalmimon Oct 22, 2023 •

edited

Loading