Implementing a mixin for Flatbuffers #79

timrulebosch · 2022-05-12T19:36:38Z

Is your feature request related to a problem? Please describe.
I'm interested in supporting Flatbuffers via a Mixin. I already have a DataClass based encoder/decoder which uses getattr(...) to call the generated Flatbuffer code as well as loading modules with importlib.import_module(). However, that could be much faster if the code to do the encoding/decoding would be generated once for the schema.

Describe the solution you'd like
So far I can see how to implement the various serialization hooks (pre/post), but what would be the best way to implement the field serialization.

Generally, the code for each hooks needs to; based on the table/field name; load a module, call getattr() to find the right method to call, and then somehow emit the code in a way which can be used by the code builder. Possibly a default_encoder (Encoder)? Essentially, at some point, I need the list of fields, and a way to emit the necessary function calls to encode/decode data.

The pre/post hooks would take care of the "framing" of the Flatbuffer table (i.g. calling Start() and End() as well as creating a buffer at some point).

Describe alternatives you've considered
Currently I use getattr() calls each time a DataClass is serialized. So, I would like to generate the code only once, based on the DataClass, and thus get hopefully a significant performance boost.

Additional context
If its feasible, I don't mind to do implementation of the Mixin.

The text was updated successfully, but these errors were encountered:

BrutalSimplicity · 2022-05-30T03:58:59Z

I'm not the author so won't speak of what's possible, but upon review of the code for any of the json, msgpack, or yaml serializers it appears that all of the code building happens upon conversion to a dictionary. There is no code building being applied to serialize/deserialize objects for the formats supported.

I do think this could be done without a code building strategy, by leveraging a cache on the mixin you create that keeps a mapping of field -> method call. From there you could handle both serialization and deserialization by executing the mappings against the flatbuffer field -> method lookup table.

Not all that familiar with flatbuffers, but maybe something like...

[Edit]: Simplified to its essence.

from typing import Any, Mapping, Optional, Type, TypeVar

from mashumaro.mixins.dict import DataClassDictMixin
from mashumaro.serializer.json import DEFAULT_DICT_PARAMS
from typing_extensions import ClassVar, Protocol

T = TypeVar("T", bound="DataClassFlatBufferMixin")

def get_encoder(type: Type[T]):
    # use type and params to lookup module and methods
    field_encoders = {
        'field_name': lambda buffer, **kwargs: bytearray() # method call here
    }
    def encoder(buffer: bytearray, obj: Mapping[str, Any]):
        for key in obj.keys():
            field_encoders[key](buffer)
        return buffer

    return encoder
    
def get_decoder(type: Type[T]):
    # use type and params to lookup module and methods
    field_decoders = {
        'field_name': lambda buffer, **kwargs: 0 # method call here
    }

    def decoder(buffer: bytearray):
        return {key: field_decoder(buffer) for key, field_decoder in field_decoders.items()}

    return decoder

class Decoder(Protocol):
    def __call__(self, buffer: bytearray) -> Mapping[str, Any]: ...

class Encoder(Protocol):
    def __call__(self, buffer: bytearray, obj: Mapping[str, Any]) -> bytearray: ...

class DataClassFlatBufferMixin(DataClassDictMixin):
    __slots__ = ()
    __flatbuffer_encoder: ClassVar[Optional[Encoder]]
    __flatbuffer_decoder: ClassVar[Optional[Decoder]]

    # similar to a metaclass (but simpler)
    # allows setting class variables on any subclass of this type
    def __init_subclass__(cls: Type[T], **kwargs):
        super().__init_subclass__(**kwargs)
        cls.__flatbuffer_encoder = None
        cls.__flatbuffer_decoder = None

    def to_flatbuffer(self: T, buffer: bytearray):
        clazz = type(self)
        if not clazz.__flatbuffer_encoder:
            clazz.__flatbuffer_encoder = get_encoder(type(self))
        return clazz.__flatbuffer_encoder(
            buffer,
            self.to_dict(**dict(DEFAULT_DICT_PARAMS)),
        )

    @classmethod
    def from_flatbuffer(
        cls: Type[T],
        data: bytearray,
    ) -> T:
        if not cls.__flatbuffer_decoder:
            cls.__flatbuffer_decoder = get_decoder(cls)
        return cls.from_dict(
            cls.__flatbuffer_decoder(data),
            **dict(DEFAULT_DICT_PARAMS),
        )

timrulebosch · 2022-06-07T10:21:27Z

For reference, what I currently do is something like this:

Flatbuffer Schema:

namespace MyGame.Sample;

table Weapon {
  name:string;
  damage:short;
}

Using generated code (API generated by Flatbuffer compiler):

import flatbuffers
import MyGame.Sample.Weapon

builder = flatbuffers.Builder(1024)

weapon = builder.CreateString('Sword')
MyGame.Sample.Weapon.Start(builder)
MyGame.Sample.Weapon.AddName(builder, weapon)
MyGame.Sample.Weapon.AddDamage(builder, 3)
sword = MyGame.Sample.Weapon.End(builder)

builder.Finish(sword)
buf = builder.Output() // Of type `bytearray`.

And then I have a dataclass defined like this:

@dataclass
class Weapon(FlatbufferTable):
    name: str = None
    damage: int = None
    _fbs_table: type = field(default=MyGame.Sample.Weapon, init=False, repr=False, compare=False)

which is used by my encoder library, which operates based on the dataclass definition and "generates" code:

getattr(self._fbs_table, 'Start')(builder)
object_map['name'] = builder.CreateString('Sword')
getattr(self._fbs_table, 'AddName')(builder, object_map['name'])
getattr(self._fbs_table, 'AddDamage')(builder, 3)
getattr(self._fbs_table, 'End')(builder)

timrulebosch · 2022-06-07T10:37:23Z

@BrutalSimplicity thanks for that suggestion. Do you think your idea would work with the "string" case above? For that I would need to call a few functions:

Note that each string in this code is normally generated from the dataclass fields, its hardcoded here for brevity, so the actual code would have a few extra calls.

object_map['name'] = builder.CreateString("Sword")
getattr(self._fbs_table, 'AddName')(builder, object_map['name'])

I think those getattr calls are going to be expensive(?), however, perhaps its possible to emit them as a code object and use it inplace of the lambda as you suggested. Its seems like it would work.

Fatal1ty · 2022-06-07T10:40:56Z

Hi, guys!

I don't have experience with FlatBuffers, so it'll take me time to dive into this. But we can create subpackage mashumaro.mixins.third_party with the idea that anyone could create a mixin and put it there even if the code quality would have concerns. If someone responsible wants to create mashumaro.mixins.third_party.flatbuffers, I would accept such a pull request without any thoughts :)

Fatal1ty added the feature label Jun 7, 2022

Fatal1ty added enhancement New feature or request good first issue Good for newcomers and removed feature labels Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing a mixin for Flatbuffers #79

Implementing a mixin for Flatbuffers #79

timrulebosch commented May 12, 2022 •

edited

Loading

BrutalSimplicity commented May 30, 2022 •

edited

Loading

timrulebosch commented Jun 7, 2022

timrulebosch commented Jun 7, 2022

Fatal1ty commented Jun 7, 2022

Implementing a mixin for Flatbuffers #79

Implementing a mixin for Flatbuffers #79

Comments

timrulebosch commented May 12, 2022 • edited Loading

BrutalSimplicity commented May 30, 2022 • edited Loading

timrulebosch commented Jun 7, 2022

timrulebosch commented Jun 7, 2022

Fatal1ty commented Jun 7, 2022

timrulebosch commented May 12, 2022 •

edited

Loading

BrutalSimplicity commented May 30, 2022 •

edited

Loading